This pr did these things:
1. Change the nullable mode of 'from_unixtime' and 'parse_url' from DEPEND_ON_ARGUMENT to ALWAYS_NULLABLE, which nullable configuration was missing previously.
2. Add some new interfaces for origin NullableMode. This change inspired by the grammar of scala's mix-in trait, It help us to quickly understand the traits of function without read the lengthy procedural code and save the work to write some template code, like `class Substring extends ScalarFunction implements ImplicitCastInputTypes, PropagateNullable`. These are the interfaces:
- PropagateNullable: equals to NullableMode.DEPEND_ON_ARGUMENT
- AlwaysNullable: equals to NullableMode.ALWAYS_NULLABLE
- AlwaysNotNullable: equals to NullableMode.ALWAYS_NOT_NULLABLE
- others ComputeNullable: equals to NullableMode.CUSTOM
3. Add `GenerateScalarFunction` to generate nereids-style function code from legacy functions, but not actual generate any new function class yet, because the function's trait is not ready for use. I need add some traits for the legacy function's CompareMode and NonDeterministic, this thought is the same as ComputeNullable.
* [tracing] Support opentelemtry collector.
1. support for exporting traces to multiple distributed tracing system via collector;
2. support using collector to process traces.
in #9755, we split plan into plan & operator, but in subsequent development, we found the rule became complex and counter intuition:
1. we must create an operator instance, then wrap a plan by the operator type.
2. relational algebra(operator) not contains children
e.g.
```java
logicalProject().then(project -> {
List<NamedExpression> boundSlots =
bind(project.operator.getProjects(), project.children(), project);
LogicalProject op = new LogicalProject(flatBoundStar(boundSlots));
// wrap a plan
return new LogicalUnaryPlan(op, project.child());
})
```
after combine operator and plan, the code become to:
```java
logicalProject().then(project -> {
List<NamedExpression> boundSlots =
bind(project.getProjects(), project.children(), project);
return new LogicalProject(flatBoundStar(boundSlots), project.child());
})
```
Originally, we thought it would be convenient for `Memo.copyIn()` after split plan & operator, because Memo don't known how to re-new the plan(assembling child plan in the children groups) by the plan type. So plan must provide the `withChildren()` abstract method to assembling children. The less plan type, the lower code cost we have(logical/physical with leaf/unary/binary plan, about 6 plans, no concrete plan e.g. LogicalAggregatePlan).
But the convenient make negative effect that difficult to understand, and people must known the concept then can develop some new rules, and rule become ugly. So we combine the plan & operator, make the rule as simple as possible, the negative effect is we must overwrite some withXxx for all concrete plan, e.g. LogicalAggregate, PhysicalHashJoin.
This pr support
1. remove the generic type from operator, remove some NODE_TYPE from plan and expression
2. refactor Plan and NODE_TYPE generic type
3. support child class matching by TypePattern
4. analyze the code of operator and generate pattern makes it easy to create rules.
e.g.
```java
class LogicalJoin extends LogicalBinaryOperator;
class PhysicalFilter extends PhysicalUnaryOperator;
```
will generate the code
```java
interface GeneratedPatterns extends Patterns {
default PatternDescriptor<LogicalBinaryPlan<LogicalJoin, Plan, Plan>, Plan> logicalJoin() {
return new PatternDescriptor<LogicalBinaryPlan<LogicalJoin, Plan, Plan>, Plan>(
new TypePattern(LogicalJoin.class, Pattern.FIXED, Pattern.FIXED),
defaultPromise()
);
}
default <C1 extends Plan, C2 extends Plan>
PatternDescriptor<LogicalBinaryPlan<LogicalJoin, C1, C2>, Plan>
logicalJoin(PatternDescriptor<C1, Plan> child1, PatternDescriptor<C2, Plan> child2) {
return new PatternDescriptor<LogicalBinaryPlan<LogicalJoin, C1, C2>, Plan>(
new TypePattern(LogicalJoin.class, child1.pattern, child2.pattern),
defaultPromise()
);
}
default PatternDescriptor<PhysicalUnaryPlan<PhysicalFilter, Plan>, Plan> physicalFilter() {
return new PatternDescriptor<PhysicalUnaryPlan<PhysicalFilter, Plan>, Plan>(
new TypePattern(PhysicalFilter.class, Pattern.FIXED),
defaultPromise()
);
}
default <C1 extends Plan>
PatternDescriptor<PhysicalUnaryPlan<PhysicalFilter, C1>, Plan>
physicalFilter(PatternDescriptor<C1, Plan> child1) {
return new PatternDescriptor<PhysicalUnaryPlan<PhysicalFilter, C1>, Plan>(
new TypePattern(PhysicalFilter.class, child1.pattern),
defaultPromise()
);
}
}
```
and then we don't have to add pattern for new operators.
this function utilizing jsr269 to do something in compile time, and utilizing antlr4 to analyze the code of `Operator`, then we can generate corresponding pattern.
pattern generate steps:
1. maven-compiler-plugin in the pom.xml will compile fe-core three terms. first term will compile `PatternDescribable.java` and `PatternDescribableProcessor.java`
2. second compile term will compile `PatternDescribableProcessPoint.java`, and enable annotation process `PatternDescribableProcessor`, PatternDescribableProcessor will receive the event and know that `PatternDescribableProcessPoint` class contains the `PatternDescribable` annotation.
3. `PatternDescribableProcessor` will not process `PatternDescribableProcessPoint`, but find all java file exists in `operatorPath` that specify in pom.xml, and then parse to Java AST(abstract syntax tree).
5. PatternDescribableProcessor collect java AST and use `PatternGeneratorAnalyzer` to analyze AST, find the child class file for `PlanOperator` then generate `GeneratedPatterns.java` by the AST.
6. third compile term will compile `GeneratedPatterns.java` and other java file.
Currently, we use `UtFrameUtils` to start a FE server in the FE unit test.
Each test class has to do some initialization and clean up stuff with the JUnit4
`@BeforeClass` and `@AfterClass` annotation. It's redundant and boring.
Besides, almost all the APIs in `UtFrameUtils` has a `ConnectContext` parameter, which is not easy to use.
This PR proposes to use an inherit-manner, i.e., wrap all the common logic in base class `TestWithFeService`,
leveraging the
JUnit5 `@BeforeAll` and `@AfterAll` annotation to narrow down the setup and cleanup lifecycle to each test class instance.
At the same time, the derived concrete test class could directly use utility methods inherited from the base class,
without calling a util class and passing a `ConnectContext` argument.
`UtFrameUtils` and `DorisAssert` are marked as deprecated. We could remove these two classes
if this refactor works well for a time.
Nereids(new optimizer) code base
Nereids is new query planner for Doris. It include three main parts: parser, analyzer and optimizer.
The parser, generated by ANTLR4, transforms SQL into a logical plan with a tree structure. Analysis and optimization are performed on the logical plan of the tree structure. Each transformation is defined as a rule. The rule is applied to the logical plan using pattern matching. The implementation of the optimizer follows the approach in the Cascades paper.
In #8319, I remove mysql-connector-java dependency because of license incompatibility.
But we need a mysql compatible driver for http query api. So I choose mariadb-java-client,
which is under LGPL.
1. mysql-connector-java
mysql-connector-java is under GLPv2 license, which is not compatible with APLv2, and Doris does not use it.
2. org.json
org.json is under JSON license, which is not compatible with APLv2. I use `json-simple` to replace it.
This bug is introduced from #8112.
Also , I change the `grpc-netty` dependency to `grpc-netty-shaded`, to avoid dependency conflict:
```
java.lang.NoSuchMethodError: io.netty.buffer.PooledByteBufAllocator.
```
Close related #7389
Support create Iceberg external table in Doris.
This is the first step to support Iceberg external table.
### Create Iceberg external table
This pr describes two ways to create Iceberg external tables. Both ways do not require explicitly specifying column definitions, Doris automatically converts them based on Iceberg's column definitions.
1. Create an Iceberg external table directly
```sql
CREATE [EXTERNAL] TABLE table_name
ENGINE = ICEBERG
[COMMENT "comment"]
PROPERTIES (
"iceberg.database" = "iceberg_db_name",
"iceberg.table" = "icberg_table_name",
"iceberg.hive.metastore.uris" = "thrift://192.168.0.1:9083",
"iceberg.catalog.type" = "HIVE_CATALOG"
);
```
2. Create an Iceberg database and automatically create all the tables under that db.
```sql
CREATE DATABASE db_name
[COMMENT "comment"]
PROPERTIES (
"iceberg.database" = "iceberg_db_name",
"iceberg.hive.metastore.uris" = "thrift://192.168.0.1:9083",
"iceberg.catalog.type" = "HIVE_CATALOG"
);
```
### Show table creation
1. For individual tables you can view them with `help show create table`.
```sql
mysql> show create table iceberg_db.logs_1;
+--------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| Table | Create Table |
+--------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| logs_1 | CREATE TABLE `logs_1` (
`level` varchar(-1) NOT NULL COMMENT "null",
`event_time` datetime NOT NULL COMMENT "null",
`message` varchar(-1) NOT NULL COMMENT "null"
) ENGINE=ICEBERG
COMMENT "ICEBERG"
PROPERTIES (
"iceberg.database" = "doris",
"iceberg.table" = "logs_1",
"iceberg.hive.metastore.uris" = "thrift://10.10.10.10:9087",
"iceberg.catalog.type" = "HIVE_CATALOG"
) |
+--------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
```
2. For Iceberg database, you can view it with `help show table creation`.
```sql
mysql> show table creation from iceberg_db;
+--------+---------+---------------------+---------------------------------------------------------+
| Table | Status | Create Time | Error Msg |
+--------+---------+---------------------+---------------------------------------------------------+
| logs | fail | 2021-12-14 13:50:10 | Cannot convert unknown type to Doris type: list<string> |
| logs_1 | success | 2021-12-14 13:50:10 | |
+--------+---------+---------------------+---------------------------------------------------------+
2 rows in set (0.00 sec)
```
This is a new syntax.
Show table creation records in Iceberg database:
Syntax:
```sql
SHOW TABLE CREATION [FROM db] [LIKE mask]
```
Users can directly query the data in the hive table in Doris, and can use join to perform complex queries without laboriously importing data from hive.
Main changes list below:
FE:
Extend HiveScanNode from BrokerScanNode
HiveMetaStoreClientHelper communicate with HIVE and HDFS.
BE:
Treate HiveScanNode as BrokerScanNode, treate HiveTable as BrokerTable.
broker_scanner.cpp: suppot read column from HDFS path.
orc_scanner.cpp: support read hdfs file.
POM:
Add hive.version=2.3.7, hive-metastore and hive-exec
Add hadoop.version=2.8.0, hadoop-hdfs
Upgrade commons-lang to fix incompatiblity of Java 9 and later.
Thrift:
Add THiveTable
Add read_by_column_def in TBrokerRangeDesc
Add a use_path_style property for S3
Upgrade hadoop-common and hadoop-aws to 2.8.0 to support path style property
Fix some S3 URI bugs
Add some logs for tracing load process.
This commit is the first stage of #6287
In this commit, we support:
1、Sync Job
1)、 Creating sync job and data channel in Fe.
2)、Pause sync job.
3)、Resume sync job.
4)、Stop sync job.
5)、Show sync jobs.
2、Canal
1)、Subscribing and getting the binlog data of canal with creating syncjob.
1. Add a new dynamic partition property `create_history_partition`.
If set to true, Doris will create all partitions from `start` to `end`.
2. Add a new FE config `max_dynamic_partition_num`
To limit the number of partitions created when creating one table.