In some cases, query mem tracker does not exist in BE when transmit block. This will result in a null pointer for get query mem tracker in brpc transmit_block
Support query hive table on S3. Pass AK/SK, Region and s3 endpoint to hive table while creating the external table.
example create table sql:
```
CREATE TABLE `region_s3` (
`r_regionkey` integer NOT NULL,
`r_name` char(25) NOT NULL,
`r_comment` varchar(152) )
engine=hive
properties
("database"="default",
"table"="region_s3",
“hive.metastore.uris"="thrift://127.0.0.1:9083",
“AWS_ACCESS_KEY”=“YOUR_ACCESS_KEY",
“AWS_SECRET_KEY”=“YOUR_SECRET_KEY",
"AWS_ENDPOINT"="s3.us-east-1.amazonaws.com",
“AWS_REGION”=“us-east-1”);
```
* If we disable AVX2 by config USE_AVX2=0, we need to croaringbitmap with ROARING_DISABLE_AVX=ON
* update to trigger regression test again
Co-authored-by: cambyzju <zhuxiaoli01@baidu.com>
* compaction quickly for small data import #9791
1.merge small versions of rowset as soon as possible to increase the import frequency of small version data
2.small version means that the number of rows is less than config::small_compaction_rowset_rows default 1000
This pr support
1. remove the generic type from operator, remove some NODE_TYPE from plan and expression
2. refactor Plan and NODE_TYPE generic type
3. support child class matching by TypePattern
4. analyze the code of operator and generate pattern makes it easy to create rules.
e.g.
```java
class LogicalJoin extends LogicalBinaryOperator;
class PhysicalFilter extends PhysicalUnaryOperator;
```
will generate the code
```java
interface GeneratedPatterns extends Patterns {
default PatternDescriptor<LogicalBinaryPlan<LogicalJoin, Plan, Plan>, Plan> logicalJoin() {
return new PatternDescriptor<LogicalBinaryPlan<LogicalJoin, Plan, Plan>, Plan>(
new TypePattern(LogicalJoin.class, Pattern.FIXED, Pattern.FIXED),
defaultPromise()
);
}
default <C1 extends Plan, C2 extends Plan>
PatternDescriptor<LogicalBinaryPlan<LogicalJoin, C1, C2>, Plan>
logicalJoin(PatternDescriptor<C1, Plan> child1, PatternDescriptor<C2, Plan> child2) {
return new PatternDescriptor<LogicalBinaryPlan<LogicalJoin, C1, C2>, Plan>(
new TypePattern(LogicalJoin.class, child1.pattern, child2.pattern),
defaultPromise()
);
}
default PatternDescriptor<PhysicalUnaryPlan<PhysicalFilter, Plan>, Plan> physicalFilter() {
return new PatternDescriptor<PhysicalUnaryPlan<PhysicalFilter, Plan>, Plan>(
new TypePattern(PhysicalFilter.class, Pattern.FIXED),
defaultPromise()
);
}
default <C1 extends Plan>
PatternDescriptor<PhysicalUnaryPlan<PhysicalFilter, C1>, Plan>
physicalFilter(PatternDescriptor<C1, Plan> child1) {
return new PatternDescriptor<PhysicalUnaryPlan<PhysicalFilter, C1>, Plan>(
new TypePattern(PhysicalFilter.class, child1.pattern),
defaultPromise()
);
}
}
```
and then we don't have to add pattern for new operators.
this function utilizing jsr269 to do something in compile time, and utilizing antlr4 to analyze the code of `Operator`, then we can generate corresponding pattern.
pattern generate steps:
1. maven-compiler-plugin in the pom.xml will compile fe-core three terms. first term will compile `PatternDescribable.java` and `PatternDescribableProcessor.java`
2. second compile term will compile `PatternDescribableProcessPoint.java`, and enable annotation process `PatternDescribableProcessor`, PatternDescribableProcessor will receive the event and know that `PatternDescribableProcessPoint` class contains the `PatternDescribable` annotation.
3. `PatternDescribableProcessor` will not process `PatternDescribableProcessPoint`, but find all java file exists in `operatorPath` that specify in pom.xml, and then parse to Java AST(abstract syntax tree).
5. PatternDescribableProcessor collect java AST and use `PatternGeneratorAnalyzer` to analyze AST, find the child class file for `PlanOperator` then generate `GeneratedPatterns.java` by the AST.
6. third compile term will compile `GeneratedPatterns.java` and other java file.
1. Provide a FE conf to test the reliability in single replica case when tablet scheduling are frequent.
2. According to #6063, almost apply this fix on current code.
1. Fix the memory leak. When the load task is canceled, the `IndexChannel` and `NodeChannel` mem trackers cannot be destructed in time.
2. Fix Load task being frequently canceled by oom and inaccurate `LoadChannel` mem tracker limit, and rewrite the variable name of `mem limit` in `LoadChannel`.
3. Fix core dump, when logout task mem tracker, phmap erase fails, resulting in repeated logout of the same tracker.
4. Fix the deadlock, when add_child_tracker mem limit exceeds, calling log_usage causes `_child_trackers_lock` deadlock.
5. Fix frequent log printing when thread mem tracker limit exceeds, which will affect readability and performance.
6. Optimize some details of mem tracker display.
At present, Doris can only access the hadoop cluster with kerberos authentication enabled by broker, but Doris BE itself
does not supports access to a kerberos-authenticated HDFS file.
This PR hope solve the problem.
When create hive external table, users just specify following properties to access the hdfs data with kerberos authentication enabled:
```sql
CREATE EXTERNAL TABLE t_hive (
k1 int NOT NULL COMMENT "",
k2 char(10) NOT NULL COMMENT "",
k3 datetime NOT NULL COMMENT "",
k5 varchar(20) NOT NULL COMMENT "",
k6 double NOT NULL COMMENT ""
) ENGINE=HIVE
COMMENT "HIVE"
PROPERTIES (
'hive.metastore.uris' = 'thrift://192.168.0.1:9083',
'database' = 'hive_db',
'table' = 'hive_table',
'dfs.nameservices'='hacluster',
'dfs.ha.namenodes.hacluster'='n1,n2',
'dfs.namenode.rpc-address.hacluster.n1'='192.168.0.1:8020',
'dfs.namenode.rpc-address.hacluster.n2'='192.168.0.2:8020',
'dfs.client.failover.proxy.provider.hacluster'='org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider',
'dfs.namenode.kerberos.principal'='hadoop/_HOST@REALM.COM'
'hadoop.security.authentication'='kerberos',
'hadoop.kerberos.principal'='doris_test@REALM.COM',
'hadoop.kerberos.keytab'='/path/to/doris_test.keytab'
);
```
If you want to `select into outfile` to HDFS that kerberos authentication enable, you can refer to the following SQL statement:
```sql
select * from test into outfile "hdfs://tmp/outfile1"
format as csv
properties
(
'fs.defaultFS'='hdfs://hacluster/',
'dfs.nameservices'='hacluster',
'dfs.ha.namenodes.hacluster'='n1,n2',
'dfs.namenode.rpc-address.hacluster.n1'='192.168.0.1:8020',
'dfs.namenode.rpc-address.hacluster.n2'='192.168.0.2:8020',
'dfs.client.failover.proxy.provider.hacluster'='org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider',
'dfs.namenode.kerberos.principal'='hadoop/_HOST@REALM.COM'
'hadoop.security.authentication'='kerberos',
'hadoop.kerberos.principal'='doris_test@REALM.COM',
'hadoop.kerberos.keytab'='/path/to/doris_test.keytab'
);
```
Issue Number: close#9621
Add following physical operator: PhysicalAgg PhysicalSort PhysicalHashJoin
Add basic logic of plan translator
1. add new agg phase enum for nereids
2. remove the Analyzer from PlanContext.java
3. implement PlanTranslator::visitPhysicalFilter
Issue Number: close#9633
The scalar expression is rewritten using the visitor pattern as a traversal.
In the abstract class ExpressionVisitor, which contains all predicate to rewrite.
We have provided a rewrite rules interface ExpressionRewriteRule, AbstractExpressionRewriteRule class implements the interface and expanded the ExpressionVisitor, if we want to realize an expression rewriting rules, Direct implementation AbstractExpressionRewriteRule provided in the method of traversing the predicate.
There are two rules to refer: NormalizeExpressionRule and SimplifyNotExprRule