Now column `Array<T>` contains column `offsets` and `data`, and type of column `offsets` is UInt32 now.
If we call array_union to merge arrays repeatedly, the size of array may overflow.
So we need to extend it before `Array Data Type` release.
1. fix all checkstyle warning
2. change all checkstyle rules to error
3. remove some java doc rules
a. RequireEmptyLineBeforeBlockTagGroup
b. JavadocStyle
c. JavadocParagraph
4. suppress some rules for old codes
a. all java doc rules only affect on Nereids
b. DeclarationOrder only affect on Nereids
c. OverloadMethodsDeclarationOrder only affect on Nereids
d. VariableDeclarationUsageDistance only affect on Nereids
e. suppress OneTopLevelClass on org/apache/doris/load/loadv2/dpp/ColumnParser.java
f. suppress OneTopLevelClass on org/apache/doris/load/loadv2/dpp/SparkRDDAggregator.java
g. suppress LineLength on org/apache/doris/catalog/FunctionSet.java
h. suppress LineLength on org/apache/doris/common/ErrorCode.java
* [Vectorized][Function] add orthogonal bitmap agg functions
save some file about orthogonal bitmap function
add some file to rebase
update functions file
* refactor union_count function
refactor orthogonal union count functions
* remove bool is_variadic
Support query hive table on S3. Pass AK/SK, Region and s3 endpoint to hive table while creating the external table.
example create table sql:
```
CREATE TABLE `region_s3` (
`r_regionkey` integer NOT NULL,
`r_name` char(25) NOT NULL,
`r_comment` varchar(152) )
engine=hive
properties
("database"="default",
"table"="region_s3",
“hive.metastore.uris"="thrift://127.0.0.1:9083",
“AWS_ACCESS_KEY”=“YOUR_ACCESS_KEY",
“AWS_SECRET_KEY”=“YOUR_SECRET_KEY",
"AWS_ENDPOINT"="s3.us-east-1.amazonaws.com",
“AWS_REGION”=“us-east-1”);
```
This pr support
1. remove the generic type from operator, remove some NODE_TYPE from plan and expression
2. refactor Plan and NODE_TYPE generic type
3. support child class matching by TypePattern
4. analyze the code of operator and generate pattern makes it easy to create rules.
e.g.
```java
class LogicalJoin extends LogicalBinaryOperator;
class PhysicalFilter extends PhysicalUnaryOperator;
```
will generate the code
```java
interface GeneratedPatterns extends Patterns {
default PatternDescriptor<LogicalBinaryPlan<LogicalJoin, Plan, Plan>, Plan> logicalJoin() {
return new PatternDescriptor<LogicalBinaryPlan<LogicalJoin, Plan, Plan>, Plan>(
new TypePattern(LogicalJoin.class, Pattern.FIXED, Pattern.FIXED),
defaultPromise()
);
}
default <C1 extends Plan, C2 extends Plan>
PatternDescriptor<LogicalBinaryPlan<LogicalJoin, C1, C2>, Plan>
logicalJoin(PatternDescriptor<C1, Plan> child1, PatternDescriptor<C2, Plan> child2) {
return new PatternDescriptor<LogicalBinaryPlan<LogicalJoin, C1, C2>, Plan>(
new TypePattern(LogicalJoin.class, child1.pattern, child2.pattern),
defaultPromise()
);
}
default PatternDescriptor<PhysicalUnaryPlan<PhysicalFilter, Plan>, Plan> physicalFilter() {
return new PatternDescriptor<PhysicalUnaryPlan<PhysicalFilter, Plan>, Plan>(
new TypePattern(PhysicalFilter.class, Pattern.FIXED),
defaultPromise()
);
}
default <C1 extends Plan>
PatternDescriptor<PhysicalUnaryPlan<PhysicalFilter, C1>, Plan>
physicalFilter(PatternDescriptor<C1, Plan> child1) {
return new PatternDescriptor<PhysicalUnaryPlan<PhysicalFilter, C1>, Plan>(
new TypePattern(PhysicalFilter.class, child1.pattern),
defaultPromise()
);
}
}
```
and then we don't have to add pattern for new operators.
this function utilizing jsr269 to do something in compile time, and utilizing antlr4 to analyze the code of `Operator`, then we can generate corresponding pattern.
pattern generate steps:
1. maven-compiler-plugin in the pom.xml will compile fe-core three terms. first term will compile `PatternDescribable.java` and `PatternDescribableProcessor.java`
2. second compile term will compile `PatternDescribableProcessPoint.java`, and enable annotation process `PatternDescribableProcessor`, PatternDescribableProcessor will receive the event and know that `PatternDescribableProcessPoint` class contains the `PatternDescribable` annotation.
3. `PatternDescribableProcessor` will not process `PatternDescribableProcessPoint`, but find all java file exists in `operatorPath` that specify in pom.xml, and then parse to Java AST(abstract syntax tree).
5. PatternDescribableProcessor collect java AST and use `PatternGeneratorAnalyzer` to analyze AST, find the child class file for `PlanOperator` then generate `GeneratedPatterns.java` by the AST.
6. third compile term will compile `GeneratedPatterns.java` and other java file.
1. Provide a FE conf to test the reliability in single replica case when tablet scheduling are frequent.
2. According to #6063, almost apply this fix on current code.
At present, Doris can only access the hadoop cluster with kerberos authentication enabled by broker, but Doris BE itself
does not supports access to a kerberos-authenticated HDFS file.
This PR hope solve the problem.
When create hive external table, users just specify following properties to access the hdfs data with kerberos authentication enabled:
```sql
CREATE EXTERNAL TABLE t_hive (
k1 int NOT NULL COMMENT "",
k2 char(10) NOT NULL COMMENT "",
k3 datetime NOT NULL COMMENT "",
k5 varchar(20) NOT NULL COMMENT "",
k6 double NOT NULL COMMENT ""
) ENGINE=HIVE
COMMENT "HIVE"
PROPERTIES (
'hive.metastore.uris' = 'thrift://192.168.0.1:9083',
'database' = 'hive_db',
'table' = 'hive_table',
'dfs.nameservices'='hacluster',
'dfs.ha.namenodes.hacluster'='n1,n2',
'dfs.namenode.rpc-address.hacluster.n1'='192.168.0.1:8020',
'dfs.namenode.rpc-address.hacluster.n2'='192.168.0.2:8020',
'dfs.client.failover.proxy.provider.hacluster'='org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider',
'dfs.namenode.kerberos.principal'='hadoop/_HOST@REALM.COM'
'hadoop.security.authentication'='kerberos',
'hadoop.kerberos.principal'='doris_test@REALM.COM',
'hadoop.kerberos.keytab'='/path/to/doris_test.keytab'
);
```
If you want to `select into outfile` to HDFS that kerberos authentication enable, you can refer to the following SQL statement:
```sql
select * from test into outfile "hdfs://tmp/outfile1"
format as csv
properties
(
'fs.defaultFS'='hdfs://hacluster/',
'dfs.nameservices'='hacluster',
'dfs.ha.namenodes.hacluster'='n1,n2',
'dfs.namenode.rpc-address.hacluster.n1'='192.168.0.1:8020',
'dfs.namenode.rpc-address.hacluster.n2'='192.168.0.2:8020',
'dfs.client.failover.proxy.provider.hacluster'='org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider',
'dfs.namenode.kerberos.principal'='hadoop/_HOST@REALM.COM'
'hadoop.security.authentication'='kerberos',
'hadoop.kerberos.principal'='doris_test@REALM.COM',
'hadoop.kerberos.keytab'='/path/to/doris_test.keytab'
);
```
Issue Number: close#9621
Add following physical operator: PhysicalAgg PhysicalSort PhysicalHashJoin
Add basic logic of plan translator
1. add new agg phase enum for nereids
2. remove the Analyzer from PlanContext.java
3. implement PlanTranslator::visitPhysicalFilter
Issue Number: close#9633
The scalar expression is rewritten using the visitor pattern as a traversal.
In the abstract class ExpressionVisitor, which contains all predicate to rewrite.
We have provided a rewrite rules interface ExpressionRewriteRule, AbstractExpressionRewriteRule class implements the interface and expanded the ExpressionVisitor, if we want to realize an expression rewriting rules, Direct implementation AbstractExpressionRewriteRule provided in the method of traversing the predicate.
There are two rules to refer: NormalizeExpressionRule and SimplifyNotExprRule
Change the DatabaseIf APIs' return type to TableIf.
Use generics in DatabaseIf, to avoid changing the return type in Database.
Currently Database class use type Table, I'm try to avoid changing it to TableIf.
Because in this case, we need to change a lot of code.
When plan bucket shuffle join, we need to know left table bucket number.
Currently, we use tablet number directly based on the assumption that left table has only one partition.
But, when left table is colocated table, it could have more than one partition.
In this case, some data in right table will be dropped incorrectly and produce wrong result for query.
reproduce could follow regression test in PR.
* [fix](fe) select stmt will make BE coredump when its castExpr is like cast(int as array<>)
* fix implicit cast scalar type bug
* Revert "fix implicit cast scalar type bug"
This reverts commit 1f05b6bab72430214dca88f386b50ef9a081e60a.
* only check array cast, retrigger
Add ntile function.
For non-vectorized-engine, I just implemented like Impala, rewrite ntile to row_number and count.
But for vectorized-engine, I implemented WindowFunctionNTile.
In SetOperationNode we do passthrough, if we child output is same with itself output.
In method isChildPassthrough we only consider memory layout.
When we use vectorized engine, we need to use SlotDesc offset in TupleDesc instead of
memory layout to check whether pass-through can be performed
closed#9644
Second step of statistics derivation: implementation of nodes other than scan_node.
The statistical information derivation interface of all nodes is uniformly placed in DeriveFactory.java.
Added one-sided to verify the derivation is correct.
Statistics derivation for each node is placed in its own *StatsDerive.java
detailed design: https://docs.google.com/document/d/1u1L6XhyzKShoyYRwFQ6kE1rnvY2iFwauwg289au5Qq0/edit
Currently, only the root user has node_priv privileges.
That is, only the root user can operate the addition and deletion of nodes.
In the original design of Doris, there is an Operator role. This role can have node_priv for node operations.
This PR supports assigning node_priv to users other than root.
However, only users who have both grant_priv and node_priv can assign node_priv to other users.
This ensures that only the root user has this permission, and users who are given node_priv
cannot continue to expand this permission outward.
- set inline view's slot descriptor to nullable in register column ref
- propagate slot nullable when generate inline view's query node in SingleNodePlanner