Commit Graph

2199 Commits

Author SHA1 Message Date
f1c9105af1 [feature] Support hive on s3 (#10128)
Support query hive table on S3. Pass AK/SK, Region and s3 endpoint to hive table while creating the external table.

example create table sql:
```
CREATE TABLE `region_s3` (
`r_regionkey` integer NOT NULL,   
`r_name` char(25) NOT NULL,   
`r_comment` varchar(152) ) 
engine=hive 
properties 
("database"="default", 
"table"="region_s3", 
“hive.metastore.uris"="thrift://127.0.0.1:9083",
“AWS_ACCESS_KEY”=“YOUR_ACCESS_KEY",
“AWS_SECRET_KEY”=“YOUR_SECRET_KEY",
"AWS_ENDPOINT"="s3.us-east-1.amazonaws.com", 
“AWS_REGION”=“us-east-1”);
```
2022-06-16 19:15:46 +08:00
4b9d500425 [improvement](profile) Add table name and predicates (#10093) 2022-06-16 10:59:31 +08:00
Pxl
5805f8077f [Feature] [Vectorized] Some pre-refactorings or interface additions for schema change part2 (#10003) 2022-06-16 10:50:08 +08:00
90f229c038 [refactor] remove useless plugin test code (#10061)
* remove plugin test code

* remove plugin test

Co-authored-by: yiguolei <yiguolei@gmail.com>
2022-06-16 10:43:28 +08:00
ca88f258d9 [improvement] remove unused codes and docs for SHOW USER (#10107)
* remove unused codes and docs for `SHOW USER`
2022-06-15 21:49:08 +08:00
49f4437396 [fix] Fix disk used pct only consider the data that used by Doris (#9705) 2022-06-15 16:28:56 +08:00
76a968d1dd [Enhancement][Refactor](Nereids) generate pattern by operator and refactor Plan and NODE_TYPE generic type (#10019)
This pr support
1. remove the generic type from operator, remove some NODE_TYPE from plan and expression
2. refactor Plan and NODE_TYPE generic type
3. support child class matching by TypePattern
4. analyze the code of operator and generate pattern makes it easy to create rules.


e.g. 
```java
class LogicalJoin extends LogicalBinaryOperator;
class PhysicalFilter extends PhysicalUnaryOperator;
```

will generate the code
```java
interface GeneratedPatterns extends Patterns {
  default PatternDescriptor<LogicalBinaryPlan<LogicalJoin, Plan, Plan>, Plan> logicalJoin() {
      return new PatternDescriptor<LogicalBinaryPlan<LogicalJoin, Plan, Plan>, Plan>(
          new TypePattern(LogicalJoin.class, Pattern.FIXED, Pattern.FIXED),
          defaultPromise()
      );
  }
  
  default <C1 extends Plan, C2 extends Plan>
  PatternDescriptor<LogicalBinaryPlan<LogicalJoin, C1, C2>, Plan>
          logicalJoin(PatternDescriptor<C1, Plan> child1, PatternDescriptor<C2, Plan> child2) {
      return new PatternDescriptor<LogicalBinaryPlan<LogicalJoin, C1, C2>, Plan>(
          new TypePattern(LogicalJoin.class, child1.pattern, child2.pattern),
          defaultPromise()
      );
  }

  default PatternDescriptor<PhysicalUnaryPlan<PhysicalFilter, Plan>, Plan> physicalFilter() {
      return new PatternDescriptor<PhysicalUnaryPlan<PhysicalFilter, Plan>, Plan>(
          new TypePattern(PhysicalFilter.class, Pattern.FIXED),
          defaultPromise()
      );
  }
  
  default <C1 extends Plan>
  PatternDescriptor<PhysicalUnaryPlan<PhysicalFilter, C1>, Plan>
          physicalFilter(PatternDescriptor<C1, Plan> child1) {
      return new PatternDescriptor<PhysicalUnaryPlan<PhysicalFilter, C1>, Plan>(
          new TypePattern(PhysicalFilter.class, child1.pattern),
          defaultPromise()
      );
  }
}
```
and then we don't have to add pattern for new operators.

this function utilizing jsr269 to do something in compile time, and utilizing antlr4 to analyze the code of `Operator`, then we can generate corresponding pattern.


pattern generate steps:
1. maven-compiler-plugin in the pom.xml will compile fe-core three terms. first term will compile `PatternDescribable.java` and `PatternDescribableProcessor.java`
2. second compile term will compile `PatternDescribableProcessPoint.java`, and enable annotation process `PatternDescribableProcessor`, PatternDescribableProcessor will receive the event and know that `PatternDescribableProcessPoint` class contains the `PatternDescribable` annotation.
3. `PatternDescribableProcessor` will not process `PatternDescribableProcessPoint`, but find all java file exists in `operatorPath` that specify in pom.xml, and then parse to Java AST(abstract syntax tree).
5. PatternDescribableProcessor collect java AST and use `PatternGeneratorAnalyzer` to analyze AST, find the child class file for `PlanOperator` then generate `GeneratedPatterns.java` by the AST.
6. third compile term will compile `GeneratedPatterns.java` and other java file.
2022-06-15 11:44:54 +08:00
c4d0fba713 Add storage policy for remote storage migration (#9997) 2022-06-15 11:00:06 +08:00
4c24586865 [Vectorized][UDF] support java-udaf (#9930) 2022-06-15 10:53:44 +08:00
f4e2f78a1a [fix] Fix the bug that data balance causes tablet loss (#9971)
1. Provide a FE conf to test the reliability in single replica case when tablet scheduling are frequent.
2. According to #6063, almost apply this fix on current code.
2022-06-15 09:52:56 +08:00
f7b5f36da4 [feature] Support read hive external table and outfile into HDFS that authenticated by kerberos (#9579)
At present, Doris can only access the hadoop cluster with kerberos authentication enabled by broker, but Doris BE itself 
does not supports access to a kerberos-authenticated HDFS file.

This PR hope solve the problem.

When create hive external table, users just specify following properties to access the hdfs data with kerberos authentication enabled:

```sql
CREATE EXTERNAL TABLE t_hive (
k1 int NOT NULL COMMENT "",
k2 char(10) NOT NULL COMMENT "",
k3 datetime NOT NULL COMMENT "",
k5 varchar(20) NOT NULL COMMENT "",
k6 double NOT NULL COMMENT ""
) ENGINE=HIVE
COMMENT "HIVE"
PROPERTIES (
'hive.metastore.uris' = 'thrift://192.168.0.1:9083',
'database' = 'hive_db',
'table' = 'hive_table',
'dfs.nameservices'='hacluster',
'dfs.ha.namenodes.hacluster'='n1,n2',
'dfs.namenode.rpc-address.hacluster.n1'='192.168.0.1:8020',
'dfs.namenode.rpc-address.hacluster.n2'='192.168.0.2:8020',
'dfs.client.failover.proxy.provider.hacluster'='org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider',
'dfs.namenode.kerberos.principal'='hadoop/_HOST@REALM.COM'
'hadoop.security.authentication'='kerberos',
'hadoop.kerberos.principal'='doris_test@REALM.COM',
'hadoop.kerberos.keytab'='/path/to/doris_test.keytab'
);
```

If you want  to `select into outfile` to HDFS that kerberos authentication enable, you can refer to the following SQL statement:

```sql
select * from test into outfile "hdfs://tmp/outfile1" 
format as csv
properties
(
'fs.defaultFS'='hdfs://hacluster/',
'dfs.nameservices'='hacluster',
'dfs.ha.namenodes.hacluster'='n1,n2',
'dfs.namenode.rpc-address.hacluster.n1'='192.168.0.1:8020',
'dfs.namenode.rpc-address.hacluster.n2'='192.168.0.2:8020',
'dfs.client.failover.proxy.provider.hacluster'='org.apache.hadoop.hdfs.server.namenode.ha.ConfiguredFailoverProxyProvider',
'dfs.namenode.kerberos.principal'='hadoop/_HOST@REALM.COM'
'hadoop.security.authentication'='kerberos',
'hadoop.kerberos.principal'='doris_test@REALM.COM',
'hadoop.kerberos.keytab'='/path/to/doris_test.keytab'
);
```
2022-06-14 20:07:03 +08:00
25b9d6eba2 [feature](nereids) Plan Translator (#9993)
Issue Number: close #9621

Add following physical operator: PhysicalAgg PhysicalSort PhysicalHashJoin

Add basic logic of plan translator

1. add new agg phase enum for nereids
2. remove the Analyzer from PlanContext.java
3. implement PlanTranslator::visitPhysicalFilter
2022-06-14 19:39:55 +08:00
2fadaddda0 [Enhancement] (Nereids) scalar expression rewrite framework (#9942)
Issue Number: close #9633

The scalar expression is rewritten using the visitor pattern as a traversal.

In the abstract class ExpressionVisitor, which contains all predicate to rewrite.

We have provided a rewrite rules interface ExpressionRewriteRule, AbstractExpressionRewriteRule class implements the interface and expanded the ExpressionVisitor, if we want to realize an expression rewriting rules, Direct implementation AbstractExpressionRewriteRule provided in the method of traversing the predicate.

There are two rules to refer: NormalizeExpressionRule and SimplifyNotExprRule
2022-06-14 16:20:48 +08:00
81e0a348a7 [fix] fix bug that show proc "/cluster_balance/history_tablets" return malformat error (#10073) 2022-06-14 15:34:16 +08:00
bdcf2e7ed2 [Improvement] set table name in olap scanner (#10102) 2022-06-14 08:18:18 +08:00
f26b81e4dd [feature](multi-catalog) Change DatabaseIf APIs' return type to TableIf. (#10044)
Change the DatabaseIf APIs' return type to TableIf.
Use generics in DatabaseIf, to avoid changing the return type in Database.
Currently Database class use type Table, I'm try to avoid changing it to TableIf.
Because in this case, we need to change a lot of code.
2022-06-13 10:55:44 +08:00
415b6b8086 [feature-wip](array-type) Support array type which doesn't contain null (#9809) 2022-06-12 23:35:28 +08:00
036276c1d3 [fix] Do not send drop task when replay drop table (#10062)
When doing checkpoint, FE will sends DropTask to BE.
This PR prohibit this conduct.
2022-06-12 09:59:38 +08:00
3f575e3e7c [fix](planner) produce wrong result when use bucket shuffle join with colocate left table (#10045)
When plan bucket shuffle join, we need to know left table bucket number.
Currently, we use tablet number directly based on the assumption that left table has only one partition.
But, when left table is colocated table, it could have more than one partition.
In this case, some data in right table will be dropped incorrectly and produce wrong result for query.

reproduce could follow regression test in PR.
2022-06-11 21:44:47 +08:00
a7cca930b9 [fix](planner) fix don't rewrite nested union statement bug (#8513)
Issue Number: close #8512
2022-06-10 19:43:45 +08:00
4135e59f77 [fix](fe) select stmt will make BE coredump when its castExpr is like cast(int as array<>) (#9995)
* [fix](fe) select stmt will make BE coredump when its castExpr is like cast(int as array<>)

* fix implicit cast scalar type bug

* Revert "fix implicit cast scalar type bug"

This reverts commit 1f05b6bab72430214dca88f386b50ef9a081e60a.

* only check array cast, retrigger
2022-06-10 15:03:09 +08:00
Pxl
495c34fa29 [Bug] [Vectorized] code dump on aggregate node over union node (#10040)
* miss check passthrough on vectorized

* format and add test

* update
2022-06-10 15:02:14 +08:00
81a9284305 [improvement][refactor](image) refactor the read and load method of meta image #10005 2022-06-10 14:56:14 +08:00
4a474420c8 [feature](function) Add ntile function (#9867)
Add ntile function.
For non-vectorized-engine, I just implemented like Impala, rewrite ntile to row_number and count.
But for vectorized-engine, I implemented WindowFunctionNTile.
2022-06-10 10:32:40 +08:00
6fab1cbf3c [feature-wip](array-type) Add array functions size and cardinality (#9921)
Co-authored-by: cambyzju <zhuxiaoli01@baidu.com>
2022-06-09 15:03:03 +08:00
050cbba6e5 [fix][hudi] use lowerCase to get hudi fileFormatType (#9873)
use lowerCase of inputFormatName to get hudi fileFormatType
2022-06-09 12:13:02 +08:00
449bfe10d1 fix: fix a thread safe problem in LoadAction.java (#9955) 2022-06-09 00:34:07 +08:00
342ab52270 [fix] Fix type description in PrimitiveType (#9985) 2022-06-09 00:30:32 +08:00
99fb830023 [feature] datetime column type support auto-initialized with default … (#9972) 2022-06-09 00:28:03 +08:00
5f56e17ef2 [feature-wip](multi-catalog)(step2) Introduce Internal Data Source (#9953) 2022-06-08 22:02:22 +08:00
d9bbf67b9e [DefaultConfigChange]enable query vectorization and storage vectorization and storage low cardinality optimization by default (#9848)
Co-authored-by: yiguolei <yiguolei@gmail.com>
2022-06-08 15:29:43 +08:00
2ed523440f [fix](planner) passthrough child in SetOperationNode is wrong when enable vector engine (#9991)
In SetOperationNode we do passthrough, if we child output is same with itself output.
In method isChildPassthrough we only consider memory layout.
When we use vectorized engine, we need to use SlotDesc offset in TupleDesc instead of
memory layout to check whether pass-through can be performed
2022-06-08 14:12:04 +08:00
dcdfc5b32a [fix](coordinator) fix bug that unable to generate query profile (#10002)
This bug was introduced from #9720
2022-06-08 10:59:15 +08:00
e97d835ba7 [feature](statistics) Statistics derivation.Step 2:OtherNode implemen… (#9458)
closed #9644

Second step of statistics derivation: implementation of nodes other than scan_node.
The statistical information derivation interface of all nodes is uniformly placed in DeriveFactory.java.
Added one-sided to verify the derivation is correct.

Statistics derivation for each node is placed in its own *StatsDerive.java
detailed design: https://docs.google.com/document/d/1u1L6XhyzKShoyYRwFQ6kE1rnvY2iFwauwg289au5Qq0/edit
2022-06-07 21:10:28 +08:00
0fa1615147 [fix](fe-ut) Fix FE ut when enable vectorized engine (#9958)
Some node name in query explain result will be changes.
eg:
Aggregate -> VAggregate
2022-06-07 09:13:47 +08:00
856b421086 [feature](priv) Support grant node_priv to other user. (#9951)
Currently, only the root user has node_priv privileges.
That is, only the root user can operate the addition and deletion of nodes.

In the original design of Doris, there is an Operator role. This role can have node_priv for node operations.

This PR supports assigning node_priv to users other than root.
However, only users who have both grant_priv and node_priv can assign node_priv to other users.
This ensures that only the root user has this permission, and users who are given node_priv
cannot continue to expand this permission outward.
2022-06-06 11:04:20 +08:00
24ad11af6a [deps] upgrade fabric8 k8s client to compitable new k8s cluster (#9933) 2022-06-06 10:00:36 +08:00
c18f7a31f1 remove redundant this (#9878)
Co-authored-by: vishalsingh <2018uec1001@gmail.com>
2022-06-05 13:09:14 +08:00
da33a48f39 [refactor](policy) Refactor the hierarchy of Policy. (#9786)
The RowPolicy extends Policy
2022-06-04 11:29:09 +08:00
3031919e8f [fix] (planner) slot nullable does not set correctly when plan outer join with inline view (#9927)
- set inline view's slot descriptor to nullable in register column ref
- propagate slot nullable when generate inline view's query node in SingleNodePlanner
2022-06-03 17:50:10 +08:00
937491098e [fix] fix grammar of ADMIN SHOW TABLET STORAGE FORMAT stmt (#9938) 2022-06-03 17:49:34 +08:00
c996334ad1 [improvement] Optimize send fragment logic to reduce send fragment timeout error (#9720)
This CL mainly changes:
1. Reducing the rpc timeout problem caused by rpc waiting for the worker thread of brpc.
    1. Merge multiple fragment instances on the same BE to send requests to reduce the number of send fragment rpcs
    2. If fragments size >= 3, use 2 phase RPC: one is to send all fragments, two is to start these fragments. So that there
         will be at most 2 RPC for each query on one BE.

3. Set the timeout of send fragment rpc to the query timeout to ensure the consistency of users' expectation of query timeout period.

4. Do not close the connection anymore when rpc timeout occurs.
5. Change some log level from info to debug to simplify the fe.log content.

NOTICE:
1. Change the definition of execPlanFragment rpc, must first upgrade BE.
3. Remove FE config `remote_fragment_exec_timeout_ms`
2022-06-03 15:47:40 +08:00
67fa1fcf2a [fix] fix invalid SQL rewrite for field in materialized view (#9877) 2022-06-02 23:43:13 +08:00
dcf18ac322 [fix](hive) fix bug of invalid user info in external table's scan node (#9908)
Fix the hive external table scan node null exception
Now hive external table query will fail when use local user@ip
2022-06-02 10:41:40 +08:00
fccb7b8055 [fix](planner) Fix the bug of can't query the data of new added partition when set partition_prune_algorithm_version = 2 (#9844) 2022-06-01 23:44:14 +08:00
2082d0a01f [optimize](planner)remove redundant conjuncts on plan node (#9819) 2022-06-01 23:43:08 +08:00
2fc2113b05 [feature] Support show proc BackendLoadStatistic (#9618)
The proc info method already exists in `ClusterLoadStatistic.getBackendStatistic`, I'll add a proc node to show it.
2022-06-01 23:30:10 +08:00
8effdd95a7 [fix](routine-load) fix bug that routine load task can not find backend (#9902)
Introduced from #9492.
2022-06-01 17:55:30 +08:00
92babc7a47 [improvement][fix](planner) Add a rewrite rule to optimize InPredicate. (#9739)
1. Convert child expressions in InPredicate to column type and discard child expressions in them that cannot be converted exactly.
2. Fix the bug of ColumnRange exception caused by InPredicate child expressions type conversion.
3. Fix the problem that the tablet could not be hit due caused by InPredicate child expressions type conversion.
2022-06-01 15:26:32 +08:00
4ab7694a7f [Enhancement](Nereids)rewrite framework used in Memo (#9807)
Issue Number: close #9627 , #9628

This PR introduce two essentials for Nereids

1. pattern match iterator used in memo

pattern match iterator is implemented by two iterators nested within each other: GroupExpressionIterator and GroupIterator.
GroupExpressionIterator use GroupIterator to get all children Plan which matching pattern and use them as children to generate pattern matched plan.
GroupIterator use GroupExpressionIterator to get all pattern matched Plan related to GroupExpressions in itself.

2. plan rewrite framework for memo

Rewrite framework is implemented by two jobs: RewriteTopDownJob and RewriteBottomUpJob
Both of them takes a group, a set of rules that need to be applied, and a context as construction parameters.
RewriteTopDownJob apply these jobs from top to down one by one.
RewriteBottomUpJob apply these jobs from bottom to up one by one.
When one rule rewrites plan tree at a plan node. This plan node will be applied all rules again until no rules can rewrite it.
2022-06-01 15:12:58 +08:00