Commit Graph

2387 Commits

Author SHA1 Message Date
aa1bcdbc18 [Bug] Show create table null pointer of storage policy and error htttp path of tablet info (#10950)
Co-authored-by: lihaopeng <lihaopeng@baidu.com>
2022-07-22 20:55:35 +08:00
d17c906eb7 [chore](FE)add license header check in fe's checkstyle (#11076)
Add license header check in fe's checkstyle
2022-07-22 18:37:32 +08:00
34f328aa57 [feature] (Nereids) Merge memo group recursively (#11043)
In Memo.copyIn( plan, group1, isRewrite), one branch is that the plan is already recorded in Memo, and owned by group 'group2'. In such case, 'group1' should be merged with 'group2', because they are equivalent.
After merge, the upper level of 'group1', saying 'p1 = group1.getLogicalExpression().getOwnerGroup()' of 'group1', and that of 'group2', saying 'p2', are equivalent. We need to merge 'p1' and 'p2'. And this process is recursive.
2022-07-22 18:31:32 +08:00
0681e4f04f [Refactor](Nereids) Remove expression type. (#11066)
ExpressionType is duplicated with Java class type info, so removed it.
2022-07-22 17:48:18 +08:00
6963c41a04 [dependency] Upgrade Apache Commons Validator version to the latest one (#10508) 2022-07-22 17:03:46 +08:00
4003489bd0 [fix](update) check LOAD priv for update stmt (#11099) 2022-07-22 11:24:44 +08:00
7e3fc0d321 [enhancement](vec) Support outer join for vectorized exec engine (#11068)
Hash join node adds three new attributes.
The following will take an SQL as an example to illustrate the meaning of these three attributes

```
select t1. a from t1 left join t2 on t1. a=t2. b;
```
1. vOutputTupleDesc:Tuple2(a'')

2. vIntermediateTupleDescList: Tuple1(a', b'<nullable>)

2. vSrcToOutputSMap: <Tuple1(a'), Tuple2(a'')>

The slot in intermediatetuple corresponds to the slot in output tuple one by one through the expr calculation of the left child in vsrctooutputsmap.

This code mainly merges the contents of two PRs:
1.  [fix](vectorized) Support outer join for vectorized exec engine (https://github.com/apache/doris/pull/10323)
2. [Fix](Join) Fix the bug of outer join function under vectorization #9954

The following is the specific description of the first PR
In a vectorized scenario, the query plan will generate a new tuple for the join node.
This tuple mainly describes the output schema of the join node.
Adding this tuple mainly solves the problem that the input schema of the join node is different from the output schema.
For example:
1. The case where the null side column caused by outer join is converted to nullable.
2. The projection of the outer tuple.

The following is the specific description of the second PR
This pr mainly fixes the following problems:
1. Solve the query combined with inline view and outer join. After adding a tuple to the join operator, the position of the `tupleisnull` function is inconsistent with the row storage. Currently the vectorized `tupleisnull` will be calculated in the HashJoinNode.computeOutputTuple() function.
2. Column nullable property error problem. At present, once the outer join occurs, the column on the null-side side will be planned to be nullable in the semantic parsing stage.

For example:
```
select * from (select a as k1 from test) tmp right join b on tmp.k1=b.k1
```
At this time, the nullable property of column k1 in the `tmp` inline view should be true.

In the vectorized code, the virtual `tableRef` of tmp will be used in constructing the output tuple of HashJoinNode (specifically, the function HashJoinNode.computeOutputTuple()). So the **correctness** of the column nullable property of this tableRef is very important.
In the above case, since the tmp table needs to perform a right join with the b table, as a null-side tmp side, it is necessary to change the column attributes involved in the tmp table to nullable.

In non-vectorized code, since the virtual tableRef tmp is not used at all, it uses the `TupleIsNull` function in `outputsmp` to ensure data correctness.
That is to say, the a column of the original table test is still non-null, and it does not affect the correctness of the result.

The vectorized nullable attribute requirements are very strict.
Outer join will change the nullable attribute of the join column, thereby changing the nullable attribute of the column in the upper operator layer by layer.
Since FE has no mechanism to modify the nullable attribute in the upper operator tuple layer by layer after the analyzer.
So at present, we can only preset the attributes before the lower join as nullable in the analyzer stage in advance, so as to avoid the problem.
(At the same time, be also wrote some evasive code in order to deal with the problem of null to non-null.)

Co-authored-by: EmmyMiao87
Co-authored-by: HappenLee
Co-authored-by: morrySnow

Co-authored-by: EmmyMiao87 <522274284@qq.com>
2022-07-21 23:39:25 +08:00
7147a7c290 [feature-wip](multi-catalog) Support s3 storage for file scan node (#10977)
This is an example of s3 hms_catalog:
```sql
CREATE CATALOG hms_catalog properties(
"type" = "hms",
"hive.metastore.uris"="thrift://localhost:9083",
"AWS_ACCESS_KEY" = "your access key",
"AWS_SECRET_KEY"="your secret key",
"AWS_ENDPOINT"="s3 endpoint",
"AWS_REGION"="s3-region",
"fs.s3a.paging.maximum"="1000");
```
All these params are necessary;
2022-07-21 17:38:53 +08:00
5f6f35e886 Add the supported sub-type for array (#10824)
1.This pr is used for adding the supported sub-type for array which has been modified in #9916
2.add regression test for the supported sub-type

Co-authored-by: hucheng01 <hucheng01@baidu.com>
2022-07-21 16:29:17 +08:00
03783ce551 [fix](Nereids) fix merge conflict caused compile error (#11064)
fix merge conflict by #10882 and #10667
remove duplicate function hashCode
2022-07-21 14:14:26 +08:00
f8ad2613cf [Enhancement](Nereids) add some expr rewrite rule and plan rewrite rule of rewrite its expression (#10667)
# first: Add two expr rewrite rule:
1. remove duplicate expr
a = 1 and a = 1 -> a = 1

2. extract common expr
(a or b) and (a or c) -> a or (b and c)

# second: Add some plan rewrite rule of rewriting expr of operator
1. NormalizeExpressionOfPlan contains normalize expr rewrite rule. Using these normalizerule rewrite LogicalFilter、LogicalAggravate,LogicalProject,LogicalJoin exprs
2. OptimizeExpressionOfPlan contains optimize expr rewrite rule. Using these optimize rule rewrite LogicalFilter、LogicalAggravate,LogicalProject,LogicalJoin exprs
2022-07-21 12:35:28 +08:00
072479fa21 [enhancement](Nereids)expression equals and hashCode function (#10882)
review and add all missing equals and hashCode function to Expression and its sub class.

Alias
Arithmetic
BoundFunction
CompoundPredicate
Not
UnboundFunction
UnboundSlot
UnboundStar
2022-07-21 12:20:53 +08:00
329f70dc02 [enhancement](Nereids) support case when for TPC-H (#10947)
support case when for TPC-H
for example:
CASE [expression] WHEN [value] THEN [expression] ... ELSE [expression] END
or
CASE WHEN [predicate] THEN [expression] ... ELSE [expression] END
2022-07-21 12:02:37 +08:00
d36b927fdb [improvement](fe-ut) use local journal to make FE ut run fast (#11038)
* [improvement](fe-ut) use local journal to make FE ut run fast
2022-07-21 09:12:21 +08:00
6aadee9a2e [data lake]Support hdfs ha for Iceberg table. (#11002)
* Support Iceberg on HDFS with HA mode enabled.
2022-07-20 19:03:58 +08:00
0a8ae6aeec Refractor COLLECT_LIST and COLLECT_SET register logic (#10956)
Co-authored-by: cambyzju <zhuxiaoli01@baidu.com>
2022-07-20 18:02:39 +08:00
7bdce8f572 [refactor](policy) refactor some policy create and check logic (#11007)
* [refactor](policy) refactor some policy create and check logic
2022-07-20 16:20:59 +08:00
658a9f7531 [fix](planner)unnecessary cast will be added on children in InPredicate (#11033) 2022-07-20 16:00:26 +08:00
6233b5200e [refactor] (Nereids) rename GroupExpression.getParent() to getOwnerGroup() (#11027)
GroupExpression.getParent() returns the group which contains this expr. This name is missleading especially in tree structures.
So we change the name to getOwnerGroup.
2022-07-20 15:57:59 +08:00
a1c1cfce47 Add some comments for the feature mow (#11028) 2022-07-20 15:35:41 +08:00
9b91f86c38 [Feature](Nereids) Reorder join to eliminate cross join. (#10890)
Try to eliminate cross join via finding join conditions in filters and changing the join orders.
For example:

-- input:
SELECT * FROM t1, t2, t3 WHERE t1.id=t3.id AND t2.id=t3.id

-- output:
SELECT * FROM t1 JOIN t3 ON t1.id=t3.id JOIN t2 ON t2.id=t3.id
This feature is controlled by session variable enable_nereids_reorder_to_eliminate_cross_join with true by default.

Simplify usage of Memo and rewrite rule application.
Before this PR, if we want to apply a rewrite rule to a plan, the code is like the below:

    Memo memo = new Memo();
        memo.initialize(root);

    PlannerContext plannerContext = new PlannerContext(memo, new ConnectContext());
    JobContext jobContext = new JobContext(plannerContext, new PhysicalProperties(), 0);
    RewriteTopDownJob rewriteTopDownJob = new RewriteTopDownJob(memo.getRoot(),
            ImmutableList.of(new AggregateDisassemble().build()), jobContext);
        plannerContext.pushJob(rewriteTopDownJob);
        plannerContext.getJobScheduler().executeJobPool(plannerContext);

    Plan after = memo.copyOut();
After this PR, we could use chain style calling:

    new Memo(plan)
        .newPlannerContext(connectContext)
        .setDefaultJobContext()
        .topDownRewrite(new AggregateDisassemble())
        .getMemo()
        .copyOut();
Rename the session variable enable_nereids to enable_nereids_planner to make it more meaningful.
2022-07-20 13:53:54 +08:00
56e036e68b [feature-wip](multi-catalog) Support runtime filter for file scan node (#11000)
* [feature-wip](multi-catalog) Support runtime filter for file scan node

Co-authored-by: morningman <morningman@apache.org>
2022-07-20 12:36:57 +08:00
a5a50726bf [Ehancement](planner) Rewrite implicit cast to the predicates (#10920)
During the analysis of BinaryPredicate, it will generate a CastExpr if the slot implicitly in the below case:
SELECT * FROM t1 WHERE t1.col1 = '1';
col1 is integer column.

This will prevent the binary predicate from pushing down to OlapScan which would impact the performance.
2022-07-20 12:28:29 +08:00
Pxl
95366de7f6 cast array element to same type (#10980)
Fix problem when there are element of different types in an array.
2022-07-19 21:47:10 +08:00
2d90f4b87c [feature-wip](statistics) step4: collect statistics by implementing statistics tasks (#8861)
This pull request includes some implementations of the statistics(https://github.com/apache/incubator-doris/issues/6370), it will not affect any existing code and users will not be able to create statistics job.

Now only MetaStatisticsTask that directly collects statistics by reading FE meta is implemented. SQLStatisticsTask is still being implemented, it needs to query BE through FE.

The following is the function implemented by this pr: 
1. Support statistics collection for partitioned and non-partitioned tables. For partitioned tables, the collection of statistics for the specified partition is implemented.
2. When the task is divided, it is divided according to the partition table and the non-partition table. The most fine-grained is to the tablet level. A matetask collects as many statistics as possible. 
3. Add partition statistics (Table -> Partition -> Column). For example, the size of the table, the number of rows, the size of the partition, the number of rows, the maximum and minimum values of the columns, etc.
4. Display and modify partition-level statistics.
 …
2022-07-19 16:22:25 +08:00
2acd5efcd8 [improvement-log]print a log when got a lower image version (#10910) 2022-07-19 08:29:58 +08:00
68b9a2936a [improvement](doe) Step1: Fe generates the DSL and is used to explain (#9895)
For the first step, I will only change FE and then change BE once I make sure the DSL is ok.
2022-07-18 23:20:58 +08:00
e769597fd2 [Improvement] (datetime) support microsecond for date literal (#10917)
* [Improvement] (datetime) support microsecond for date literal

* remove joda dependency
2022-07-18 21:39:39 +08:00
60dd322aba [feature-wip](multi-catalog) Optimize threads and thrift interface of FileScanNode (#10942)
FileScanNode in be will launch as many threads as the number of splits.
The thrift interface of FileScanNode is excessive redundant.
2022-07-18 20:50:34 +08:00
a849f5be71 [feature](Nereids): hashCode(), equals() and UT. (#10870)
Add hashCode(), equals() for operator.

Add basic UT for them(need more detail test).

**future ticket**: add hashCode(), equals() and UT for `Expression`
2022-07-18 20:33:10 +08:00
b185545243 [refactor](Nereids)remove generic type from Rule and Job (#10897) 2022-07-18 19:35:16 +08:00
b037aca4fd [improvement](dynamic-partition) add replication allocation check for dynamic partition when creating table(#10892) 2022-07-18 18:02:33 +08:00
6736e06679 [feature](udf) Vectorization support remote udaf #10683 (#10685) 2022-07-18 17:15:34 +08:00
9adbd8abbd [feature](resource-tag) support multi tag for a single Backend (#10901) 2022-07-18 16:50:45 +08:00
091e17ecab fix(fe): add , with json_root property in stmt show create routine load for xx_job (#10929)
Fix issue: https://github.com/apache/doris/issues/10928
2022-07-18 16:44:40 +08:00
006d7c9225 [fix]The spring boot startup banner is lost, and the maven package does not package the pictures in the resources directory (#10955) 2022-07-18 16:00:14 +08:00
dc01ea7ad9 [fix](nereids) Fix the substring compilation error caused by merge (#10965)
Compilation error after merging due to Literal refactoring.
Compilation failure:
fe/fe-core/src/main/java/org/apache/doris/nereids/trees/expressions/functions/Substring.java:[40,38] org.apache.doris.nereids.trees.expressions.Literal is abstract; cannot be instantiated
2022-07-18 15:20:25 +08:00
8c544b6e13 fix show storage policy null pointer and redundant log (#10906)
* fix show storage policy null pointer and redundant log
2022-07-18 14:08:54 +08:00
0b177669d9 [feature](nereids) support substring (#10847)
support substring, for example:
select substr(a, 2), substring(b ,3 ,4) from test1;
2022-07-18 12:38:56 +08:00
bf95440c13 [Refactor](nereids)Refactor Literal to an inheritance hierarchy (#10771)
Use inheritance hierarchy instead of combination to make the framework more clear
2022-07-18 12:01:30 +08:00
5c88a74792 [Enhancement] generate runtime filter only for tuples with conjunct (#8745)
Remove useless runtime filter in some primary-foreign key join scenario in TPCH case.
2022-07-18 09:37:45 +08:00
2e94674cb5 [fix](alter) fix bug that fe crash because npe on rollupBatchTask (#10943)
Co-authored-by: caiconghui1 <caiconghui1@jd.com>
2022-07-18 08:47:25 +08:00
6b1408ce41 [fix](planner) fix create view when using union (#10849) 2022-07-17 20:54:40 +08:00
09d19e3f0f [feature-wip](array-type) explode support more sub types (#10673)
1. explode support more sub types;
2. explode support nullable elements;

Co-authored-by: cambyzju <zhuxiaoli01@baidu.com>
2022-07-17 18:08:30 +08:00
f0babfdcf8 print image's version when it is higher than FEConstants.meta_version (#10908) 2022-07-16 19:26:47 +08:00
6751e5b23c [fix](alter)(tablet-scheduler) fix unexpected exception with compaction_too_slow message when add rollup for olap table (#10827) 2022-07-15 19:59:00 +08:00
dc6fbcce14 [feature-wip] (datev2) modify datev2 format in memory (#10873)
* [feature-wip] (datev2) modify datev2 format in memory

* update
2022-07-15 19:57:38 +08:00
401203da6a [feature](code-data) move cold data to object storage without losing any feature(FE) (#10693)
Co-authored-by:platonekosama@gmail.com
2022-07-15 18:00:48 +08:00
97861f517a Revert "[chore][nereids] Bump the version of antlr4 to 4.10.1 (#10780)" (#10876)
This reverts commit b4927a8f151c60357387302723fa808e523d17e3.
2022-07-15 17:05:08 +08:00
ad4751972c [feature-wip] Support in predicate for datev2 type (#10810) 2022-07-15 14:32:40 +08:00