Commit Graph

1904 Commits

Author SHA1 Message Date
31d8fdd9e4 [fix](Nereids) finalize local aggregate should not turn on stream pre agg (#13922) 2022-11-03 11:08:06 +08:00
a4a991207b [fix](agg)fix group by constant value bug (#13827)
* [fix](agg)fix group by constant value bug

* keep only one const grouping exprs if no agg exprs
2022-11-03 10:26:59 +08:00
b3c6af0059 [Bugfix](MV) Fixed load negative values into bitmap type materialized views successfully under non-vectorization (#13719)
* [Bugfix](MV) Fixed load negative values into bitmap type materialized views successfully under non-vectorization
2022-11-03 09:21:38 +08:00
7b4c2cabb4 [feature](new-scan) support transactional insert in new scan framework (#13858)
Support running transactional insert operation with new scan framework. eg:

admin set frontend config("enable_new_load_scan_node" = "true");
begin;
insert into tbl1 values(1,2);
insert into tbl1 values(3,4);
insert into tbl1 values(5,6);
commit;
Add some limitation to transactional insert

Do not support non-literal value in insert stmt
Fix some issue about array type:

Forbid cast other non-array type to NESTED array type, it may cause BE crash.
Add getStringValueForArray() method for Expr, to get valid string-formatted array type value.
Add useLocalSessionState=true in regression-test jdbc url
without this config, the jdbc driver will send some init cmd each time it connect to server, such as
select @@session.tx_read_only.
But when we use transactional insert, after begin command, Doris do not support any other type of
stmt except for insert, commit or rollback.
So adding this config to let the jdbc NOT send cmd when connecting.
2022-11-03 08:36:07 +08:00
Fy
e021705053 [feature](nereids) support common table expression (#12742)
Support common table expression(CTE) in Nereids:
- Just implemented inline CTE, which means we will copy the logicalPlan of CTE everywhere it is referenced;
- If the name of CTE is the same as an existing table or view, we will choose CTE first;
2022-11-02 23:41:53 +08:00
0ea7f85986 [fix](keyword) add BIN as keyword (#13907) 2022-11-02 22:30:43 +08:00
53814e466b [Enhancement](Nereids)optimize merge group in memo #13900 2022-11-02 20:42:55 +08:00
374303186c [Vectorized](function) support topn_array function (#13869) 2022-11-02 19:49:23 +08:00
b26d8f284c [fix](rpc) The proxy removed when rpc exception occurs is not an abnormal proxy (#13836)
`BackendServiceProxy.getInstance()` uses the round robin strategy to obtain the proxy,
so when the current RPC request is abnormal, the proxy removed by 
`BackendServiceProxy.getInstance().removeProxy(...)` is not an abnormal proxy.
2022-11-02 19:39:33 +08:00
6eea855e78 [feature](Nereids) Support lots of scalar function and fix some bug (#13764)
Proposed changes
1. function interfaces that can search the matched signature, say ComputeSignature. It's equal to the Function.CompareMode.
   - IdenticalSignature: equal to Function.CompareMode.IS_IDENTICAL
   - NullOrIdenticalSignature: equal to Function.CompareMode.IS_INDISTINGUISHABLE
   - ImplicitlyCastableSignature: equal to Function.CompareMode.IS_SUPERTYPE_OF
   - ExplicitlyCastableSignature: equal to Function.CompareMode.IS_NONSTRICT_SUPERTYPE_OF
3. generate lots of scalar functions
4. bug-fix: disassemble avg function compute wrong result because the wrong input type, the AggregateParam.inputTypesBeforeDissemble is use to save the origin input type and pass to backend to find the correct global aggregate function.
5. bug-fix: subquery with OneRowRelation will crash because wrong nullable property


Note:
1. currently no more unit test/regression test for the scalar functions, I will add the test until migrate aggregate functions for unified processing.
2. A known problem is can not invoke the variable length function, I will fix it later.
2022-11-02 18:01:08 +08:00
a871fef815 [Improve](Nereids): refactor eliminate outer join (#13402)
Refactor eliminate outer join #12985

Evaluate the expression with ConstantFoldRule. If the evaluation result is NULL or FALSE, then the elimination condition is satisfied.
2022-11-02 17:39:05 +08:00
1bafb26217 [fix](Nereids) throw NPE when call getOutputExprIds in LogicalProperties (#13898) 2022-11-02 16:52:18 +08:00
699ffbca0e [enhancement](Nereids) generate correct distribution spec after project (#13725)
after project, some Slot maybe project to another one. So we need to replace ExprId in DistributionSpecHash to the new one. if we do project other than Alias, We need to return DistributionSpecAny other than child's DistributionSpec.
2022-11-02 16:50:44 +08:00
f2a0adf34e [fix](fe) Inconsistent behavior for string comparison in FE and BE (#13604) 2022-11-02 15:32:13 +08:00
6f3db8b4b4 [enhancement](Nereids) add eliminate unnecessary project rule (#13886)
This rule eliminate project that output set is same with its child. If the project is the root of plan, the elimination condition is project's output is exactly the same with its child.

The reason to add this rule is when we do join reorder in optimization, the root of plan after transformed maybe a Project and its output set is same with the root of plan before transformed. If we had a Project on the top of the root and its output set is same with the root of plan too. We will have two exactly same projects in memo. One of them is the parent of the other. After MergeProject, we will get a new Project exactly same like the child and need to add to parent's group. Then we trigger Merge Group. Since merge will produce a cycle, the merge will be denied and we will get a final plan with two consecutive projects.

## for example:
**BEFORE OPTIMIZATION**
```
LogicalProject1( projects=[c_custkey#0, c_name#1]) [GroupId#1]
+--LogicalJoin(type=LEFT_SEMI_JOIN)                [GroupId#2]
   |--LogicalProject(...)
   |  +--LogicalJoin(type=INNER_JOIN)
   |  ...
   +--LogicalOlapScan(...)
```
**AFTER APPLY RULE: LOGICAL_SEMI_JOIN_LOGICAL_JOIN_TRANSPOSE_PROJECT**
```
LogicalProject1( projects=[c_custkey#0, c_name#1])    [GroupId#1]
+--LogicalProject2( projects=[c_custkey#0, c_name#1]) [GroupId#2]
   +--LogicalJoin(type=INNER_JOIN)                    [GroupId#10]
      |--LogicalProject(...)
      |  +--LogicalJoin(type=LEFT_SEMI_JOIN)
      |  ...
      +--LogicalOlapScan(...)
```
**AFTER APPLY RULE: MERGE_PROJECTS**
```
LogicalProject3( projects=[c_custkey#0, c_name#1])  [should be in GroupId#1, but in GroupId#2 in fact]
+--LogicalJoin(type=INNER_JOIN)                     [GroupId#10]
   |--LogicalProject(...)
   |  +--LogicalJoin(type=LEFT_SEMI_JOIN)
   |  ...
   +--LogicalOlapScan(...)
```
Since we have exaclty GroupExpression(LogicalProject3 and LogicalProject2) in GroupId#1 and GroupId#2, we need to do MergeGroup(GroupId#1, GroupId#2). But we have child of GroupId#1 in GroupId#2. So the merge is denied.
If the best GroupExpression in GroupId#2 is LogicalProject3, we will get two consecutive projects in the final plan.
2022-11-02 14:16:03 +08:00
ee8dffbfb7 [meta](recover) change dropInfo and RecoverInfo to GSON (#13830) 2022-11-02 13:32:46 +08:00
d5becdb4a1 [fix](dynamic-partition) fix wrong check of replication num (#13755) 2022-11-02 12:55:33 +08:00
wxy
947e67fa76 [enhancement](test) retry start be or fe when port has been bind. (#13860)
Co-authored-by: wangxiangyu@360shuke.com <wangxiangyu@360shuke.com>
2022-11-02 08:42:35 +08:00
0eeb4d2881 [minor](log) remove some e.printStackTrace() (#13870) 2022-11-02 08:42:10 +08:00
7f34698eef [enhancement](Nereids) use join estimation v2 only when stats derive v2 is enable (#13845)
join estimation V2 should be invoked when enableNereidsStatsDeriveV2=true
2022-11-01 20:38:39 +08:00
f0c9867af3 [fix](nereids) map literal to double in FilterSelectivityCalculator (#13776)
fix literal to double bug: all literal type implements getDouble() function
2022-11-01 20:20:44 +08:00
01f9f8ad43 [enhancement](Nereids) add merge project rule to column prune rule set (#13835)
when we do column prune, we add project on child plan. If child plan is Project. we need to merge them.
2022-11-01 20:17:53 +08:00
61c817f4cc [feature](syntax) support SELECT * EXCEPT (#13844)
* [feature](syntax) support SELECT * EXCEPT: add regression test
2022-11-01 19:41:25 +08:00
1eef986e75 [feature](nereids) add rule for semi/anti join exploration, when there is project between them (#13756) 2022-11-01 19:07:25 +08:00
c14277e587 [fix](analytic) fix coredump cause by empty analytic parameter types (#13808)
* fix fe compile error
2022-11-01 17:25:36 +08:00
83e55cade8 [feature](Nereids): add rule for matching plan into HyperGraph. (#13805) 2022-11-01 14:57:25 +08:00
34e68a41dd [enhancement](explain) add cardinality to explain string and explain graph (#13720)
1. set cardinality when translate Nereids plan to legacy planner's plan
2. print cardinality when use EXPLAIN GRAPH
2022-11-01 11:43:21 +08:00
b27714542d [fix](planner) infer predicate could generate predicates in another scope (#13691)
* [fix](planner) infer predicate could generate predicates in another scope
2022-11-01 09:03:41 +08:00
36a47dfe16 [enhancement](Nereids): use ImmutableList explicitly in Plan (#13817) 2022-10-31 20:23:30 +08:00
18be77af64 [fix](nereids) query cannot execution when both nereids enable and fallback to legacy planner are set to false (#13787)
when enable_nereids_planner=false and enable_fallback_to_origin=false, FE throws exception for all select statement.
Expected: when enable_nereids_planner=false, all valid query execution success
2022-10-31 19:02:01 +08:00
ba177a15cb [feature-wip](recover) new recover ddl and support show catalog recycle bin (#13067) 2022-10-31 17:44:56 +08:00
ceb7b60a64 [fix](Nereids) update immutable LogicalAggregate attribute by mistake (#13740) 2022-10-31 14:11:55 +08:00
53e5f3939e [fix](plan)result exprs should be substituted in the same way as agg exprs (#13744)
* [fix](cast)ignore implicit cast when comparing two exprs

* fix fe ut
2022-10-31 10:19:32 +08:00
61b7c2c96c [fix](join) fix incorrect result when using anti join with other join predicates (#13743) 2022-10-31 09:51:34 +08:00
efe813ba60 [fix](test)(explain) add full qualified name for scan node explain string (#13777)
1.
In the "explain" result of SQL, the table name in `ScanNode` should be full qualified with dbname.
And for olap scan node, the selected index name should not be "null".

2.
Remove `tpch_sf1_p1/tpch_sf1/nereids/` in regression test, it will be fixed later.
2022-10-30 13:24:48 +08:00
2a5d3dbb6e feat(nereids): draw hyper graph by graphviz (#13749) 2022-10-28 17:23:35 +08:00
e0667b297f [feature-wip](multi-catalog) reuse hdfsFs and decode parquet values in batch (#13688)
PR(https://github.com/apache/doris/pull/13404) introduced that ParquetReader
will break up batch insertion when encountering null values, which leads to the bad performance
compared to OrcReader.
So this PR has pushed null map into decode function, reduce the time of virtual function call
when encountering null values.

Further more, reuse hdfsFS among file readers to reduce the time of building connection to hdfs.
2022-10-28 15:52:52 +08:00
Pxl
2fab0c45c7 [Feature](runtime-filter) add runtime filter breaking change adapt (#13246)
add runtime filter breaking change adapt
2022-10-28 10:59:28 +08:00
45b31506c7 [improvement](delete) support delete from partitioned table without partition specified (#13533)
Support delete from partitioned table without partition specified in [DELETE] stmt.

## Usage
If it is a partitioned table, you can specify a partition.
If not specified, Doris will infer partition from the given conditions.
In two cases, Doris cannot infer the partition from conditions:
1) the conditions do not contain partition columns;
2) The operator of the partition column is `not in`.
When a partition table does not specify the partition,
or the partition cannot be inferred from the conditions,
the session variable `delete_without_partition` needs to be `true`
to make delete statement be applied to all partitions.

## Test case
Test case is added in `regression-test/suites/delete_p0/test_delete_from_partition.groovy`,
user can delete from partitioned table without partition specified now.
2022-10-27 21:32:45 +08:00
ec86e9c9b2 [feature-wip][MTMV] The schedule framework for the MTMV (#13147)
Design document: https://github.com/apache/doris/issues/13146
2022-10-27 11:37:24 +08:00
0e70d681d9 [feature](Nereids): Construct join graph (#13679)
* feat: add hypergraph and its api

* feat: add visulization api

Signed-off-by: xiejiann <jianxie0@gmail.com>

* remove unused code

Signed-off-by: xiejiann <jianxie0@gmail.com>

* fix format

Signed-off-by: xiejiann <jianxie0@gmail.com>

* remove unused test

Signed-off-by: xiejiann <jianxie0@gmail.com>

* remove unused tests

Signed-off-by: xiejiann <jianxie0@gmail.com>

* format

Signed-off-by: xiejiann <jianxie0@gmail.com>

Signed-off-by: xiejiann <jianxie0@gmail.com>
2022-10-27 11:32:31 +08:00
2697f72d77 [Improvement][SET-PROPERTY] Support for set query_timeout property (#13444) 2022-10-27 10:03:39 +08:00
7557980d64 [improvement](regression-test) avoid query empty result after loading finished (#13682)
When running regression test, we always found that the query return empty result after loading finished,
even if we call "sync" before the query.
This is because for `stream load`, the load task result will be returned immediately after the txn's status changed to VISIBLE,
but before writing the edit log.
So if we do the query right after we got the load task result, it is possible that we can not see the latest loaded data.

Same issue with `insert` operation
2022-10-27 09:47:18 +08:00
5bd66243ee [minor](log) remove some unused logs (#13689)
1. When running regression test with specific suites or group, do not print other suite name or file name
2. Remove unused alter table job log.
2022-10-27 09:37:32 +08:00
ddb27b9c3f nereids use decimal(27,9) (#13678) 2022-10-26 21:37:24 +08:00
f4c8d4ce85 [feature](nereids) estimate plan cost by column ndv and table row count (#13375)
In this version, we use column ndv information to estimate plan cost.

This is the first version, covers TPCH queries.
2022-10-26 20:35:10 +08:00
bed759b3f5 [Fix](array-type) support CTAS for ARRAY column from collect_list and collect_set (#13627)
Co-authored-by: cambyzju <zhuxiaoli01@baidu.com>
2022-10-26 19:42:15 +08:00
0841c5bf28 [Bugfix](manager) fix query profile key incompatible with old versions (#13596) 2022-10-26 14:27:58 +08:00
3548d0b824 [fix](statistics) fix cross join statistics exception (#13645) 2022-10-26 14:10:57 +08:00
c418bbd2d1 [feature-wip](new-scan) support Json reader (#13546)
Issue Number: close #12574
This pr adds `NewJsonReader` which implements GenericReader interface to support read json format file.

TODO:
1. modify `_scann_eof` later.
2. Rename `NewJsonReader` to `JsonReader` when `JsonReader` is deleted.
2022-10-26 12:52:21 +08:00