Commit Graph

6964 Commits

Author SHA1 Message Date
699ffbca0e [enhancement](Nereids) generate correct distribution spec after project (#13725)
after project, some Slot maybe project to another one. So we need to replace ExprId in DistributionSpecHash to the new one. if we do project other than Alias, We need to return DistributionSpecAny other than child's DistributionSpec.
2022-11-02 16:50:44 +08:00
f2a0adf34e [fix](fe) Inconsistent behavior for string comparison in FE and BE (#13604) 2022-11-02 15:32:13 +08:00
6f3db8b4b4 [enhancement](Nereids) add eliminate unnecessary project rule (#13886)
This rule eliminate project that output set is same with its child. If the project is the root of plan, the elimination condition is project's output is exactly the same with its child.

The reason to add this rule is when we do join reorder in optimization, the root of plan after transformed maybe a Project and its output set is same with the root of plan before transformed. If we had a Project on the top of the root and its output set is same with the root of plan too. We will have two exactly same projects in memo. One of them is the parent of the other. After MergeProject, we will get a new Project exactly same like the child and need to add to parent's group. Then we trigger Merge Group. Since merge will produce a cycle, the merge will be denied and we will get a final plan with two consecutive projects.

## for example:
**BEFORE OPTIMIZATION**
```
LogicalProject1( projects=[c_custkey#0, c_name#1]) [GroupId#1]
+--LogicalJoin(type=LEFT_SEMI_JOIN)                [GroupId#2]
   |--LogicalProject(...)
   |  +--LogicalJoin(type=INNER_JOIN)
   |  ...
   +--LogicalOlapScan(...)
```
**AFTER APPLY RULE: LOGICAL_SEMI_JOIN_LOGICAL_JOIN_TRANSPOSE_PROJECT**
```
LogicalProject1( projects=[c_custkey#0, c_name#1])    [GroupId#1]
+--LogicalProject2( projects=[c_custkey#0, c_name#1]) [GroupId#2]
   +--LogicalJoin(type=INNER_JOIN)                    [GroupId#10]
      |--LogicalProject(...)
      |  +--LogicalJoin(type=LEFT_SEMI_JOIN)
      |  ...
      +--LogicalOlapScan(...)
```
**AFTER APPLY RULE: MERGE_PROJECTS**
```
LogicalProject3( projects=[c_custkey#0, c_name#1])  [should be in GroupId#1, but in GroupId#2 in fact]
+--LogicalJoin(type=INNER_JOIN)                     [GroupId#10]
   |--LogicalProject(...)
   |  +--LogicalJoin(type=LEFT_SEMI_JOIN)
   |  ...
   +--LogicalOlapScan(...)
```
Since we have exaclty GroupExpression(LogicalProject3 and LogicalProject2) in GroupId#1 and GroupId#2, we need to do MergeGroup(GroupId#1, GroupId#2). But we have child of GroupId#1 in GroupId#2. So the merge is denied.
If the best GroupExpression in GroupId#2 is LogicalProject3, we will get two consecutive projects in the final plan.
2022-11-02 14:16:03 +08:00
ba918b40e2 [chore](macOS) Fix compilation errors caused by the deprecated function (#13890) 2022-11-02 13:34:51 +08:00
ee8dffbfb7 [meta](recover) change dropInfo and RecoverInfo to GSON (#13830) 2022-11-02 13:32:46 +08:00
e6080a6e4c [regression](join) add right anti join with other predicate regression case (#13815) 2022-11-02 13:27:58 +08:00
d5becdb4a1 [fix](dynamic-partition) fix wrong check of replication num (#13755) 2022-11-02 12:55:33 +08:00
667cfe5598 [community](collaborators) add more collaborators (#13880) 2022-11-02 12:54:04 +08:00
Pxl
be124523f4 [enhancement](profile) add profile to show column predicates (#13862) 2022-11-02 09:07:26 +08:00
277025b046 [fix](join)ColumnNullable need handle const column with nullable const value (#13866) 2022-11-02 08:52:49 +08:00
bd6070d9b3 [doc](spark-doris-connetor)Add spark Doris connector to support streamload documentation #13834 2022-11-02 08:43:52 +08:00
wxy
3fc1b27c40 [docs](tablet-docs) fix the tablet-repair-and-balance.md doucument. (#13853)
Co-authored-by: wangxiangyu@360shuke.com <wangxiangyu@360shuke.com>
2022-11-02 08:43:08 +08:00
wxy
947e67fa76 [enhancement](test) retry start be or fe when port has been bind. (#13860)
Co-authored-by: wangxiangyu@360shuke.com <wangxiangyu@360shuke.com>
2022-11-02 08:42:35 +08:00
0eeb4d2881 [minor](log) remove some e.printStackTrace() (#13870) 2022-11-02 08:42:10 +08:00
de1dc62843 [enhancement](olap scanner) Scanner row bytes buffer is too small bug (#13874)
* [enhancement](olap scanner) Scanner row bytes buffer is too small, please try to increase be config

Co-authored-by: yiguolei <yiguolei@gmail.com>
2022-11-02 08:41:50 +08:00
7fedfdcf6a [fix](spark load)The where condition does not take effect when spark load loads the file (#13803) 2022-11-01 23:01:45 +08:00
3924ecead5 [minor](load) Improve error message for string type in loading process (#13718) 2022-11-01 22:02:33 +08:00
8b3afd431e [improvement](memory) simplify memory config related to tcmalloc (#13781)
There are several configs related to tcmalloc, users do know how to config them. Actually users just want two modes, performance or compact, in performance mode, users want doris run query and load quickly while in compact mode, users want doris run with less memory usage.

If we want to config tcmalloc individually, we can use env variables which are supported by tcmalloc.
2022-11-01 21:45:19 +08:00
287a739510 [javaudf](string) Fix string format in java udf (#13854) 2022-11-01 21:25:12 +08:00
7f34698eef [enhancement](Nereids) use join estimation v2 only when stats derive v2 is enable (#13845)
join estimation V2 should be invoked when enableNereidsStatsDeriveV2=true
2022-11-01 20:38:39 +08:00
f0c9867af3 [fix](nereids) map literal to double in FilterSelectivityCalculator (#13776)
fix literal to double bug: all literal type implements getDouble() function
2022-11-01 20:20:44 +08:00
01f9f8ad43 [enhancement](Nereids) add merge project rule to column prune rule set (#13835)
when we do column prune, we add project on child plan. If child plan is Project. we need to merge them.
2022-11-01 20:17:53 +08:00
61c817f4cc [feature](syntax) support SELECT * EXCEPT (#13844)
* [feature](syntax) support SELECT * EXCEPT: add regression test
2022-11-01 19:41:25 +08:00
1eef986e75 [feature](nereids) add rule for semi/anti join exploration, when there is project between them (#13756) 2022-11-01 19:07:25 +08:00
f30b974d54 [Bugfix](upgrade) Fix 1.1 upgrade 1.2 coredump when schema change (#13822)
When upgrade 1.2 version from 1.1, FE version will don't match BE version for a period of time. After upgrade BE and doing schema change, BE will use a field desc_tbl that add in 1.2 version FE. BE will coredump because the field desc_tbl is nullptr. So it need to refuse the request.
2022-11-01 17:35:24 +08:00
c14277e587 [fix](analytic) fix coredump cause by empty analytic parameter types (#13808)
* fix fe compile error
2022-11-01 17:25:36 +08:00
83e55cade8 [feature](Nereids): add rule for matching plan into HyperGraph. (#13805) 2022-11-01 14:57:25 +08:00
942611c185 Revert "[enhancement](compaction) opt compaction task producer and quick compaction (#13495)" (#13833)
This reverts commit 4f2ea0776ca3fe5315ab5ef7e00eefabfb5771a0.
2022-11-01 14:22:12 +08:00
7db916fc85 [enhancement](metric)Add metric for exec_state prepare function (#13646)
* add bvar metric for exec_state prepare function
2022-11-01 14:09:47 +08:00
e63608b556 [Bug](test) fix some test case result is ramdom (#13837) 2022-11-01 14:06:47 +08:00
42b2725f03 [Bug](delete) Fix wrong delete operation (#13840) 2022-11-01 13:38:43 +08:00
34e68a41dd [enhancement](explain) add cardinality to explain string and explain graph (#13720)
1. set cardinality when translate Nereids plan to legacy planner's plan
2. print cardinality when use EXPLAIN GRAPH
2022-11-01 11:43:21 +08:00
Pxl
164ca1e1a8 [Bug](function) change log fatal to log warning to avoid code dump on nullable double column cast to decimal column (#13819) 2022-11-01 09:54:35 +08:00
7f2166b1fd [fix](thrift) fix that thrift struct sequence number is not consistent in 1.1-lts and master (#13829) 2022-11-01 09:14:33 +08:00
b27714542d [fix](planner) infer predicate could generate predicates in another scope (#13691)
* [fix](planner) infer predicate could generate predicates in another scope
2022-11-01 09:03:41 +08:00
d2c5c1af3b [feature](regression) add custom config file for Regression: regression-conf-custom.groovy (#13783) 2022-10-31 22:49:06 +08:00
cc0fa5fef6 [fix](array-type) fix the be core dump when import array<largeint> (#13821)
- this pr is used to fix the be core dump when import array.
- before the change, we import array by rapidjson string will core dump under the non-vectorized scenario.
- after the change, we can import array by rapidjson string successfully.
2022-10-31 22:08:55 +08:00
36a47dfe16 [enhancement](Nereids): use ImmutableList explicitly in Plan (#13817) 2022-10-31 20:23:30 +08:00
Pxl
57a9b0fa65 [Enhancement](chore) remove unused diagnostic (#12337)
remove unused diagnostic
2022-10-31 19:19:13 +08:00
7ae60a0ad2 [feature](function)add url functions: domain and protocol (#13662) 2022-10-31 19:13:08 +08:00
18be77af64 [fix](nereids) query cannot execution when both nereids enable and fallback to legacy planner are set to false (#13787)
when enable_nereids_planner=false and enable_fallback_to_origin=false, FE throws exception for all select statement.
Expected: when enable_nereids_planner=false, all valid query execution success
2022-10-31 19:02:01 +08:00
ba177a15cb [feature-wip](recover) new recover ddl and support show catalog recycle bin (#13067) 2022-10-31 17:44:56 +08:00
f49a0daf54 [fix](regression) Fix concurrent regression failure (#13798) 2022-10-31 15:57:45 +08:00
ceb7b60a64 [fix](Nereids) update immutable LogicalAggregate attribute by mistake (#13740) 2022-10-31 14:11:55 +08:00
2fb218173e [improvement](scan) change the max thread num and num of free blocks in new scan (#13793)
1. 
In the previous implementation, the max thread num of olap scanner was set relatively small, such as 3.
which would slow down some of queries.
In this PR, I changed the max thread num  to a quarter of the scaner thread pool(default is 12),
which is less than the old scan node's max thread num, but larger than the previous implementation.
The upper limit of the max thread num of the old scan node is too high, which is not reasonable.

2.
Lower down the number of pre allocated free blocks.
2022-10-31 14:00:06 +08:00
4f2ea0776c [enhancement](compaction) opt compaction task producer and quick compaction (#13495)
1.remove quick_compaction's rowset pick policy, call cu compaction when trigger
quick compaction
2. skip tablet's compaction task when compaction score is too small

Co-authored-by: yixiutt <yixiu@selectdb.com>
2022-10-31 12:24:05 +08:00
53e5f3939e [fix](plan)result exprs should be substituted in the same way as agg exprs (#13744)
* [fix](cast)ignore implicit cast when comparing two exprs

* fix fe ut
2022-10-31 10:19:32 +08:00
61b7c2c96c [fix](join) fix incorrect result when using anti join with other join predicates (#13743) 2022-10-31 09:51:34 +08:00
2b9e1878a2 [fix](hashjoin) return error if in progress of upgrade (#13753) 2022-10-31 09:41:20 +08:00
f5761c658f [Fix]Fix the extension mysql_to_doris bug (#13723)
* Fix the extension mysql_to_doris  BUG

e_mysql_to_doris.sh: command error,This error causes script execution errors.  :ERROR 1103 (42000) at line 1: Incorrect table name ''.
 ` ` symbol position error

* Update extension/mysql_to_doris/bin/e_mysql_to_doris.sh


Co-authored-by: Adonis Ling <adonis0147@gmail.com>
2022-10-31 08:45:34 +08:00