Commit Graph

10740 Commits

Author SHA1 Message Date
8a8d3bcb59 [improvement](multi catalog, nereids)Support collect hive table statistics by sql (#19955)
Support collect hive external table statistics by running sql against hive table.
By running sql, we could collect all the statistics collected for Olap table, including the min, max value of String column.

With 3 BE (16 core, 64 GB), it cost less than 2 minutes to collect TPCH 100GB statistics for all columns of all tables.
Also less than 2 minutes to collect all columns statistics for SSB 100GB tables.
2023-05-26 10:31:02 +08:00
5621ae08e6 [fix](Nereids) function ABS return type not same between constant folding and function signature (#20059)
The abs return the wrong type for the integer type. Return the int type when the arg's type is integer
2023-05-26 10:24:32 +08:00
f1b949ad59 [fix](Nereids) local sort should not translate to unpartitioned partition (#20031)
1. local sort should not update current fragment partition to UNPARTITIONED
2. should set input fragment dest exchange node after create dest fragment
2023-05-26 10:18:56 +08:00
dca0ebb281 [fix](Nereids) constant folding to null should retain data type (#20070) 2023-05-26 10:14:08 +08:00
56360ba04a [fix](memory) Load flush memtable no check memory exceed #20036 2023-05-26 09:57:00 +08:00
bc4e0e97f2 [enhance](S3FileWriter) abort when s3 file writer abnormally quite and optimize s3 buffer pool (#19944)
1. reduce s3 buffer pool's ctor cost
2. before this pr, if one s3 file writer return err when calling append or close function, the caller will not call abort function which result in one confusing DCHECK failed like the following picture
2023-05-26 09:14:38 +08:00
3c2b2361be [docs](memory) debug-tools memory part description Jemalloc #20054 2023-05-26 08:58:57 +08:00
9185b202c5 [Fix](multi-catalog) Fix compilation errors in Column.java. (#20075) 2023-05-25 23:51:29 +08:00
dfaa2db653 [typo](docs) modify the en url to zh url in 2.0 alpha release zh doc (#20038) 2023-05-25 21:08:19 +08:00
d6998723e8 Comment stats unstable cases (#20034) 2023-05-25 21:08:00 +08:00
3f971889b7 [Enhancement](multi catalog) Support hudi mor only java side ,be side not support (#19909)
Support reading Hudi MOR table by using jni connector.
Note:
the FE part of the current PR is not completed all, and the BE part will be supplemented in next PR.
2023-05-25 20:37:01 +08:00
5ee13ce2ac [fix](Nereids): memo skipProject() shouldn't skip NotEliminated project (#20051) 2023-05-25 20:01:31 +08:00
686711adda [fix](publish) dot use wait_for for publish synchorization (#20029)
It leads to use after free problem.
2023-05-25 20:01:06 +08:00
0dce725120 [fix](nereids)fix decimalv3 type error of mod operator (#20039) 2023-05-25 17:25:11 +08:00
3598518e59 [fix](revert) data stream sender stop sending data to receiver if it returns eos early (#19847)" (#20040)
* Revert "[fix](sink) fix END_OF_FILE error for pipeline caused by VDataStreamSender eof (#20007)"

This reverts commit 2ec1d282c5e27b25d37baf91cacde082cca4ec31.

* [fix](revert) data stream sender stop sending data to receiver if it returns eos early (#19847)"

This reverts commit c73003359567067ea7d44e4a06c1670c9ec37902.
2023-05-25 16:50:17 +08:00
694b8b6cd3 [test](pipline) adjust mem limit to 50% (#20030) 2023-05-25 15:51:32 +08:00
Pxl
618961053f [Bug](materialized-view) forbid create mv/rollup on mow table (#20001)
forbid create mv/rollup on mow table
2023-05-25 15:30:12 +08:00
002c76e06f [vectorized](udaf) support udaf function work with window function (#19962) 2023-05-25 14:38:47 +08:00
04415d0b35 [opt](balance) add config balance_slot_num_per_path (#19869)
Make balance_slot_num_per_path configurable.
2023-05-25 13:39:42 +08:00
99e0f7b184 [opt](Nereids) restore the set_var hint after finish the execution (#20004)
# Proposed changes
Before the change:
```
mysql> SET enable_nereids_planner=true;
Query OK, 0 rows affected (0.01 sec)

mysql> explain select /*+ SET_var(enable_nereids_planner = false) */ year_floor(cast('2023-04-28' as date));
-- omit the result here
10 rows in set (0.01 sec)

mysql> select @@enable_nereids_planner;
+--------------------------+
| @@enable_nereids_planner |
+--------------------------+
|                        0 |
+--------------------------+
1 row in set (0.00 sec)
```

After the change:
```
mysql> SET enable_nereids_planner=true;
Query OK, 0 rows affected (0.01 sec)

mysql> explain select /*+ SET_var(enable_nereids_planner = false) */ year_floor(cast('2023-04-28' as date));
-- omit the result here
10 rows in set (0.14 sec)

mysql> select @@enable_nereids_planner;
+------+
| TRUE |
+------+
|    1 |
+------+
1 row in set (0.25 sec)
```

# Problem summary
We have already recorded the old session vars when we use the `Nereids` to handle the `set_var` hint.
But after we change the optimizer to the old one, it will handle the `set_var` hint again. But it has already taken effect before. So the old value has already changed. But we will use the changed value to overwrite again.

# Describe your changes.
We will check the old session var value when we want to record it first. If there exists the value, just skip it.
2023-05-25 12:32:01 +08:00
3ebd6e1649 [feat](stats) Support delete expired auto analysis tasks (#19922) 2023-05-25 12:25:11 +08:00
e04b9cb47e [vectorized](function) fix array_map funtion return type maybe get wrong (#19320) 2023-05-25 11:30:28 +08:00
53ae24912f [vectorized](feature) support partition sort node (#19708) 2023-05-25 11:22:02 +08:00
c49060a50b [fix](Nereids) the rule of fold constant for logical operator (#20017)
the rule of constant folding on Logical Operator is:
true and true -> true
true and false -> false
false and false -> false
true and x -> x
false and x -> false
null and true -> null
null and false -> false
null and null -> null
null and x -> null and x

true or true -> true
true or false -> true
false or false -> false
true or x -> true
false or x -> false or x
null or true -> true
null or false -> null
null or null -> null
null or x -> null or x
2023-05-25 11:21:12 +08:00
8149b757c4 [Feature](Nereids)support insert into select command (#18869)
support insert the ret-value of a query into a table with `partition`, `with label`, `cols` tags:

```
insert into t partition (p1, p2)
with label label_1
(c1, c2, c3)
[hint1, hint2]
with cte as (
  select * from src
)
select k1, k2, k3 from cte
```

we create new class: InsertIntoTableCommand, Unbound/Logical/PhysicalOlapTableSink to describe the command of insert and the olapTableSink for Nereids. 
We make UnboundOlapTableSink in parsing phase and bind it, then implement and translate the node to OlapTableSink.
Then we run the command with a transaction.
2023-05-25 10:44:41 +08:00
2ec1d282c5 [fix](sink) fix END_OF_FILE error for pipeline caused by VDataStreamSender eof (#20007)
* [fix](sink) fix END_OF_FILE error for pipeline caused by VDataStreamSender eof
2023-05-25 10:29:35 +08:00
2d668e8d0b [DEBUG](Log) Add debug string for pipeline task cacnel (#20026) 2023-05-25 09:58:31 +08:00
4610f26a6e [fix](auth)fix row policy use alias error (#19976)
Issue Number: close #19975
2023-05-25 09:10:31 +08:00
Pxl
f9a4a04bdb [fix](Nereids) npe when one row relation contain aggregate function (#19974)
mysql [test]>select sum(1);
ERROR 1105 (HY000): errCode = 2, detailMessage = Unexpected exception: null
2023-05-25 09:09:50 +08:00
a3fd7abf92 [fix](docs) fix wrong link (#20000) 2023-05-25 08:51:01 +08:00
Bin
156c8faac5 [typo](doc)deleted the space which broke the word model (#19991) 2023-05-25 08:22:41 +08:00
f881a2336b [Bug](regression) fix DCHECK failed in not enable pipeline engine (#20010) 2023-05-24 23:51:25 +08:00
bf4072e5b0 [fix](StorageEngine) release DataDir after the thread pool has been shutdown (#20014) 2023-05-24 23:51:06 +08:00
4a7dab228c [fix](doc) Add doris-future doc in sidebar (#19992) 2023-05-24 21:39:14 +08:00
1dd3a4ed3a [fix](Nereids) fix unstable regression test cases and some bugs (#19999)
Fix bugs:
1. should return the other side child of Or if current side is NULL after constant fold
2. Lead should has three parameters, remove the default value ctors

Not enable Nereids case under nereids_p0
1. nereids_p0/join/sql
2. nereids_p0/sql_functions/horology_functions/sql

Should disble Nereids explicitly because the result is not same
1. query_p0/sql_functions/horology_functions/sql
2. query_p0/stats/query_stats_test.groovy
3. query_profile/test_profile.groovy

Unstable regression test case
1. nereids_syntax_p0/join.groovy
2023-05-24 20:34:01 +08:00
a713c225a5 [regressiontest](statistics) Collate and supplement statistics regression test (#19901)
This pr is mainly supplement statistics regression test. include the following:

analyze stats p0 tests:

1. Universal analysis

analyze stats p1 tests:

1. Universal analysis
2. Sampled analysis
3. Incremental analysis
4. Automatic analysis
5. Periodic analysis

manage stats p0 tests:

1. Alter table stats
2. Show table stats
3. Alter column stats
4. Show column stats and histogram
5. Drop column stats
6. Drop expired stats

TODO:

1. Supplement related documents
2. Optimize for unstable cases encountered during testing
3. Add other cases

For pr related to statistics, should ensure that all of these cases pass!
2023-05-24 20:17:28 +08:00
4aad88abc4 [test](Nereids) fix tpcds shape out file #20002 2023-05-24 17:40:13 +08:00
2b3db8f2a8 [Bug](functions) Fix functions for array type with nested decimalv3 (#19993) 2023-05-24 16:51:34 +08:00
ff54b45775 [fix](partial-update) should hold tablet meta lock before calling lookup_row_key() (#19964) 2023-05-24 16:37:27 +08:00
Pxl
3ba7c2336b [Chore](build) change CMAKE_CXX_STANDARD from 17 to 20 #19987 2023-05-24 16:16:42 +08:00
e5eed53b89 [improvement](bitmap) Use shared_ptr in BitmapValue to avoid deep copying (#19101)
Currently bitmapvalue type is copied between columns, it cost a lot of memory. Use a shared ptr in bitmap value to avoid copy data.
2023-05-24 16:13:01 +08:00
c730033595 [improvement](exchange) data stream sender stop sending data to receiver if it returns eos early (#19847)
For broadcast join, only one build fragment instance will build hash table, other fragment instances just receive and throw away build side data, this is waste of memory and cpu.

This PR improve this condition, data stream receiver tells sender that it does not need data from sender, and sender stops sending anydata to it.
2023-05-24 15:11:32 +08:00
d0a3cdfe1a [enhancement](error message) print query id when query timeout (#19972)
In regression test, there are many query timeout, but we do not know the query id, and it is too hard to use the sql text to find the query id in audit log. So that I add query id during query timeout.
---------

Co-authored-by: yiguolei <yiguolei@gmail.com>
2023-05-24 14:40:33 +08:00
14b4c7abf9 [fix](hashtable) Check query cancel status during build hash table #19970
should cancel query during hash table build stage if the query is cancelled.
2023-05-24 14:24:03 +08:00
4603a60650 [opt](Nereids) give the easy understand error message when the window func misses the parameters (#19957)
For the new optimizer, if the window func misses the parameter. It will not give an understandable error message. So add the error message.
2023-05-24 14:18:22 +08:00
c84fd79051 [regression](nereids) fix tpcds plan shape #19985
skip tpcds 88/16/28/61/85/17/9/50/25/39/29/13/48/64
2023-05-24 14:04:28 +08:00
70f2e8ff80 [fix](nereids)enable decimalv3 by default for nereids (#19906) 2023-05-24 13:36:24 +08:00
f14e6189a9 [feature](load-refactor) Unfied mysql load use InsertStmt (#19571) 2023-05-24 12:09:16 +08:00
b4669eaeba [Improve](complex-type)add switch for array/struct/map nesting complex type (#19928)
Now we not support array/map/struct nesting each other for many action in be , If we do prohibit it in fe, we will meet many undefined action in be , so I just add switch to prohibit nesting complex type . When we fully support , can make it able.
Issue Number: close #xxx
2023-05-24 11:39:53 +08:00
cf7a74f6ec [fix](memory) query check cancel while waiting for memory in Allocator, and optimize log (#19967)
After the query check process memory exceed limit in Allocator, it will wait up to 5s.
Before, Allocator will not check whether the query is canceled while waiting for memory, this causes the query to not end quickly.
2023-05-24 11:08:48 +08:00