Support collect hive external table statistics by running sql against hive table.
By running sql, we could collect all the statistics collected for Olap table, including the min, max value of String column.
With 3 BE (16 core, 64 GB), it cost less than 2 minutes to collect TPCH 100GB statistics for all columns of all tables.
Also less than 2 minutes to collect all columns statistics for SSB 100GB tables.
1. reduce s3 buffer pool's ctor cost
2. before this pr, if one s3 file writer return err when calling append or close function, the caller will not call abort function which result in one confusing DCHECK failed like the following picture
Support reading Hudi MOR table by using jni connector.
Note:
the FE part of the current PR is not completed all, and the BE part will be supplemented in next PR.
* Revert "[fix](sink) fix END_OF_FILE error for pipeline caused by VDataStreamSender eof (#20007)"
This reverts commit 2ec1d282c5e27b25d37baf91cacde082cca4ec31.
* [fix](revert) data stream sender stop sending data to receiver if it returns eos early (#19847)"
This reverts commit c73003359567067ea7d44e4a06c1670c9ec37902.
# Proposed changes
Before the change:
```
mysql> SET enable_nereids_planner=true;
Query OK, 0 rows affected (0.01 sec)
mysql> explain select /*+ SET_var(enable_nereids_planner = false) */ year_floor(cast('2023-04-28' as date));
-- omit the result here
10 rows in set (0.01 sec)
mysql> select @@enable_nereids_planner;
+--------------------------+
| @@enable_nereids_planner |
+--------------------------+
| 0 |
+--------------------------+
1 row in set (0.00 sec)
```
After the change:
```
mysql> SET enable_nereids_planner=true;
Query OK, 0 rows affected (0.01 sec)
mysql> explain select /*+ SET_var(enable_nereids_planner = false) */ year_floor(cast('2023-04-28' as date));
-- omit the result here
10 rows in set (0.14 sec)
mysql> select @@enable_nereids_planner;
+------+
| TRUE |
+------+
| 1 |
+------+
1 row in set (0.25 sec)
```
# Problem summary
We have already recorded the old session vars when we use the `Nereids` to handle the `set_var` hint.
But after we change the optimizer to the old one, it will handle the `set_var` hint again. But it has already taken effect before. So the old value has already changed. But we will use the changed value to overwrite again.
# Describe your changes.
We will check the old session var value when we want to record it first. If there exists the value, just skip it.
the rule of constant folding on Logical Operator is:
true and true -> true
true and false -> false
false and false -> false
true and x -> x
false and x -> false
null and true -> null
null and false -> false
null and null -> null
null and x -> null and x
true or true -> true
true or false -> true
false or false -> false
true or x -> true
false or x -> false or x
null or true -> true
null or false -> null
null or null -> null
null or x -> null or x
support insert the ret-value of a query into a table with `partition`, `with label`, `cols` tags:
```
insert into t partition (p1, p2)
with label label_1
(c1, c2, c3)
[hint1, hint2]
with cte as (
select * from src
)
select k1, k2, k3 from cte
```
we create new class: InsertIntoTableCommand, Unbound/Logical/PhysicalOlapTableSink to describe the command of insert and the olapTableSink for Nereids.
We make UnboundOlapTableSink in parsing phase and bind it, then implement and translate the node to OlapTableSink.
Then we run the command with a transaction.
Fix bugs:
1. should return the other side child of Or if current side is NULL after constant fold
2. Lead should has three parameters, remove the default value ctors
Not enable Nereids case under nereids_p0
1. nereids_p0/join/sql
2. nereids_p0/sql_functions/horology_functions/sql
Should disble Nereids explicitly because the result is not same
1. query_p0/sql_functions/horology_functions/sql
2. query_p0/stats/query_stats_test.groovy
3. query_profile/test_profile.groovy
Unstable regression test case
1. nereids_syntax_p0/join.groovy
This pr is mainly supplement statistics regression test. include the following:
analyze stats p0 tests:
1. Universal analysis
analyze stats p1 tests:
1. Universal analysis
2. Sampled analysis
3. Incremental analysis
4. Automatic analysis
5. Periodic analysis
manage stats p0 tests:
1. Alter table stats
2. Show table stats
3. Alter column stats
4. Show column stats and histogram
5. Drop column stats
6. Drop expired stats
TODO:
1. Supplement related documents
2. Optimize for unstable cases encountered during testing
3. Add other cases
For pr related to statistics, should ensure that all of these cases pass!
For broadcast join, only one build fragment instance will build hash table, other fragment instances just receive and throw away build side data, this is waste of memory and cpu.
This PR improve this condition, data stream receiver tells sender that it does not need data from sender, and sender stops sending anydata to it.
In regression test, there are many query timeout, but we do not know the query id, and it is too hard to use the sql text to find the query id in audit log. So that I add query id during query timeout.
---------
Co-authored-by: yiguolei <yiguolei@gmail.com>
Now we not support array/map/struct nesting each other for many action in be , If we do prohibit it in fe, we will meet many undefined action in be , so I just add switch to prohibit nesting complex type . When we fully support , can make it able.
Issue Number: close #xxx
After the query check process memory exceed limit in Allocator, it will wait up to 5s.
Before, Allocator will not check whether the query is canceled while waiting for memory, this causes the query to not end quickly.