# Proposed changes
Before the change:
```
mysql> SET enable_nereids_planner=true;
Query OK, 0 rows affected (0.01 sec)
mysql> explain select /*+ SET_var(enable_nereids_planner = false) */ year_floor(cast('2023-04-28' as date));
-- omit the result here
10 rows in set (0.01 sec)
mysql> select @@enable_nereids_planner;
+--------------------------+
| @@enable_nereids_planner |
+--------------------------+
| 0 |
+--------------------------+
1 row in set (0.00 sec)
```
After the change:
```
mysql> SET enable_nereids_planner=true;
Query OK, 0 rows affected (0.01 sec)
mysql> explain select /*+ SET_var(enable_nereids_planner = false) */ year_floor(cast('2023-04-28' as date));
-- omit the result here
10 rows in set (0.14 sec)
mysql> select @@enable_nereids_planner;
+------+
| TRUE |
+------+
| 1 |
+------+
1 row in set (0.25 sec)
```
# Problem summary
We have already recorded the old session vars when we use the `Nereids` to handle the `set_var` hint.
But after we change the optimizer to the old one, it will handle the `set_var` hint again. But it has already taken effect before. So the old value has already changed. But we will use the changed value to overwrite again.
# Describe your changes.
We will check the old session var value when we want to record it first. If there exists the value, just skip it.
the rule of constant folding on Logical Operator is:
true and true -> true
true and false -> false
false and false -> false
true and x -> x
false and x -> false
null and true -> null
null and false -> false
null and null -> null
null and x -> null and x
true or true -> true
true or false -> true
false or false -> false
true or x -> true
false or x -> false or x
null or true -> true
null or false -> null
null or null -> null
null or x -> null or x
support insert the ret-value of a query into a table with `partition`, `with label`, `cols` tags:
```
insert into t partition (p1, p2)
with label label_1
(c1, c2, c3)
[hint1, hint2]
with cte as (
select * from src
)
select k1, k2, k3 from cte
```
we create new class: InsertIntoTableCommand, Unbound/Logical/PhysicalOlapTableSink to describe the command of insert and the olapTableSink for Nereids.
We make UnboundOlapTableSink in parsing phase and bind it, then implement and translate the node to OlapTableSink.
Then we run the command with a transaction.
Fix bugs:
1. should return the other side child of Or if current side is NULL after constant fold
2. Lead should has three parameters, remove the default value ctors
Not enable Nereids case under nereids_p0
1. nereids_p0/join/sql
2. nereids_p0/sql_functions/horology_functions/sql
Should disble Nereids explicitly because the result is not same
1. query_p0/sql_functions/horology_functions/sql
2. query_p0/stats/query_stats_test.groovy
3. query_profile/test_profile.groovy
Unstable regression test case
1. nereids_syntax_p0/join.groovy
This pr is mainly supplement statistics regression test. include the following:
analyze stats p0 tests:
1. Universal analysis
analyze stats p1 tests:
1. Universal analysis
2. Sampled analysis
3. Incremental analysis
4. Automatic analysis
5. Periodic analysis
manage stats p0 tests:
1. Alter table stats
2. Show table stats
3. Alter column stats
4. Show column stats and histogram
5. Drop column stats
6. Drop expired stats
TODO:
1. Supplement related documents
2. Optimize for unstable cases encountered during testing
3. Add other cases
For pr related to statistics, should ensure that all of these cases pass!
For broadcast join, only one build fragment instance will build hash table, other fragment instances just receive and throw away build side data, this is waste of memory and cpu.
This PR improve this condition, data stream receiver tells sender that it does not need data from sender, and sender stops sending anydata to it.
In regression test, there are many query timeout, but we do not know the query id, and it is too hard to use the sql text to find the query id in audit log. So that I add query id during query timeout.
---------
Co-authored-by: yiguolei <yiguolei@gmail.com>
Now we not support array/map/struct nesting each other for many action in be , If we do prohibit it in fe, we will meet many undefined action in be , so I just add switch to prohibit nesting complex type . When we fully support , can make it able.
Issue Number: close #xxx
After the query check process memory exceed limit in Allocator, it will wait up to 5s.
Before, Allocator will not check whether the query is canceled while waiting for memory, this causes the query to not end quickly.
case of test_string_concat_extremely_long_string will exceed our test limit. Move it to p2 so that it will be tested only in SelectDB test environment.
Because we need to keep consistent with MySQL & avoid overflow. the q67 must keep its behavior like now. When we fully apply nereids & decimalV3 then, it will be fixed automatically.
In the parallel test, although all query stats were cleaned, the cases run in parallel will affect this. So we need to use a unique table for query_stats_test
test_query_sys_tables didn't deal with some unstable situations. fixed it.
temporarily disable unstable case analyze_test case for p0.
fix problem:
If there is an unfinished schema change job (job-2), and before this time, another schema change job (job-1) of the same table has been finished.
Then restart fe, will replay edit log (pending log and waiting_txn log) for job-2, and the table's state is set to SHCEMA_CHANGE, but when loadAlterJob after replayJournal, will add job-1 to schema change handler, and then run the job-1 will set the table to NORMAL because of job-1 is done, but at this point, the job-2 is doing runWaitingTxnJob, in this function will check table's state, if not normal will throw exception, not change the job's state, and cannot cancel the job because the table is not under schema change.
Firstly, to reduce memory usage, we do not pre-allocate blocks, instead we lazily allocate block when upper call get_free_block. And when upper call return_free_block to return free block, we add the block to a queue for memory reuse, and we will free the blocks in the queue when the scanner_context was closed instead of destructed.
Secondly, to limit the memory usage of the scanner, we introduce a variable _free_blocks_capacity to indicate the current number of free blocks available to the scanners. The number of scanners that can be scheduled will be calculated based on this value.
ssb flat test
previous
lineorder 1.2G:
load time: 3s, query time: 0.355s
lineorder 5.8G:
load time: 330s, query time: 0.970s
load time: 349s, query time: 0.949s
load time: 349s, query time: 0.955s
load time: 360s, query time: 0.889s (pipeline enabled)
after
lineorder 1.2G:
load time: 3s, query time: 0.349s
lineorder 5.8G:
load time: 342s, query time: 0.929s
load time: 337s, query time: 0.913s
load time: 345s, query time: 0.946s
load time: 346s, query time: 0.865s (pipeline enabled)
fix s3 resource check:
ERROR 1105 (HY000): Unexpected exception: org.apache.doris.common.DdlException: errCode = 2, detailMessage = Missing [AWS_ACCESS_KEY] in properties.
we should use new properties to check s3 available
If we have join as the root node, then after some join reorder join, the root Group in Memo will have a GroupExpression including LogicalProject as its plan and the children is its ownerGroup.
This PR add a rewrite rule to ensure we have a Project on the top of the top Join of plan to avoid circle in Memo.