For broadcast join, only one build fragment instance will build hash table, other fragment instances just receive and throw away build side data, this is waste of memory and cpu.
This PR improve this condition, data stream receiver tells sender that it does not need data from sender, and sender stops sending anydata to it.
In regression test, there are many query timeout, but we do not know the query id, and it is too hard to use the sql text to find the query id in audit log. So that I add query id during query timeout.
---------
Co-authored-by: yiguolei <yiguolei@gmail.com>
Now we not support array/map/struct nesting each other for many action in be , If we do prohibit it in fe, we will meet many undefined action in be , so I just add switch to prohibit nesting complex type . When we fully support , can make it able.
Issue Number: close #xxx
After the query check process memory exceed limit in Allocator, it will wait up to 5s.
Before, Allocator will not check whether the query is canceled while waiting for memory, this causes the query to not end quickly.
case of test_string_concat_extremely_long_string will exceed our test limit. Move it to p2 so that it will be tested only in SelectDB test environment.
Because we need to keep consistent with MySQL & avoid overflow. the q67 must keep its behavior like now. When we fully apply nereids & decimalV3 then, it will be fixed automatically.
In the parallel test, although all query stats were cleaned, the cases run in parallel will affect this. So we need to use a unique table for query_stats_test
test_query_sys_tables didn't deal with some unstable situations. fixed it.
temporarily disable unstable case analyze_test case for p0.
fix problem:
If there is an unfinished schema change job (job-2), and before this time, another schema change job (job-1) of the same table has been finished.
Then restart fe, will replay edit log (pending log and waiting_txn log) for job-2, and the table's state is set to SHCEMA_CHANGE, but when loadAlterJob after replayJournal, will add job-1 to schema change handler, and then run the job-1 will set the table to NORMAL because of job-1 is done, but at this point, the job-2 is doing runWaitingTxnJob, in this function will check table's state, if not normal will throw exception, not change the job's state, and cannot cancel the job because the table is not under schema change.
Firstly, to reduce memory usage, we do not pre-allocate blocks, instead we lazily allocate block when upper call get_free_block. And when upper call return_free_block to return free block, we add the block to a queue for memory reuse, and we will free the blocks in the queue when the scanner_context was closed instead of destructed.
Secondly, to limit the memory usage of the scanner, we introduce a variable _free_blocks_capacity to indicate the current number of free blocks available to the scanners. The number of scanners that can be scheduled will be calculated based on this value.
ssb flat test
previous
lineorder 1.2G:
load time: 3s, query time: 0.355s
lineorder 5.8G:
load time: 330s, query time: 0.970s
load time: 349s, query time: 0.949s
load time: 349s, query time: 0.955s
load time: 360s, query time: 0.889s (pipeline enabled)
after
lineorder 1.2G:
load time: 3s, query time: 0.349s
lineorder 5.8G:
load time: 342s, query time: 0.929s
load time: 337s, query time: 0.913s
load time: 345s, query time: 0.946s
load time: 346s, query time: 0.865s (pipeline enabled)
fix s3 resource check:
ERROR 1105 (HY000): Unexpected exception: org.apache.doris.common.DdlException: errCode = 2, detailMessage = Missing [AWS_ACCESS_KEY] in properties.
we should use new properties to check s3 available
If we have join as the root node, then after some join reorder join, the root Group in Memo will have a GroupExpression including LogicalProject as its plan and the children is its ownerGroup.
This PR add a rewrite rule to ensure we have a Project on the top of the top Join of plan to avoid circle in Memo.
- The data source parameters are sunk into the specific data source class
- Simplify some code logic to reduce code complexity
- Provide a data source factory class to extract public logic
- Code that removes tests from production code. We should not include code for testing purposes in any production code.
Iceberg table partition name may contain upper case characters, for example: City=xxx, Nation=xxx.
But in Doris, all column names are in lower case. Here we transfer the partition name to lower case to keep consist with column name.
make VcompoundPred optimization work well
#19818 this pr try to enable VcompoundPred optimization but get wrong result on tpcds q28.
The reason is some nullable logic on mysql need special handling.
mysql [regression_test_tpcds_sf1_p1]>select null and false;
+----------------+
| NULL AND FALSE |
+----------------+
| 0 |
+----------------+
1 row in set (0.00 sec)
mysql [regression_test_tpcds_sf1_p1]>select null and true;
+---------------+
| NULL AND TRUE |
+---------------+
| NULL |
+---------------+
1 row in set (0.00 sec)
mysql [regression_test_tpcds_sf1_p1]>select null or false;
+---------------+
| NULL OR FALSE |
+---------------+
| NULL |
+---------------+
1 row in set (0.00 sec)
mysql [regression_test_tpcds_sf1_p1]>select null or true;
+--------------+
| NULL OR TRUE |
+--------------+
| 1 |
+--------------+
1 row in set (0.00 sec)