validate supported data types checks if a project node's output contains any unsupported data types like array, map, etc in nereids. So this validation should run before EliminateUnnecessaryProject rule
sql : select bitmap_empty() from d_table where true;
should always use base index instead of any mv, because the conjuncts is constant (true) and use none of the column from any mv
Currently, the session variable for Split size will not take effect after the file splits are cached.
1. This PR is to cache file for Hive Table, instead of cache file splits. And split the file every time using the current split size.
2. Use self splitter by default.
In the current implementation of the function of dynamically add and drop inverted index, there is a problem that the inverted index information of historical data is out of date after compaction on the base tablet.
In the future, I will submit PRs to solve this problem. Now, temporarily add or drop inverted index by the directly schema change logic
could reproduced by:
CREATE TABLE t (
name varchar(128)
) ENGINE=OLAP
UNIQUE KEY(name)
DISTRIBUTED BY HASH(name) BUCKETS 1;
insert into t values('abc');
SELECT cd
FROM
(SELECT cast(now() as string) cd FROM t) t1
JOIN
(select cast(now() as string) td from t GROUP BY now()) t2
ON t1.cd = t2.td;
ERROR 1105 (HY000): errCode = 2, detailMessage = Unexpected exception: null
(affects tpch q14/7/9)
1. equation estimation confidence level
For equation, if any side is almost unique, its estimation confidence is high, we call it trustable condition.
if a join contains more than one un-trustable condition, we only use the one whose selectivity is biggest in order to avoid error propagation.
2. like expression estimation factor: 0.2
give a separate default shrink ratio for like operator, default ratio is 0.2
3. disable fat-child-penalty
set HEAVY_OPERATOR_PUNISH_FACTOR=1
this change affect tpch q15. This factor should be adaptive to the implementation of BE.
When processing data in hash table for right join and full outer join, if the output data rows of one hash bucket excceeds batch size, the logic when continue processing this bucket is wrong, it should differentiate between different join types.
Previously in cold_heat_separation regression, it just tries to create resources/policies. Sometimes if the former cases failed or BE crashed when doing cases the resources would not be cleared so the next time invoking this regression cases would result in failure.
Fix tow bugs:
1. Enabling file caching requires both `FE session` and `BE` configurations(enable_file_cache=true) to be enabled.
2. `ParquetReader` has not used `IOContext` previously, but `CachedRemoteFileReader::read_at` needs `IOContext` after PR(#17586).
#18015 enables stream load profile log, however be will encounter rpc fail when loading tpch data(see #18291). This is because when `is_report_success` is true, be will reportExecStatus to fe, but fe cannot find QueryInfo in `coordinatorMap`, thus it will return error to be.
1. Support the show load warnings for mysql load to get the detail error message.
2. Fix fillByteBufferAsync not mark the load as finished in same data load
3. Fix drain data only in client mode.