Problem: when used nereids to generate scalarType, byteSize would be set. After switch the optimizer to planner, planner would reuse scalarType in some cases.
Fix: change byteSize setting from Plan translator to toCatalogDataType
BE will core dump while use whole/sub file cache.
Call func CachedRemoteFileReader/WholeFileCache/SubFileCache::read_at_impl() did not pass IOContext when reading segment footer.
1. The time string in the profile can be "xx s xx ms". The framework should extract time with re package to support more complicated time string
2. Add stats for sortNode and AggNode in `withChildren`
validate supported data types checks if a project node's output contains any unsupported data types like array, map, etc in nereids. So this validation should run before EliminateUnnecessaryProject rule
sql : select bitmap_empty() from d_table where true;
should always use base index instead of any mv, because the conjuncts is constant (true) and use none of the column from any mv
Currently, the session variable for Split size will not take effect after the file splits are cached.
1. This PR is to cache file for Hive Table, instead of cache file splits. And split the file every time using the current split size.
2. Use self splitter by default.
In the current implementation of the function of dynamically add and drop inverted index, there is a problem that the inverted index information of historical data is out of date after compaction on the base tablet.
In the future, I will submit PRs to solve this problem. Now, temporarily add or drop inverted index by the directly schema change logic
could reproduced by:
CREATE TABLE t (
name varchar(128)
) ENGINE=OLAP
UNIQUE KEY(name)
DISTRIBUTED BY HASH(name) BUCKETS 1;
insert into t values('abc');
SELECT cd
FROM
(SELECT cast(now() as string) cd FROM t) t1
JOIN
(select cast(now() as string) td from t GROUP BY now()) t2
ON t1.cd = t2.td;
ERROR 1105 (HY000): errCode = 2, detailMessage = Unexpected exception: null
(affects tpch q14/7/9)
1. equation estimation confidence level
For equation, if any side is almost unique, its estimation confidence is high, we call it trustable condition.
if a join contains more than one un-trustable condition, we only use the one whose selectivity is biggest in order to avoid error propagation.
2. like expression estimation factor: 0.2
give a separate default shrink ratio for like operator, default ratio is 0.2
3. disable fat-child-penalty
set HEAVY_OPERATOR_PUNISH_FACTOR=1
this change affect tpch q15. This factor should be adaptive to the implementation of BE.
When processing data in hash table for right join and full outer join, if the output data rows of one hash bucket excceeds batch size, the logic when continue processing this bucket is wrong, it should differentiate between different join types.