Fixed the problem of not being able to read parquet lz4 compressed format. By default, it is decompressed according to the Hadoop lz4 format. If it fails, it will fall back to the standard lz4 compression format.
Improve the performance under the tpch data set by reconstructing the join related code and the use of hash table
Co-authored-by: HappenLee <happenlee@hotmail.com>
Co-authored-by: BiteTheDDDDt <pxl290@qq.com>
Support where, group by, having, order by clause without from clause in query statement.
For example as following:
SELECT 1 AS a, COUNT(), SUM(2), AVG(1), RANK() OVER() AS w_rank
WHERE 1 = 1
GROUP BY a, w_rank
HAVING COUNT() IN (1, 2) AND w_rank = 1
ORDER BY a;
this will return result:
| a |count(*)|sum(2)|avg(1)|w_rank|
+----+--------+------+------+------+
| 1 | 1| 2| 1.0| 1|
For another example as following:
select 1 c1, 2 union (select "hell0", "") order by c1
the second column datatype will be varchar(65533), 65533 is the default varchar length.
this will return result:
|c1 | 2 |
+------+---+
|1 | 2 |
|hell0 | |
All cases' results are tested and passed with datetime/date v2
Cases about:
Calculation ( +, -
Kinds of predicates(<, >, =, <>, in, not in, is null, is not null)
Load test(from csv and select into)
Runtime filter
Delete conditions
Key columns(agg/duplicate/uniq model, distributed/partition, bitmap index...)
Introduction to Main Classes:
- MTMVService:MTMV services for other modules to call
- MTMVHookService:All operations that affect the MTMV
- MTMVJobManager:All operations that affect the MTMV job
- MTMVCacheManager:All operations that affect the MTMV Cache
- MTMVTask&MTMVJob:Inherit from job framework
All cases' results are tested and passed with decimalv3
Cases about:
Calculation ( +, - , *, /)
Kinds of predicates(<, >, =, <>, in, not in, is null, is not null)
Load test(from csv and select into)
Runtime filter
Delete conditions
Key columns(agg/duplicate/uniq model, distributed/partition, bitmap index...)
1. optimize rf prune when col stats are not avaliable
2. add regression case to check plan and rf for tpcds_sf100 with stats
3. add regression case to check plan and rf for tpcds_sf100 without stats