Commit Graph

17971 Commits

Author SHA1 Message Date
ff990eb869 [enhancement](Nereids) refactor expression rewriter to pattern match (#32617)
this pr can improve the performance of the nereids planner, in plan stage.

1. refactor expression rewriter to pattern match, so the lots of expression rewrite rules can criss-crossed apply in a big bottom-up iteration, and rewrite until the expression became stable. now we can process more cases because original there has no loop, and sometimes only process the top expression, like `SimplifyArithmeticRule`.
2. replace `Collection.stream()` to `ImmutableXxx.Builder` to avoid useless method call
3. loop unrolling some codes, like `Expression.<init>`, `PlanTreeRewriteBottomUpJob.pushChildrenJobs`
4. use type/arity specified-code, like `OneRangePartitionEvaluator.toNereidsLiterals()`, `PartitionRangeExpander.tryExpandRange()`, `PartitionRangeExpander.enumerableCount()`
5. refactor `ExtractCommonFactorRule`, now we can extract more cases, and I fix the deed loop when use `ExtractCommonFactorRule` and `SimplifyRange` in one iterative, because `SimplifyRange` generate right deep tree, but `ExtractCommonFactorRule` generate left deep tree
6. refactor `FoldConstantRuleOnFE`, support visitor/pattern match mode, in ExpressionNormalization, pattern match can criss-crossed apply with other rules; in PartitionPruner, visitor can evaluate expression faster
7. lazy compute and cache some operation
8. use int field to compare date
9. use BitSet to find disableNereidsRules
10. two level loop usually faster then build Multimap when bind slot in Scope, so I revert the code
11. `PlanTreeRewriteBottomUpJob` don't need to clearStatePhase any more

### test case
100 threads parallel continuous send this sql which query an empty table, test in my mac machine(m2 chip, 8 core), enable sql cache
```sql
select  count(1),date_format(time_col,'%Y%m%d'),varchar_col1
from tbl
where  partition_date>'2024-02-15'  and (varchar_col2 ='73130' or varchar_col3='73130') and time_col>'2024-03-04'
  and  time_col<'2024-03-05'
group by date_format(time_col,'%Y%m%d'),varchar_col1
order by date_format(time_col,'%Y%m%d') desc, varchar_col1 desc,count(1) asc
limit 1000
```

before this pr: 3100 peak QPS, about 2700 avg QPS
after this pr: 4800 peak QPS, about 4400 avg QPS

(cherry picked from commit 7338683fdbdf77711f2ce61e580c19f4ea100723)
2024-04-10 14:59:45 +08:00
6c5dd820c0 [improvement](spill) improve spill timers (#33156) 2024-04-10 14:55:11 +08:00
7f2fdf78ac [Enhancement](inverted index) set need to read data only when delete predicate contains the column (#33172) 2024-04-10 14:53:56 +08:00
c61d6ad1e2 [Feature] support function uuid_to_int and int_to_uuid #33005 2024-04-10 14:53:56 +08:00
bf022f9d8d [enhancement](function truncate) truncate can use column as scale argument (#32746)
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
2024-04-10 14:53:56 +08:00
a69f3eb870 [fix](fe) partitionInfo is null, fe can not start (#33108) 2024-04-10 14:53:56 +08:00
8b1d174b13 [Optimize] Move strings_pool from individual tree nodes to the tree itself (#33089)
Previously, strings_pool was allocated within each tree node. However, due to the Arena's alignment of allocated chunks to at least 4K, this allocation size was excessively large for a single tree node. Consequently, when there are numerous nodes within the SubcolumnTree, a significant portion of memory was wasted. Moving strings_pool to the tree itself optimizes memory usage and reduces wastage, improving overall efficiency.
2024-04-10 14:53:56 +08:00
02b24abed2 [Fix](Nereids) ntile function should check argument (#32994)
Problem:
when ntile using 0 as parameter, be would core because no checking of parameter
Solved:
check parameter in fe analyze
2024-04-10 14:53:56 +08:00
a7c8abe58c [feature](nereids) support common sub expression by multi-layer projections (fe part) (#33087)
* cse fe part
2024-04-10 14:53:56 +08:00
1b3a11a02b [Enhancement](merge-on-write) Support dynamic delete bitmap cache (#32991)
* The default delete bitmap cache is set to 100MB, which can be insufficient and cause performance issues when the amount of user data is large. To mitigate the problem of an inadequate cache, we will take the larger of 5% of the total memory and 100MB as the delete bitmap cache size.
2024-04-10 14:53:56 +08:00
3b42dc73af [improvement](spill) avoid spill if memory is enough (#33075) 2024-04-10 14:53:27 +08:00
517c12478f [improvement](spill) spill trigger improvement (#32641) 2024-04-10 14:52:46 +08:00
b0b5f84e40 [feature](load) support compressed JSON format data for broker load (#30809) 2024-04-10 14:20:53 +08:00
1a2177adb9 [Fix](test) add sync to ensure data synchronization in test_set_operater (#32993) 2024-04-10 12:01:02 +08:00
Pxl
09db427eed [Feature](materialized-view) support ignore not slot is null when count(slot) not has key in mv (#32912)
support ignore not slot is null when count(slot) not has key in mv
2024-04-10 11:59:36 +08:00
1d0908e80d [feature](profile) make WaitForLocalExchangeBuffer timer merge (#32946)
make WaitForLocalExchangeBuffer timer merge
2024-04-10 11:57:57 +08:00
Pxl
e4993a19e5 [Chore](column) remove ColumnVectorHelper (#33036)
remove ColumnVectorHelper
2024-04-10 11:56:41 +08:00
8e19cdd745 [featrue](expr) support common subexpression elimination be part (#32673) 2024-04-10 11:56:21 +08:00
61e214c327 [Fix](Hive-Metastore) fix that if JDBC reads the NULL value, it will cause NPE (#32831) 2024-04-10 11:55:17 +08:00
5116724494 [Fix](hive-writer) Fix the issue of block was not copied to do filtering when hive partition writer write block to file. (#32775) (#33447)
backport #32775
2024-04-10 11:42:23 +08:00
4963d60a07 [Fix](multi-catalog)Fix the issue of not initializing the writer caused by refactoring and add hive writing regression test. (#32721) (#33446)
backport #32721.
2024-04-10 11:42:22 +08:00
caea45586f fix compile 2024-04-10 11:42:22 +08:00
cf7595d423 [opt](memory) Optimize mem tracker accuracy (#32039) (#33140) 2024-04-10 11:42:19 +08:00
39fba884fb [fix](typo) typo fix for 'delete bimap' changing to 'delete bitmap' (#32341) 2024-04-10 11:34:30 +08:00
3243053fcd [fix](memory) Fix MemTableWriter flush_async attach task in thread context (#33071) 2024-04-10 11:34:30 +08:00
285e2fcb5a [fix] (vectorization) regexp all_pass string (#32515) 2024-04-10 11:34:30 +08:00
fb910e5304 [fix](planner) retain groupingSlotIds as materialized for aggregate (#33060) 2024-04-10 11:34:30 +08:00
c5a3af5c27 [partitionsort](fix) Fix DCHECK failure (#33035) 2024-04-10 11:34:30 +08:00
c5ab7ca573 [fix](planner) remove and retain input slot for aggregate slot which is not materialized (#33033)
Signed-off-by: nextdreamblue <zxw520blue1@163.com>
2024-04-10 11:34:30 +08:00
Pxl
5b162a80f2 [Improvement](materialized-view) The materialized view can not involved auto increment column (#32885)
The materialized view can not involved auto increment column
2024-04-10 11:34:30 +08:00
35fa9f98e9 1 Add running query num/waiting query num (#33024)
1 Add running query num/waiting query num
2 Fix auth regression test
2024-04-10 11:34:30 +08:00
d7c1c7dcd4 [fix](mtmv)partition limit #32978 2024-04-10 11:34:30 +08:00
59aa923bce [bug](function) fix milliseconds_diff function return wrong result (#32897)
* [bug](function) fix milliseconds_diff function return wrong result
2024-04-10 11:34:30 +08:00
3b7d75fb4b [fix](inverted index) Clear the index cache corresponding to the table after deleting the table. (#32921) 2024-04-10 11:34:30 +08:00
193600ad9d [Performance](sink) opt mysql result writer (#31816) 2024-04-10 11:34:30 +08:00
fdb9500023 [fix](nereids) null-safe-eq runtime filter denies outer join #32927 2024-04-10 11:34:30 +08:00
1f1932c6b7 [enhancement](nereids)add some date functions for constant fold (#32772) 2024-04-10 11:34:30 +08:00
814e4ed3ec [fix](nereids)partition prune should consider <=> operator (#32965) 2024-04-10 11:34:30 +08:00
2ee6f28cec [fix](nereids)column name should be case insensitive when selecting mv (#33002) 2024-04-10 11:34:30 +08:00
a7be070021 [chore](session_variable) change parallel_scan_min_rows_per_scanner' default value to 16384 (#32939) 2024-04-10 11:34:30 +08:00
7b26feb6de [fix](invert index) Fix the issue of high memory usage. (#31739) 2024-04-10 11:34:30 +08:00
53309e32a9 [Improvement](execution) Use single phase execution commit if only 1 BE is used (#32937) 2024-04-10 11:34:30 +08:00
e214eb1ea7 [chore](ci) fix ci check (#32992)
Co-authored-by: stephen <hello-stephen@qq.com>
2024-04-10 11:34:30 +08:00
3ee14a80ab [chore](ci) adjust ckb expect result (#32856)
Co-authored-by: stephen <hello-stephen@qq.com>
2024-04-10 11:34:30 +08:00
7e802c9127 [fix](variant)group name optimization (#32598)
1. Change the group name stars to repo_name
2024-04-10 11:34:30 +08:00
a6fc2ae176 [fix](test) replace 'null' to null for date/datetime column (#32972) 2024-04-10 11:34:30 +08:00
528a889077 [Fix](hive-writer) Fix correct num when hive writing data to an unpartitioned table if size large than hive_sink_max_file_size. (#32959) 2024-04-10 11:34:29 +08:00
97a2977f2a [improvement](executor)Add tag property for workload group #32874 2024-04-10 11:34:29 +08:00
f1ee7f5767 [fix](merge-iterator) fix NOT_IMPLEMENTED_ERROR when read next block view (#32961) 2024-04-10 11:34:29 +08:00
dcddd88e01 Limit the max string length to 1024 while collecting column stats to control BE memory usage. (#32470) 2024-04-10 11:34:29 +08:00