doris

Author	SHA1	Message	Date
924060929	ff990eb869	[enhancement](Nereids) refactor expression rewriter to pattern match (#32617 ) this pr can improve the performance of the nereids planner, in plan stage. 1. refactor expression rewriter to pattern match, so the lots of expression rewrite rules can criss-crossed apply in a big bottom-up iteration, and rewrite until the expression became stable. now we can process more cases because original there has no loop, and sometimes only process the top expression, like `SimplifyArithmeticRule`. 2. replace `Collection.stream()` to `ImmutableXxx.Builder` to avoid useless method call 3. loop unrolling some codes, like `Expression.<init>`, `PlanTreeRewriteBottomUpJob.pushChildrenJobs` 4. use type/arity specified-code, like `OneRangePartitionEvaluator.toNereidsLiterals()`, `PartitionRangeExpander.tryExpandRange()`, `PartitionRangeExpander.enumerableCount()` 5. refactor `ExtractCommonFactorRule`, now we can extract more cases, and I fix the deed loop when use `ExtractCommonFactorRule` and `SimplifyRange` in one iterative, because `SimplifyRange` generate right deep tree, but `ExtractCommonFactorRule` generate left deep tree 6. refactor `FoldConstantRuleOnFE`, support visitor/pattern match mode, in ExpressionNormalization, pattern match can criss-crossed apply with other rules; in PartitionPruner, visitor can evaluate expression faster 7. lazy compute and cache some operation 8. use int field to compare date 9. use BitSet to find disableNereidsRules 10. two level loop usually faster then build Multimap when bind slot in Scope, so I revert the code 11. `PlanTreeRewriteBottomUpJob` don't need to clearStatePhase any more ### test case 100 threads parallel continuous send this sql which query an empty table, test in my mac machine(m2 chip, 8 core), enable sql cache ```sql select count(1),date_format(time_col,'%Y%m%d'),varchar_col1 from tbl where partition_date>'2024-02-15' and (varchar_col2 ='73130' or varchar_col3='73130') and time_col>'2024-03-04' and time_col<'2024-03-05' group by date_format(time_col,'%Y%m%d'),varchar_col1 order by date_format(time_col,'%Y%m%d') desc, varchar_col1 desc,count(1) asc limit 1000 ``` before this pr: 3100 peak QPS, about 2700 avg QPS after this pr: 4800 peak QPS, about 4400 avg QPS (cherry picked from commit 7338683fdbdf77711f2ce61e580c19f4ea100723)	2024-04-10 14:59:45 +08:00
TengJianPing	6c5dd820c0	[improvement](spill) improve spill timers (#33156 )	2024-04-10 14:55:11 +08:00
airborne12	7f2fdf78ac	[Enhancement](inverted index) set need to read data only when delete predicate contains the column (#33172 )	2024-04-10 14:53:56 +08:00
zclllyybb	c61d6ad1e2	[Feature] support function uuid_to_int and int_to_uuid #33005	2024-04-10 14:53:56 +08:00
zhiqiang	bf022f9d8d	[enhancement](function truncate) truncate can use column as scale argument (#32746 ) Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>	2024-04-10 14:53:56 +08:00
Lei Zhang	a69f3eb870	[fix](fe) partitionInfo is null, fe can not start (#33108 )	2024-04-10 14:53:56 +08:00
lihangyu	8b1d174b13	[Optimize] Move strings_pool from individual tree nodes to the tree itself (#33089 ) Previously, strings_pool was allocated within each tree node. However, due to the Arena's alignment of allocated chunks to at least 4K, this allocation size was excessively large for a single tree node. Consequently, when there are numerous nodes within the SubcolumnTree, a significant portion of memory was wasted. Moving strings_pool to the tree itself optimizes memory usage and reduces wastage, improving overall efficiency.	2024-04-10 14:53:56 +08:00
LiBinfeng	02b24abed2	[Fix](Nereids) ntile function should check argument (#32994 ) Problem: when ntile using 0 as parameter, be would core because no checking of parameter Solved: check parameter in fe analyze	2024-04-10 14:53:56 +08:00
minghong	a7c8abe58c	[feature](nereids) support common sub expression by multi-layer projections (fe part) (#33087 ) * cse fe part	2024-04-10 14:53:56 +08:00
abmdocrt	1b3a11a02b	[Enhancement](merge-on-write) Support dynamic delete bitmap cache (#32991 ) * The default delete bitmap cache is set to 100MB, which can be insufficient and cause performance issues when the amount of user data is large. To mitigate the problem of an inadequate cache, we will take the larger of 5% of the total memory and 100MB as the delete bitmap cache size.	2024-04-10 14:53:56 +08:00
TengJianPing	3b42dc73af	[improvement](spill) avoid spill if memory is enough (#33075 )	2024-04-10 14:53:27 +08:00
TengJianPing	517c12478f	[improvement](spill) spill trigger improvement (#32641 )	2024-04-10 14:52:46 +08:00
超威老仲	b0b5f84e40	[feature](load) support compressed JSON format data for broker load (#30809 )	2024-04-10 14:20:53 +08:00
LiBinfeng	1a2177adb9	[Fix](test) add sync to ensure data synchronization in test_set_operater (#32993 )	2024-04-10 12:01:02 +08:00
Pxl	09db427eed	[Feature](materialized-view) support ignore not slot is null when count(slot) not has key in mv (#32912 ) support ignore not slot is null when count(slot) not has key in mv	2024-04-10 11:59:36 +08:00
Mryange	1d0908e80d	[feature](profile) make WaitForLocalExchangeBuffer timer merge (#32946 ) make WaitForLocalExchangeBuffer timer merge	2024-04-10 11:57:57 +08:00
Pxl	e4993a19e5	[Chore](column) remove ColumnVectorHelper (#33036 ) remove ColumnVectorHelper	2024-04-10 11:56:41 +08:00
Mryange	8e19cdd745	[featrue](expr) support common subexpression elimination be part (#32673 )	2024-04-10 11:56:21 +08:00
Tiewei Fang	61e214c327	[Fix](Hive-Metastore) fix that if JDBC reads the NULL value, it will cause NPE (#32831 )	2024-04-10 11:55:17 +08:00
Qi Chen	5116724494	[Fix](hive-writer) Fix the issue of block was not copied to do filtering when hive partition writer write block to file. (#32775 ) (#33447 ) backport #32775	2024-04-10 11:42:23 +08:00
Qi Chen	4963d60a07	[Fix](multi-catalog)Fix the issue of not initializing the writer caused by refactoring and add hive writing regression test. (#32721 ) (#33446 ) backport #32721.	2024-04-10 11:42:22 +08:00
yiguolei	caea45586f	fix compile	2024-04-10 11:42:22 +08:00
Xinyi Zou	cf7595d423	[opt](memory) Optimize mem tracker accuracy (#32039 ) (#33140 )	2024-04-10 11:42:19 +08:00
DuRipeng	39fba884fb	[fix](typo) typo fix for 'delete bimap' changing to 'delete bitmap' (#32341 )	2024-04-10 11:34:30 +08:00
Xinyi Zou	3243053fcd	[fix](memory) Fix MemTableWriter flush_async attach task in thread context (#33071 )	2024-04-10 11:34:30 +08:00
Qifeng	285e2fcb5a	[fix] (vectorization) regexp all_pass string (#32515 )	2024-04-10 11:34:30 +08:00
xueweizhang	fb910e5304	[fix](planner) retain groupingSlotIds as materialized for aggregate (#33060 )	2024-04-10 11:34:30 +08:00
Gabriel	c5a3af5c27	[partitionsort](fix) Fix DCHECK failure (#33035 )	2024-04-10 11:34:30 +08:00
xueweizhang	c5ab7ca573	[fix](planner) remove and retain input slot for aggregate slot which is not materialized (#33033 ) Signed-off-by: nextdreamblue <zxw520blue1@163.com>	2024-04-10 11:34:30 +08:00
Pxl	5b162a80f2	[Improvement](materialized-view) The materialized view can not involved auto increment column (#32885 ) The materialized view can not involved auto increment column	2024-04-10 11:34:30 +08:00
wangbo	35fa9f98e9	1 Add running query num/waiting query num (#33024 ) 1 Add running query num/waiting query num 2 Fix auth regression test	2024-04-10 11:34:30 +08:00
zhangdong	d7c1c7dcd4	[fix](mtmv)partition limit #32978	2024-04-10 11:34:30 +08:00
zhangstar333	59aa923bce	[bug](function) fix milliseconds_diff function return wrong result (#32897 ) * [bug](function) fix milliseconds_diff function return wrong result	2024-04-10 11:34:30 +08:00
zzzxl	3b7d75fb4b	[fix](inverted index) Clear the index cache corresponding to the table after deleting the table. (#32921 )	2024-04-10 11:34:30 +08:00
HappenLee	193600ad9d	[Performance](sink) opt mysql result writer (#31816 )	2024-04-10 11:34:30 +08:00
minghong	fdb9500023	[fix](nereids) null-safe-eq runtime filter denies outer join #32927	2024-04-10 11:34:30 +08:00
starocean999	1f1932c6b7	[enhancement](nereids)add some date functions for constant fold (#32772 )	2024-04-10 11:34:30 +08:00
starocean999	814e4ed3ec	[fix](nereids)partition prune should consider <=> operator (#32965 )	2024-04-10 11:34:30 +08:00
starocean999	2ee6f28cec	[fix](nereids)column name should be case insensitive when selecting mv (#33002 )	2024-04-10 11:34:30 +08:00
Jerry Hu	a7be070021	[chore](session_variable) change parallel_scan_min_rows_per_scanner' default value to 16384 (#32939 )	2024-04-10 11:34:30 +08:00
zzzxl	7b26feb6de	[fix](invert index) Fix the issue of high memory usage. (#31739 )	2024-04-10 11:34:30 +08:00
Gabriel	53309e32a9	[Improvement](execution) Use single phase execution commit if only 1 BE is used (#32937 )	2024-04-10 11:34:30 +08:00
Dongyang Li	e214eb1ea7	[chore](ci) fix ci check (#32992 ) Co-authored-by: stephen <hello-stephen@qq.com>	2024-04-10 11:34:30 +08:00
Dongyang Li	3ee14a80ab	[chore](ci) adjust ckb expect result (#32856 ) Co-authored-by: stephen <hello-stephen@qq.com>	2024-04-10 11:34:30 +08:00
RotKang	7e802c9127	[fix](variant)group name optimization (#32598 ) 1. Change the group name stars to repo_name	2024-04-10 11:34:30 +08:00
morrySnow	a6fc2ae176	[fix](test) replace 'null' to null for date/datetime column (#32972 )	2024-04-10 11:34:30 +08:00
Qi Chen	528a889077	[Fix](hive-writer) Fix correct num when hive writing data to an unpartitioned table if size large than `hive_sink_max_file_size`. (#32959 )	2024-04-10 11:34:29 +08:00
wangbo	97a2977f2a	[improvement](executor)Add tag property for workload group #32874	2024-04-10 11:34:29 +08:00
Xin Liao	f1ee7f5767	[fix](merge-iterator) fix NOT_IMPLEMENTED_ERROR when read next block view (#32961 )	2024-04-10 11:34:29 +08:00
Jibing-Li	dcddd88e01	Limit the max string length to 1024 while collecting column stats to control BE memory usage. (#32470 )	2024-04-10 11:34:29 +08:00

1 2 3 4 5 ...

17971 Commits