doris

Author	SHA1	Message	Date
Pxl	477961dc21	[Chore](agg) refactor of hash map (#22958 ) refactor of hash map	2023-08-18 17:59:30 +08:00
Pxl	d371101bfd	[Improvement](aggregation) make fixed hashmap's bitmap_size flexable (#22573 ) make fixed hashmap's bitmap_size flexable	2023-08-14 10:47:06 +08:00
ZenoYang	9d3f1dcf44	[improvement](vectorized) Deserialized elements of count distinct aggregation directly inserted into target hashset (#21888 ) The original logic is to first deserialize the ColumnString into a HashSet (insert the deserialized elements into the hashset), and then traverse all the HashSet elements into the target HashSet during the merge phase. After optimization, when deserializing, elements are directly inserted into the target HashSet, thereby reducing unnecessary hashset insert overhead. In one of our internal query tests, 30 hashsets were merged in second phase aggregation(the average cardinality is 1,400,000), and the cardinality after merging is 42,000,000. After optimization, the MergeTime dropped from 5s965ms to 3s375ms.	2023-08-02 21:19:56 +08:00
Pxl	f5e3cd2737	[Improvement](aggregation) optimization for aggregation hash_table_lazy_emplace (#22327 ) optimization for aggregation hash_table_lazy_emplace	2023-08-02 11:50:21 +08:00
zhangstar333	1c6246f7ee	[improve](agg) support distinct agg node (#22169 ) select c_name from customer union select c_name from customer this sql used agg node to get distinct row of c_name, so it's no need to wait for inserted all data to hash map, could output the data which it's inserted into hash map successed.	2023-07-28 13:54:10 +08:00
Pxl	9451382428	[Improvement](aggregate) optimization for AggregationMethodKeysFixed::insert_keys_into_columns (#22216 ) optimization for AggregationMethodKeysFixed::insert_keys_into_columns	2023-07-26 16:19:15 +08:00
ZenoYang	6512893257	[refactor](vectorized) Remove useless control variables to simplify aggregation node code (#22026 ) * [refactor](vectorized) Remove useless control variables to simplify aggregation node code * fix	2023-07-21 12:45:23 +08:00
HappenLee	254f76f61d	[Agg](exec) support aggregation_node limit short circuit (#21767 )	2023-07-14 00:29:19 +08:00
Xinyi Zou	2c9bdd64fa	[fix](memory) arena support memory reuse after clear() (#21033 )	2023-06-21 23:27:21 +08:00
yiguolei	a6f625676b	[profile](remove child) child is for node, should not be used to organize counters (#20676 ) Currently, there are many profiles using add child profile to orgnanize profile into blocks. But it is wrong. Child profile will have a total time counter. Actually, what we should use is just a label. - MemoryUsage: - HashTable: 23.98 KB - SerializeKeyArena: 446.75 KB Add a new macro ADD_LABEL_COUNTER to add just a label in the profile. --------- Co-authored-by: yiguolei <yiguolei@gmail.com>	2023-06-12 10:00:35 +08:00
Jerry Hu	9f8de89659	[refactor](exec) replace the single pointer with an array of 'conjuncts' in ExecNode (#19758 ) Refactoring the filtering conditions in the current ExecNode from an expression tree to an array can simplify the process of adding runtime filters. It eliminates the need for complex merge operations and removes the requirement for the frontend to combine expressions into a single entity. By representing the filtering conditions as an array, each condition can be treated individually, making it easier to add runtime filters without the need for complex merging logic. The array can store the individual conditions, and the runtime filter logic can iterate through the array to apply the filters as needed. This refactoring simplifies the codebase, improves readability, and reduces the complexity associated with handling filtering conditions and adding runtime filters. It separates the conditions into discrete entities, enabling more straightforward manipulation and management within the execution node.	2023-05-29 11:47:31 +08:00
Pxl	b927f8cd37	[Chore](asan) change asan_suppr from interceptor_via_lib to interceptor_via_fun (#19636 ) change asan_suppr from interceptor_via_lib to interceptor_via_fun	2023-05-16 10:51:43 +08:00
Xinyi Zou	f23c93b3c6	[fix](memory) Fix AggFunc memory leak due to incorrect destroy (#19126 )	2023-04-27 14:58:32 +08:00
Jerry Hu	c4e469c82c	[feature](agg) Support spill to disk in aggregation (#18051 )	2023-04-20 18:59:08 +08:00
Adonis Ling	e412dd12e8	[chore](build) Use include-what-you-use to optimize includes (PART II) (#18761 ) Currently, there are some useless includes in the codebase. We can use a tool named include-what-you-use to optimize these includes. By using a strict include-what-you-use policy, we can get lots of benefits from it.	2023-04-19 23:11:48 +08:00
Xinyi Zou	d9fe5f7b67	[enhancement](memory) Remove MemPool and replace it with Arena (#17820 ) Arena can replace MemPool in most scenarios. Except for memory reuse, MemPool supports reuse of previous memory chunks after clear, but Arena does not. Some comparisons between MemPool and Arena: 1. Expansion Arena is less than 128M index 2 alloc chunk; more than 128M memory, allocate 128M * n > `size`, n is equal to the minimum value that satisfies the expression; MemPool less than 512K index 2 alloc chunk, greater than 512K memory, separately apply for a `size` length chunk After Arena applied for a chunk larger than 128M last time, the minimum chunk applied for after that is 128M. Does this seem to be a waste of memory? MemPool is also similar. After the chunk of 512K was applied for last time, the minimum chunk of subsequent applications is 512K. 2. Alignment MemPool defaults to 16 alignment, because memtable and other places that use int128 require 16 alignment; Arena has no default alignment; 3. Memory reuse Arena only supports `rollback`, which reuses the memory of the current chunk, usually the memory requested last time. MemPool supports clear(), all chunks can be reused; or call ReturnPartialAllocation() to roll back the last requested memory; if the last chunk has no memory, search for the most free chunk for allocation 4. Realloc Arena supports realloc contiguous memory; it also supports realloc contiguous memory from any position at the time of the last allocation. The difference between `alloc_continue` and `realloc` is: 1. Alloc_continue does not need to specify the old size, but the default old size = head->pos - range_start 2. alloc_continue supports expansion from range_start when additional_bytes is between head and pos, which is equivalent to reusing a part of memory, while realloc completely allocates a new memory MemPool does not support realloc, but supports transferring or absorbing chunks between two MemPools 5. check mem limit MemPool checks the mem limit, and Arena checks at the Allocator layer. 6. Support for ASAN Arena does something extra 7. Error handling MemPool supports returning the error message of application failure directly through `Status`, and Arena throws Exception. Tests that Arena can consider 1. After the last applied chunk is larger than 128M, the minimum applied chunk is 128M, which seems to waste memory; 2. Support clear, memory multiplexing; 3. Increase the large list, alloc the memory larger than 128M, and the size is equal to `size`, so as to avoid the current chunk not being fully used, which is wasteful. 4. In some cases, it may be possible to allocate backwards to find chunks t	2023-03-29 20:56:49 +08:00
Pxl	ca73c60442	[Chore](build) enable ignored-qualifiers check (#16196 ) enable ignored-qualifiers check	2023-02-01 15:15:59 +08:00
Jerry Hu	a9671b6dfd	[feature](agg)support two level-hash map in aggregation node (#15967 )	2023-01-30 16:43:33 +08:00
starocean999	1ec88cbff6	[fix](nereids) AggregationNode process null as key column in wrong way (#16125 ) in AggregationNode, _merge_with_serialized_key_helper method should convert the key column to full column if the key column is null literal.	2023-01-29 20:12:07 +08:00
Pxl	81bab55d43	[Bug](function) catch function calculation error on aggregate node to avoid core dump (#15903 )	2023-01-16 11:21:28 +08:00
Pxl	93f5e440eb	[Bug](execute) fix get next non stop for eos on streaming preagg (#15611 ) * fix get nnext non stop for eos on streaming preagg * update	2023-01-05 09:36:11 +08:00
Gabriel	b085ff49f0	[refactor](non-vec) delete non-vec data sink (#15283 ) * [refactor](non-vec) delete non-vec data sink Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>	2022-12-23 14:10:47 +08:00
HappenLee	b30cd86e9e	[Refactor](pipeline) Refactor operator and builder code of pipeline (#14787 )	2022-12-05 18:35:00 +08:00
TengJianPing	8c0e13ab51	[improvement](profile) add detail memory counter for exec nodes (#14806 ) * [improvement](profile) improve accuraccy of memory usage and add detail memory counter * fix	2022-12-05 11:51:52 +08:00
HappenLee	12304bc0ee	[Pipeline](exec) Support pipeline exec engine (#14736 ) Co-authored-by: Lijia Liu <liutang123@yeah.net> Co-authored-by: HappenLee <happenlee@hotmail.com> Co-authored-by: Jerry Hu <mrhhsg@gmail.com> Co-authored-by: Pxl <952130278@qq.com> Co-authored-by: shee <13843187+qzsee@users.noreply.github.com> Co-authored-by: Gabriel <gabrielleebuaa@gmail.com> ## Problem Summary: ### 1. Design DSIP: https://cwiki.apache.org/confluence/display/DORIS/DSIP-027%3A+Support+Pipeline+Exec+Engine ### 2. How to use: Set the environment variable `set enable_pipeline_engine = true; `	2022-12-02 17:11:34 +08:00
Xinyi Zou	176f519fa1	[enhancement](memtracker) Optimize exec node memory tracking (#14711 )	2022-12-01 14:52:21 +08:00
starocean999	1520e5c88a	[enhancement](agg)use new method to serialize keys in batch if the key is too large (#14484 ) * [enhancement](agg)use new method to serialize keys in batch if the key is too large * fix compile error	2022-11-23 17:35:39 +08:00
starocean999	1f326fc0d6	[enhancement](be)limit mem cost to 16m when pre serialize keys in agg node (#14321 ) * [enhancement](be)limit mem cost to 16m when pre serialize keys in agg node * use only one chunk memory when serializing keys in agg node	2022-11-18 12:31:52 +08:00
starocean999	6d2e6d85d3	[enhancement](be)release memory in Node's close() method (#14258 ) * [enhancement](be)release memory in Node's close() method * format code	2022-11-15 15:59:23 +08:00
zhangstar333	4bc33a54a1	[Fix](agg) fix bitmap agg core dump when phmap pointer assert alignment (#13381 )	2022-10-15 10:39:23 +08:00
Jerry Hu	8f4bb0f804	[improvement](agg) iterate aggregation data in memory written order (#12704 ) Following the iteration order of the hash table will result in out-of-order access to aggregate states, which is very inefficient. Traversing aggregate states in memory write order can significantly improve memory read efficiency. Test hash table items count: 3.35M Before this optimization: insert keys into column takes 500ms With this optimization only takes 80ms	2022-09-21 14:58:50 +08:00
starocean999	8e4374b7ec	[enhancement](agg)remove unnessasery mem alloc and dealloc in agg node (#12535 )	2022-09-15 11:07:06 +08:00
Pxl	0ead048b93	[Enhancement](column) remove ColumnString terminating zero and add a data_version for pblock (#12456 ) 1. remove ColumnString terminating zero 2. add a data_version for pblock 3. change EncryptionMode to enum class	2022-09-14 21:25:22 +08:00
Jerry Hu	3485dfa927	[chore](profile) add some counters in aggregatation & sender (#12385 )	2022-09-07 10:09:05 +08:00
Jerry Hu	dc8f64b3e3	[improvement](agg) Serialize the fixed-length aggregation results with corresponding columns instead of ColumnString (#11801 )	2022-08-22 10:12:06 +08:00
Pxl	cac317430f	[Bug](aggregation) fix core dump on 2nd phase aggregate (#11843 )	2022-08-18 14:42:34 +08:00
starocean999	092a394782	[improvement](agg)limit the output of agg node (#11461 ) * [improvement](agg)limit the output of agg node	2022-08-05 07:53:55 +08:00
Jerry Hu	842a5b8e24	[refactor](agg) Abstract the hash operation into a method" (#11399 )	2022-08-02 17:27:19 +08:00
Jerry Hu	0325fa436e	[fix](agg)Add field of 'is_first_phase' in TAggregationNode (#11321 )	2022-08-01 11:49:50 +08:00
Jerry Hu	b74f36e009	[improvement]Use phmap for aggregation with integer keys (#11175 )	2022-07-27 13:58:20 +08:00
Jerry Hu	b7c9007776	[improvement][agg]Process aggregated results in the vectorized way (#11084 )	2022-07-22 22:04:43 +08:00
Xinyi Zou	4960043f5e	[enhancement] Refactor to improve the usability of MemTracker (step2) (#10823 )	2022-07-21 17:11:28 +08:00
Jerry Hu	899acb6564	[improvement][agg]import sub hashmap (#10937 )	2022-07-18 18:36:45 +08:00
Jerry Hu	d1573e1a4a	[improvement]Use phmap for aggregation with serialized key (#10821 )	2022-07-14 11:26:09 +08:00
Jerry Hu	e293fbd277	[improvement]pre-serialize aggregation keys (#10700 )	2022-07-09 06:21:56 +08:00
Gabriel	476be35961	[TYPO] fix typo 'destory' -> 'destroy' (#10373 )	2022-06-24 19:11:28 +08:00
Xinyi Zou	eeae516e37	[Feature](Memory) Hook TCMalloc new/delete automatically counts to MemTracker (#8476 ) Early Design Documentation: https://shimo.im/docs/DT6JXDRkdTvdyV3G Implement a new way of memory statistics based on TCMalloc New/Delete Hook, MemTracker and TLS, and it is expected that all memory new/delete/malloc/free of the BE process can be counted.	2022-03-20 23:06:54 +08:00
Zhengguo Yang	50864aca7d	[refactor] fix warings when compile with clang (#8069 )	2022-02-19 11:29:02 +08:00
HappenLee	505acae931	[fix](vectorization) make sure the mem address use in agg is align in proper way before use (#7960 )	2022-02-08 10:05:03 +08:00
HappenLee	015371ac72	[fix](grouping-set) Fix the bug of grouping set core in both vec and non vec query engine (#7800 )	2022-01-26 16:15:30 +08:00

1 2

51 Commits