doris

Author	SHA1	Message	Date
spaces-x	bea9a7ba4f	[feature] Support pre-aggregation for quantile type (#8234 ) Add a new column-type to speed up the approximation of quantiles. 1. The new column-type is named `quantile_state` with fixed aggregation function `quantile_union`, which stores the intermediate results of pre-aggregated approximation calculations for quantiles. 2. support pre-aggregation of new column-type and quantile_state related functions.	2022-03-24 09:11:34 +08:00
Xinyi Zou	eeae516e37	[Feature](Memory) Hook TCMalloc new/delete automatically counts to MemTracker (#8476 ) Early Design Documentation: https://shimo.im/docs/DT6JXDRkdTvdyV3G Implement a new way of memory statistics based on TCMalloc New/Delete Hook, MemTracker and TLS, and it is expected that all memory new/delete/malloc/free of the BE process can be counted.	2022-03-20 23:06:54 +08:00
Pxl	a8af8d2981	[fix](vectorized) fix core dump on get_json_string and add some ut (#8496 )	2022-03-17 10:08:31 +08:00
Xinyi Zou	e17aef9467	[refactor] refactor the implement of MemTracker, and related usage (#8322 ) Modify the implementation of MemTracker: 1. Simplify a lot of useless logic; 2. Added MemTrackerTaskPool, as the ancestor of all query and import trackers, This is used to track the local memory usage of all tasks executing; 3. Add cosume/release cache, trigger a cosume/release when the memory accumulation exceeds the parameter mem_tracker_consume_min_size_bytes; 4. Add a new memory leak detection mode (Experimental feature), throw an exception when the remaining statistical value is greater than the specified range when the MemTracker is destructed, and print the accurate statistical value in HTTP, the parameter memory_leak_detection 5. Added Virtual MemTracker, cosume/release will not sync to parent. It will be used when introducing TCMalloc Hook to record memory later, to record the specified memory independently; 6. Modify the GC logic, register the buffer cached in DiskIoMgr as a GC function, and add other GC functions later; 7. Change the global root node from Root MemTracker to Process MemTracker, and remove Process MemTracker in exec_env; 8. Modify the macro that detects whether the memory has reached the upper limit, modify the parameters and default behavior of creating MemTracker, modify the error message format in mem_limit_exceeded, extend and apply transfer_to, remove Metric in MemTracker, etc.; Modify where MemTracker is used: 1. MemPool adds a constructor to create a temporary tracker to avoid a lot of redundant code; 2. Added trackers for global objects such as ChunkAllocator and StorageEngine; 3. Added more fine-grained trackers such as ExprContext; 4. RuntimeState removes FragmentMemTracker, that is, PlanFragmentExecutor mem_tracker, which was previously used for independent statistical scan process memory, and replaces it with _scanner_mem_tracker in OlapScanNode; 5. MemTracker is no longer recorded in ReservationTracker, and ReservationTracker will be removed later;	2022-03-11 22:04:23 +08:00
zhangstar333	e0ef9b8f6c	[refactor](vectorized) to_bitmap(-1) return NULL instead of return parse failed error_message (#8373 )	2022-03-11 17:21:47 +08:00
Pxl	cd8694e532	[feature][vectorized] support replace() (#8384 )	2022-03-08 18:57:12 +08:00
Zhengguo Yang	5029ef46c9	[fix] fix ltrim result may incorrect in some case (#7963 ) fix ltrim result may incorrect in some case according to https://gcc.gnu.org/onlinedocs/gcc/Other-Builtins.html Built-in Function: int __builtin_cl/tz (unsigned int x) If x is 0, the result is undefined. So we handle the case of 0 separately this function return different between gcc and clang when x is 0	2022-02-09 13:06:37 +08:00
Pxl	0553ce2944	[feature](vectorization) support function topn && remove some unused code (#7793 )	2022-02-09 13:05:31 +08:00
924060929	c1fef37399	[improvement](runtime-filter) Support adaptive runtime filter(#7546 ) (#7645 ) Change 1: Support an adaptive runtime filter: IN_OR_BLOOM_FILTER The processing logic is If the number of rows in the right table < runtime_filter_max_in_num, then IN predicate will work If the number of rows in the right table >= runtime_filter_max_in_num, then Bloom filter can take effect Change 2: The default runtime filter is changed to filter: IN_OR_BLOOM_FILTER	2022-01-30 16:46:52 +08:00
Pxl	cd73a6b84b	[chore] fix clang compile error (#7883 )	2022-01-26 12:53:35 +08:00
HappenLee	e1d7233e9c	[feature](vectorization) Support Vectorized Exec Engine In Doris (#7785 ) # Proposed changes Issue Number: close #6238 Co-authored-by: HappenLee <happenlee@hotmail.com> Co-authored-by: stdpain <34912776+stdpain@users.noreply.github.com> Co-authored-by: Zhengguo Yang <yangzhgg@gmail.com> Co-authored-by: wangbo <506340561@qq.com> Co-authored-by: emmymiao87 <522274284@qq.com> Co-authored-by: Pxl <952130278@qq.com> Co-authored-by: zhangstar333 <87313068+zhangstar333@users.noreply.github.com> Co-authored-by: thinker <zchw100@qq.com> Co-authored-by: Zeno Yang <1521564989@qq.com> Co-authored-by: Wang Shuo <wangshuo128@gmail.com> Co-authored-by: zhoubintao <35688959+zbtzbtzbt@users.noreply.github.com> Co-authored-by: Gabriel <gabrielleebuaa@gmail.com> Co-authored-by: xinghuayu007 <1450306854@qq.com> Co-authored-by: weizuo93 <weizuo@apache.org> Co-authored-by: yiguolei <guoleiyi@tencent.com> Co-authored-by: anneji-dev <85534151+anneji-dev@users.noreply.github.com> Co-authored-by: awakeljw <993007281@qq.com> Co-authored-by: taberylyang <95272637+taberylyang@users.noreply.github.com> Co-authored-by: Cui Kaifeng <48012748+azurenake@users.noreply.github.com> ## Problem Summary: ### 1. Some code from clickhouse ClickHouse is an excellent implementation of the vectorized execution engine database, so here we have referenced and learned a lot from its excellent implementation in terms of data structure and function implementation. We are based on ClickHouse v19.16.2.2 and would like to thank the ClickHouse community and developers. The following comment has been added to the code from Clickhouse, eg: // This file is copied from // https://github.com/ClickHouse/ClickHouse/blob/master/src/Interpreters/AggregationCommon.h // and modified by Doris ### 2. Support exec node and query: * vaggregation_node * vanalytic_eval_node * vassert_num_rows_node * vblocking_join_node * vcross_join_node * vempty_set_node * ves_http_scan_node * vexcept_node * vexchange_node * vintersect_node * vmysql_scan_node * vodbc_scan_node * volap_scan_node * vrepeat_node * vschema_scan_node * vselect_node * vset_operation_node * vsort_node * vunion_node * vhash_join_node You can run exec engine of SSB/TPCH and 70% TPCDS stand query test set. ### 3. Data Model Vec Exec Engine Support Dup/Agg/Unq table, Support Block Reader Vectorized. Segment Vec is working in process. ### 4. How to use 1. Set the environment variable `set enable_vectorized_engine = true; `(required) 2. Set the environment variable `set batch_size = 4096; ` (recommended) ### 5. Some diff from origin exec engine https://github.com/doris-vectorized/doris-vectorized/issues/294 ## Checklist(Required) 1. Does it affect the original behavior: (No) 2. Has unit tests been added: (Yes) 3. Has document been added or modified: (No) 4. Does it need to update dependencies: (No) 5. Are there any changes that cannot be rolled back: (Yes)	2022-01-18 10:07:15 +08:00
Universe	5b0f11b665	[feature](mysql-compatibility)(function) add `WEEKDAY` function (#7673 ) `WEEKDAY` in MySQL: returns an index from 0 to 6 for Monday to Sunday. `DAYOFWEEK` in MySQL: returns an index from 1 to 7 for Sunday to Saturday. Doris only have `DAYOFWEEK` function, so I add `WEEKDAY` function. Thanks for the following materials: - https://github.com/apache/incubator-doris/pull/6982/files - https://www.bilibili.com/video/BV1V44y1Y7Ro	2022-01-16 10:39:21 +08:00
Mingyu Chen	5e1caea2b1	[fix](lateral-view) Fix some bugs about lateral view (#7721 ) 1. fix core dump when using multi explode_bitmap #7716 2. fix bug that json array extract by json path is wrong #7717 3. fix bug that after lateral view, the null value become non-null value #7718 4. fix bug that lateral view may return error: couldn't resolve slot descriptor 1. #7719 5. fix error result when using lateral view with where predicate #7720	2022-01-13 15:30:38 +08:00
924060929	563545475e	[Optimize](Runtime Filter) Support merge in runtime filter(#7546 ) (#7547 ) Support merge IN predicate when exist remote target(e.g. shuffle hash join). Remote the code that IN predicate implicit conversion to Bloom filter then exist remote target. Close related #7546	2022-01-06 19:08:35 +08:00
Zhengguo Yang	07e2acb2f3	[feature] Suport national secret (national commercial password) algorithm SM3/SM4 (#7464 ) SM3 is password hash algorithm SM4 is a block cipher used to replace DES / AES and other international algorithms.	2021-12-28 10:39:54 +08:00
zhangstar333	0c154733e0	[feature](function) support bitmap_union/intersect have more columns parameters (#7379 ) support multi bitmap parameter for all bitmap aggregation function	2021-12-26 11:03:20 +08:00
shee	3ba6dcf236	[fix](function) fix round function for inaccuracy (#7421 )	2021-12-24 21:23:11 +08:00
caiconghui	382351b0ee	[fix](ut) Fix run fe ut failed, be ut memory leak and build thirdparty failed (#7377 )	2021-12-15 11:00:20 +08:00
HappenLee	d3316ff567	[performance](function) Support SIMD function in some string function (#7236 ) Support SIMD function in some string function：lrtim，rtrim，trim，reverse，hex	2021-12-06 10:24:26 +08:00
Zhengguo Yang	c9e578032b	optimize bitmap function count, use roaring cardinality method, this will more fast than current version (#7151 )	2021-11-24 14:42:48 +08:00
Pxl	a74fdf184c	[refactor](be) refactor predicate function creator (#7054 ) Refactor predicate function creator, make MinMaxFunction/HybridSet/BloomFilter use a unified interface through template to get function.	2021-11-24 10:39:29 +08:00
Zhengguo Yang	6c6380969b	[refactor] replace boost smart ptr with stl (#6856 ) 1. replace all boost::shared_ptr to std::shared_ptr 2. replace all boost::scopted_ptr to std::unique_ptr 3. replace all boost::scoped_array to std::unique<T[]> 4. replace all boost:thread to std::thread	2021-11-17 10:18:35 +08:00
Xinyi Zou	e69249c082	sub_bitmap (#6977 ) Starting from the offset position, intercept the specified limit bitmap elements and return a bitmap subset. Types of chang	2021-11-06 13:31:03 +08:00
pengxiangyu	599ecb1f30	[Function] Add bitmap function bitmap_subset_limit (#6980 ) Add bitmap function bitmap_subset_limit. This function will return subset in specified index.	2021-11-04 12:14:47 +08:00
xy720	aeec9c45e6	[Function] Add bitmap-xor-count function for doris (#6982 ) Add bitmap-xor-count function for doris relate to #6875	2021-11-02 16:37:00 +08:00
zhangstar333	1ff3d708ca	[Function] add functions of bitmap_and/or_count (#6912 ) issue #6875 add bitmap_and_count/ bitmap_or_count	2021-11-01 14:00:07 +08:00
luozenglin	c7a3116f98	[Function] add bitmap function of bitmap_has_all (#6918 ) The 'bitmap_has_all' function returns true if the first bitmap contains all the elements of the second bitmap.	2021-11-01 12:50:47 +08:00
qiye	65ded82778	[Function] add BE bitmap function bitmap_subset_in_range (#6917 ) Add bitmap function bitmap_subset_in_range. This function will return subset in specified range (not include the range_end).	2021-11-01 11:05:19 +08:00
Pxl	28030294f7	[Feature] Support bitmap_and_not & bitmap_and_not_count (#6910 ) Support bitmap_and_not & bitmap_and_not_count.	2021-11-01 10:11:54 +08:00
zhuixun	a842d41b87	[Function] add BE bitmap function bitmap_max (#6942 ) Support bitmap_max.	2021-10-30 18:16:38 +08:00
EmmyMiao87	adb9b0d9c6	[Bug] Return 0 when hex(0) (#6837 )	2021-10-15 10:18:55 +08:00
tianhui5	58440b90f0	[Bug] Left() string function behaves not identically to the mysql implementation (#6811 ) See Fix #6810	2021-10-15 10:17:21 +08:00
zhoubintao	ad949c2f65	Optimize Hex and add related Doc (#6697 ) I tested hex in a 1000w times for loop with random numbers， old hex avg time cost is 4.92 s，optimize hex avg time cost is 0.46 s which faster nearly 10x.	2021-10-13 11:36:14 +08:00
Cui Kaifeng	020282e885	[Bug] Fix aes_decrypt to handle null input correctly. (#6636 )	2021-09-14 11:19:55 +08:00
qiye	225bdb1fda	[Bug] fix `replace` function bug (#6605 ) * fix replace function bug * fix replace docs	2021-09-14 09:59:13 +08:00
Pxl	577ff01094	[Bug][Function] Fix pad function wrong result when len.val==str_char_size (#6564 ) like #6563 and #6562	2021-09-07 11:55:49 +08:00
zhangstar333	7a15e583a7	[Feature]Support functions of json_array, json_object, json_quote (#6504 )	2021-09-02 09:59:02 +08:00
Hao Tan	66a7a4b294	[Feature] Support exact percentile aggregate function (#6410 ) Support to calculate the exact percentile value array of numeric column `col` at the given percentage(s).	2021-08-18 15:56:06 +08:00
caiconghui	285d44cd48	[BUG] Fix potential overflow exception when do money format for double (#6408 ) * [BUG] Fix potential overflow bug when do money format for double Co-authored-by: caiconghui <caiconghui@xiaomi.com>	2021-08-15 18:40:26 +08:00
caiconghui	2f5b06ae70	[Bug][Optimize] Fix race condition problem and optimize do_money_format function (#6350 ) * [Bug][Optimize] Fix race condition problem and optimize do_money_format function Co-authored-by: caiconghui <caiconghui@xiaomi.com>	2021-08-06 16:29:34 +08:00
stdpain	4c0fdd2800	[Bug] Fix core dump in BloomFilter while build Runtime Filter right table string column contains null (#6305 ) when right table has null value in string column, runtime filter may coredump ``` select count(*) from baseall t1 join test t2 where t1.k7 = t2.k7; ```	2021-07-26 09:41:41 +08:00
xinghuayu007	13ef2c9e1d	[Function][Enhance] lower/upper case transfer function vectorized (#6253 ) Currently, the function lower()/upper() can only handle one char at a time. A vectorized function has been implemented, it makes performance 2 times faster. Here is the performance test: The length of char: 26, test 100 times vectorized-function-cost: 99491 ns normal-function-cost: 134766 ns The length of char: 260, test 100 times vectorized-function-cost: 179341 ns normal-function-cost: 344995 ns	2021-07-26 09:38:07 +08:00
HappenLee	fae3eff2e6	[Bug] Fix the bug of cast string to datetime return not null (#6228 )	2021-07-17 10:55:08 +08:00
stdpain	bf5db6eefe	[BUG][Timeout][QueryLeak] Fixed memory not released in time (#6221 ) * Revert "[Optimize] Put _Tuple_ptrs into mempool when RowBatch is initialized (#6036)" This reverts commit f254870aeb18752a786586ef5d7ccf952b97f895. * [BUG][Timeout][QueryLeak] Fixed memory not released in time, Fix Core dump in bloomfilter	2021-07-16 12:32:10 +08:00
Mingyu Chen	c2695e9716	[Bug][RoutineLoad] Can not match whole json in routine load (#6213 ) Support using json path "$" to match the whole json in routine load Co-authored-by: chenmingyu <chenmingyu@baidu.com>	2021-07-16 09:21:27 +08:00
Zhengguo Yang	ed3ff470ce	[ARRAY] Support array type load and select not include access by index (#5980 ) This is part of the array type support and has not been fully completed. The following functions are implemented 1. fe array type support and implementation of array function, support array syntax analysis and planning 2. Support import array type data through insert into 3. Support select array type data 4. Only the array type is supported on the value lie of the duplicate table this pr merge some code from #4655 #4650 #4644 #4643 #4623 #2979	2021-07-13 14:02:39 +08:00
stdpain	290a844e04	[optimize] Optimize bloomfilter performance (#6180 ) refactor runtime filter bloomfilter and eliminate some virtual function calls which obtained a performance improvement of about 5% import block bloom filter, for avx version obtained 40% performance improvement before: bloomfilter size:default, about 2000W item cost about 1s400ms after: bloomfilter size:524288, about 2000W item cost about 400ms	2021-07-10 10:12:12 +08:00
DinoZhang	c929a8935a	[Feature][Function] support bit_length function (#6140 ) support bit_length function like mysql	2021-07-08 09:40:30 +08:00
Zhengguo Yang	739c0268ff	[refactor] Remove decimal v1 related code from code base (#6079 ) remove ALL DECIMAL V1 type code ， this is a part of #6073	2021-07-07 10:26:32 +08:00
stdpain	149def9e42	[Feature] Support RuntimeFilter in Doris (BE Implement) (#6077 ) 1. support in/bloomfilter/minmax 2. support broadcast/shuffle/bucket shuffle/colocate join 3. opt memory use and cpu cache miss while build runtime filter 4. opt memory use in left semi join (works well on tpcds-95)	2021-07-04 20:59:05 +08:00

1 2 3

127 Commits