doris

Author	SHA1	Message	Date
ElvinWei	76ad599fd7	[enhancement](histogram) optimise aggregate function histogram (#15317 ) This pr mainly to optimize the histogram(👉🏻 https://github.com/apache/doris/pull/14910) aggregation function. Including the following: 1. Support input parameters `sample_rate` and `max_bucket_num` 2. Add UT and regression test 3. Add documentation 4. Optimize function implementation logic Parameter description： - `sample_rate`：Optional. The proportion of sample data used to generate the histogram. The default is 0.2. - `max_bucket_num`：Optional. Limit the number of histogram buckets. The default value is 128. --- Example： ``` MySQL [test]> SELECT histogram(c_float) FROM histogram_test; +-------------------------------------------------------------------------------------------------------------------------------------+ \| histogram(`c_float`) \| +-------------------------------------------------------------------------------------------------------------------------------------+ \| {"sample_rate":0.2,"max_bucket_num":128,"bucket_num":3,"buckets":[{"lower":"0.1","upper":"0.1","count":1,"pre_sum":0,"ndv":1},...]} \| +-------------------------------------------------------------------------------------------------------------------------------------+ MySQL [test]> SELECT histogram(c_string, 0.5, 2) FROM histogram_test; +-------------------------------------------------------------------------------------------------------------------------------------+ \| histogram(`c_string`) \| +-------------------------------------------------------------------------------------------------------------------------------------+ \| {"sample_rate":0.5,"max_bucket_num":2,"bucket_num":2,"buckets":[{"lower":"str1","upper":"str7","count":4,"pre_sum":0,"ndv":3},...]} \| +-------------------------------------------------------------------------------------------------------------------------------------+ ``` Query result description： ``` { "sample_rate": 0.2, "max_bucket_num": 128, "bucket_num": 3, "buckets": [ { "lower": "0.1", "upper": "0.2", "count": 2, "pre_sum": 0, "ndv": 2 }, { "lower": "0.8", "upper": "0.9", "count": 2, "pre_sum": 2, "ndv": 2 }, { "lower": "1.0", "upper": "1.0", "count": 2, "pre_sum": 4, "ndv": 1 } ] } ``` Field description： - sample_rate：Rate of sampling - max_bucket_num：Limit the maximum number of buckets - bucket_num：The actual number of buckets - buckets：All buckets - lower：Upper bound of the bucket - upper：Lower bound of the bucket - count：The number of elements contained in the bucket - pre_sum：The total number of elements in the front bucket - ndv：The number of different values in the bucket > Total number of histogram elements = number of elements in the last bucket(count) + total number of elements in the previous bucket(pre_sum).	2023-01-07 00:50:32 +08:00
starocean999	51b14c06d3	[enhancement](nereids) support approx_count_distinct function (#15406 )	2022-12-27 22:25:21 +08:00
Gabriel	4dbe30d37b	[regression](vectorized) delete vectorized config in regression tests (#15126 )	2022-12-16 17:08:29 +08:00
abmdocrt	529bdfb153	[Fix](function) Fix retention function return wrong value type (#14552 ) MySQL [db]> SELECT SUM(a.r[1]) as active_user_num, SUM(a.r[2]) as active_user_num_1day, SUM(a.r[3]) as active_user_num_3day, SUM(a.r[4]) as active_user_num_7day FROM ( SELECT user_id, retention( day = '2022-11-01', day = '2022-11-02', day = '2022-11-04', day = '2022-11-07') as r FROM login_event WHERE (day >= '2022-11-01') AND (day <= '2022-11-21') GROUP BY user_id ) a; ERROR 1105 (HY000): errCode = 2, detailMessage = sum requires a numeric parameter: sum(%element_extract%(a.r, 1))	2022-11-28 15:56:18 +08:00
HappenLee	38b4cbe253	[Bug](regression) regression fail random in fuzzy mode (#14614 )	2022-11-27 09:23:36 +08:00
zy-kkk	59b31a03c4	[Improvement](agg function) support group_bit_and/group_bit_or/group_bit_xor functions (#14386 )	2022-11-24 16:46:42 +08:00
lihangyu	8afe298a0f	[Fix](function) fix function `retention` lost `ARRAY`'s element type … (#14538 )	2022-11-24 15:19:50 +08:00
abmdocrt	70ea07bc4b	[fix](nullable) Fix nullable cache to avoid function returning wrong value (#14463 )	2022-11-24 09:35:08 +08:00
Pxl	bcd641877f	[Enhancement](scan) disable build key range and filters when push down agg work (#14248 ) disable build key range and filters when push down agg work	2022-11-21 12:47:57 +08:00
abmdocrt	b6ba654f5b	[Feature](Sequence) Support sequence_match and sequence_count functions (#13785 )	2022-11-11 13:38:45 +08:00
zhangstar333	374303186c	[Vectorized](function) support topn_array function (#13869 )	2022-11-02 19:49:23 +08:00
camby	bed759b3f5	[Fix](array-type) support CTAS for ARRAY column from collect_list and collect_set (#13627 ) Co-authored-by: cambyzju <zhuxiaoli01@baidu.com>	2022-10-26 19:42:15 +08:00
luozenglin	207f4e559e	[feature](agg) support `group_bitmap_xor` agg function. (#13287 ) support `group_bitmap_xor` agg function	2022-10-17 18:40:06 +08:00
abmdocrt	045bccdbea	[Feature](Retention) support retention function (#13056 )	2022-10-17 11:00:47 +08:00
luozenglin	d63a80eaba	[fix](bitmap_intersect) fix `bitmap_intersect` result error (#13298 )	2022-10-12 19:12:11 +08:00
Yongqiang YANG	ff1971f916	[improvement](test) add dryRun option and group all cases into either p0 or p1 (#11576 ) 1. add dryRun option to list tests 2. group all cases into p0 p1 p2	2022-08-17 22:45:53 +08:00

16 Commits