Commit Graph

121 Commits

Author SHA1 Message Date
d6a841409f [Enhancement](func)Introduce non_nullable extraction function. #16621
Introduced a new function non_nullable to BE, which can extract concrete data column from a nullable column. If the input argument is already not a nullable column, raise an error.
2023-02-18 20:44:07 +08:00
de1337511c [Bug](Datetime) Fix date time function mem use after free (#16814) 2023-02-16 16:15:58 +08:00
41947c73eb [Feature](array-function) Support array functions for nested type datev2 and datetimev2 (#16382) 2023-02-08 12:51:07 +08:00
289a4b2ea4 [fix](func) fix truncate float type result error (#16468)
When the argument of truncate function is float type, it can match both truncate(DECIMALV3) and truncate(DOUBLE), if the match is truncate(DECIMALV3), the precision is lost when converting float to DECIMALV3(38, 0).

Here I modify it to match truncate(DOUBLE) for now, maybe we still need to solve the problem of losing precision when converting float to DECIMALV3.
2023-02-08 08:57:43 +08:00
09870098af [fix](func) fix core dump when the pattern of the regexp_extract_all function does not contain subpatterns (#16408) 2023-02-05 01:16:54 +08:00
ca7b2e27a8 [regression-test](function) add regression test for money_format with truncate (#16052) 2023-02-04 23:10:01 +08:00
918004c016 [Bug](date) Fix BE crash caused by function datediff (#16397)
* [Bug](date) Fix BE crash caused by function `datediff`

* update
2023-02-04 18:43:23 +08:00
941e192019 [enhancement](test) add function case date_sub(datetime,INTERVAL dayofmonth(datetime)-1 DAY) (#16306) 2023-02-02 09:56:01 +08:00
e3c8fffd99 [function](round) fix decimal scale for scale not specified (#15541) 2023-02-01 14:58:48 +08:00
95d7c2de26 [Refactor](function) Rewrite the function elt (#16287) 2023-02-01 11:17:06 +08:00
ca7eb94f23 [improvement](agg-function) Increase the limit maximum number of agg function parameters (#15924) 2023-01-31 21:03:50 +08:00
6bebf92254 [fix][FE] fix be coredump when children of FunctionCallExpr is folded (#16064)
Co-authored-by: shizhiqiang03 <shizhiqiang03@meituan.com>
fix be coredump when children of FunctionCallExpr is folded
2023-01-30 15:25:00 +08:00
eb7da1c0ee [fix](datatype) fix some bugs about data type array datetimev2 and decimalv3 (#16132) 2023-01-29 14:26:08 +08:00
578a855b3e [Bug](topn-opt) filter condition for analytic info for two phase read opt (#16173)
two phase read optimization should not be enabled when query has analytic info
2023-01-29 12:06:18 +08:00
b919cbe487 [ehancement](nereids) Enhancement for limit clause (#16114)
support limit offset without order by.
the legacy planner supoort this feature in PR #15218
2023-01-28 11:04:03 +08:00
9ffd109b35 [fix](datetimev2) Fix BE datetimev2 type returning wrong result (#15885) 2023-01-20 22:25:20 +08:00
Pxl
81bab55d43 [Bug](function) catch function calculation error on aggregate node to avoid core dump (#15903) 2023-01-16 11:21:28 +08:00
7441b4dc96 [Feature](function) Support width_bucket function (#14396) 2023-01-12 13:59:21 +08:00
05f6e4c48a [fix](predicate) fix be core dump caused by pushing down the double column predicate (#15693) 2023-01-09 19:31:04 +08:00
76ad599fd7 [enhancement](histogram) optimise aggregate function histogram (#15317)
This pr mainly to optimize the histogram(👉🏻 https://github.com/apache/doris/pull/14910)  aggregation function. Including the following:
1. Support input parameters `sample_rate` and `max_bucket_num`
2. Add UT and regression test
3. Add documentation
4. Optimize function implementation logic
 
Parameter description:
- `sample_rate`:Optional. The proportion of sample data used to generate the histogram. The default is 0.2.
- `max_bucket_num`:Optional. Limit the number of histogram buckets. The default value is 128.

---

Example:

```
MySQL [test]> SELECT histogram(c_float) FROM histogram_test;
+-------------------------------------------------------------------------------------------------------------------------------------+
| histogram(`c_float`)                                                                                                                |
+-------------------------------------------------------------------------------------------------------------------------------------+
| {"sample_rate":0.2,"max_bucket_num":128,"bucket_num":3,"buckets":[{"lower":"0.1","upper":"0.1","count":1,"pre_sum":0,"ndv":1},...]} |
+-------------------------------------------------------------------------------------------------------------------------------------+

MySQL [test]> SELECT histogram(c_string, 0.5, 2) FROM histogram_test;
+-------------------------------------------------------------------------------------------------------------------------------------+
| histogram(`c_string`)                                                                                                               |
+-------------------------------------------------------------------------------------------------------------------------------------+
| {"sample_rate":0.5,"max_bucket_num":2,"bucket_num":2,"buckets":[{"lower":"str1","upper":"str7","count":4,"pre_sum":0,"ndv":3},...]} |
+-------------------------------------------------------------------------------------------------------------------------------------+
```

Query result description:

```
{
    "sample_rate": 0.2, 
    "max_bucket_num": 128, 
    "bucket_num": 3, 
    "buckets": [
        {
            "lower": "0.1", 
            "upper": "0.2", 
            "count": 2, 
            "pre_sum": 0, 
            "ndv": 2
        }, 
        {
            "lower": "0.8", 
            "upper": "0.9", 
            "count": 2, 
            "pre_sum": 2, 
            "ndv": 2
        }, 
        {
            "lower": "1.0", 
            "upper": "1.0", 
            "count": 2, 
            "pre_sum": 4, 
            "ndv": 1
        }
    ]
}
```

Field description:
- sample_rate:Rate of sampling
- max_bucket_num:Limit the maximum number of buckets
- bucket_num:The actual number of buckets
- buckets:All buckets
    - lower:Upper bound of the bucket
    - upper:Lower bound of the bucket
    - count:The number of elements contained in the bucket
    - pre_sum:The total number of elements in the front bucket
    - ndv:The number of different values in the bucket

> Total number of histogram elements = number of elements in the last bucket(count) + total number of elements in the previous bucket(pre_sum).
2023-01-07 00:50:32 +08:00
a97f582b93 [fix](nereids) use DAYS as default unit for DATE_ADD and DATE_SUB function (#15559) 2023-01-04 01:55:15 +08:00
Pxl
85fe9d2496 [Bug](filter) fix not in(null) return true (#15466)
fix not in(null) return true
2023-01-03 21:14:50 +08:00
781fa17993 [fix](Nereids) round function return type should be double (#15502) 2022-12-30 23:36:15 +08:00
edb9a3b58d [Bug](timediff) Fix wrong result for function timediff (#15312) 2022-12-30 00:28:51 +08:00
51b14c06d3 [enhancement](nereids) support approx_count_distinct function (#15406) 2022-12-27 22:25:21 +08:00
6bec1ffc47 [feature](planner) remove restrict of offset without order by (#15218)
Support SELECT * FROM tbl LIMIT 5, 3;
2022-12-26 09:37:41 +08:00
df5969ab58 [Feature] Support function roundBankers (#15154) 2022-12-22 22:53:09 +08:00
6712f1fc1d [fix](Nereids) encryption function with 4 params should auto-complate last param with config (#15038) 2022-12-20 13:55:54 +08:00
4dbe30d37b [regression](vectorized) delete vectorized config in regression tests (#15126) 2022-12-16 17:08:29 +08:00
21c2e485ae [improvment](function) add new function substring_index (#15024) 2022-12-15 09:54:34 +08:00
1200b22fd2 [function](round) compute accurate round value by decimal (#14946) 2022-12-13 09:53:43 +08:00
b5c0d4870d [fix](nereids)fix bug of elt and sub_replace function (#14971) 2022-12-12 17:37:36 +08:00
38570312dd [feature](split_by_string)support split by string function (#13741) 2022-12-12 15:22:30 +08:00
33349c3419 [feature](function)Support negative index for function split_part (#13914) 2022-12-12 09:56:09 +08:00
3286fb48ab [fix](if) fix coredump of if const (#14858) 2022-12-07 09:43:10 +08:00
a60490651f [improvement](function) add timezone cache for convert_tz (#14616) 2022-11-29 17:00:54 +08:00
529bdfb153 [Fix](function) Fix retention function return wrong value type (#14552)
MySQL [db]> SELECT SUM(a.r[1]) as active_user_num, SUM(a.r[2]) as active_user_num_1day, SUM(a.r[3]) as active_user_num_3day, SUM(a.r[4]) as active_user_num_7day FROM ( SELECT user_id, retention( day = '2022-11-01', day = '2022-11-02', day = '2022-11-04', day = '2022-11-07') as r FROM login_event WHERE (day >= '2022-11-01') AND (day <= '2022-11-21') GROUP BY user_id ) a;
ERROR 1105 (HY000): errCode = 2, detailMessage = sum requires a numeric parameter: sum(%element_extract%(a.r, 1))
2022-11-28 15:56:18 +08:00
38b4cbe253 [Bug](regression) regression fail random in fuzzy mode (#14614) 2022-11-27 09:23:36 +08:00
7ba4cd764a [enhancement](array-function) array_position,array_contains,countequal which in FunctionArrayIndex handle target NULL (#14564)
in the previous, the result is:
```
mysql> select array_position([1, null], null);
+--------------------------------------+
| array_position(ARRAY(1, NULL), NULL) |
+--------------------------------------+
|                                 NULL |
+--------------------------------------+
1 row in set (0.02 sec)
```

but after this commit, the result become:
```
mysql> select array_position([1, null], null);
+--------------------------------------+
| array_position(ARRAY(1, NULL), NULL) |
+--------------------------------------+
|                                    2 |
+--------------------------------------+
1 row in set (0.02 sec)
```
2022-11-25 14:19:50 +08:00
d5d356b17f [vectorized](function) support order by field function (#14528)
* [vectorized](function) support order by field function

* update

* update test
2022-11-25 14:00:46 +08:00
bc699511d0 [Fix](array-function) fix array_distinct null values (#14544)
in the previous the result is:
```
mysql> select array_distinct([1,1,3,3,null, null, null]);
+-----------------------------------------------------+
| array_distinct(ARRAY(1, 1, 3, 3, NULL, NULL, NULL)) |
+-----------------------------------------------------+
| [1, 3, NULL, NULL, NULL]                            |
+-----------------------------------------------------+
1 row in set (0.00 sec)
```

after this fix, the result becomes:
```
mysql> select array_distinct([1,1,3,3,null, null, null]);
+-----------------------------------------------------+
| array_distinct(ARRAY(1, 1, 3, 3, NULL, NULL, NULL)) |
+-----------------------------------------------------+
| [1, 3, NULL]                                        |
+-----------------------------------------------------+
1 row in set (0.00 sec)
```
2022-11-24 19:07:28 +08:00
59b31a03c4 [Improvement](agg function) support group_bit_and/group_bit_or/group_bit_xor functions (#14386) 2022-11-24 16:46:42 +08:00
8afe298a0f [Fix](function) fix function retention lost ARRAY's element type … (#14538) 2022-11-24 15:19:50 +08:00
70ea07bc4b [fix](nullable) Fix nullable cache to avoid function returning wrong value (#14463) 2022-11-24 09:35:08 +08:00
18b9db17b3 [fix](test) move cases in query to query_p0 (#14452) 2022-11-22 21:35:18 +08:00
16d8a1853a [Bug](array-function) array set function not handle all null value (#14318) 2022-11-22 09:07:43 +08:00
Pxl
bcd641877f [Enhancement](scan) disable build key range and filters when push down agg work (#14248)
disable build key range and filters when push down agg work
2022-11-21 12:47:57 +08:00
b4aef889f2 [feature-array](array-function) add array constructor function array() (#14250)
* [feature-array](array-function) add array constructor function `array()`

```
mysql>  select array(qid, creationDate) from nested_c_2  limit 10;
+------------------------------+
| array(`qid`, `creationDate`) |
+------------------------------+
| [1000038, 20090616074056]    |
| [1000069, 20090616075005]    |
| [1000130, 20090616080918]    |
| [1000145, 20090616081545]    |
+------------------------------+
10 rows in set (0.01 sec)
```
2022-11-19 10:49:50 +08:00
f86886f8f5 [Feature](function) Support array_compact function (#14141) 2022-11-15 14:24:37 +08:00
93e5d8e660 [Vectorized](function) support bitmap_from_array function (#14259) 2022-11-15 01:55:51 +08:00