Commit Graph

34 Commits

Author SHA1 Message Date
fd2d9ae63e [improve](test) fix regression test case report error when run times (#30531) 2024-02-01 19:01:08 +08:00
3e10e5af39 [Fix](Serde) Fix content displayed by complex types in MySQL Client (#25946)
This pr makes three changes to the display of complex types:
1. NULL value in complex types refers to being displayed as `null`, not `NULL`
2. struct type is displayed as "column_name": column_value
3. Time types such as `datetime` and `date`, are displayed with double quotes in complex types. like
    `{1, "2023-10-26 12:12:12"}`

This pr also do a code refactor:
1. nesting_level is set to a member variable of the `DataTypeSerDe`, rather than a parameter in methods.

What's more, this pr fix a bug that fileSize is not correct, introduced by this pr: #25854
2023-11-01 23:48:55 +08:00
9649e09aaa [feature](function) support bitmap type in min/max_by agg function (#25430)
support bitmap type in min/max_by agg function
2023-10-16 11:05:32 +08:00
4608dcb2d9 [fix](agg) fix coredump caused by push down count aggregation (#22699)
fix coredump caused by push down count aggregation
2023-08-09 10:21:20 +08:00
1a8a1e5b16 [Feature](count_by_enum) support count_by_enum function (#22071)
count_by_enum(expr1, expr2, ... , exprN);

Treats the data in a column as an enumeration and counts the number of values in each enumeration. Returns the number of enumerated values for each column, and the number of non-null values versus the number of null values.
2023-08-06 16:05:14 +08:00
Pxl
a0d4f11667 [Bug](function) catch error state in function cast to avoid core dump (#20751)
catch error state in function cast to avoid core dump
2023-06-14 17:34:34 +08:00
feb21fc9e9 [fix](group_concat) use default seperator ',' instead of ', ' for group_concat, to be consistant with mysql (#20741) 2023-06-13 17:20:29 +08:00
55ccddb62c [Conf](decimalv3) enable decimalv3 by default 2023-05-29 15:38:31 +08:00
ee34b6de2d [Refact] (serde) refact mysql serde with data type (#19543)
refact mysql output (de)serialize with data type serde , avoid accoriding switch case Primitive type writed in mysqlWriter
2023-05-26 14:11:17 +08:00
df0aaece1d [Function](test) add some test cases for agg functions (#18610) 2023-04-13 10:23:41 +08:00
66f3ef568e (functions) optimize const_column to full convert 2023-03-15 10:57:03 +08:00
5b39fa9843 [Feature](vec)(quantile_state): support quantile state in vectorized engine (#16562)
* [Feature](vectorized)(quantile_state): support vectorized quantile state functions
1. now quantile column only support not nullable
2. add up some regression test cases
3. set default enable_quantile_state_type = true
---------

Co-authored-by: spaces-x <weixiang06@meituan.com>
2023-03-14 10:54:04 +08:00
782001c75b [fix](planner) project should be done inside subquery (#17630)
WITH t0 AS(
SELECT report.date1 AS date2 FROM(
SELECT DATE_FORMAT(date, '%Y%m%d') AS date1 FROM cir_1756_t1
) report GROUP BY report.date1
),
t3 AS(
SELECT date_format(date, '%Y%m%d') AS date3
FROM cir_1756_t2
)
SELECT row_number() OVER(ORDER BY date2)
FROM(
SELECT t0.date2 FROM t0 LEFT JOIN t3 ON t0.date2 = t3.date3
) tx;

The DATE_FORMAT(date, '%Y%m%d') was calculated in GROUP BY node, which is wrong. This expr should be calculated inside the subquery.
2023-03-13 11:10:27 +08:00
55c42da511 [Feature](array) Support array<decimalv3> data type (#16640) 2023-03-13 10:48:13 +08:00
bd5ed2b0c2 [enhancement](histogram) optimize the histogram bucketing strategy, etc (#17264)
* optimize the histogram bucketing strategy, etc

* fix p0 regression of histogram
2023-03-08 20:12:05 +08:00
c0360f80bb [enhancement](aggregate-function) enhance aggregate funtion collect and add group_array aliases (#15339)
Enhance aggregate function `collect_set` and `collect_list` to support optional `max_size` param,
which enables to limit the number of elements in result array.
2023-02-27 14:22:30 +08:00
7956800df7 [refactor](Nereids) let type coercion same with legacy planner (#16844)
- change for Nereids
1. add a variable length parameter to the ctor of Count for a good error reporting of Count(a, b)
2. refactor StringRegexPredicate, let it inherit from ScalarFunction
3. remove useless class TypeCollection
4. use catalog.Type.Collection to check expression arguments type
5. change type coercion for TimestampArithmetic, divide, integral divide, comparison predicate, case when and in predicate. Let them same as legacy planner.

- change for legacy planner
1. change the common type of floating and Decimal from Decimal to Double
2023-02-22 17:29:37 +08:00
ca7eb94f23 [improvement](agg-function) Increase the limit maximum number of agg function parameters (#15924) 2023-01-31 21:03:50 +08:00
76ad599fd7 [enhancement](histogram) optimise aggregate function histogram (#15317)
This pr mainly to optimize the histogram(👉🏻 https://github.com/apache/doris/pull/14910)  aggregation function. Including the following:
1. Support input parameters `sample_rate` and `max_bucket_num`
2. Add UT and regression test
3. Add documentation
4. Optimize function implementation logic
 
Parameter description:
- `sample_rate`:Optional. The proportion of sample data used to generate the histogram. The default is 0.2.
- `max_bucket_num`:Optional. Limit the number of histogram buckets. The default value is 128.

---

Example:

```
MySQL [test]> SELECT histogram(c_float) FROM histogram_test;
+-------------------------------------------------------------------------------------------------------------------------------------+
| histogram(`c_float`)                                                                                                                |
+-------------------------------------------------------------------------------------------------------------------------------------+
| {"sample_rate":0.2,"max_bucket_num":128,"bucket_num":3,"buckets":[{"lower":"0.1","upper":"0.1","count":1,"pre_sum":0,"ndv":1},...]} |
+-------------------------------------------------------------------------------------------------------------------------------------+

MySQL [test]> SELECT histogram(c_string, 0.5, 2) FROM histogram_test;
+-------------------------------------------------------------------------------------------------------------------------------------+
| histogram(`c_string`)                                                                                                               |
+-------------------------------------------------------------------------------------------------------------------------------------+
| {"sample_rate":0.5,"max_bucket_num":2,"bucket_num":2,"buckets":[{"lower":"str1","upper":"str7","count":4,"pre_sum":0,"ndv":3},...]} |
+-------------------------------------------------------------------------------------------------------------------------------------+
```

Query result description:

```
{
    "sample_rate": 0.2, 
    "max_bucket_num": 128, 
    "bucket_num": 3, 
    "buckets": [
        {
            "lower": "0.1", 
            "upper": "0.2", 
            "count": 2, 
            "pre_sum": 0, 
            "ndv": 2
        }, 
        {
            "lower": "0.8", 
            "upper": "0.9", 
            "count": 2, 
            "pre_sum": 2, 
            "ndv": 2
        }, 
        {
            "lower": "1.0", 
            "upper": "1.0", 
            "count": 2, 
            "pre_sum": 4, 
            "ndv": 1
        }
    ]
}
```

Field description:
- sample_rate:Rate of sampling
- max_bucket_num:Limit the maximum number of buckets
- bucket_num:The actual number of buckets
- buckets:All buckets
    - lower:Upper bound of the bucket
    - upper:Lower bound of the bucket
    - count:The number of elements contained in the bucket
    - pre_sum:The total number of elements in the front bucket
    - ndv:The number of different values in the bucket

> Total number of histogram elements = number of elements in the last bucket(count) + total number of elements in the previous bucket(pre_sum).
2023-01-07 00:50:32 +08:00
51b14c06d3 [enhancement](nereids) support approx_count_distinct function (#15406) 2022-12-27 22:25:21 +08:00
4dbe30d37b [regression](vectorized) delete vectorized config in regression tests (#15126) 2022-12-16 17:08:29 +08:00
529bdfb153 [Fix](function) Fix retention function return wrong value type (#14552)
MySQL [db]> SELECT SUM(a.r[1]) as active_user_num, SUM(a.r[2]) as active_user_num_1day, SUM(a.r[3]) as active_user_num_3day, SUM(a.r[4]) as active_user_num_7day FROM ( SELECT user_id, retention( day = '2022-11-01', day = '2022-11-02', day = '2022-11-04', day = '2022-11-07') as r FROM login_event WHERE (day >= '2022-11-01') AND (day <= '2022-11-21') GROUP BY user_id ) a;
ERROR 1105 (HY000): errCode = 2, detailMessage = sum requires a numeric parameter: sum(%element_extract%(a.r, 1))
2022-11-28 15:56:18 +08:00
38b4cbe253 [Bug](regression) regression fail random in fuzzy mode (#14614) 2022-11-27 09:23:36 +08:00
59b31a03c4 [Improvement](agg function) support group_bit_and/group_bit_or/group_bit_xor functions (#14386) 2022-11-24 16:46:42 +08:00
8afe298a0f [Fix](function) fix function retention lost ARRAY's element type … (#14538) 2022-11-24 15:19:50 +08:00
70ea07bc4b [fix](nullable) Fix nullable cache to avoid function returning wrong value (#14463) 2022-11-24 09:35:08 +08:00
Pxl
bcd641877f [Enhancement](scan) disable build key range and filters when push down agg work (#14248)
disable build key range and filters when push down agg work
2022-11-21 12:47:57 +08:00
b6ba654f5b [Feature](Sequence) Support sequence_match and sequence_count functions (#13785) 2022-11-11 13:38:45 +08:00
374303186c [Vectorized](function) support topn_array function (#13869) 2022-11-02 19:49:23 +08:00
bed759b3f5 [Fix](array-type) support CTAS for ARRAY column from collect_list and collect_set (#13627)
Co-authored-by: cambyzju <zhuxiaoli01@baidu.com>
2022-10-26 19:42:15 +08:00
207f4e559e [feature](agg) support group_bitmap_xor agg function. (#13287)
support `group_bitmap_xor` agg function
2022-10-17 18:40:06 +08:00
045bccdbea [Feature](Retention) support retention function (#13056) 2022-10-17 11:00:47 +08:00
d63a80eaba [fix](bitmap_intersect) fix bitmap_intersect result error (#13298) 2022-10-12 19:12:11 +08:00
ff1971f916 [improvement](test) add dryRun option and group all cases into either p0 or p1 (#11576)
1. add dryRun option to list tests
2. group all cases into p0 p1 p2
2022-08-17 22:45:53 +08:00