Commit Graph

489 Commits

Author SHA1 Message Date
0f439bb1ca [vectorized](udf) java udf support map type (#22059) 2023-07-25 11:56:20 +08:00
b41fcbb783 [feature](agg) add the aggregation function 'mag_agg' (#22043)
New aggregation function: map_agg.

This function requires two arguments: a key and a value, which are used to build a map.

select map_agg(column1, column2) from t group by column3;
2023-07-25 11:21:03 +08:00
a0463ea047 [round](decimalv2) round decimalv2 to precision value (#22138)
* [round](decimalv2) round decimalv2 to precision value

* update

* update`
2023-07-25 03:29:48 +08:00
d2531db1cf [fix](inverted index) fix regression case test_index_change_7 occasional failure (#22066) 2023-07-24 15:39:08 +08:00
ddd7e9871d [improvement](Jsonb) optimization Jsonb path parse (#21495)
The previous logic was to read jsonbvalue while parsing the json path. For complex json paths, there will be a lot of repeated parsing work. The optimization idea is to separate the analysis and value of jsonpath
2023-07-23 18:59:12 +08:00
b35cfc5d5e [opt](join) Opt the performance of join probe (#21845) 2023-07-19 01:21:22 +08:00
845cf94a7a [feature](function) support time_to_sec (#21722)
mysql >select sec_to_time(time_to_sec(cast('16:32:18' as time)));
+----------------------------------------------------+
| sec_to_time(time_to_sec(CAST('16:32:18' AS TIME))) |
+----------------------------------------------------+
| 16:32:18                                           |
+----------------------------------------------------+
1 row in set (0.53 sec)

mysql [test]>select sec_to_time(59538);
+--------------------+
| sec_to_time(59538) |
+--------------------+
| 16:32:18           |
+--------------------+
1 row in set (0.03 sec)
2023-07-19 01:09:48 +08:00
03b575842d [Feature](table function) support explode_json_array_json (#21795) 2023-07-17 11:40:02 +08:00
be55cb8dfc [Improve](jsonb_extract) support jsonb_extract multi parse path (#21555)
support jsonb_extract multi parse path
2023-07-12 21:37:36 +08:00
Pxl
ca71048f7f [Chore](status) avoid empty error msg on status (#21454)
avoid empty error msg on status
2023-07-11 13:48:16 +08:00
8973610543 [feature](datetime) "timediff" supports calculating microseconds (#21371) 2023-07-10 19:21:32 +08:00
36524f2b72 [improvement](functions) avoid copying of block in create_block_with_nested_columns (#21526)
avoid copying of block in create_block_with_nested_columns
2023-07-10 17:21:23 +08:00
bb985cd9a1 [refactor](udf) refactor java-udf execute method by using for loop (#21388) 2023-07-07 11:43:11 +08:00
181dad4181 [fix](executor) make elt / repeat smooth upgrade. (#21493)
BE : 2.0,FE : 1.2

before

mysql [(none)]>select elt(1, 'aaa', 'bbb');
ERROR 1105 (HY000): errCode = 2, detailMessage = (127.0.0.1)[INTERNAL_ERROR]Function elt get failed, expr is VectorizedFnCall[elt](arguments=,return=String) and return type is String.

mysql [test]> INSERT INTO tbb VALUES (1, repeat("test1111", 8192))(2, repeat("test1111", 131072));
mysql [test]>select k1, md5(v1), length(v1) from tbb;
+------+----------------------------------+--------------+
| k1   | md5(`v1`)                        | length(`v1`) |
+------+----------------------------------+--------------+
| 1    | d41d8cd98f00b204e9800998ecf8427e |            0 |
| 2    | d41d8cd98f00b204e9800998ecf8427e |            0 |
+------+----------------------------------+--------------+

now

mysql [test]>select elt(1, 'aaa', 'bbb');
+----------------------+
| elt(1, 'aaa', 'bbb') |
+----------------------+
| aaa                  |
+----------------------+

mysql [test]>select k1, md5(v1), length(v1) from tbb;
+------+----------------------------------+--------------+
| k1   | md5(`v1`)                        | length(`v1`) |
+------+----------------------------------+--------------+
| 1    | 1f44fb91f47cab16f711973af06294a0 |        65536 |
| 2    | 3c514d3b89e26e2f983b7bd4cbb82055 |      1048576 |
+------+----------------------------------+--------------+
2023-07-06 19:15:06 +08:00
25b5bab22d [fix](memory) Fix hash table buf initialize null pointer (#21315)
When compiling FunctionArrayEnumerateUniq::_execute_by_hash, AllocatorWithStackMemory::free(buf)
will be called when delete HashMapContainer. the gcc compiler will think that size > N and buf is not heap memory,
and report an error ' void free(void*)' called on unallocated object 'hash_map'

This only fails on doris docker + gcc 11.1, no problem on doris docker + clang 16.0.1,
no problem on ldb_toolchanin gcc 11.1 and clang 16.0.1.
2023-06-30 14:50:53 +08:00
d76fa427a3 [improve](jsonb)Invalid json path prompts an error instead of null (#19646)
1. Invalid json path prompts an error instead of null:
before:
```sql
mysql> SELECT jsonb_extract('[{"k1":"v41","k2":400},1,"a",3.14]', '$[a]');
+-------------------------------------------------------------+
| jsonb_extract('[{"k1":"v41","k2":400},1,"a",3.14]', '$[a]') |
+-------------------------------------------------------------+
| NULL                                                        |
+-------------------------------------------------------------+
1 row in set (0.01 sec)
```
now
```sql
mysql> SELECT jsonb_extract('[{"k1":"v41","k2":400},1,"a",3.14]', '$[a]');
ERROR 1105 (HY000): errCode = 2, detailMessage = (127.0.0.1)[INVALID_ARGUMENT]Json path error: Invalid Json Path for value: $[a]
```
2. fix some problem: https://github.com/apache/doris/pull/19185
   a. support negative numbers
```sql
mysql> SELECT jsonb_extract('[{"k1":"v41","k2":400},1,"a",3.14]', '$[-2]');
+--------------------------------------------------------------+
| jsonb_extract('[{"k1":"v41","k2":400},1,"a",3.14]', '$[-2]') |
+--------------------------------------------------------------+
| "a"                                                          |
+--------------------------------------------------------------+
1 row in set (0.02 sec)
```
  b. Avoid using unnecessary memory
3. Supplementary regression test
2023-06-30 14:29:21 +08:00
a4fdf7324a [Bug](javaudf) fix BE crash if javaudf is push down (#21139) 2023-06-28 15:01:24 +08:00
e0b20f0437 [feature](function) add ip function ipv4numtostring (alias inet_ntoa) (#20936) 2023-06-27 10:17:40 +08:00
0cf9de8cef [fix](decimalv3) fix result error when cast a round decimalv3 to double (#20678) 2023-06-21 00:02:48 +08:00
55a6649da9 [fix](testcase) fix test case failure of insert null value into not null column (#20963) 2023-06-20 20:46:07 +08:00
824bc02603 [Function] Support date function: microsecond() (#20044) 2023-06-20 10:32:54 +08:00
97135a1cbb [Feature] (json)add json_contains function (#20824) 2023-06-16 15:10:12 +08:00
420603317b [improve](match) Improve performance for match query without inverted index (#20815) 2023-06-16 10:35:38 +08:00
Pxl
01e53f4e67 [Bug](materialized-view) fix problems about create mv on ssb_flat q4.1 failed (#20658)
fix problems about create mv on ssb_flat q4.1 failed
2023-06-15 14:38:21 +08:00
Pxl
a0d4f11667 [Bug](function) catch error state in function cast to avoid core dump (#20751)
catch error state in function cast to avoid core dump
2023-06-14 17:34:34 +08:00
1433544c56 [fix](case expr) fix coredump of case for null value 3 #20711 2023-06-12 20:58:01 +08:00
99c0592157 [Feature](array-function) Support array_pushback function #17417 (#19988)
Implement array_pushback.

mysql> select array_pushback([1, 2], 3);
+--------------------------------+
| array_pushback(ARRAY(1, 2), 3) |
+--------------------------------+
| [1, 2, 3]                      |
+--------------------------------+
1 row in set (0.01 sec)
2023-06-12 16:51:12 +08:00
4c340f2851 [Feature] (Multi-Catalog) support query hll column in doris jdbc table - part 1 (#19413)
Issue Number: close #17895
2023-06-12 11:16:19 +08:00
Pxl
ab7ac31d89 [Chore](case) fix failed on test_big_pad when enable pipeline engine #20644 2023-06-12 09:15:55 +08:00
a347063390 [fix](case expr) fix coredump of case for null value 2 (#20635)
fix coredump of case for null value 2
2023-06-11 23:08:53 +08:00
dd71e101d3 [fix](case expr) fix coredump of case for null value (#20564)
be coredump when when expr is null:
2023-06-08 20:05:23 +08:00
Pxl
36216f0925 [Bug](Agg-State) fix coredump when state combinator input const column (#20510)
fix coredump when state combinator input const column
2023-06-07 11:28:55 +08:00
35a2be1074 [refactor](agg_state) refactor agg_state type to support fixed length object type (#20370)
before the agg_state type only support with datatype string,
But with some agg functions, eg: avg,sum,mix...
those functions need serialize type is fixed length object type
2023-06-07 10:05:00 +08:00
49f8f20fb1 [fix](regex) String with Chinese characters matching failed (#20493) 2023-06-07 07:27:47 +08:00
1b94b6368f [fix](load) in strict mode, return error for insert if datatype convert fails (#20378)
* [fix](load) in strict mode, return error for load and insert if datatype convert fails

Revert "[fix](MySQL) the way Doris handles boolean type is consistent with MySQL (#19416)"

This reverts commit 68eb420cabe5b26b09d6d4a2724ae12699bdee87.

Since it changed other behaviours, e.g. in strict mode insert into t_int values ("a"),
it will result 0 is inserted into table, but it should return error instead.

* fix be ut

* fix regression tests
2023-06-06 12:04:03 +08:00
d02737a293 [feature](struct-type) support struct_element function (#19045)
This commit support a function allows return a field column in named struct column.
Since the function can return any type, this commit also supports ANY_STRUCT_TYPE
and ANY_ELEMENT_TYPE.
2023-06-06 10:44:08 +08:00
d03bb4ba7b [Optimize](function) Optimize locate function by compare across strings (#20290)
Optimize locate function by compare across strings. about 90% speed up test by sum()
2023-06-05 12:43:14 +08:00
59a0f80233 [Improve](array-function)Improve array function intersect (#20085)
now we just support array function with 2 arrays , but intersect operator can support more than 2 arrays
2023-06-05 10:38:48 +08:00
Pxl
8e39f0cf6b [Enchancement](Agg State) storage function name and result is nullable in agg state type (#20298)
storage function name and result is nullable in agg state type
2023-06-04 22:44:48 +08:00
Pxl
90d710e83d [Enchancement](function) optimize for padding function && add string length check on string op (#20363) 2023-06-02 21:24:41 +08:00
b62c5a70c7 [fix](match query) fix array column match query failed without inverted index (#20344) 2023-06-02 21:10:12 +08:00
78c37b5244 [Optimize](Function) Add fast path for col like '%%' or col like '%' or regexp '\\.*' (#20143)
Add fast path for col like '%%' or col like '%' or regexp '\\.*'
(1) like about 34% speed up when use count() test
support col like '%%' , col like '%', col not like '%%' , col not like '%'

(2) regexp about 37% speed up when use count() test
support col regexp '\\.', col not regexp '\\.'

Q1: select count() From hits where url like '%';
Q2: select count() From hits where url regexp '\\.*';
2023-06-02 16:26:56 +08:00
06e7c14320 [Improve](json-array) Support json array with nereids bool (#20248)
Support json array with nereids bool
now : 

```
set enable_nereids_planner=true;
mysql> SELECT json_array(1, "abc", NULL, TRUE, '10:00:00');
+----------------------------------------------+
| json_array(1, 'abc', NULL, TRUE, '10:00:00') |
+----------------------------------------------+
| [1,"abc",null,false,"10:00:00"]              |
+----------------------------------------------+
1 row in set (0.02 sec)
```
 
nereids boolean is "true"/"false" is not '0' /'1' , so we always get false
2023-06-02 14:47:24 +08:00
d68f3f3b3d [Feature](array-functions)improve array functions for array_last_index (#20294)
Now we just support array_first_index for lambda input , but no array_last_index
2023-06-02 13:54:03 +08:00
6adb3fdf11 [fix](match_phrase) Fix the inconsistent query result for 'match_phrase' after creating index without support_phrase property (#20258)
if create inverted index without support_phrase property, remaining the match_phrase condition to filter by match function.
2023-05-31 18:09:50 +08:00
c03a19ea23 [improvement](bitmap) Using set to store a small number of elements to improve performance (#19973)
Test on SSB 100g:

select lo_suppkey, count(distinct lo_linenumber) from lineorder group by lo_suppkey;
exec time: 4.388s

create materialized view:

create materialized view customer_uv as select lo_suppkey, bitmap_union(to_bitmap(lo_linenumber)) from lineorder group by lo_suppkey;
select lo_suppkey, count(distinct lo_linenumber) from lineorder group by lo_suppkey;
exec time: 12.908s

test with the patch, exec time: 5.790s
2023-05-31 16:13:42 +08:00
de08c4a57b [enhance](match) Support match query without inverted index (#19936) 2023-05-30 15:02:57 +08:00
bb12a1cb49 [Enhance](array function) add support for DecimalV3 for array_enumerate_uniq() (#17724) 2023-05-30 13:09:19 +08:00
Pxl
bbb3af6ce6 [Feature](agg_state) support agg_state combinators (#19969)
support agg_state combinators state/merge/union
2023-05-29 13:07:29 +08:00
a86134cb39 [fix](executor) Fixed an error with cast as time. #20144
before

mysql [(none)]>select cast("10:10:10" as time);
+-------------------------------+
| CAST('10:10:10' AS TIMEV2(0)) |
+-------------------------------+
| 00:00:00                      |
+-------------------------------+
after

mysql [(none)]>select cast("10:10:10" as time);
+-------------------------------+
| CAST('10:10:10' AS TIMEV2(0)) |
+-------------------------------+
| 10:10:10                      |
+-------------------------------+
In the past, we supported this syntax.

mysql [(none)]>select cast("2023:05:01 13:14:15" as time);
+------------------------------------------+
| CAST('2023:05:01 13:14:15' AS TIMEV2(0)) |
+------------------------------------------+
| 13:14:15                                 |
+------------------------------------------+
However, "10:10:10" is also a valid datetime.

mysql [(none)]>select cast("10:10:10" as datetime);
+-----------------------------------+
| CAST('10:10:10' AS DATETIMEV2(0)) |
+-----------------------------------+
| 2010-10-10 00:00:00               |
+-----------------------------------+
So here, the order of parsing has been adjusted.
2023-05-29 12:17:21 +08:00