In this PR, we introduce TOKENIZE function for inverted index, it is used as following:
```
SELECT TOKENIZE('I love my country', 'english');
```
It has two arguments, first is text which has to be tokenized, the second is parser type which can be **english**, **chinese** or **unicode**.
It also can be used with existing table, like this:
```
mysql> SELECT TOKENIZE(c,"chinese") FROM chinese_analyzer_test;
+---------------------------------------+
| tokenize(`c`, 'chinese') |
+---------------------------------------+
| ["来到", "北京", "清华大学"] |
| ["我爱你", "中国"] |
| ["人民", "得到", "更", "实惠"] |
+---------------------------------------+
```
New aggregation function: map_agg.
This function requires two arguments: a key and a value, which are used to build a map.
select map_agg(column1, column2) from t group by column3;
The previous logic was to read jsonbvalue while parsing the json path. For complex json paths, there will be a lot of repeated parsing work. The optimization idea is to separate the analysis and value of jsonpath
mysql >select sec_to_time(time_to_sec(cast('16:32:18' as time)));
+----------------------------------------------------+
| sec_to_time(time_to_sec(CAST('16:32:18' AS TIME))) |
+----------------------------------------------------+
| 16:32:18 |
+----------------------------------------------------+
1 row in set (0.53 sec)
mysql [test]>select sec_to_time(59538);
+--------------------+
| sec_to_time(59538) |
+--------------------+
| 16:32:18 |
+--------------------+
1 row in set (0.03 sec)
When compiling FunctionArrayEnumerateUniq::_execute_by_hash, AllocatorWithStackMemory::free(buf)
will be called when delete HashMapContainer. the gcc compiler will think that size > N and buf is not heap memory,
and report an error ' void free(void*)' called on unallocated object 'hash_map'
This only fails on doris docker + gcc 11.1, no problem on doris docker + clang 16.0.1,
no problem on ldb_toolchanin gcc 11.1 and clang 16.0.1.
before the agg_state type only support with datatype string,
But with some agg functions, eg: avg,sum,mix...
those functions need serialize type is fixed length object type
* [fix](load) in strict mode, return error for load and insert if datatype convert fails
Revert "[fix](MySQL) the way Doris handles boolean type is consistent with MySQL (#19416)"
This reverts commit 68eb420cabe5b26b09d6d4a2724ae12699bdee87.
Since it changed other behaviours, e.g. in strict mode insert into t_int values ("a"),
it will result 0 is inserted into table, but it should return error instead.
* fix be ut
* fix regression tests
This commit support a function allows return a field column in named struct column.
Since the function can return any type, this commit also supports ANY_STRUCT_TYPE
and ANY_ELEMENT_TYPE.
Add fast path for col like '%%' or col like '%' or regexp '\\.*'
(1) like about 34% speed up when use count() test
support col like '%%' , col like '%', col not like '%%' , col not like '%'
(2) regexp about 37% speed up when use count() test
support col regexp '\\.', col not regexp '\\.'
Q1: select count() From hits where url like '%';
Q2: select count() From hits where url regexp '\\.*';
Support json array with nereids bool
now :
```
set enable_nereids_planner=true;
mysql> SELECT json_array(1, "abc", NULL, TRUE, '10:00:00');
+----------------------------------------------+
| json_array(1, 'abc', NULL, TRUE, '10:00:00') |
+----------------------------------------------+
| [1,"abc",null,false,"10:00:00"] |
+----------------------------------------------+
1 row in set (0.02 sec)
```
nereids boolean is "true"/"false" is not '0' /'1' , so we always get false