1b108604d5
branch-2.1: [fix](function) fix error result in split_by_string with utf8 chars #40710 ( #50689 )
...
Cherry-picked from #40710
Co-authored-by: Mryange <59914473+Mryange@users.noreply.github.com >
2025-05-08 19:15:52 +08:00
995f1e5dc0
branch-2.1:[fix](Nereids) fix regression framework compare issue and fix code point count ( #49575 ) ( #50667 )
...
backport: https://github.com/apache/doris/pull/49575
Co-authored-by: LiBinfeng <libinfeng@selectdb.com >
2025-05-08 16:53:02 +08:00
a40a4bbc67
branch-2.1: [fix](Nereids) fold constant for string function process emoji character by mistake #49087 ( #49344 )
...
pick: #49087
Related PR: #40441
Problem Summary:
wrong calculation of emoji character length in some String function when
do constant folding in FE. For example:
select STRLEFT('😊 😉 👍 ', 2);
should return 😊 😉 , but fe return 😊 only when folding constant
fixed functions:
- left
- strleft
- right
- strright
- locate
- character_length
- split_by_string
- overlay
- replace_empty
2025-03-22 07:44:55 +08:00
aa47a35384
[fix](mem) heap-buffer-overflow for function convert_to ( #46405 ) ( #46502 )
...
pick #46405 to branch-2.1
2025-01-07 13:46:32 +08:00
e7520ae6cf
branch-2.1: [fix](hyperscan) Fix hyper scan fall back to re2 #44547 ( #44653 )
...
Cherry-picked from #44547
Co-authored-by: zhiqiang <hezhiqiang@selectdb.com >
2024-11-28 16:00:43 +08:00
355170a921
[cherry-pick](branch2.1) impl scalar functions trim_in、ltrim_in and rtrim_in ( #42641 )
...
pick https://github.com/apache/doris/pull/41681
2024-11-01 09:55:50 +08:00
6c3d42e09a
[cherry-pick](branch-21) cherry-pick pr about ( #42488 ) ( #42099 ) ( #42055 ) ( #42916 )
...
## Proposed changes
Issue Number: close #xxx
<!--Describe your changes.-->
2024-10-31 14:14:19 +08:00
f112af0fd2
[pick](branch-2.1) pick #41555 #41592 #38204 ( #41781 )
...
pick #41555 #41592 #38204
2024-10-14 14:05:08 +08:00
f6917acd6a
[cherry-pick](branch2.1) Impl translate and url encode 2.1 ( #41051 )
...
## Proposed changes
pick https://github.com/apache/doris/pull/40567
some code about const folding should wait the pr picked:
https://github.com/apache/doris/pull/40441
2024-09-23 14:26:27 +08:00
9877a08834
[feature](function) support ngram_search function #38226 ( #40893 )
...
https://github.com/apache/doris/pull/38226
mysql [test]>select ngram_search('123456789' , '12345' , 3);
+---------------------------------------+
| ngram_search('123456789', '12345', 3) |
+---------------------------------------+
| 0.6 |
+---------------------------------------+
1 row in set (0.01 sec)
mysql [test]>select ngram_search("abababab","babababa",2);
+-----------------------------------------+
| ngram_search('abababab', 'babababa', 2) |
+-----------------------------------------+
| 1 |
+-----------------------------------------+
1 row in set (0.01 sec)
```
doc https://github.com/apache/doris-website/pull/899
## Proposed changes
Issue Number: close #xxx
<!--Describe your changes.-->
2024-09-21 20:34:44 +08:00
163193b1d4
[branch-2.1](function) fix random_bytes return same data for multi rows ( #39891 ) ( #40137 )
...
pick https://github.com/apache/doris/pull/39891
Issue Number: close #xxx
before:
```sql
mysql [optest]>SELECT random_bytes(10) a FROM numbers("number" = "10");
+------------------------+
| a |
+------------------------+
| 0x7b4e5727024bc5b59e2c |
| 0x7b4e5727024bc5b59e2c |
| 0x7b4e5727024bc5b59e2c |
| 0x7b4e5727024bc5b59e2c |
| 0x7b4e5727024bc5b59e2c |
| 0x7b4e5727024bc5b59e2c |
| 0x7b4e5727024bc5b59e2c |
| 0x7b4e5727024bc5b59e2c |
| 0x7b4e5727024bc5b59e2c |
| 0x7b4e5727024bc5b59e2c |
+------------------------+
```
now:
```sql
mysql [optest]>SELECT random_bytes(10) a FROM numbers("number" = "10");
+------------------------+
| a |
+------------------------+
| 0xd82cf60825b29ef2a0fd |
| 0x6f8c808415bdbaa6d257 |
| 0x7c26b5214297a151c25c |
| 0x43f02c77293063900437 |
| 0x5e5727569dec5e24f96b |
| 0x434f20bf74d7759640b7 |
| 0x087ed96b739750c733a6 |
| 0xdf05f6d7ede4972eb846 |
| 0xcefab471912264b5c54f |
| 0x1bddc019409d1926aa10 |
+------------------------+
```
## Proposed changes
Issue Number: close #xxx
<!--Describe your changes.-->
2024-08-30 10:43:42 +08:00
bb687bd69c
[cherry-pick](branch-2.1) add function regexp_extract_or_null ( #39561 )
...
# Proposed changes
pick https://github.com/apache/doris/pull/38296
2024-08-21 09:14:58 +08:00
eb7eaee386
[fix](function) money format ( #34680 )
2024-05-18 18:35:29 +08:00
ca9eb56233
[Fix](functions) fix strcmp return value #34565
2024-05-12 09:49:38 +08:00
6cf7468073
[enhancement](function) change some function nullable mode ( #30991 )
...
change some function nullable mode
2024-02-18 14:45:25 +08:00
ca5a314765
[fix](function) make STRLEFT and STRRIGHT and SUBSTR function DEPEND_ON_ARGUMENT ( #28352 )
...
make STRLEFT and STRRIGHT function DEPEND_ON_ARGUMENT
2024-01-25 13:23:59 +08:00
e894911cda
[function](char) change char function behaviour same with mysql ( #30034 )
...
select char(0) = '\0';
should return true;
2024-01-18 10:04:21 +08:00
1a46cf6fb5
[fix](split_by_string) Fix split by string core on column string ( #28030 )
2023-12-07 16:36:13 +08:00
007506ce42
[fix](like_func) incorrect result of like with 'NO_BACKSLASH_ESCAPES' mode ( #27842 )
2023-12-01 17:32:46 +08:00
49b73483fd
[fix](field) fix coredump of field function when the first argument is const ( #25859 )
2023-10-25 14:14:32 +08:00
b8452812df
[bug](function) fix regexp_extract_all can't handle empty str ( #25717 )
2023-10-23 15:47:12 +08:00
0128dd42d9
[fix](regexp_extract_all) fix be OOM when quering with regexp_extrac… ( #23284 )
2023-08-29 10:34:12 +08:00
affe36d32e
[test](find_in_set) add find_in_set function test case ( #20718 )
2023-06-14 09:43:48 +08:00
49f8f20fb1
[fix](regex) String with Chinese characters matching failed ( #20493 )
2023-06-07 07:27:47 +08:00
d03bb4ba7b
[Optimize](function) Optimize locate function by compare across strings ( #20290 )
...
Optimize locate function by compare across strings. about 90% speed up test by sum()
2023-06-05 12:43:14 +08:00
ee34b6de2d
[Refact] (serde) refact mysql serde with data type ( #19543 )
...
refact mysql output (de)serialize with data type serde , avoid accoriding switch case Primitive type writed in mysqlWriter
2023-05-26 14:11:17 +08:00
88ca4f3e6b
[feature](like) make like regexp used as a sql function ( #19755 )
2023-05-18 10:03:12 +08:00
b75f4c97f3
[function](string) support char function ( #18878 )
...
* [function](string) support char function
* fix
2023-04-22 08:36:48 +08:00
0b074ade02
[fix](const column) fix coredump caused by const column for some functions ( #18737 )
2023-04-18 13:57:55 +08:00
9877143210
[fix](like) fix wrong result of like pattern with backslash ( #18039 )
...
Result is empty for query select * from person where address like '%\\\\%';, but MySQL can get a line of result.
CREATE TABLE `person` (
`id` int(11) NULL,
`name` text NULL,
`age` int(11) NULL,
`class` int(11) NULL,
`address` text NULL
) ENGINE=OLAP
UNIQUE KEY(`id`)
COMMENT 'OLAP'
DISTRIBUTED BY HASH(`id`) BUCKETS 1
PROPERTIES (
"replication_allocation" = "tag.location.default: 1",
"in_memory" = "false",
"storage_format" = "V2",
"disable_auto_compaction" = "false"
);
insert into person values (10001,'test1',30,2,'test\\\\,xxx');
Adding logs:
select * from person where address like '%\\\\%';
I0323 10:26:15.907760 2387043 like.cpp:558] arg str: %\\%, size: 4, pattern LIKE_ENDS_WITH_RE: (?:%+)(((\\%)|(\\_)|([^%_]))+), size: 30
I0323 10:26:15.907789 2387043 like.cpp:562] match 0: \\%, size: 3
I0323 10:26:15.907801 2387043 like.cpp:562] match 1: \%, size: 2
I0323 10:26:15.907811 2387043 like.cpp:562] match 2: \%, size: 2
I0323 10:26:15.907821 2387043 like.cpp:562] match 3: , size: 0
I0323 10:26:15.907830 2387043 like.cpp:562] match 4: \, size: 1
I0323 10:26:15.907842 2387043 like.cpp:615] search_string : \\%
I0323 10:26:15.907855 2387043 like.cpp:619] search_string escape removed: \%
It matchs against the LIKE_ENDS_WITH_RE which is wrong, the meaning of the sql should be: match strings that have one backslash in any place.
2023-03-30 11:05:09 +08:00
f84481886b
[feature](string_functions) The 'split_part' function supports non-constant parameters ( #18029 )
2023-03-25 12:03:11 +08:00
7b93c17364
[Bug][Fix] regexp function core dump DCHECK failed and error result ( #17953 )
...
CREATE TABLE `test` (
`name` varchar(64) NULL,
`age` int(11) NULL
) ENGINE=OLAP
DUPLICATE KEY(`name`)
COMMENT 'OLAP'
DISTRIBUTED BY HASH(`name`) BUCKETS 1
PROPERTIES (
"replication_allocation" = "tag.location.default: 1",
"in_memory" = "false",
"storage_format" = "V2",
"disable_auto_compaction" = "false"
);
insert into `test` values ("lemon",1),("tom",2);
select a.name regexp concat('^', a.name) from test a;
2023-03-21 08:56:19 +08:00
3e40467ce6
[Bug](vec) Fix chinese pinyin order by ( #17152 )
...
bug: some chinese word not sort by pinyin in GBK coding
CREATE TABLE `test_convert` (
`a` varchar(100) NULL
) ENGINE=OLAP
DUPLICATE KEY(`a`)
DISTRIBUTED BY HASH(`a`) BUCKETS 3
PROPERTIES (
"replication_allocation" = "tag.location.default: 1"
);
insert into test_convert values("b"), ("a"), ("c"), ("睿"), ("多"), ("丝");
Query OK, 6 rows affected (0.03 sec)
{'label':'insert_ca73a6acc2194d5b_888218a3949355a6', 'status':'VISIBLE', 'txnId':'18068'}
mysql [test]>select * from test_convert;
+------+
| a |
+------+
| a |
| c |
| 丝 |
| b |
| 多 |
| 睿 |
+------+
6 rows in set (0.01 sec)
mysql [test]>select * from test_convert order by convert(a using gbk);
+------+
| a |
+------+
| a |
| b |
| c |
| 多 |
| 丝 |
| 睿 |
+------+
6 rows in set (0.01 sec)
2023-02-28 14:29:56 +08:00
883f575cfe
[fix](string function) fix wrong usage of iconv_open ( #17048 )
...
* [fix](string function) fix wrong usage of iconv_open
Also add test case for function convert
* fix test case
2023-02-24 09:13:10 +08:00
09870098af
[fix](func) fix core dump when the pattern of the regexp_extract_all function does not contain subpatterns ( #16408 )
2023-02-05 01:16:54 +08:00
ca7b2e27a8
[regression-test](function) add regression test for money_format with truncate ( #16052 )
2023-02-04 23:10:01 +08:00
95d7c2de26
[Refactor](function) Rewrite the function elt ( #16287 )
2023-02-01 11:17:06 +08:00
578a855b3e
[Bug](topn-opt) filter condition for analytic info for two phase read opt ( #16173 )
...
two phase read optimization should not be enabled when query has analytic info
2023-01-29 12:06:18 +08:00
05f6e4c48a
[fix](predicate) fix be core dump caused by pushing down the double column predicate ( #15693 )
2023-01-09 19:31:04 +08:00
21c2e485ae
[improvment](function) add new function substring_index ( #15024 )
2022-12-15 09:54:34 +08:00
b5c0d4870d
[fix](nereids)fix bug of elt and sub_replace function ( #14971 )
2022-12-12 17:37:36 +08:00
38570312dd
[feature](split_by_string)support split by string function ( #13741 )
2022-12-12 15:22:30 +08:00
33349c3419
[feature](function)Support negative index for function split_part ( #13914 )
2022-12-12 09:56:09 +08:00
d5d356b17f
[vectorized](function) support order by field function ( #14528 )
...
* [vectorized](function) support order by field function
* update
* update test
2022-11-25 14:00:46 +08:00
18b9db17b3
[fix](test) move cases in query to query_p0 ( #14452 )
2022-11-22 21:35:18 +08:00
34f43ac781
[bug](like function)fix like '' (empty string) get wrong result with all rows #14035
2022-11-08 08:51:39 +08:00
b83744d2f6
[feature](function)add regexp functions: regexp_replace_one, regexp_extract_all ( #13766 )
2022-11-02 23:15:57 +08:00
20363edc73
[BugFix](function) fix reverse function dynamic buffer overflow due to illegal character ( #13671 )
...
Previous logic of reverse function might not be strong enough to handle illegal character. For example, one one byte size character would be mistaken as one utf-8 character which occupies more than one byte space. And unfortunately exceeding the buffer space during future process.
2022-10-28 08:44:08 +08:00
43c6428aea
[Function](string) support sub_replace function ( #13736 )
...
* [Function](string) support sub_replace function
* remove conf
2022-10-28 08:40:08 +08:00
8a068c8c92
[function](string_function) add new string function 'not_null_or_empty' ( #13418 )
2022-10-19 11:10:37 +08:00