Commit Graph

56 Commits

Author SHA1 Message Date
1b108604d5 branch-2.1: [fix](function) fix error result in split_by_string with utf8 chars #40710 (#50689)
Cherry-picked from #40710

Co-authored-by: Mryange <59914473+Mryange@users.noreply.github.com>
2025-05-08 19:15:52 +08:00
995f1e5dc0 branch-2.1:[fix](Nereids) fix regression framework compare issue and fix code point count (#49575) (#50667)
backport: https://github.com/apache/doris/pull/49575

Co-authored-by: LiBinfeng <libinfeng@selectdb.com>
2025-05-08 16:53:02 +08:00
a40a4bbc67 branch-2.1: [fix](Nereids) fold constant for string function process emoji character by mistake #49087 (#49344)
pick: #49087 
Related PR: #40441

Problem Summary:

wrong calculation of emoji character length in some String function when
do constant folding in FE. For example:

select STRLEFT('😊😉👍', 2);

should return 😊😉, but fe return 😊 only when folding constant

fixed functions:
- left
- strleft
- right
- strright
- locate
- character_length
- split_by_string
- overlay
- replace_empty
2025-03-22 07:44:55 +08:00
aa47a35384 [fix](mem) heap-buffer-overflow for function convert_to (#46405) (#46502)
pick #46405 to branch-2.1
2025-01-07 13:46:32 +08:00
e7520ae6cf branch-2.1: [fix](hyperscan) Fix hyper scan fall back to re2 #44547 (#44653)
Cherry-picked from #44547

Co-authored-by: zhiqiang <hezhiqiang@selectdb.com>
2024-11-28 16:00:43 +08:00
355170a921 [cherry-pick](branch2.1) impl scalar functions trim_in、ltrim_in and rtrim_in (#42641)
pick https://github.com/apache/doris/pull/41681
2024-11-01 09:55:50 +08:00
6c3d42e09a [cherry-pick](branch-21) cherry-pick pr about (#42488) (#42099) (#42055) (#42916)
## Proposed changes

Issue Number: close #xxx

<!--Describe your changes.-->
2024-10-31 14:14:19 +08:00
f112af0fd2 [pick](branch-2.1) pick #41555 #41592 #38204 (#41781)
pick #41555 #41592 #38204
2024-10-14 14:05:08 +08:00
f6917acd6a [cherry-pick](branch2.1) Impl translate and url encode 2.1 (#41051)
## Proposed changes

pick https://github.com/apache/doris/pull/40567

some code about const folding should wait the pr picked:
https://github.com/apache/doris/pull/40441
2024-09-23 14:26:27 +08:00
9877a08834 [feature](function) support ngram_search function #38226 (#40893)
https://github.com/apache/doris/pull/38226 
mysql [test]>select ngram_search('123456789' , '12345' , 3);
+---------------------------------------+
| ngram_search('123456789', '12345', 3) |
+---------------------------------------+
|                                   0.6 |
+---------------------------------------+
1 row in set (0.01 sec)

mysql [test]>select ngram_search("abababab","babababa",2);
+-----------------------------------------+
| ngram_search('abababab', 'babababa', 2) |
+-----------------------------------------+
|                                       1 |
+-----------------------------------------+
1 row in set (0.01 sec)
```

doc https://github.com/apache/doris-website/pull/899

## Proposed changes

Issue Number: close #xxx

<!--Describe your changes.-->
2024-09-21 20:34:44 +08:00
163193b1d4 [branch-2.1](function) fix random_bytes return same data for multi rows (#39891) (#40137)
pick https://github.com/apache/doris/pull/39891

Issue Number: close #xxx

before:
```sql
mysql [optest]>SELECT random_bytes(10) a FROM numbers("number" = "10");
+------------------------+
| a                      |
+------------------------+
| 0x7b4e5727024bc5b59e2c |
| 0x7b4e5727024bc5b59e2c |
| 0x7b4e5727024bc5b59e2c |
| 0x7b4e5727024bc5b59e2c |
| 0x7b4e5727024bc5b59e2c |
| 0x7b4e5727024bc5b59e2c |
| 0x7b4e5727024bc5b59e2c |
| 0x7b4e5727024bc5b59e2c |
| 0x7b4e5727024bc5b59e2c |
| 0x7b4e5727024bc5b59e2c |
+------------------------+
```

now:
```sql
mysql [optest]>SELECT random_bytes(10) a FROM numbers("number" = "10");
+------------------------+
| a                      |
+------------------------+
| 0xd82cf60825b29ef2a0fd |
| 0x6f8c808415bdbaa6d257 |
| 0x7c26b5214297a151c25c |
| 0x43f02c77293063900437 |
| 0x5e5727569dec5e24f96b |
| 0x434f20bf74d7759640b7 |
| 0x087ed96b739750c733a6 |
| 0xdf05f6d7ede4972eb846 |
| 0xcefab471912264b5c54f |
| 0x1bddc019409d1926aa10 |
+------------------------+
```

## Proposed changes

Issue Number: close #xxx

<!--Describe your changes.-->
2024-08-30 10:43:42 +08:00
bb687bd69c [cherry-pick](branch-2.1) add function regexp_extract_or_null (#39561)
# Proposed changes

pick https://github.com/apache/doris/pull/38296
2024-08-21 09:14:58 +08:00
eb7eaee386 [fix](function) money format (#34680) 2024-05-18 18:35:29 +08:00
ca9eb56233 [Fix](functions) fix strcmp return value #34565 2024-05-12 09:49:38 +08:00
6cf7468073 [enhancement](function) change some function nullable mode (#30991)
change some function nullable mode
2024-02-18 14:45:25 +08:00
ca5a314765 [fix](function) make STRLEFT and STRRIGHT and SUBSTR function DEPEND_ON_ARGUMENT (#28352)
make STRLEFT and STRRIGHT function DEPEND_ON_ARGUMENT
2024-01-25 13:23:59 +08:00
e894911cda [function](char) change char function behaviour same with mysql (#30034)
select char(0) = '\0';
should return true;
2024-01-18 10:04:21 +08:00
1a46cf6fb5 [fix](split_by_string) Fix split by string core on column string (#28030) 2023-12-07 16:36:13 +08:00
007506ce42 [fix](like_func) incorrect result of like with 'NO_BACKSLASH_ESCAPES' mode (#27842) 2023-12-01 17:32:46 +08:00
49b73483fd [fix](field) fix coredump of field function when the first argument is const (#25859) 2023-10-25 14:14:32 +08:00
b8452812df [bug](function) fix regexp_extract_all can't handle empty str (#25717) 2023-10-23 15:47:12 +08:00
0128dd42d9 [fix](regexp_extract_all) fix be OOM when quering with regexp_extrac… (#23284) 2023-08-29 10:34:12 +08:00
affe36d32e [test](find_in_set) add find_in_set function test case (#20718) 2023-06-14 09:43:48 +08:00
49f8f20fb1 [fix](regex) String with Chinese characters matching failed (#20493) 2023-06-07 07:27:47 +08:00
d03bb4ba7b [Optimize](function) Optimize locate function by compare across strings (#20290)
Optimize locate function by compare across strings. about 90% speed up test by sum()
2023-06-05 12:43:14 +08:00
ee34b6de2d [Refact] (serde) refact mysql serde with data type (#19543)
refact mysql output (de)serialize with data type serde , avoid accoriding switch case Primitive type writed in mysqlWriter
2023-05-26 14:11:17 +08:00
88ca4f3e6b [feature](like) make like regexp used as a sql function (#19755) 2023-05-18 10:03:12 +08:00
b75f4c97f3 [function](string) support char function (#18878)
* [function](string) support char function

* fix
2023-04-22 08:36:48 +08:00
0b074ade02 [fix](const column) fix coredump caused by const column for some functions (#18737) 2023-04-18 13:57:55 +08:00
9877143210 [fix](like) fix wrong result of like pattern with backslash (#18039)
Result is empty for query select * from person where address like '%\\\\%';, but MySQL can get a line of result.

CREATE TABLE `person` (
  `id` int(11) NULL,
  `name` text NULL,
  `age` int(11) NULL,
  `class` int(11) NULL,
  `address` text NULL
) ENGINE=OLAP
UNIQUE KEY(`id`)
COMMENT 'OLAP'
DISTRIBUTED BY HASH(`id`) BUCKETS 1
PROPERTIES (
"replication_allocation" = "tag.location.default: 1",
"in_memory" = "false",
"storage_format" = "V2",
"disable_auto_compaction" = "false"
); 

insert into person values (10001,'test1',30,2,'test\\\\,xxx');
Adding logs:

select * from person where address like '%\\\\%';

I0323 10:26:15.907760 2387043 like.cpp:558] arg str: %\\%, size: 4, pattern LIKE_ENDS_WITH_RE: (?:%+)(((\\%)|(\\_)|([^%_]))+), size: 30
I0323 10:26:15.907789 2387043 like.cpp:562] match 0: \\%, size: 3
I0323 10:26:15.907801 2387043 like.cpp:562] match 1: \%, size: 2
I0323 10:26:15.907811 2387043 like.cpp:562] match 2: \%, size: 2
I0323 10:26:15.907821 2387043 like.cpp:562] match 3: , size: 0
I0323 10:26:15.907830 2387043 like.cpp:562] match 4: \, size: 1
I0323 10:26:15.907842 2387043 like.cpp:615] search_string : \\%
I0323 10:26:15.907855 2387043 like.cpp:619] search_string escape removed: \%
It matchs against the LIKE_ENDS_WITH_RE which is wrong, the meaning of the sql should be: match strings that have one backslash in any place.
2023-03-30 11:05:09 +08:00
f84481886b [feature](string_functions) The 'split_part' function supports non-constant parameters (#18029) 2023-03-25 12:03:11 +08:00
7b93c17364 [Bug][Fix] regexp function core dump DCHECK failed and error result (#17953)
CREATE TABLE `test` (
`name` varchar(64) NULL,
`age` int(11) NULL
) ENGINE=OLAP
DUPLICATE KEY(`name`)
COMMENT 'OLAP'
DISTRIBUTED BY HASH(`name`) BUCKETS 1
PROPERTIES (
"replication_allocation" = "tag.location.default: 1",
"in_memory" = "false",
"storage_format" = "V2",
"disable_auto_compaction" = "false"
);
insert into `test` values ("lemon",1),("tom",2);

select a.name regexp concat('^', a.name) from test a;
2023-03-21 08:56:19 +08:00
3e40467ce6 [Bug](vec) Fix chinese pinyin order by (#17152)
bug: some chinese word not sort by pinyin in GBK coding

CREATE TABLE `test_convert` (
                 `a` varchar(100) NULL
             ) ENGINE=OLAP
               DUPLICATE KEY(`a`)
               DISTRIBUTED BY HASH(`a`) BUCKETS 3
               PROPERTIES (
               "replication_allocation" = "tag.location.default: 1"
               );
insert into test_convert values("b"), ("a"), ("c"), ("睿"), ("多"), ("丝");
Query OK, 6 rows affected (0.03 sec)
{'label':'insert_ca73a6acc2194d5b_888218a3949355a6', 'status':'VISIBLE', 'txnId':'18068'}
mysql [test]>select * from test_convert;
+------+
| a    |
+------+
| a    |
| c    |
| 丝   |
| b    |
| 多   |
| 睿   |
+------+
6 rows in set (0.01 sec)
mysql [test]>select * from test_convert order by convert(a using gbk);          
+------+
| a    |
+------+
| a    |
| b    |
| c    |
| 多   |
| 丝   |
| 睿   |
+------+
6 rows in set (0.01 sec)
2023-02-28 14:29:56 +08:00
883f575cfe [fix](string function) fix wrong usage of iconv_open (#17048)
* [fix](string function) fix wrong usage of iconv_open

Also add test case for function convert

* fix test case
2023-02-24 09:13:10 +08:00
09870098af [fix](func) fix core dump when the pattern of the regexp_extract_all function does not contain subpatterns (#16408) 2023-02-05 01:16:54 +08:00
ca7b2e27a8 [regression-test](function) add regression test for money_format with truncate (#16052) 2023-02-04 23:10:01 +08:00
95d7c2de26 [Refactor](function) Rewrite the function elt (#16287) 2023-02-01 11:17:06 +08:00
578a855b3e [Bug](topn-opt) filter condition for analytic info for two phase read opt (#16173)
two phase read optimization should not be enabled when query has analytic info
2023-01-29 12:06:18 +08:00
05f6e4c48a [fix](predicate) fix be core dump caused by pushing down the double column predicate (#15693) 2023-01-09 19:31:04 +08:00
21c2e485ae [improvment](function) add new function substring_index (#15024) 2022-12-15 09:54:34 +08:00
b5c0d4870d [fix](nereids)fix bug of elt and sub_replace function (#14971) 2022-12-12 17:37:36 +08:00
38570312dd [feature](split_by_string)support split by string function (#13741) 2022-12-12 15:22:30 +08:00
33349c3419 [feature](function)Support negative index for function split_part (#13914) 2022-12-12 09:56:09 +08:00
d5d356b17f [vectorized](function) support order by field function (#14528)
* [vectorized](function) support order by field function

* update

* update test
2022-11-25 14:00:46 +08:00
18b9db17b3 [fix](test) move cases in query to query_p0 (#14452) 2022-11-22 21:35:18 +08:00
34f43ac781 [bug](like function)fix like '' (empty string) get wrong result with all rows #14035 2022-11-08 08:51:39 +08:00
b83744d2f6 [feature](function)add regexp functions: regexp_replace_one, regexp_extract_all (#13766) 2022-11-02 23:15:57 +08:00
20363edc73 [BugFix](function) fix reverse function dynamic buffer overflow due to illegal character (#13671)
Previous logic of reverse function might not be strong enough to handle illegal character. For example, one one byte size character would be mistaken as one utf-8 character which occupies more than one byte space. And unfortunately exceeding the buffer space during future process.
2022-10-28 08:44:08 +08:00
43c6428aea [Function](string) support sub_replace function (#13736)
* [Function](string) support sub_replace function

* remove conf
2022-10-28 08:40:08 +08:00
8a068c8c92 [function](string_function) add new string function 'not_null_or_empty' (#13418) 2022-10-19 11:10:37 +08:00