Boost tokenizer requires explicit "." after "$" to correctly extract
JSON path tokens. Without this, expressions like "$[0].key" cannot be
properly split, causing issues in downstream logic. This commit ensures
a "." is automatically added after "$" to maintain consistent token
parsing behavior.
…ght, append_trailing_char_if_absent (#49127)
The url_encode function previously performed a modulus operation on a
signed number. Converting it to an unsigned number will fix the issue.
```
before
mysql> select url_encode('编码');
+----------------------+
| url_encode('编码') |
+----------------------+
| %5.%23%0-%5.%10%/( |
+----------------------+
now
mysql> select url_encode('编码');
+----------------------+
| url_encode('编码') |
+----------------------+
| %E7%BC%96%E7%A0%81 |
+----------------------+
```
The strright function did not calculate the length according to the
number of UTF-8 characters.
```
before
mysql> select strright("你好世界",5);
+----------------------------+
| strright("你好世界",5) |
+----------------------------+
| |
+----------------------------+
now
mysql> select strright("你好世界",5);
+----------------------------+
| strright("你好世界",5) |
+----------------------------+
| 你好世界 |
+----------------------------+
```
he case of inputting a UTF-8 character was not considered.
```
mysql> select append_trailing_char_if_absent('中文', '文');
+-------------------------------------------------+
| append_trailing_char_if_absent('中文', '文') |
+-------------------------------------------------+
| NULL |
+-------------------------------------------------+
now
mysql> select append_trailing_char_if_absent('中文', '文');
+-------------------------------------------------+
| append_trailing_char_if_absent('中文', '文') |
+-------------------------------------------------+
| 中文 |
+-------------------------------------------------+
```
### What problem does this PR solve?
backport:https://github.com/apache/doris/pull/48537
Issue Number: close #xxx
Related PR: #xxx
Problem Summary:
### Release note
None
### Check List (For Author)
- Test <!-- At least one of them must be included. -->
- [ ] Regression test
- [ ] Unit Test
- [ ] Manual test (add detailed scripts or steps below)
- [ ] No need to test or manual test. Explain why:
- [ ] This is a refactor/code format and no logic has been changed.
- [ ] Previous test can cover this change.
- [ ] No code files have been changed.
- [ ] Other reason <!-- Add your reason? -->
- Behavior changed:
- [ ] No.
- [ ] Yes. <!-- Explain the behavior change -->
- Does this need documentation?
- [ ] No.
- [ ] Yes. <!-- Add document PR link here. eg:
https://github.com/apache/doris-website/pull/1214 -->
### Check List (For Reviewer who merge this PR)
- [ ] Confirm the release note
- [ ] Confirm test cases
- [ ] Confirm document
- [ ] Add branch pick label <!-- Add branch pick label that this PR
should merge into -->
Pick #49403
If the two arrays have the same non-null elements, they are considered
overlapping, and the result is 1.
If the two arrays have no common non-null elements and either array
contains a null element, the result is null.
Otherwise, the result is 0.
```
select arrays_overlap([1, 2, 3], [1, null]); -- result should be 1
select arrays_overlap([2, 3], [1, null]); -- result should be null
select arrays_overlap([2, 3], [1]); -- result should be 0
```
### What problem does this PR solve?
pick https://github.com/apache/doris/pull/37062
1. revert https://github.com/apache/doris/pull/25097. we decide to rely
on OS. not maintain independent tzdata anymore to keep result
consistency
2. refactor timezone load. removed rwlock.
before:
```sql
mysql [optest]>select count(convert_tz(d, 'Asia/Shanghai', 'America/Los_Angeles')), count(convert_tz(dt, 'America/Los_Angeles', '+00:00')) from dates;
+-------------------------------------------------------------------------------------+--------------------------------------------------------+
| count(convert_tz(cast(d as DATETIMEV2(6)), 'Asia/Shanghai', 'America/Los_Angeles')) | count(convert_tz(dt, 'America/Los_Angeles', '+00:00')) |
+-------------------------------------------------------------------------------------+--------------------------------------------------------+
| 16000000 | 16000000 |
+-------------------------------------------------------------------------------------+--------------------------------------------------------+
1 row in set (6.88 sec)
```
now:
```sql
mysql [optest]>select count(convert_tz(d, 'Asia/Shanghai', 'America/Los_Angeles')), count(convert_tz(dt, 'America/Los_Angeles', '+00:00')) from dates;
+-------------------------------------------------------------------------------------+--------------------------------------------------------+
| count(convert_tz(cast(d as DATETIMEV2(6)), 'Asia/Shanghai', 'America/Los_Angeles')) | count(convert_tz(dt, 'America/Los_Angeles', '+00:00')) |
+-------------------------------------------------------------------------------------+--------------------------------------------------------+
| 16000000 | 16000000 |
+-------------------------------------------------------------------------------------+--------------------------------------------------------+
1 row in set (2.61 sec)
```
3. now don't support timezone offset format string like 'UTC+8', like we
already said in
https://doris.apache.org/docs/dev/query/query-variables/time-zone/#usage
4. support case-insensitive timezone parsing in nereids.
5. a bug when parse timezone using nereids. should check DST by input,
but wrongly by now before. now fixed.
doc pr: https://github.com/apache/doris-website/pull/810
1. Fix memory issues in LoadStreamMgrTest.
2. Skip S3FileWriterTest by default because it depends on the environment in teamcity.
3. Fix VTimestampFunctionsTest.convert_tz_test.
this PR #22602 have check function.
only support date_trunc(column, const), so the second must be const literal
and no need to check time unit every row.