Commit Graph

7957 Commits

Author SHA1 Message Date
03e21dddff [cherry-pick](branch-21) fix cast string to int return wrong result (#36788) (#37803)
## Proposed changes
cherry-pick from master:
https://github.com/apache/doris/pull/36788
https://github.com/apache/doris/pull/36505

<!--Describe your changes.-->
2024-07-15 18:48:49 +08:00
Pxl
e5219467dd [Bug](join) avoid overflow on bucket_size+1 (#37807)
## Proposed changes
pick from #37493
2024-07-15 18:47:36 +08:00
b7dbd5c186 [feature](inverted index) add ordered functionality to match_phrase query (#37751)
## Proposed changes

1. select count() from tbl where b match_phrase 'the brown ~2+';
2024-07-15 18:42:48 +08:00
967173d7d0 [cherry-pick-2.1](table-function) pick some table functions exec performance (#34090) (#37778)
## Proposed changes

pick from master:
https://github.com/apache/doris/pull/33904
https://github.com/apache/doris/pull/34090

Co-authored-by: HappenLee <happenlee@hotmail.com>
2024-07-15 17:15:56 +08:00
a4d37d96ca [opt](file-scanner) add not found file number in profile (#37042) (#37764)
bp #37042
2024-07-15 17:11:06 +08:00
e5339a4014 [feature](ES Catalog)Support control scroll level by config #37180 (#37290)
## Proposed changes

backport #37180
2024-07-15 16:41:38 +08:00
8df2432e94 [fix](inverted index) implementation of match function without index #36471 (#36918) 2024-07-15 16:19:41 +08:00
8360e3f6cf [fix](sleep) sleep with character const make be crash (#37681) (#37775)
cherry-pick #37681 to branch-2.1
2024-07-15 14:57:46 +08:00
de61887cdc [chore](log) reduce print warning msg during be starting up #36710 (#37780)
cherry pick from #36710
2024-07-15 14:46:54 +08:00
79f6b647d5 [FIX] should check fe host standing when coordinator is not found. (#37772)
fix https://github.com/apache/doris/pull/37707
2024-07-15 12:27:31 +08:00
232202b71f [improve](load) reduce memory reserved in memtable limiter (#37511) (#37699)
cherry-pick #37511
2024-07-15 11:09:09 +08:00
2759383365 [branch-2.1](timezone) refactor tzdata load to accelerate and unify timezone parsing (#37062) (#37269)
pick https://github.com/apache/doris/pull/37062

1. revert https://github.com/apache/doris/pull/25097. we decide to rely
on OS. not maintain independent tzdata anymore to keep result
consistency
2. refactor timezone load. removed rwlock.

before:
```sql
mysql [optest]>select count(convert_tz(d, 'Asia/Shanghai', 'America/Los_Angeles')), count(convert_tz(dt, 'America/Los_Angeles', '+00:00')) from dates;
+-------------------------------------------------------------------------------------+--------------------------------------------------------+
| count(convert_tz(cast(d as DATETIMEV2(6)), 'Asia/Shanghai', 'America/Los_Angeles')) | count(convert_tz(dt, 'America/Los_Angeles', '+00:00')) |
+-------------------------------------------------------------------------------------+--------------------------------------------------------+
|                                                                            16000000 |                                               16000000 |
+-------------------------------------------------------------------------------------+--------------------------------------------------------+
1 row in set (6.88 sec)
```
now:
```sql
mysql [optest]>select count(convert_tz(d, 'Asia/Shanghai', 'America/Los_Angeles')), count(convert_tz(dt, 'America/Los_Angeles', '+00:00')) from dates;
+-------------------------------------------------------------------------------------+--------------------------------------------------------+
| count(convert_tz(cast(d as DATETIMEV2(6)), 'Asia/Shanghai', 'America/Los_Angeles')) | count(convert_tz(dt, 'America/Los_Angeles', '+00:00')) |
+-------------------------------------------------------------------------------------+--------------------------------------------------------+
|                                                                            16000000 |                                               16000000 |
+-------------------------------------------------------------------------------------+--------------------------------------------------------+
1 row in set (2.61 sec)
```
3. now don't support timezone offset format string like 'UTC+8', like we
already said in
https://doris.apache.org/docs/dev/query/query-variables/time-zone/#usage
4. support case-insensitive timezone parsing in nereids.
5. a bug when parse timezone using nereids. should check DST by input,
but wrongly by now before. now fixed.

doc pr: https://github.com/apache/doris-website/pull/810
2024-07-15 10:56:48 +08:00
351ba4aeb2 [opt](spill) handle oom exception in spill tasks (#35025) (#35171) 2024-07-15 10:33:33 +08:00
31b3afa2c8 [fix](pipeline) fix exception safety issue in MultiCastDataStreamer (#36814)
## Proposed changes

pick #36748

```cpp
RETURN_IF_ERROR(vectorized::MutableBlock(block).merge(*pos_to_pull->_block))
```
this line may throw an exception(cannot allocate)

```
*** Query id: b7b80bfd76cc42a5-a9916f8364d5a4d3 ***
*** tablet id: 0 ***
*** Aborted at 1719187603 (unix time) try "date -d @1719187603" if you are using GNU date ***
*** Current BE git commitID: a8c48f5328 ***
*** SIGSEGV address not mapped to object (@0x47) received by PID 1197117 (TID 1197376 OR 0x7f49a25e4640) from PID 71; stack trace: ***
 0# doris::signal::(anonymous namespace)::FailureSignalHandler(int, siginfo_t*, void*) at /root/doris_branch-2.0/doris/be/src/common/signal_handler.h:417
 1# os::Linux::chained_handler(int, siginfo_t*, void*) in /usr/lib/jvm/java-8-openjdk-amd64/jre/lib/amd64/server/libjvm.so
 2# JVM_handle_linux_signal in /usr/lib/jvm/java-8-openjdk-amd64/jre/lib/amd64/server/libjvm.so
 3# signalHandler(int, siginfo_t*, void*) in /usr/lib/jvm/java-8-openjdk-amd64/jre/lib/amd64/server/libjvm.so
 4# 0x00007F4ABB927520 in /lib/x86_64-linux-gnu/libc.so.6
 5# std::default_delete<doris::vectorized::Block>::operator()(doris::vectorized::Block*) const at /var/local/ldb-toolchain/bin/../lib/gcc/x86_64-linux-gnu/11/../../../../include/c++/11/bits/unique_ptr.h:85
 6# doris::pipeline::MultiCastDataStreamer::close_sender(int) at /root/doris_branch-2.0/doris/be/src/pipeline/exec/multi_cast_data_streamer.cpp:60
 7# doris::pipeline::MultiCastDataStreamerSourceOperator::close(doris::RuntimeState*) at /root/doris_branch-2.0/doris/be/src/pipeline/exec/multi_cast_data_stream_source.cpp:120
 8# doris::pipeline::PipelineTask::close() at /root/doris_branch-2.0/doris/be/src/pipeline/pipeline_task.cpp:334
 9# doris::pipeline::TaskScheduler::_try_close_task(doris::pipeline::PipelineTask*, doris::pipeline::PipelineTaskState) at /root/doris_branch-2.0/doris/be/src/pipeline/task_scheduler.cpp:353
10# doris::pipeline::TaskScheduler::_do_work(unsigned long) in /mnt/disk1/STRESS_ENV/be/lib/doris_be
11# doris::ThreadPool::dispatch_thread() in /mnt/disk1/STRESS_ENV/be/lib/doris_be
12# doris::Thread::supervise_thread(void*) at /root/doris_branch-2.0/doris/be/src/util/thread.cpp:499
13# start_thread at ./nptl/pthread_create.c:442
14# 0x00007F4ABBA0B850 at ../sysdeps/unix/sysv/linux/x86_64/clone3.S:83
```

<!--Describe your changes.-->
2024-07-15 10:32:20 +08:00
9556c07a16 [mac](compile) fix compile error on mac (#37726) 2024-07-15 10:19:42 +08:00
8de13c5cc8 [fix](function) error scale set in unix_timestamp (#36110) (#37619)
## Proposed changes

```
mysql [test]>set DEBUG_SKIP_FOLD_CONSTANT = true;
Query OK, 0 rows affected (0.00 sec)

mysql [test]>select cast(unix_timestamp("2024-01-01",'yyyy-MM-dd') as bigint);
+------------------------------------------------------------+
| cast(unix_timestamp('2024-01-01', 'yyyy-MM-dd') as BIGINT) |
+------------------------------------------------------------+
|                                           1704038400000000 |
+------------------------------------------------------------+
```
now
```
mysql [test]>select cast(unix_timestamp("2024-01-01",'yyyy-MM-dd') as bigint);
+------------------------------------------------------------+
| cast(unix_timestamp('2024-01-01', 'yyyy-MM-dd') as BIGINT) |
+------------------------------------------------------------+
|                                                 1704038400 |
+------------------------------------------------------------+
1 row in set (0.01 sec)
```

The column does not have a scale set, but the cast uses the scale to
perform the cast.


<!--Describe your changes.-->

## Proposed changes

Issue Number: close #xxx

<!--Describe your changes.-->
2024-07-15 10:00:04 +08:00
b55dd6f644 [fix](delete) fix the error message for valid decimal data for 2.1 (#37710)
## Proposed changes

cherry-pick : #36802

<!--Describe your changes.-->
2024-07-15 09:54:42 +08:00
747172237a [branch-2.1](memory) Pick some memory GC patch (#37725)
pick
#36768
#37164
#37174
#37525
2024-07-14 15:19:40 +08:00
5162789234 [Refactor](Variant) make many insterfaces exception safe (#37640) (#37719) 2024-07-13 16:52:10 +08:00
8930df3b31 [Feature](iceberg-writer) Implements iceberg partition transform. (#37692)
## Proposed changes

Cherry-pick iceberg partition transform functionality. #36289 #36889

---------

Co-authored-by: kang <35803862+ghkang98@users.noreply.github.com>
Co-authored-by: lik40 <lik40@chinatelecom.cn>
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: Mingyu Chen <morningman@163.com>
2024-07-13 16:07:50 +08:00
7887f51e9b [fix](partial update) fix a mem leak issue (#37706) (#37730)
cherry-pick #37706
2024-07-13 09:20:01 +08:00
20758576b2 [fix](split) remove retry when fetch split batch failed (#37637)
bp: #37636
2024-07-12 22:46:03 +08:00
326b40cde2 [branch-2.1](memory) Add HTTP API to clear data cache (#37704)
pick #36599

Co-authored-by: Gabriel <gabrielleebuaa@gmail.com>
2024-07-12 17:21:52 +08:00
a61030215e [branch-2.1](memory) Support make all memory snapshots (#37705)
pick #36679
2024-07-12 16:21:37 +08:00
035027f831 [fix](query cancel) Fix query is cancelled when it comes from follower FE #37662 (#37707)
cherry pick from #37662
2024-07-12 15:50:45 +08:00
ef031c5fb2 [branch-2.1](memory) Fix reserve memory compatible with memory GC and logging (#37682)
pick
#36307
#36412
2024-07-12 11:43:26 +08:00
4dc933bb28 [cherry-pick] (branch-2.1) fix query errors caused by ignore_above (#37685)
## Proposed changes
pick from master #37679
2024-07-12 09:31:45 +08:00
87912de93f [fix](scan) catch exceptions thrown in scanner (#36101) (#37408)
## Proposed changes

pick #36101

The uncaught exceptions thrown in the scanner will cause the BE to
crash.
2024-07-12 08:49:39 +08:00
79a208259e [cherry-pick] (branch-2.1) Remove the check for inverted index file exists #36945 (#37423) 2024-07-11 21:35:52 +08:00
217eac790b [pick](Variant) pick some refactor and fix #34925 #36317 #36201 #36793 (#37526) 2024-07-11 21:25:34 +08:00
cf2fb6945a [branch-2.1](memory) Refactor LRU cache policy memory tracking (#37658)
pick 
#36235
#35965
2024-07-11 21:04:01 +08:00
62e0230523 [branch-2.1](memory) Add ThreadMemTrackerMgr BE UT (#37654)
## Proposed changes

pick #35518
2024-07-11 21:03:49 +08:00
fed632bf4a [fix](move-memtable) check segment num when closing each tablet (#36753) (#37536)
cherry-pick #36753 and #37660
2024-07-11 20:33:44 +08:00
e66ffc1b6d [branch-2.1](arrow-flight-sql) Fix pipelineX Unknown result sink type (#37540)
pick ##35804
2024-07-11 12:30:46 +08:00
9f4e7346fb [fix](compaction) fixing the inaccurate statistics of concurrent compaction tasks (#37318) (#37496) 2024-07-10 22:23:25 +08:00
741807bb22 [performance](move-memtable) only call _select_streams when necessary (#35576) (#37406)
cherry-pick #35576
2024-07-10 22:20:23 +08:00
afcc6170f6 [fix](txn_manager) Add ingested rowsets to unused rowsets when removing txn (#37417)
Generally speaking, as long as a rowset has a version, it can be
considered not to be in a pending state. However, if the rowset was
created through ingesting binlogs, it will have a version but should
still be considered in a pending state because the ingesting txn has not
yet been committed.

This PR updates the condition for determining the pending state. If a
rowset is COMMITTED, the txn should be allowed to roll back even if a
version exists.

Cherry-pick #36551
2024-07-10 14:25:44 +08:00
0cdb371624 [branch-2.1](memory) Disable Arrow Jemalloc step 2 (#37556)
pick #37533
2024-07-10 11:34:18 +08:00
5247e0ff3a [fix](clone) Increase robustness for clone #36642 (#37413)
cherry pick from #36642
2024-07-09 17:18:14 +08:00
7cda8db020 [fix](load) The NodeChannel should be canceled when failed to add block #37500 (#37527)
cherry pick from #37500
2024-07-09 17:01:04 +08:00
f7f0c20f00 [branch-2.1](cgroup memory) Correct cgroup mem info cache (#37440)
pick #36966

Co-authored-by: Hongkun Xu <xuhongkun666@163.com>
2024-07-09 16:19:37 +08:00
005304953e [performance](load) do not copy input_block in memtable (#36939) (#37407)
cherry-pick #36939
2024-07-09 15:59:44 +08:00
81360cf897 [opt](test) shorten the external p0 running time (#37320) (#37473)
bp #37320
2024-07-09 15:35:15 +08:00
3337c1bbe3 [[enhancement](compaction) adjust compaction concurrency based on compaction score and workload (#37491)
adjust compaction concurrency based on compaction score and workload
#36672
fix null pointer when retrieving CPU load average #37171
2024-07-09 09:56:35 +08:00
1e3ab0ff8c [fix](group commit) Pick make group commit cancel in time (#36249) (#37404)
pick https://github.com/apache/doris/pull/36249/
2024-07-09 09:25:11 +08:00
1a25270918 [fix](group commit) Pick Fix the incorrect group commit count in log; fix the core in get_first_block (#36408) (#37405)
Pick https://github.com/apache/doris/pull/36408/
2024-07-09 09:24:43 +08:00
0a103aa11f [improve](json)improve json support empty keys #36762 (#37351) 2024-07-08 19:04:51 +08:00
5280e277e7 [chore](be) Acquire and check MD5 digest of the file to download (#37418)
Cherry-pick #35807, #36621, #36726
2024-07-08 18:55:35 +08:00
494b54a5a5 [enhancement](trash) support skip trash, update trash default expire time (#37170) (#37409)
cherry-pick #37170
2024-07-08 15:33:02 +08:00
c66df8d9e6 [branch-2.1](load) fix no error url if no partition can be found (#36831) (#37401)
## Proposed changes

pick #36831

before
```
Stream load result: {
    "TxnId": 2014,
    "Label": "83ba46bd-280c-4e22-b581-4eb126fd49cf",
    "Comment": "",
    "TwoPhaseCommit": "false",
    "Status": "Fail",
    "Message": "[DATA_QUALITY_ERROR]Encountered unqualified data, stop processing",
    "NumberTotalRows": 1,
    "NumberLoadedRows": 1,
    "NumberFilteredRows": 0,
    "NumberUnselectedRows": 0,
    "LoadBytes": 1669,
    "LoadTimeMs": 58,
    "BeginTxnTimeMs": 0,
    "StreamLoadPutTimeMs": 10,
    "ReadDataTimeMs": 0,
    "WriteDataTimeMs": 47,
    "CommitAndPublishTimeMs": 0
}
```

after
```
Stream load result: {
    "TxnId": 2014,
    "Label": "83ba46bd-280c-4e22-b581-4eb126fd49cf",
    "Comment": "",
    "TwoPhaseCommit": "false",
    "Status": "Fail",
    "Message": "[DATA_QUALITY_ERROR]too many filtered rows",
    "NumberTotalRows": 1,
    "NumberLoadedRows": 0,
    "NumberFilteredRows": 1,
    "NumberUnselectedRows": 0,
    "LoadBytes": 1669,
    "LoadTimeMs": 58,
    "BeginTxnTimeMs": 0,
    "StreamLoadPutTimeMs": 10,
    "ReadDataTimeMs": 0,
    "WriteDataTimeMs": 47,
    "CommitAndPublishTimeMs": 0,
    "ErrorURL": "http://XXXX:8040/api/_load_error_log?file=__shard_4/error_log_insert_stmt_c6461270125a615b-2873833fb48d56a3_c6461270125a615b_2873833fb48d56a3"
}
```

## Proposed changes

Issue Number: close #xxx

<!--Describe your changes.-->
2024-07-08 10:41:33 +08:00