Commit Graph

1671 Commits

Author SHA1 Message Date
1f779ba9de [branch-2.1](arrow-flight-sql) Open regression-test/pipeline/p0/arrow_flight_sql (#37727)
pick #36854
2024-07-16 16:23:43 +08:00
de61887cdc [chore](log) reduce print warning msg during be starting up #36710 (#37780)
cherry pick from #36710
2024-07-15 14:46:54 +08:00
79f6b647d5 [FIX] should check fe host standing when coordinator is not found. (#37772)
fix https://github.com/apache/doris/pull/37707
2024-07-15 12:27:31 +08:00
2759383365 [branch-2.1](timezone) refactor tzdata load to accelerate and unify timezone parsing (#37062) (#37269)
pick https://github.com/apache/doris/pull/37062

1. revert https://github.com/apache/doris/pull/25097. we decide to rely
on OS. not maintain independent tzdata anymore to keep result
consistency
2. refactor timezone load. removed rwlock.

before:
```sql
mysql [optest]>select count(convert_tz(d, 'Asia/Shanghai', 'America/Los_Angeles')), count(convert_tz(dt, 'America/Los_Angeles', '+00:00')) from dates;
+-------------------------------------------------------------------------------------+--------------------------------------------------------+
| count(convert_tz(cast(d as DATETIMEV2(6)), 'Asia/Shanghai', 'America/Los_Angeles')) | count(convert_tz(dt, 'America/Los_Angeles', '+00:00')) |
+-------------------------------------------------------------------------------------+--------------------------------------------------------+
|                                                                            16000000 |                                               16000000 |
+-------------------------------------------------------------------------------------+--------------------------------------------------------+
1 row in set (6.88 sec)
```
now:
```sql
mysql [optest]>select count(convert_tz(d, 'Asia/Shanghai', 'America/Los_Angeles')), count(convert_tz(dt, 'America/Los_Angeles', '+00:00')) from dates;
+-------------------------------------------------------------------------------------+--------------------------------------------------------+
| count(convert_tz(cast(d as DATETIMEV2(6)), 'Asia/Shanghai', 'America/Los_Angeles')) | count(convert_tz(dt, 'America/Los_Angeles', '+00:00')) |
+-------------------------------------------------------------------------------------+--------------------------------------------------------+
|                                                                            16000000 |                                               16000000 |
+-------------------------------------------------------------------------------------+--------------------------------------------------------+
1 row in set (2.61 sec)
```
3. now don't support timezone offset format string like 'UTC+8', like we
already said in
https://doris.apache.org/docs/dev/query/query-variables/time-zone/#usage
4. support case-insensitive timezone parsing in nereids.
5. a bug when parse timezone using nereids. should check DST by input,
but wrongly by now before. now fixed.

doc pr: https://github.com/apache/doris-website/pull/810
2024-07-15 10:56:48 +08:00
9556c07a16 [mac](compile) fix compile error on mac (#37726) 2024-07-15 10:19:42 +08:00
326b40cde2 [branch-2.1](memory) Add HTTP API to clear data cache (#37704)
pick #36599

Co-authored-by: Gabriel <gabrielleebuaa@gmail.com>
2024-07-12 17:21:52 +08:00
a61030215e [branch-2.1](memory) Support make all memory snapshots (#37705)
pick #36679
2024-07-12 16:21:37 +08:00
035027f831 [fix](query cancel) Fix query is cancelled when it comes from follower FE #37662 (#37707)
cherry pick from #37662
2024-07-12 15:50:45 +08:00
ef031c5fb2 [branch-2.1](memory) Fix reserve memory compatible with memory GC and logging (#37682)
pick
#36307
#36412
2024-07-12 11:43:26 +08:00
cf2fb6945a [branch-2.1](memory) Refactor LRU cache policy memory tracking (#37658)
pick 
#36235
#35965
2024-07-11 21:04:01 +08:00
62e0230523 [branch-2.1](memory) Add ThreadMemTrackerMgr BE UT (#37654)
## Proposed changes

pick #35518
2024-07-11 21:03:49 +08:00
fed632bf4a [fix](move-memtable) check segment num when closing each tablet (#36753) (#37536)
cherry-pick #36753 and #37660
2024-07-11 20:33:44 +08:00
1e3ab0ff8c [fix](group commit) Pick make group commit cancel in time (#36249) (#37404)
pick https://github.com/apache/doris/pull/36249/
2024-07-09 09:25:11 +08:00
1a25270918 [fix](group commit) Pick Fix the incorrect group commit count in log; fix the core in get_first_block (#36408) (#37405)
Pick https://github.com/apache/doris/pull/36408/
2024-07-09 09:24:43 +08:00
5280e277e7 [chore](be) Acquire and check MD5 digest of the file to download (#37418)
Cherry-pick #35807, #36621, #36726
2024-07-08 18:55:35 +08:00
70f46c12b3 [improve](group commit) Pick Modify group commit case and modify cancel status (#35995) (#37398)
Pick https://github.com/apache/doris/pull/35995
2024-07-08 10:27:08 +08:00
423483ed8f [branch-2.1](routine-load) optimize out of range error message (#37391)
## Proposed changes
pick #36450

before
```
ErrorReason{code=errCode = 105, msg='be 10002 abort task, task id: d846f3d3-7c9e-44a7-bee0-3eff8cd11c6f job id: 11310 with reason: [INTERNAL_ERROR]Offset out of range,

        0#  doris::Status doris::Status::Error<6, true>(std::basic_string_view<char, std::char_traits<char> >) at /mnt/disk1/laihui/doris/be/src/common/status.h:422
        1#  doris::Status doris::Status::InternalError<true>(std::basic_string_view<char, std::char_traits<char> >) at /mnt/disk1/laihui/doris/be/src/common/status.h:468
        2#  doris::KafkaDataConsumer::group_consume(doris::BlockingQueue<RdKafka::Message*>*, long) at /mnt/disk1/laihui/doris/be/src/runtime/routine_load/data_consumer.cpp:226
        3#  doris::KafkaDataConsumerGroup::actual_consume(std::shared_ptr<doris::DataConsumer>, doris::BlockingQueue<RdKafka::Message*>*, long, std::function<void (doris::Status const&)>) at /mnt/disk1/laihui/doris/be/src/runtime/routine_load/data_consumer_group.cpp:200
        4#  void std::__invoke_impl<void, void (doris::KafkaDataConsumerGroup::*&)(std::shared_ptr<doris::DataConsumer>, doris::BlockingQueue<RdKafka::Message*>*, long, std::function<void (doris::Status const&)>), doris::KafkaDataConsumerGroup*&, std::shared_ptr<doris::DataConsumer>&, doris::BlockingQueue<RdKafka::Message*>*&, long&, doris::KafkaDataConsumerGroup::start_all(std::shared_ptr<doris::StreamLoadContext>, std::shared_ptr<doris::io::KafkaConsumerPipe>)::$_0&>(std::__invoke_memfun_deref, void (doris::KafkaDataConsumerGroup::*&)(std::shared_ptr<doris::DataConsumer>, doris::BlockingQueue<RdKafka::Message*>*, long, std::function<void (doris::Status const&)>), doris::KafkaDataConsumerGroup*&, std::shared_ptr<doris::DataConsumer>&, doris::BlockingQueue<RdKafka::Message*>*&, long&, doris::KafkaDataConsumerGroup::start_all(std::shared_ptr<doris::StreamLoadContext>, std::shared_ptr<doris::io::KafkaConsumerPipe>)::$_0&) at /mnt/disk1/laihui/build/ldb_toolchain/bin/../lib/gcc/x86_64-linux-gnu/11/../../../../include/c++/11/bits/invoke.h:74
...
```

now
```
ErrorReason{code=errCode = 105, msg='be 10002 abort task, task id: 3ba0c0f4-d13c-4dfa-90ce-3df922fd9340 job id: 11310 with reason: [INTERNAL_ERROR]Offset out of range, consume partition 0, consume offset 100, the offset used by job does not exist in kafka, please check the offset, using the Alter ROUTINE LOAD command to modify it, and resume the job'}
```

## Proposed changes

Issue Number: close #xxx

<!--Describe your changes.-->
2024-07-07 18:29:04 +08:00
7d423b3a6a [chery-pick](branch-2.1) Pick "[Fix](group commit) Fix group commit block queue mem estimate fault" (#37379)
Pick [Fix](group commit) Fix group commit block queue mem estimate faule
#35314

## Proposed changes

Issue Number: close #xxx

<!--Describe your changes.-->

**Problem:** When `group commit=async_mode` and NULL data is imported
into a `variant` type column, it causes incorrect memory statistics for
group commit backpressure, leading to a stuck issue. **Cause:** In group
commit mode, blocks are first added to a queue in batches using `add
block`, and then blocks are retrieved from the queue using `get block`.
To track memory usage during backpressure, we add the block size to the
memory statistics during `add block` and subtract the block size from
the memory statistics during `get block`. However, for `variant` types,
during the `add block` write to WAL, serialization occurs, which can
merge types (e.g., merging `int` and `bigint` into `bigint`), thereby
changing the block size. This results in a discrepancy between the block
size during `get block` and `add block`, causing memory statistics to
overflow.
**Solution:** Record the block size at the time of `add block` and use
this recorded size during `get block` instead of the actual block size.
This ensures consistency in the memory addition and subtraction.

## Further comments

If this is a relatively large or complex change, kick off the discussion
at [dev@doris.apache.org](mailto:dev@doris.apache.org) by explaining why
you chose the solution you did and what alternatives you considered,
etc...

## Proposed changes

Issue Number: close #xxx

<!--Describe your changes.-->
2024-07-07 18:27:49 +08:00
f2693152bb [fix](multi-table-load) fix be core when multi table load pipe finish fail (#37383)
pick #36269
2024-07-07 18:24:16 +08:00
Pxl
e2c2702dff [Bug](runtime-filter) fix some rf error problems (#37155)
## Proposed changes
pick from #37273
2024-07-04 20:03:46 +08:00
b272247a57 [pick]log thread num (#37258)
## Proposed changes

pick #37159
2024-07-04 15:27:52 +08:00
Pxl
ffc57c9ef4 [Bug](runtime-filter) fix brpc ctrl use after free (#37223)
part of #35186
2024-07-03 21:01:50 +08:00
bd24a8bdd9 [Fix](csv_reader) Add a session variable to control whether empty rows in CSV files are read as NULL values (#37153)
bp: #36668
2024-07-02 22:12:17 +08:00
f5572ac732 [pick]reset memtable flush thread num (#37092)
## Proposed changes

pick #37028
2024-07-02 19:20:17 +08:00
07278e9dcb [improvement](segmentcache) limit segment cache by memory or segment … (#37035)
…num (#37026)

pick ##37026
2024-06-30 20:34:13 +08:00
Pxl
cb80ae906f [Bug](runtime-filter) disable sync filter when pipeline engine is off (#36994)
## Proposed changes
1. disable sync filter when pipeline engine is off
2. reduce some warning log
2024-06-28 16:59:26 +08:00
785a1f49f5 [fix](txn) Fix coordidator be restart not abort txn #35342 (#36437)
cherry pick from #35342
2024-06-25 13:35:01 +08:00
c8e4c404fa [Fix]check if fe set thrift field current_connect_fe (#36681)
bp #36678
2024-06-21 22:15:25 +08:00
a79b56ac23 [chore](be) Support config max message size for be thrift server (#36595)
Cherry-pick #36467
2024-06-20 20:15:43 +08:00
bd47d5a681 [branch-2.1](auto-partition) Fix auto partition load failure in multi replica (#36586)
this pr
1. picked #35630, which was reverted #36098 before.
2. picked #36344 from master

these two pr fixed existing bug about auto partition load.

---------

Co-authored-by: Kaijie Chen <ckj@apache.org>
2024-06-20 17:51:18 +08:00
88e02c836d [Fix]Fix insert select missing audit log when connect follower FE (#36481)
## Proposed changes

pick #36472
2024-06-20 15:16:16 +08:00
612f2ae961 [feature](api) add BE HTTP /api/load_streams (#36312) (#36338)
cherry-pick #36312
2024-06-16 22:09:04 +08:00
6bb670ab38 [metrics](bvar) add bvar for load stream and file writer count (#36300) (#36336)
cherry-pick #36300
2024-06-16 10:14:59 +08:00
7051431671 [branch-2.1](memory) fix query thread attach memory tracker (#36245)
## Proposed changes

fix dcheck
```
*** Check failure stack trace: ***
F20240613 12:33:01.700206 1467887 thread_context.h:204] Check failed: doris::k_doris_exit || !doris::config::enable_memory_orphan_check || thread_mem_tracker()->label() != "Orphan" If you crash here, it means that SCOPED_ATTACH_TASK and SCOPED_SWITCH_THREAD_MEM_TRACKER_LIMITER are not used correctly. starting position of each thread is expected to use SCOPED_ATTACH_TASK to bind a MemTrackerLimiter belonging to Query/Load/Compaction/Other Tasks, otherwise memory alloc using Doris Allocator in the thread will crash. If you want to switch MemTrackerLimiter during thread execution, please use SCOPED_SWITCH_THREAD_MEM_TRACKER_LIMITER, do not repeat Attach. Of course, you can modify enable_memory_orphan_check=false in be.conf to avoid this crash.

44# doris::FragmentMgr::exec_plan_fragment(doris::TPipelineFragmentParams const&, std::function<void (doris::RuntimeState*, doris::Status*)> const&)::$_0::operator()() const at /home/zcp/repo_center/doris_branch-2.1/doris/be/src/runtime/fragment_mgr.cpp:981
45# void std::__invoke_impl<void, doris::FragmentMgr::exec_plan_fragment(doris::TPipelineFragmentParams const&, std::function<void (doris::RuntimeState*, doris::Status*)> const&)::$_0&>(std::__invoke_other, doris::FragmentMgr::exec_plan_fragment(doris::TPipelineFragmentParams const&, std::function<void (doris::RuntimeState*, doris::Status*)> const&)::$_0&) at /var/local/ldb_toolchain/bin/../lib/gcc/x86_64-linux-gnu/11/../../../../include/c++/11/bits/invoke.h:61
46# std::enable_if<is_invocable_r_v<void, doris::FragmentMgr::exec_plan_fragment(doris::TPipelineFragmentParams const&, std::function<void (doris::RuntimeState*, doris::Status*)> const&)::$_0&>, void>::type std::__invoke_r<void, doris::FragmentMgr::exec_plan_fragment(doris::TPipelineFragmentParams const&, std::function<void (doris::RuntimeState*, doris::Status*)> const&)::$_0&>(doris::FragmentMgr::exec_plan_fragment(doris::TPipelineFragmentParams const&, std::function<void (doris::RuntimeState*, doris::Status*)> const&)::$_0&) at /var/local/ldb_toolchain/bin/../lib/gcc/x86_64-linux-gnu/11/../../../../include/c++/11/bits/invoke.h:117
47# std::_Function_handler<void (), doris::FragmentMgr::exec_plan_fragment(doris::TPipelineFragmentParams const&, std::function<void (doris::RuntimeState*, doris::Status*)> const&)::$_0>::_M_invoke(std::_Any_data const&) at /var/local/ldb_toolchain/bin/../lib/gcc/x86_64-linux-gnu/11/../../../../include/c++/11/bits/std_function.h:291
48# std::function<void ()>::operator()() const at /var/local/ldb_toolchain/bin/../lib/gcc/x86_64-linux-gnu/11/../../../../include/c++/11/bits/std_function.h:560
49# doris::FunctionRunnable::run() at /home/zcp/repo_center/doris_branch-2.1/doris/be/src/util/threadpool.cpp:48
50# doris::ThreadPool::dispatch_thread() at /home/zcp/repo_center/doris_branch-2.1/doris/be/src/util/threadpool.cpp:543
51# void std::__invoke_impl<void, void (doris::ThreadPool::*&)(), doris::ThreadPool*&>(std::__invoke_memfun_deref, void (doris::ThreadPool::*&)(), doris::ThreadPool*&) at /var/local/ldb_toolchain/bin/../lib/gcc/x86_64-linux-gnu/11/../../../../include/c++/11/bits/invoke.h:74
52# std::__invoke_result<void (doris::ThreadPool::*&)(), doris::ThreadPool*&>::type std::__invoke<void (doris::ThreadPool::*&)(), doris::ThreadPool*&>(void (doris::ThreadPool::*&)(), doris::ThreadPool*&) at /var/local/ldb_toolchain/bin/../lib/gcc/x86_64-linux-gnu/11/../../../../include/c++/11/bits/invoke.h:96
53# void std::_Bind<void (doris::ThreadPool::*(doris::ThreadPool*))()>::__call<void, , 0ul>(std::tuple<>&&, std::_Index_tuple<0ul>) at /var/local/ldb_toolchain/bin/../lib/gcc/x86_64-linux-gnu/11/../../../../include/c++/11/functional:420
54# void std::_Bind<void (doris::ThreadPool::*(doris::ThreadPool*))()>::operator()<, void>() at /var/local/ldb_toolchain/bin/../lib/gcc/x86_64-linux-gnu/11/../../../../include/c++/11/functional:503
55# void std::__invoke_impl<void, std::_Bind<void (doris::ThreadPool::*(doris::ThreadPool*))()>&>(std::__invoke_other, std::_Bind<void (doris::ThreadPool::*(doris::ThreadPool*))()>&) at /var/local/ldb_toolchain/bin/../lib/gcc/x86_64-linux-gnu/11/../../../../include/c++/11/bits/invoke.h:61
56# std::enable_if<is_invocable_r_v<void, std::_Bind<void (doris::ThreadPool::*(doris::ThreadPool*))()>&>, void>::type std::__invoke_r<void, std::_Bind<void (doris::ThreadPool::*(doris::ThreadPool*))()>&>(std::_Bind<void (doris::ThreadPool::*(doris::ThreadPool*))()>&) at /var/local/ldb_toolchain/bin/../lib/gcc/x86_64-linux-gnu/11/../../../../include/c++/11/bits/invoke.h:117
57# std::_Function_handler<void (), std::_Bind<void (doris::ThreadPool::*(doris::ThreadPool*))()> >::_M_invoke(std::_Any_data const&) at /var/local/ldb_toolchain/bin/../lib/gcc/x86_64-linux-gnu/11/../../../../include/c++/11/bits/std_function.h:291
58# std::function<void ()>::operator()() const at /var/local/ldb_toolchain/bin/../lib/gcc/x86_64-linux-gnu/11/../../../../include/c++/11/bits/std_function.h:560
59# doris::Thread::supervise_thread(void*) at /home/zcp/repo_center/doris_branch-2.1/doris/be/src/util/thread.cpp:498
60# start_thread at /build/glibc-SzIz7B/glibc-2.31/nptl/pthread_create.c:478
61# __clone at ../sysdeps/unix/sysv/linux/x86_64/clone.S:97
```
<!--Describe your changes.-->
2024-06-15 13:32:42 +08:00
3b23eee37c Revert "[fix](auto-partition) fix auto partition load lost data in multi sender (#35287)" (#36098)
Reverts apache/doris#35630 because it brought some more damaging bugs.
we will fix it and merge in next version
2024-06-11 17:11:42 +08:00
75a6f28f2e [cherry-pick]Add query type when report (#35918)
pick #34978
2024-06-11 10:51:59 +08:00
Pxl
4fc3d0ce2c [Chore](pipeline) set PipelineFragmentContext::_timeout and adjust dump_pipeline_tasks infomation display (#36023)
## Proposed changes
pick from #35328
2024-06-07 15:36:29 +08:00
af779f5cd8 Pick "[fix](gclog) Skip tablet dir without schema hash dir in path gc (#32793)" (#35978)
## Proposed changes
Pick "[fix](gclog) Skip tablet dir without schema hash dir in path gc
(#32793)"
2024-06-06 22:24:30 +08:00
5cecbfc6ea [cherry-pick]Add workload metric query_be_memory (#35911) 2024-06-06 14:33:30 +08:00
8df1a3c849 [Bug](load) fix s3 load not display the progress info (#35719)
## Proposed changes
should display the load progress info, so the user could know it loading
step.
```
         JobId: 49088
         Label: rpt_10002184_syqzzywqkb10
         State: FINISHED
      Progress: 100.00% (10/10)
```


<!--Describe your changes.-->
2024-06-01 11:24:54 +08:00
b864aa7aa2 [fix](pipeline) Fix query hang up if limited rows is reached (#35513) (#35746)
Follow-up for #35466.

We should assure closed tasks will not block other tasks.

## Proposed changes

Issue Number: close #xxx

<!--Describe your changes.-->
2024-05-31 22:50:57 +08:00
c2fc485327 [fix](auto-partition) fix auto partition load lost data in multi sender (#35287) (#35630)
## Proposed changes

Change `use_cnt` mechanism for incremental (auto partition) channels and
streams, it's now dynamically counted.
Use `close_wait()` of regular partitions as a synchronize point to make
sure all sinks are in close phase before closing any incremental (auto
partition) channels and streams.
Add dummy (fake) partition and tablet if there is no regular partition
in the auto partition table.

Backport #35287

Co-authored-by: zhaochangle <zhaochangle@selectdb.com>
2024-05-31 10:27:03 +08:00
300582f2e5 [branch-2.1](routine-load) fix be core when partial table load failed (#35622) 2024-05-30 09:35:36 +08:00
680be6d19f [fix](ub) fix uninitialized accesses in BE (#35370)
ubsan hints:
```c++
/root/doris/be/src/olap/hll.h:93:29: runtime error: load of value 3078029312, which is not a valid value for type 'HllDataType'
/root/doris/be/src/olap/hll.h:94:23: runtime error: load of value 3078029312, which is not a valid value for type 'HllDataType'
/root/doris/be/src/runtime/descriptors.h:439:38: runtime error: load of value 118, which is not a valid value for type 'bool'
/root/doris/be/src/vec/exec/vjdbc_connector.cpp:61:50: runtime error: load of value 35, which is not a valid value for type 'bool' 
```
2024-05-29 20:31:07 +08:00
4294b7360e Revert "Revert "[fix](memory) Fix nested scoped tracker and nested reserve memory (#35257)""
This reverts commit 95393b531d340a865bfd2711ea77d39a04e61993.
2024-05-29 20:16:16 +08:00
b91d2caab8 [Feature](iceberg-writer) Implements iceberg sink basic functionality for inserting into table. (#35587)
backport #34929
2024-05-29 16:40:54 +08:00
589518ff72 [fix](Nereids) fix Illegal aggregate node: group by and output is empty (#35497)
fix Illegal aggregate node: group by and output is empty.
introduced by #33091
2024-05-29 15:01:47 +08:00
b06794d619 [opt](spill) add session variable of 'enable_force_spill' (#34664) (#35561)
## Proposed changes

pick #34664

<!--Describe your changes.-->
2024-05-29 09:57:31 +08:00
95393b531d Revert "[fix](memory) Fix nested scoped tracker and nested reserve memory (#35257)"
This reverts commit f8fcd17f33deab0605c9378850a21714293ef1b5.
2024-05-28 23:14:19 +08:00
5c40e87667 [opt](s3) auto retry when meeting 429 error (#35397)
- Add 2 new BE config

	- `s3_read_base_wait_time_ms` and `s3_read_max_wait_time_ms`

		When meet s3 429 error, the "get" request will
		sleep `s3_read_base_wait_time_ms (*1, *2, *3, *4)` ms get try again.
		The max sleep time is s3_read_max_wait_time_ms
		and the max retry time is max_s3_client_retry
		
- Add more metrics for s3 file reader

	- `s3_file_reader_too_many_request`: counter of 429 error.
	- `s3_file_reader_s3_get_request`: the QPS of s3 get request.

	- `TotalGetRequest`: Get request counter in profile
	- `TooManyRequestErr`: 429 error counter in profile
	- `TooManyRequestSleepTime`: Sum of sleep time after 429 error in profile
	- `TotalBytesRead`: Total bytes read from s3 in profile
2024-05-28 23:00:31 +08:00