doris

Author	SHA1	Message	Date
924060929	589518ff72	[fix](Nereids) fix Illegal aggregate node: group by and output is empty (#35497 ) fix Illegal aggregate node: group by and output is empty. introduced by #33091	2024-05-29 15:01:47 +08:00
Mingyu Chen	3736d0af13	[Fix](hive-writer) Fix s3 file commiter not working. (#35502 ) (#35579 ) bp #35502 Co-authored-by: Qi Chen <kaka11.chen@gmail.com>	2024-05-29 12:14:42 +08:00
camby	746c6207fc	[fix](index) bitmap and bloomfilter index should not do light index change (#35225 )	2024-05-29 10:09:31 +08:00
TengJianPing	b06794d619	[opt](spill) add session variable of 'enable_force_spill' (#34664 ) (#35561 ) ## Proposed changes pick #34664 <!--Describe your changes.-->	2024-05-29 09:57:31 +08:00
yiguolei	95393b531d	Revert "[fix](memory) Fix nested scoped tracker and nested reserve memory (#35257 )" This reverts commit f8fcd17f33deab0605c9378850a21714293ef1b5.	2024-05-28 23:14:19 +08:00
Mingyu Chen	5c40e87667	[opt](s3) auto retry when meeting 429 error (#35397 ) - Add 2 new BE config - `s3_read_base_wait_time_ms` and `s3_read_max_wait_time_ms` When meet s3 429 error, the "get" request will sleep `s3_read_base_wait_time_ms (1, 2, 3, 4)` ms get try again. The max sleep time is s3_read_max_wait_time_ms and the max retry time is max_s3_client_retry - Add more metrics for s3 file reader - `s3_file_reader_too_many_request`: counter of 429 error. - `s3_file_reader_s3_get_request`: the QPS of s3 get request. - `TotalGetRequest`: Get request counter in profile - `TooManyRequestErr`: 429 error counter in profile - `TooManyRequestSleepTime`: Sum of sleep time after 429 error in profile - `TotalBytesRead`: Total bytes read from s3 in profile	2024-05-28 23:00:31 +08:00
huanghaibin	1fab4b63ec	[fix](group commit) should set wal id in runtime_state when building pipeline task (#35552 ) pick from master #35445	2024-05-28 20:17:29 +08:00
Pxl	aacc3bb993	[Bug](runtime-filter) do not process rf on HashJoinBuildSinkLocalState::close when query ca… (#35487 ) do not process rf on HashJoinBuildSinkLocalState::close when query ```cpp * Query id: ee97f0c64a76436b-babc251c7d6702fb * * is nereids: 1 * * tablet id: 0 * * Aborted at 1716780426 (unix time) try "date -d @1716780426" if you are using GNU date * * Current BE git commitID: 813074b * * SIGSEGV address not mapped to object (@0x0) received by PID 12924 (TID 15847 OR 0x7efbe5aa5700) from PID 0; stack trace: * 0# doris::signal::(anonymous namespace)::FailureSignalHandler(int, siginfo_t, void) at /root/doris/be/src/common/signal_handler.h:421 1# PosixSignals::chained_handler(int, siginfo_t, void) [clone .part.0] in /usr/lib/jvm/java-17-openjdk-amd64/lib/server/libjvm.so 2# JVM_handle_linux_signal in /usr/lib/jvm/java-17-openjdk-amd64/lib/server/libjvm.so 3# 0x00007F064FF1C090 in /lib/x86_64-linux-gnu/libc.so.6 4# doris::BloomFilterFuncBase::merge(doris::BloomFilterFuncBase) at /root/doris/be/src/exprs/bloom_filter_func.h:169 5# doris::RuntimePredicateWrapper::merge(doris::RuntimePredicateWrapper const) at /root/doris/be/src/exprs/runtime_filter.cpp:507 6# doris::IRuntimeFilter::merge_from(doris::RuntimePredicateWrapper const) at /root/doris/be/src/exprs/runtime_filter.cpp:1497 7# doris::IRuntimeFilter::publish(bool)::$_2::operator()() const in /home/work/unlimit_teamcity/TeamCity/Agents/20240527104837agent_172.16.0.93_1/work/60183217f6ee2a9c/output/be/lib/doris_be 8# doris::IRuntimeFilter::publish(bool) at /root/doris/be/src/exprs/runtime_filter.cpp:1015 9# doris::VRuntimeFilterSlots::publish(bool) at /root/doris/be/src/exprs/runtime_filter_slots.h:137 10# doris::pipeline::HashJoinBuildSinkLocalState::close(doris::RuntimeState, doris::Status) in /home/work/unlimit_teamcity/TeamCity/Agents/20240527104837agent_172.16.0.93_1/work/60183217f6ee2a9c/output/be/lib/doris_be 11# doris::pipeline::DataSinkOperatorXBase::close(doris::RuntimeState, doris::Status) at /root/doris/be/src/pipeline/exec/operator.h:491 12# doris::pipeline::PipelineTask::close(doris::Status) at /root/doris/be/src/pipeline/pipeline_task.cpp:436 13# doris::pipeline::_close_task(doris::pipeline::PipelineTask, doris::Status) at /root/doris/be/src/pipeline/task_scheduler.cpp:88 14# doris::pipeline::TaskScheduler::_do_work(unsigned long) in /home/work/unlimit_teamcity/TeamCity/Agents/20240527104837agent_172.16.0.93_1/work/60183217f6ee2a9c/output/be/lib/doris_be 15# doris::ThreadPool::dispatch_thread() at /root/doris/be/src/util/threadpool.cpp:551 16# doris::Thread::supervise_thread(void*) at /root/doris/be/src/util/thread.cpp:499 17# start_thread at /build/glibc-SzIz7B/glibc-2.31/nptl/pthread_create.c:478 18# __clone at ../sysdeps/unix/sysv/linux/x86_64/clone.S:97 ```	2024-05-28 18:55:31 +08:00
TengJianPing	eefeb4d80c	[fix](spill) fix wrong disk usage of spill (#35423 ) ## Proposed changes Issue Number: close #xxx <!--Describe your changes.--> ## Further comments If this is a relatively large or complex change, kick off the discussion at [dev@doris.apache.org](mailto:dev@doris.apache.org) by explaining why you chose the solution you did and what alternatives you considered, etc...	2024-05-28 18:53:55 +08:00
Sun Chenyang	2e1318b8a0	[fix] (compaction) fix CompactionPermitLimiter causing compaction to stall (#35078 ) ## BUG 1. config::total_permits_for_compaction_score = 20000 2. Thread-B requests permits 11000， used_permits = 11000 3. Thread-A requests permits 12000，wait for used_permits + 12000 <= 20000 4. adjust config::total_permits_for_compaction_score = 10000 5. Thread-B releases permits，used_permits = 0，notify Thread-A，used_permits + 12000 <= 10000 ## FIx we need to initialize total_permits instead of using the config	2024-05-28 18:52:34 +08:00
Mingyu Chen	86c7092f21	[opt](external) ignore not find files (#35319 ) The file list is got from external meta cache, and the file may already be removed from storage. We should ignore not found files and that query continue.	2024-05-28 18:51:56 +08:00
HappenLee	d97788dec8	[Refactor](Status) Refactor the scanner scheduler code make return error msg means (#35286 ) ## Proposed changes Before error msg： ``` Failed to submit scanner to scanner pool ``` After error msg: ``` Failed to submit scanner to scanner pool reason:Scan thread pool had shutdown\|type 1 ```	2024-05-28 18:49:55 +08:00
yiguolei	70106067ab	Revert "[fix](group commit) should set wal id in runtime_state when building pipeline task (#35506 )" This reverts commit 9f6d82672f5d445822f0a2d5b13a6c9ffdcca13a.	2024-05-28 18:22:20 +08:00
Qi Chen	84e9a14063	[Fix](hive-writer) Fix partition column orders issue when the partition fields inserted into the target table are inconsistent with the field order of the query source table and the schema field order of the query source table. (#35543 ) ## Proposed changes backport #35347 ## Further comments If this is a relatively large or complex change, kick off the discussion at [dev@doris.apache.org](mailto:dev@doris.apache.org) by explaining why you chose the solution you did and what alternatives you considered, etc...	2024-05-28 18:11:55 +08:00
huanghaibin	9f6d82672f	[fix](group commit) should set wal id in runtime_state when building pipeline task (#35506 ) pick from master #35445 ## Proposed changes Issue Number: close #xxx <!--Describe your changes.--> ## Further comments If this is a relatively large or complex change, kick off the discussion at [dev@doris.apache.org](mailto:dev@doris.apache.org) by explaining why you chose the solution you did and what alternatives you considered, etc...	2024-05-28 17:48:10 +08:00
Luwei	43890ffd3a	[fix](compaction) fix repeatedly picking tablets with disable auto compaction (#35472 ) (#35505 ) pick master #35472	2024-05-28 15:57:54 +08:00
Jerry Hu	96a4159f73	[opt](scan) Use lazy-init for segment iterators and avoid caching all segments in the rowset reader (#35432 ) ## Proposed changes Issue Number: close #xxx <!--Describe your changes.--> ## Further comments If this is a relatively large or complex change, kick off the discussion at [dev@doris.apache.org](mailto:dev@doris.apache.org) by explaining why you chose the solution you did and what alternatives you considered, etc...	2024-05-28 13:19:18 +08:00
yiguolei	4e7e8d700f	[enhancement](atomicstatus) use lock to make the status object more stable (#35476 ) 1. In the past, if error code is not ok and then get status, the status maybe ok. some dcheck maybe failed. In this PR use std mutex to make this behavior stable. If this is a relatively large or complex change, kick off the discussion at [dev@doris.apache.org](mailto:dev@doris.apache.org) by explaining why you chose the solution you did and what alternatives you considered, etc... --------- Co-authored-by: yiguolei <yiguolei@gmail.com>	2024-05-28 13:18:42 +08:00
Mryange	97a5f55a37	[fix](function) bitmap to base64 error length check (#35117 )	2024-05-28 13:17:16 +08:00
Gabriel	2310915c26	[fix](pipeline) Fix query hang if limited rows is reached (#35466 ) ## Proposed changes Some operators has limit condition, the source operator should notify the sink operator that limit reached. Although FE has limit logic but it not always send . ## Further comments If this is a relatively large or complex change, kick off the discussion at [dev@doris.apache.org](mailto:dev@doris.apache.org) by explaining why you chose the solution you did and what alternatives you considered, etc...	2024-05-28 13:15:31 +08:00
Xinyi Zou	f8fcd17f33	[fix](memory) Fix nested scoped tracker and nested reserve memory (#35257 ) SCOPED_ATTACH_TASK cannot be nested, but SCOPED_SWITCH_THREAD_MEM_TRACKER_LIMITER can continue to be called, so attach_limiter_tracker may be nested.	2024-05-28 13:12:03 +08:00
daidai	9d6b2d66ca	[feature](metrics)support be jvm metrics. (#35023 ) support be jvm metrics. if you `curl http://be_host:webserver_port/metrics` , you will get : ``` doris_be_jvm_heap_size_bytes{type="max"} 8589934592 doris_be_jvm_heap_size_bytes{type="committed"} 8589934592 doris_be_jvm_heap_size_bytes{type="used"} 364159504 doris_be_jvm_non_heap_size_bytes{type="committed"} 117899264 doris_be_jvm_non_heap_size_bytes{type="used"} 115330424 doris_be_jvm_young_size_bytes{type="used"} 255852544 doris_be_jvm_young_size_bytes{type="peak_used"} 255852544 doris_be_jvm_young_size_bytes{type="max"} 8589934592 doris_be_jvm_old_size_bytes{type="used"} 94393344 doris_be_jvm_old_size_bytes{type="peak_used"} 94393344 doris_be_jvm_old_size_bytes{type="max"} 8589934592 doris_be_jvm_gc{name="G1 Young Generation Count", type="count"} 3 doris_be_jvm_gc{name="G1 Young Generation Time", type="time"} 33 doris_be_jvm_gc{name="G1 Old Generation Count", type="count"} 0 doris_be_jvm_gc{name="G1 Old Generation Time", type="time"} 0 doris_be_jvm_thread{type="count"} 147 doris_be_jvm_thread{type="peak_count"} 147 doris_be_jvm_thread{type="new_count"} 0 doris_be_jvm_thread{type="runnable_count"} 25 doris_be_jvm_thread{type="blocked_count"} 0 doris_be_jvm_thread{type="waiting_count"} 48 doris_be_jvm_thread{type="timed_waiting_count"} 74 doris_be_jvm_thread{type="terminated_count"} 0 ```	2024-05-28 13:12:03 +08:00
airborne12	79cd726132	[Fix](inverted index) fix race condition in index build (#35427 ) Fix race condition problem introduced by #35366 , which will cause heap-use-after-free	2024-05-28 13:12:03 +08:00
TengJianPing	d8eefd0be8	[fix] fix wrong result of spill agg with limit (#35403 )	2024-05-28 13:12:03 +08:00
Kaijie Chen	7058b31edd	[fix](move-memtable) clear load streams before shutdown SegmentFileWriterThreadPool (#35217 )	2024-05-28 13:12:03 +08:00
Yulei-Yang	238e218312	[fix](httpapi) restore compaction/run_status api can show be's overall compaction status and refactor code (#35409 )	2024-05-28 09:43:43 +08:00
zhangstar333	596fb6f327	[improve](ub) fix some runtime error of ubsan when downcast (#35343 ) those code could work well, but it will be report some runtime error under UBSAN, so refactor it to let's ubsan could running happy.	2024-05-27 15:27:43 +08:00
wangbo	c44affb43f	Add downgrade scan thread num by column num (#35351 )	2024-05-27 15:27:12 +08:00
Qi Chen	68eda58a8c	[Fix](multi-catalog) Fix string dict filtering when use null related function in parquet and orc reader. (#35335 ) The following sql and when the dictionary column contains functions related to null, the results will be incorrect. ``` select * from ( select IF(o_orderpriority IS NULL, 'null', o_orderpriority) AS o_orderpriority from test_string_dict_filter_orc ) as A where o_orderpriority = 'null'; ``` ``` select * from ( select IFNULL(o_orderpriority, 'null') AS o_orderpriority from test_string_dict_filter_parquet ) as A where o_orderpriority = 'null' ``` ``` select * from ( select COALESCE(o_orderpriority, 'null') AS o_orderpriority from test_string_dict_filter_parquet ) as A where o_orderpriority = 'null'; ```	2024-05-27 15:25:29 +08:00
Qi Chen	7284b6959f	[Configurations](multi-catalog)Fix enable_orc_filter_by_min_max functionality, the mistake for #35012 . (#35320 ) fix bug introduced from #35012	2024-05-27 15:25:07 +08:00
Qi Chen	09f9012817	[Fix](hive-writer) Fix hive partition update core. (#35311 ) Issue: #31442 ``` /home/zcp/repo_center/doris_master/doris/be/src/common/signal_handler.h:421 1# PosixSignals::chained_handler(int, siginfo, void) [clone .part.0] in /usr/lib/jvm/java-17-openjdk-amd64/lib/server/libjvm.so 2# JVM_handle_linux_signal in /usr/lib/jvm/java-17-openjdk-amd64/lib/server/libjvm.so 3# 0x00007F963FA9D090 in /lib/x86_64-linux-gnu/libc.so.6 4# doris::vectorized::VHivePartitionWriter::_build_partition_update() at /home/zcp/repo_center/doris_master/doris/be/src/vec/sink/writer/vhive_partition_writer.cpp:215 5# doris::vectorized::VHivePartitionWriter::close(doris::Status const&) at /home/zcp/repo_center/doris_master/doris/be/src/vec/sink/writer/vhive_partition_writer.cpp:164 6# doris::vectorized::VHiveTableWriter::close(doris::Status) at /home/zcp/repo_center/doris_master/doris/be/src/vec/sink/writer/vhive_table_writer.cpp:209 7# doris::vectorized::AsyncResultWriter::process_block(doris::RuntimeState, doris::RuntimeProfile) at /home/zcp/repo_center/doris_master/doris/be/src/vec/sink/writer/async_result_writer.cpp:184 8# doris::vectorized::AsyncResultWriter::start_writer(doris::RuntimeState, doris::RuntimeProfile)::$_0::operator()() const at ```	2024-05-27 15:24:53 +08:00
airborne12	5ab5ec3d0d	[Fix](inverted index) fix build index wrong size for inverted index (#35366 )	2024-05-27 15:24:17 +08:00
zy-kkk	2e20e38523	[improvement](jdbc catalog) remove useless jdbc catalog code (#34986 ) (#35418 )	2024-05-27 14:25:26 +08:00
Xinyi Zou	b6eaf95720	[fix](memory) Fix BE memory info compatible with Cgroup (#35412 ) (#35425 ) 1. `memory.usage_in_bytes ~= free.used + free.(buff/cache) - (buff)`, free cache can be reused, so, modify cgroup_memory_usage = memory.usage_in_bytes - memory.meminfo["Cached"]. 2. If system not configured with cgroup, find cgroup file path will failed, refactor refresh cgroup memory info, compatible with find failed.	2024-05-27 12:31:44 +08:00
yiguolei	8f5deb10be	[be](oom) add stacktrace in debugmode to find oom reason	2024-05-26 23:39:46 +08:00
Gabriel	ade1841a01	[fix](shuffle) Do not return error if local recvr is null (#35399 )	2024-05-26 20:20:50 +08:00
Sun Chenyang	6e17dc1e87	(cherry-pick)[branch-2.1] add calc tablet file crc and fix single compaction test #33076 #34915 (#35215 ) * [fix](compaction test) show single replica compaction status and fix test (#33076) * [improve](http action) add http interface to calculate the crc of all files in tablet (#34915)	2024-05-26 17:15:09 +08:00
TengJianPing	65b9e5ab69	[fix](chore) fix DCHECK failure of BufferWritable if failed to alloc memory (#35345 )	2024-05-25 17:48:04 +08:00
Pxl	b143f0dfe2	[Improvement](date) shortcut for str to date parse (#35288 ) shortcut for str to date parse	2024-05-25 17:47:20 +08:00
zhiqiang	34e5030702	[bugifx](core) fix logical error of status check in nestedloop join (#35365 )	2024-05-25 17:46:44 +08:00
HHoflittlefish777	c6c90ff63e	[chore](routine-load) make routine_load_consumer_pool_size can update using HTTP API (#35315 )	2024-05-25 17:46:29 +08:00
yiguolei	5bcdc75283	fix compile	2024-05-25 09:00:48 +08:00
Yongqiang YANG	0f550aeda7	[fix](compression) handle exception to reuse compression context (#35338 ) (#35380 ) * [fix](compression) handle exception to reuse compression context Otherwise, there is memleak and new context is allocated, then flush tlb consumes a lot sys cpu.	2024-05-24 19:56:27 +08:00
lihangyu	c4b2ddd688	[Fix](Variant) clear block after a flush complete (#35226 ) (#35372 ) Otherwise result in crash ``` * SIGSEGV address not mapped to object (@0x0) received by PID 4149909 (TID 4152328 OR 0x7efefc60d700) from PID 0; stack trace: * 0# doris::signal::(anonymous namespace)::FailureSignalHandler(int, siginfo_t, void) at /home/zcp/repo_center/doris_master/doris/be/src/common/signal_handler.h:421 1# PosixSignals::chained_handler(int, siginfo, void) [clone .part.0] in /usr/lib/jvm/java-17-openjdk-amd64/lib/server/libjvm.so 2# JVM_handle_linux_signal in /usr/lib/jvm/java-17-openjdk-amd64/lib/server/libjvm.so 3# 0x00007F031AD0E090 in /lib/x86_64-linux-gnu/libc.so.6 4# doris::Status doris::vectorized::MutableBlock::merge_impl<doris::vectorized::Block const&>(doris::vectorized::Block const&) at /home/zcp/repo_center/doris_master/doris/be/src/vec/core/block.h:586 5# doris::Status doris::vectorized::MutableBlock::merge<doris::vectorized::Block const&>(doris::vectorized::Block const&) at /home/zcp/repo_center/doris_master/doris/be/src/vec/core/block.h:521 ```	2024-05-24 19:10:07 +08:00
Yongqiang YANG	41f29cf4cd	[fix](decompress)(review) context leaked in failure path (#33622 ) (#35364 ) * [fix](decompress)(review) context leaked in failure path * [fix](decompress)(review) context leaked in failure path review fix Co-authored-by: Vallish Pai <vallishpai@gmail.com>	2024-05-24 17:40:13 +08:00
TengJianPing	639c7ee7fb	[fix](decimalv2) fix scale of decimalv2 to string (#35222 ) (#35359 ) * [fix](decimalv2) fix scale of decimalv2 to string	2024-05-24 17:20:43 +08:00
yiguolei	4b91ad003f	[opt](memory) avoid allocate memory in agg operator constructor (#35301 ) Co-authored-by: yiguolei <yiguolei@gmail.com>	2024-05-24 16:23:58 +08:00
abmdocrt	309503855e	[Fix](bloom filter) Fix bloom filter memory leak (#34871 ) * Issue: Doris occasionally encounters an issue where memory usage becomes exceptionally high and does not decrease. The leaked memory is occupied by Bloom filters stored in memory. Reason: The segment cache stores segment objects read from files into memory. It functions as an LRU cache with an eviction strategy: when the number of segments exceeds the maximum number, or the total memory size of segment objects in the cache exceeds the maximum usage, it evicts the older segments. However, there is a piece of logic in the code that first reads the segment object into memory, assuming it occupies memory size A, then places the read segment object into the cache (at this point, the cache considers the segment object size to be A). It then reads the segment's Bloom filter from the file and assigns it to the segment's Bloom filter member variable, assuming the Bloom filter occupies memory size B. Thus, the total size of the segment object at this point is A+B. However, the cache does not update this size, leading to the actual size of the segment object stored in the cache (A+B) being larger than the size considered by the cache (A). When the number of segment objects in the cache increases to a certain extent, the used memory will surge dramatically. However, the cache does not perceive the size as reaching the eviction limit, so it does not evict the segment objects. In such cases, a memory leak issue arises. Solution: Since each segment object only reads the Bloom filter once, the issue can be resolved by changing the logic from reading the segment, placing it into the cache, and then reading the Bloom filter to reading the segment, reading the Bloom filter, and then placing it into the cache.	2024-05-24 16:23:58 +08:00
zhiqiang	682d72bf4d	[fix](noexcept) Remove incorrect noexcept #35230	2024-05-24 16:23:58 +08:00
qiye	4b7608c2bf	[fix](inverted index)Change index_id from int32 to int64 to avoid overflow (#35206 ) Co-authored-by: Luennng <luennng@gmail.com>	2024-05-23 19:12:55 +08:00

1 2 3 4 5 ...

7754 Commits