Commit Graph

19059 Commits

Author SHA1 Message Date
e68834158c [fix](inverted index)Support Chinese column name with inverted index #36321 (#36374)
1. `std::string` to `std::wstring` conversion only supports ASCII
characters. For non-ASCII characters, we need to use
`StringUtil::string_to_wstring`
2. Fix index_tool check_terms_stats_v2 and add field info to print

pick from master #36321
2024-06-17 19:41:09 +08:00
4008a04da7 [bugfix](paimon)Fix field case issues for 2.1 (#36288)
bp:  #36239
2024-06-17 18:38:00 +08:00
b6aa17ae32 [fix](statistics)Fix stats analyze p0 case. (#36251) (#36364)
Fix stats p0 case. Def a variable before using it.
backport https://github.com/apache/doris/pull/36251
2024-06-17 13:04:51 +08:00
612f2ae961 [feature](api) add BE HTTP /api/load_streams (#36312) (#36338)
cherry-pick #36312
2024-06-16 22:09:04 +08:00
6bb670ab38 [metrics](bvar) add bvar for load stream and file writer count (#36300) (#36336)
cherry-pick #36300
2024-06-16 10:14:59 +08:00
98fccb1809 [improvement](build index)Make build index and clone mutually exclusive and add timeout for index change job (#36293)
Currently the index change job and clone task can be executed at the
same time. If the clone task gets stuck at this point, it will cause the
index change job to get stuck as well and keep retrying. To solve this
problem, we can refer to alter job and make index change job exclusive
with clone task, and introduce the timeout to prevent infinite retries
of build index.

Add the following checks and status in FE.
1. Check if table is stable (build index is not allowed when clone is in
progress)
1.1. Tablet is HEALTHY.
1.2. Whether the tablet is included in the Tablet scheduler, if so, it
means the current tablet is doing clone.
2. When creating the index change job, set the timeout at the same time.

pick from master #35724
2024-06-16 09:34:32 +08:00
55b4cf1658 [fix](load) fix NPE in LoadManager#jobRemovedTrigger() (#36173) (#36337)
cherry-pick #36173
2024-06-15 23:06:31 +08:00
bfab7a2537 [fix](shuffle) fix tablets num calculation in shuffle condition (#36050) (#36339)
cherry-pick #36050
2024-06-15 23:06:00 +08:00
7051431671 [branch-2.1](memory) fix query thread attach memory tracker (#36245)
## Proposed changes

fix dcheck
```
*** Check failure stack trace: ***
F20240613 12:33:01.700206 1467887 thread_context.h:204] Check failed: doris::k_doris_exit || !doris::config::enable_memory_orphan_check || thread_mem_tracker()->label() != "Orphan" If you crash here, it means that SCOPED_ATTACH_TASK and SCOPED_SWITCH_THREAD_MEM_TRACKER_LIMITER are not used correctly. starting position of each thread is expected to use SCOPED_ATTACH_TASK to bind a MemTrackerLimiter belonging to Query/Load/Compaction/Other Tasks, otherwise memory alloc using Doris Allocator in the thread will crash. If you want to switch MemTrackerLimiter during thread execution, please use SCOPED_SWITCH_THREAD_MEM_TRACKER_LIMITER, do not repeat Attach. Of course, you can modify enable_memory_orphan_check=false in be.conf to avoid this crash.

44# doris::FragmentMgr::exec_plan_fragment(doris::TPipelineFragmentParams const&, std::function<void (doris::RuntimeState*, doris::Status*)> const&)::$_0::operator()() const at /home/zcp/repo_center/doris_branch-2.1/doris/be/src/runtime/fragment_mgr.cpp:981
45# void std::__invoke_impl<void, doris::FragmentMgr::exec_plan_fragment(doris::TPipelineFragmentParams const&, std::function<void (doris::RuntimeState*, doris::Status*)> const&)::$_0&>(std::__invoke_other, doris::FragmentMgr::exec_plan_fragment(doris::TPipelineFragmentParams const&, std::function<void (doris::RuntimeState*, doris::Status*)> const&)::$_0&) at /var/local/ldb_toolchain/bin/../lib/gcc/x86_64-linux-gnu/11/../../../../include/c++/11/bits/invoke.h:61
46# std::enable_if<is_invocable_r_v<void, doris::FragmentMgr::exec_plan_fragment(doris::TPipelineFragmentParams const&, std::function<void (doris::RuntimeState*, doris::Status*)> const&)::$_0&>, void>::type std::__invoke_r<void, doris::FragmentMgr::exec_plan_fragment(doris::TPipelineFragmentParams const&, std::function<void (doris::RuntimeState*, doris::Status*)> const&)::$_0&>(doris::FragmentMgr::exec_plan_fragment(doris::TPipelineFragmentParams const&, std::function<void (doris::RuntimeState*, doris::Status*)> const&)::$_0&) at /var/local/ldb_toolchain/bin/../lib/gcc/x86_64-linux-gnu/11/../../../../include/c++/11/bits/invoke.h:117
47# std::_Function_handler<void (), doris::FragmentMgr::exec_plan_fragment(doris::TPipelineFragmentParams const&, std::function<void (doris::RuntimeState*, doris::Status*)> const&)::$_0>::_M_invoke(std::_Any_data const&) at /var/local/ldb_toolchain/bin/../lib/gcc/x86_64-linux-gnu/11/../../../../include/c++/11/bits/std_function.h:291
48# std::function<void ()>::operator()() const at /var/local/ldb_toolchain/bin/../lib/gcc/x86_64-linux-gnu/11/../../../../include/c++/11/bits/std_function.h:560
49# doris::FunctionRunnable::run() at /home/zcp/repo_center/doris_branch-2.1/doris/be/src/util/threadpool.cpp:48
50# doris::ThreadPool::dispatch_thread() at /home/zcp/repo_center/doris_branch-2.1/doris/be/src/util/threadpool.cpp:543
51# void std::__invoke_impl<void, void (doris::ThreadPool::*&)(), doris::ThreadPool*&>(std::__invoke_memfun_deref, void (doris::ThreadPool::*&)(), doris::ThreadPool*&) at /var/local/ldb_toolchain/bin/../lib/gcc/x86_64-linux-gnu/11/../../../../include/c++/11/bits/invoke.h:74
52# std::__invoke_result<void (doris::ThreadPool::*&)(), doris::ThreadPool*&>::type std::__invoke<void (doris::ThreadPool::*&)(), doris::ThreadPool*&>(void (doris::ThreadPool::*&)(), doris::ThreadPool*&) at /var/local/ldb_toolchain/bin/../lib/gcc/x86_64-linux-gnu/11/../../../../include/c++/11/bits/invoke.h:96
53# void std::_Bind<void (doris::ThreadPool::*(doris::ThreadPool*))()>::__call<void, , 0ul>(std::tuple<>&&, std::_Index_tuple<0ul>) at /var/local/ldb_toolchain/bin/../lib/gcc/x86_64-linux-gnu/11/../../../../include/c++/11/functional:420
54# void std::_Bind<void (doris::ThreadPool::*(doris::ThreadPool*))()>::operator()<, void>() at /var/local/ldb_toolchain/bin/../lib/gcc/x86_64-linux-gnu/11/../../../../include/c++/11/functional:503
55# void std::__invoke_impl<void, std::_Bind<void (doris::ThreadPool::*(doris::ThreadPool*))()>&>(std::__invoke_other, std::_Bind<void (doris::ThreadPool::*(doris::ThreadPool*))()>&) at /var/local/ldb_toolchain/bin/../lib/gcc/x86_64-linux-gnu/11/../../../../include/c++/11/bits/invoke.h:61
56# std::enable_if<is_invocable_r_v<void, std::_Bind<void (doris::ThreadPool::*(doris::ThreadPool*))()>&>, void>::type std::__invoke_r<void, std::_Bind<void (doris::ThreadPool::*(doris::ThreadPool*))()>&>(std::_Bind<void (doris::ThreadPool::*(doris::ThreadPool*))()>&) at /var/local/ldb_toolchain/bin/../lib/gcc/x86_64-linux-gnu/11/../../../../include/c++/11/bits/invoke.h:117
57# std::_Function_handler<void (), std::_Bind<void (doris::ThreadPool::*(doris::ThreadPool*))()> >::_M_invoke(std::_Any_data const&) at /var/local/ldb_toolchain/bin/../lib/gcc/x86_64-linux-gnu/11/../../../../include/c++/11/bits/std_function.h:291
58# std::function<void ()>::operator()() const at /var/local/ldb_toolchain/bin/../lib/gcc/x86_64-linux-gnu/11/../../../../include/c++/11/bits/std_function.h:560
59# doris::Thread::supervise_thread(void*) at /home/zcp/repo_center/doris_branch-2.1/doris/be/src/util/thread.cpp:498
60# start_thread at /build/glibc-SzIz7B/glibc-2.31/nptl/pthread_create.c:478
61# __clone at ../sysdeps/unix/sysv/linux/x86_64/clone.S:97
```
<!--Describe your changes.-->
2024-06-15 13:32:42 +08:00
Pxl
c2fa60cbe5 [Enchancement](scan) enable parallel scan when preagg is on (#36302)
## Proposed changes
pick from #35810
2024-06-14 23:44:41 +08:00
Pxl
db2721915e [Bug](runtime-filter) release dependency when rf rpc failed or meet error status (#36297)
pick from #36126
2024-06-14 23:44:08 +08:00
0e0142f3c7 [branch-2.1](test) fix wrong external p0 (#36284) 2024-06-14 23:34:51 +08:00
7c0ec4ea2e [fix](autobucket) fix autobucket config masterOnly=true #36116 (#36286)
cherry pick from #36116
2024-06-14 14:26:23 +08:00
bfb41c15de [fix](statistics)Fix sync analyze job timeout block bug. (#36199)
Fix sync analyze job timeout block bug. When a task of a analyze job
timeout, it should throw an exception instead of finish silently.
2024-06-14 09:47:51 +08:00
a23aee2883 [fix](broker) fix no error url when broker data quality error (#35643) (#36089)
## Proposed changes

cherry-pick from #35643
2024-06-14 09:29:14 +08:00
af1c4ecf89 Revert "[opt](inverted index) performance optimization for need_read_data in …" (#36260)
Reverts apache/doris#36192
2024-06-14 08:41:17 +08:00
f01039c224 [Pick 2.1](inverted index) fix memory leak of inverted index writer for array values #36208 (#36276)
fix inverted index writer's field leak
pick from  #36208
2024-06-14 08:39:55 +08:00
845dcce7f0 Revert "[opt](inverted index) performance optimization for need_read_data in …" (#36260)
Reverts apache/doris#36192
2024-06-13 21:31:20 +08:00
56ccb9a657 [fix](parquet) fix parquet reader missing column and filter missing column (#36182)
bp #36189
2024-06-13 21:30:05 +08:00
d8eac07178 [branch-2.1](test) fix external p0 unstable test (#36262)
Fix some unstable external p0 tests
2024-06-13 20:55:41 +08:00
e2f7e0da0a [Fix](nereids) fix merge aggregate rule, rules should not have mutable members (#36223)
cherry-pick #36145  to branch-2.1
2024-06-13 17:49:57 +08:00
d70751a808 [fix](planner)remove constant expr in window function's partition and order exprs (#36185)
pick from master https://github.com/apache/doris/pull/36184
2024-06-13 15:05:21 +08:00
e51cd58d6e [fix](clone) fix check replica failed due to replica had drop #35994 (#36219)
cherry pick from #35994
2024-06-13 13:39:09 +08:00
106a55497b [minor] better column name error description (#36154) (#36212)
bp #36154

Co-authored-by: Jensen <czjourney@163.com>
2024-06-13 11:51:33 +08:00
375770f2b4 [fix](hudi) move wrong members in HMSExternalTable (#36187)
Previously, there are 2 members: TableScanParams and IncrementalRelation
in HMSExternalTable.
These 2 members are for Hudi's incremental query, so their lifecycle
should be with query task,
should not be saved in HMSExternalTable.

This PR mainly changes:

- Add LogicalHudiScan and PhysicalHudiScan, extends from LogicalFileScan
and PhysicalFileScan.
- Move TableScanParams and IncrementalRelation from HMSExternalTable to
XXXHudiScan.
- Add or modify related Nereids rules
2024-06-13 11:50:40 +08:00
c84b56140c [Fix](outfile) Add a configuration for exporting data in Parquet format using select into outfile (#36143)
backport: #36142
2024-06-13 11:49:46 +08:00
226775f059 [Feature](Point Query) fully support in nereids #35823 (#36205) 2024-06-13 08:37:31 +08:00
3a3c8cd9ee [cherry-pick](branch-2.1) fix inverted index format is lost during a schema change #36059 (#36100) 2024-06-12 23:06:51 +08:00
cc7ab2b9fe [fix](inverted index)Delete tmp dirs when BE starts to avoid tmp files left by last crash #35951 (#36190)
When BE crashes, there may be tmp files left in the tmp dir, so we
remove and rebuild the tmp dir every time we start BE to prevent rubbish
data from occupying the disk.
2024-06-12 23:05:44 +08:00
04e62d9c42 [fix](invert index) ensure that the pred result sign of the inlist is in order #36085 (#36191) 2024-06-12 23:04:31 +08:00
6d54527395 [fix](dynamic partition) fix dynamic partition thread met uncatch exception #35778 (#36166)
cherry pick from #35778
2024-06-12 22:16:51 +08:00
f1e83f5656 [opt](inverted index) performance optimization for need_read_data in compound #35346 (#36192) 2024-06-12 20:02:00 +08:00
e1694e3d91 [Pick 2.1](inverted index) fix memory leak in inverted index writer for array values #36144 (#36165) 2024-06-12 19:59:57 +08:00
205bf73d4e [Pick 2.1](inverted index) low level log for fulltext query info #35820 (#36183) 2024-06-12 19:59:22 +08:00
9708ca8fcb [Feature](Prepared Statment) Implement in nereids planner (#35318) (#36172) 2024-06-12 19:54:17 +08:00
0b28420e1c [pick](Variant) make remote schema fetch rpc timeout configurable (#35296) (#36174) 2024-06-12 19:51:53 +08:00
d1eb917076 [fix](rpc) fix transfer large data and enable transfer_large_data_by_brpc by default #35770 (#36169)
cherry pick from #35770
2024-06-12 19:39:07 +08:00
ff517ab677 [opt](load) use notify to replace polling for FlushToken #35796 (#36170)
cherry pick from #35796
2024-06-12 19:37:27 +08:00
14ece32b87 [Pick 2.1](inverted index) add inverted index reader memory size into searcher cache (#36160)
Pick from #35149
2024-06-12 14:40:20 +08:00
73eda9bdb7 [fix](ci) external pipeline use regression-test/pipeline/external/conf/be.conf (#36139)
external pipeline use regression-test/pipeline/external/conf/be.conf instead of regression-test/pipeline/p0/conf/be.conf
relate to master #36132
Co-authored-by: stephen <hello-stephen@qq.com>
2024-06-12 11:40:16 +08:00
b75533e72b [branch-2.1](beut) fix BE UT (#36147)
only for branch-2.1
2024-06-12 08:21:38 +08:00
c78c7f6b45 [branch-2.1](test) fix some tests in external p0 (#36127)
Also move the analysis exception of "Not support insert with partition
spec in hive catalog."
from create sink phase to bind sink phase.
So that when `set enable_fallback_to_original_planner=false;`, the
return error will be correct.
2024-06-11 22:15:28 +08:00
acbfcf7ad9 [fix](Nereids) fix four phase aggregation compute wrong result (#36131)
cherry pick from #36128
2024-06-11 20:40:18 +08:00
d2a6911791 [opt](split) close the batch mode of file split in default (#36109)
bp: #36108
2024-06-11 19:19:09 +08:00
596a9a16d3 [chore](Compile) Fix segment cache ut's compile error due to miss cherry-pick (#36099) 2024-06-11 17:12:42 +08:00
3b23eee37c Revert "[fix](auto-partition) fix auto partition load lost data in multi sender (#35287)" (#36098)
Reverts apache/doris#35630 because it brought some more damaging bugs.
we will fix it and merge in next version
2024-06-11 17:11:42 +08:00
fce09ae2f6 [fix](third-party) enable keepalive on socket created by libevent (#36088)
pick #35805 #36026
2024-06-11 14:18:22 +08:00
e46ed37530 [fix](snappy) avoid potential buffer overflow (#35537) (#36094)
pick #35537

If skip more than once when available is zero, then a buffer overflow
occurs.


![photo-size-5-6244711526321733357-y](https://github.com/apache/doris/assets/98214048/b0bb9c79-df22-4582-8e7a-1a214e9b69bb)
2024-06-11 14:17:59 +08:00
0dccc4e6e4 [cherry-pick](branch-2.1)fix http error when downloading varaint inverted index file #35668 (#36061)
pick from master[#35668](https://github.com/apache/doris/pull/35668)
2024-06-11 14:09:05 +08:00
4a277affdc [fix](scan) In-predicate should not be pushed down for non-key column(#35913) (#35968)
pick #35913
2024-06-11 11:13:34 +08:00