Commit Graph

6556 Commits

Author SHA1 Message Date
e326ebb63e [feature](pipelineX) control exchange sink by memory usage (#28814) 2023-12-25 10:31:50 +08:00
d42fd68d6b [opt](invert index) Empty strings are not written to the index in the case of TOKENIZED (#28822) 2023-12-25 10:23:07 +08:00
b7ae7a07c7 [fix](join) incorrect result of left semi/anti join with empty build side (#28898) 2023-12-25 09:07:38 +08:00
bade50db56 [chore](test) Add testing util sync point (#28924) 2023-12-24 21:59:11 +08:00
145683ccdb [improvement](group commit) make get column function more reliable when replaying wal (#28900) 2023-12-24 21:17:39 +08:00
1545c36d16 Revert "[bugfix](scannercore) scanner will core in deconstructor during collect profile (#28727)" (#28931)
This reverts commit 4066de375efe6ff8e156a61df4f9316b3d9eaa4e.
2023-12-24 20:37:33 +08:00
db1da161f5 [optimize](zonemap) skip zonemap if predicate does not support_zonemap (#28595)
* [optimize](zonemap) skip zonemap if predicate does not support_zonemap #27608 (#28506)
2023-12-24 19:34:13 +08:00
dfbf082e06 [fix](merge-on-write) migration may cause duplicate keys for mow table (#28923) 2023-12-23 23:37:00 +08:00
96d4778f2e [fix](parquet) the end offset of column chunk may be wrong in parquet metadata (#28891) 2023-12-23 22:21:04 +08:00
de6c7a792e [fix](chore) update dcheck to avoid core during stress test (#28895) 2023-12-23 18:49:57 +08:00
2014396707 [fix](block) add block columns size dcheck (#28539) 2023-12-23 15:21:53 +08:00
e51f75e424 [FIX](map)fix map with rowstore table (#28877) 2023-12-23 12:11:06 +08:00
4066de375e [bugfix](scannercore) scanner will core in deconstructor during collect profile (#28727) 2023-12-23 11:09:46 +08:00
43776465d9 [fix](segcompaction) disable segcompaction by default (#28906)
Signed-off-by: freemandealer <freeman.zhang1992@gmail.com>
2023-12-23 07:43:41 +08:00
3b830f89a7 [improve](move-memtable) avoid using heavy work pool during append data (#28745) 2023-12-22 22:51:30 +08:00
f781f0cf24 [improve](load) limit delta writer flush task parallelism (#28883) 2023-12-22 21:50:56 +08:00
b1c5747f56 [improve](load) remove extra layer of heavy work pool in tablet_writer_add_block (#28550) 2023-12-22 20:10:50 +08:00
18c9ebce95 [improve](move-memtable) tweak load stream flush token num and max tasks (#28884) 2023-12-22 20:08:47 +08:00
fa0ad56817 [exec](compress) use FragmentTransmissionCompressionCodec control the exchange compress behavior (#28818) 2023-12-22 19:50:57 +08:00
3ed82bcee2 [Feature](inverted index) add lowercase option for inverted index analyzer (#28704) 2023-12-22 18:22:44 +08:00
9e0a2e861c [pipelineX](refactor) rename functions (#28846) 2023-12-22 17:24:39 +08:00
aca8406e31 [refactor](executor)remove scan group #28847 2023-12-22 17:05:50 +08:00
d75300f166 [fix](hash join) fix stack overflow caused by evaluate case expr on huge build block (#28851) 2023-12-22 15:45:12 +08:00
9b67c86219 [optimize](count) optimize pk exact query without reading data (#28494) 2023-12-22 14:18:15 +08:00
8c59e16f81 [opt](query cancel) optimization for query cancel #28778 2023-12-22 12:48:37 +08:00
012e66729a [improvement](executor) Add tvf and regression test for Workload Scheduler (#28733)
1 Add select workload schedule policy tvf
2 Add reg test
2023-12-22 12:09:51 +08:00
83e7235bab [fix](memory) Add thread asynchronous purge jemalloc dirty pages (#28655)
jemallctl purge all arena dirty pages may take several seconds, which will block memory GC and cause OOM.
So purge asynchronously in a thread.
2023-12-22 12:05:20 +08:00
453e3c18f4 [refactor](buffer) remove download buffer since it is no longer useful (#28832)
remove download buffer since it is no longer useful
2023-12-22 11:53:31 +08:00
0af6bd6390 [fix](group-commit) check if wal need recovery is abnormal (#28769) 2023-12-22 11:06:11 +08:00
172f68480b [Enhancement](load) Limit the number of incorrect data drops and add documents (#27727)
In the load process, if there are problems with the original data, we will store the error data in an error_log file on the disk for subsequent debugging. However, if there are many error data, it will occupy a lot of disk space. Now we want to limit the number of error data that is saved to the disk.

Be familiar with the usage of doris' import function and internal implementation process
Add a new be configuration item load_error_log_limit_bytes = default value 200MB
Use the newly added threshold to limit the amount of data that RuntimeState::append_error_msg_to_file writes to disk
Write regression cases for testing and verification

Co-authored-by: xy720 <22125576+xy720@users.noreply.github.com>
2023-12-22 10:43:18 +08:00
0b9b1be1f1 [fix](function) Fix from_second functions overflow and wrong result (#28685) 2023-12-22 10:22:49 +08:00
49eaf0cc32 [fix](partial update) only report error when in strict mode partial update when finding missing rowsets during flushing memtable (#28764)
related pr: #28062, #28674, #28677
fix #28677
2023-12-22 09:50:10 +08:00
5153137b83 [fix](metrics) fix bvar memtable_input_block_allocated_size (#28725) 2023-12-21 21:16:14 +08:00
e74ff95087 [fix](compaction) compaction should catch exception when vertical block reader read next block (#28625) 2023-12-21 20:30:37 +08:00
0070909d30 [fix](group commit)Fix the issue of duplicate addition of wal path when encouter exception (#28691) 2023-12-21 20:27:33 +08:00
ee73833d6e [improve](load) reduce lock scope in MemTableWriter active consumption (#28790) 2023-12-21 20:18:35 +08:00
cd65796874 [opt](inverted index) ignore_above only affects untokenized strings (#28819) 2023-12-21 20:06:56 +08:00
4f1aebb8e8 (topN)runtime_predicate is only triggered when the column name is obtained (#28419)
Issue Number: close #27485
2023-12-21 18:08:23 +08:00
5c469a8b6c [pipelineX](fix) Fix TPCH Q2 (#28783) 2023-12-21 17:11:01 +08:00
db523dafcb [improve](move-memtable) limit task num in load stream flush token (#28748) 2023-12-21 12:19:58 +08:00
34fd376f33 [fix](publish version) fix publish fail but return ok (#28425) 2023-12-21 11:10:08 +08:00
bcf2683b9d [fix](scanner) fix concurrency bugs when scanner is stopped or finished (#28650)
`ScannerContext` will schedule scanners even after stopped, and confused with `_is_finished` and `_should_stop`.
 Only Fix the concurrency bugs when scanner is stopped or finished reported in https://github.com/apache/doris/pull/28384
2023-12-21 10:37:58 +08:00
970e1c8475 [fix](group_commit) fix group commit cancel stuck (#28749) 2023-12-21 10:32:21 +08:00
007f152e5e [Improve](compile) add __AVX2__ macro for JsonbParser (#28754)
* [Improve](compile) add `__AVX2__` macro for JsonbParser

* throw exception instead of CHECK
2023-12-21 10:25:26 +08:00
18ad8562f2 [refactor](broadcastbuffer) using a queue to remove ref and unref codes (#28698)
Co-authored-by: yiguolei <yiguolei@gmail.com>Add a new class broadcastbufferholderqueue to manage holders
Using shared ptr to manage holders, not use ref and unref, it is too difficult to maintain.
2023-12-20 21:23:25 +08:00
280a01b815 [pipelineX](improvement) Support global runtime filter (#28692) 2023-12-20 20:06:26 +08:00
504693be7f [bug](coredump) Fix coredump in aggregation node's destruction(#28684)
fix coredump in aggregation node's destruction
2023-12-20 20:02:48 +08:00
36857006cd [Fix](json reader) fix json reader crash due to fmt::format_to (#28737)
```
4# __gnu_cxx::__verbose_terminate_handler() [clone .cold] at ../../../../libstdc++-v3/libsupc++/vterminate.cc:75
5# __cxxabiv1::__terminate(void (*)()) at ../../../../libstdc++-v3/libsupc++/eh_terminate.cc:48
6# 0x00005622F33D22B1 in /mnt/ssd01/pipline/OpenSourceDoris/clusterEnv/P0/Cluster0/be/lib/doris_be
7# 0x00005622F33D2404 in /mnt/ssd01/pipline/OpenSourceDoris/clusterEnv/P0/Cluster0/be/lib/doris_be
8# fmt::v7::detail::error_handler::on_error(char const*) in /mnt/ssd01/pipline/OpenSourceDoris/clusterEnv/P0/Cluster0/be/lib/doris_be
9# char const* fmt::v7::detail::parse_replacement_field<char, fmt::v7::detail::format_handler<fmt::v7::detail::buffer_appender<char>, char, fmt::v7::basic_format_context<fmt::v7::detail::buffer_appender<char>, char> >&>(char const*, char const*, fmt::v7::detail::format_handler<fmt::v7::detail::buffer_appender<char>, char, fmt::v7::basic_format_context<fmt::v7::detail::buffer_appender<char>, char> >&) in /mnt/ssd01/pipline/OpenSourceDoris/clusterEnv/P0/Cluster0/be/lib/doris_be
10# void fmt::v7::detail::vformat_to<char>(fmt::v7::detail::buffer<char>&, fmt::v7::basic_string_view<char>, fmt::v7::basic_format_args<fmt::v7::basic_format_context<fmt::v7::detail::buffer_appender<fmt::v7::type_identity<char>::type>, fmt::v7::type_identity<char>::type> >, fmt::v7::detail::locale_ref) in /mnt/ssd01/pipline/OpenSourceDoris/clusterEnv/P0/Cluster0/be/lib/doris_be
11# doris::vectorized::NewJsonReader::_append_error_msg(rapidjson::GenericValue<rapidjson::UTF8<char>, rapidjson::MemoryPoolAllocator<rapidjson::CrtAllocator> > const&, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >, bool*) at /root/doris/be/src/vec/exec/format/json/new_json_reader.cpp:924
12# doris::vectorized::NewJsonReader::_set_column_value
```
2023-12-20 19:58:30 +08:00
7b96730e87 [fix](block) fix nullptr in MutableBlock::allocated_bytes (#28738) 2023-12-20 19:46:13 +08:00
e8d0569d8b [refine](pipelineX)Make the 'set ready' logic of SenderQueue in pipelineX the same as that in the pipeline (#28488) 2023-12-20 19:26:00 +08:00