31db633624
[improve](load) add profile for WaitFlushLimitTime ( #29013 )
2023-12-26 15:41:54 +08:00
52eeee347f
[opt](compound) Optimize by deleting the compound expr after obtaining the final result ( #28934 )
2023-12-26 14:10:53 +08:00
c8ed14f11c
[enhance](tablet) Reduce log in tablet meta ( #28719 )
2023-12-26 13:37:30 +08:00
92660bb1b2
[chore](config) modify variant_ratio_of_defaults_as_sparse_column from 0.95 to 1 ( #28984 )
...
since sparse column is not stable at present
2023-12-26 10:24:43 +08:00
f30e50676e
[opt](scanner) optimize the number of threads of scanners ( #28640 )
...
1. Remove `doris_max_remote_scanner_thread_pool_thread_num`, use `doris_scanner_thread_pool_thread_num` only.
2. Set the default value `doris_scanner_thread_pool_thread_num` as `std::max(48, CpuInfo::num_cores() * 4)`
2023-12-26 10:24:12 +08:00
75a45484b6
[chore](config) modify tablet_schema_cache_recycle_interval from 24h to 1h ( #28980 )
...
To prevent from too many tablet schema cache in memory and lead to performance issue when hold lock
to erase item
2023-12-26 00:34:58 +08:00
cefae3dc90
[bug](storage) Fix gc rowset bug ( #28979 )
2023-12-26 00:29:03 +08:00
137f785698
[fix](parquet_reader) misused bool pointer ( #28986 )
...
Signed-off-by: pengyu <pengyu@selectdb.com >
2023-12-25 22:58:08 +08:00
c2c5df9341
[opt](assert_num_rows) support filter in AssertNumRows operator and fix some explain ( #28935 )
...
* NEED
* Update pipeline x
* fix pipelinex compile
2023-12-25 22:47:23 +08:00
0af9371a96
[fix](hash join) fix column ref DCHECK failure of hash join node block mem reuse ( #28991 )
...
Introduced by #28851 , after evaluating build side expr, some columns in resulting block may be referenced more than once in the same block.
e.g. coalesce(col_a, 'string') if col_a is nullable but actually contains no null values, in this case funcition coalesce will insert a new nullable column which references the original col_a.
2023-12-25 22:19:01 +08:00
7081139bdc
[fix](block) fix be core while mutable block merge may cause different row size between columns in origin block ( #27943 )
2023-12-25 20:35:22 +08:00
91e5b47439
[fix](hdfs) Fix HdfsFileSystem::exists_impl crash ( #28952 )
...
Calling hdfsGetLastExceptionRootCause without initializing ThreadLocalState
will crash. This PR modifies the condition for determining the existence of
a hdfs file, because hdfsExists will set errno to ENOENT when the file does
not exist, we can use this condition to check whether a file existence rather
than check the existence of the root cause.
2023-12-25 19:18:01 +08:00
c2eabbd441
[fix](load) fix nullptr when getting memtable flush running count ( #28942 )
...
* [fix](load) fix nullptr when getting memtable flush running count
* style
2023-12-25 13:49:18 +08:00
e9e1e2894b
[performance](variant) support topn 2phase read for variant column ( #28318 )
...
[performance](variant) support topn 2phase read for variant column
2023-12-25 11:50:41 +08:00
f374beaa4e
[fix](log) regularise some BE error type and fix a load task check #28729
2023-12-25 10:45:19 +08:00
3273e0e635
[refactor](pipelineX)do not override dependency() function in pipelineX ( #28848 )
2023-12-25 10:36:31 +08:00
24b1b4d96b
[fix](pipelineX) fix use global rf when there no shared_scans ( #28869 )
2023-12-25 10:35:22 +08:00
e326ebb63e
[feature](pipelineX) control exchange sink by memory usage ( #28814 )
2023-12-25 10:31:50 +08:00
d42fd68d6b
[opt](invert index) Empty strings are not written to the index in the case of TOKENIZED ( #28822 )
2023-12-25 10:23:07 +08:00
b7ae7a07c7
[fix](join) incorrect result of left semi/anti join with empty build side ( #28898 )
2023-12-25 09:07:38 +08:00
bade50db56
[chore](test) Add testing util sync point ( #28924 )
2023-12-24 21:59:11 +08:00
145683ccdb
[improvement](group commit) make get column function more reliable when replaying wal ( #28900 )
2023-12-24 21:17:39 +08:00
1545c36d16
Revert "[bugfix](scannercore) scanner will core in deconstructor during collect profile ( #28727 )" ( #28931 )
...
This reverts commit 4066de375efe6ff8e156a61df4f9316b3d9eaa4e.
2023-12-24 20:37:33 +08:00
db1da161f5
[optimize](zonemap) skip zonemap if predicate does not support_zonemap ( #28595 )
...
* [optimize](zonemap) skip zonemap if predicate does not support_zonemap #27608 (#28506 )
2023-12-24 19:34:13 +08:00
dfbf082e06
[fix](merge-on-write) migration may cause duplicate keys for mow table ( #28923 )
2023-12-23 23:37:00 +08:00
96d4778f2e
[fix](parquet) the end offset of column chunk may be wrong in parquet metadata ( #28891 )
2023-12-23 22:21:04 +08:00
de6c7a792e
[fix](chore) update dcheck to avoid core during stress test ( #28895 )
2023-12-23 18:49:57 +08:00
2014396707
[fix](block) add block columns size dcheck ( #28539 )
2023-12-23 15:21:53 +08:00
e51f75e424
[FIX](map)fix map with rowstore table ( #28877 )
2023-12-23 12:11:06 +08:00
4066de375e
[bugfix](scannercore) scanner will core in deconstructor during collect profile ( #28727 )
2023-12-23 11:09:46 +08:00
43776465d9
[fix](segcompaction) disable segcompaction by default ( #28906 )
...
Signed-off-by: freemandealer <freeman.zhang1992@gmail.com >
2023-12-23 07:43:41 +08:00
3b830f89a7
[improve](move-memtable) avoid using heavy work pool during append data ( #28745 )
2023-12-22 22:51:30 +08:00
f781f0cf24
[improve](load) limit delta writer flush task parallelism ( #28883 )
2023-12-22 21:50:56 +08:00
b1c5747f56
[improve](load) remove extra layer of heavy work pool in tablet_writer_add_block ( #28550 )
2023-12-22 20:10:50 +08:00
18c9ebce95
[improve](move-memtable) tweak load stream flush token num and max tasks ( #28884 )
2023-12-22 20:08:47 +08:00
fa0ad56817
[exec](compress) use FragmentTransmissionCompressionCodec control the exchange compress behavior ( #28818 )
2023-12-22 19:50:57 +08:00
3ed82bcee2
[Feature](inverted index) add lowercase option for inverted index analyzer ( #28704 )
2023-12-22 18:22:44 +08:00
9e0a2e861c
[pipelineX](refactor) rename functions ( #28846 )
2023-12-22 17:24:39 +08:00
aca8406e31
[refactor](executor)remove scan group #28847
2023-12-22 17:05:50 +08:00
d75300f166
[fix](hash join) fix stack overflow caused by evaluate case expr on huge build block ( #28851 )
2023-12-22 15:45:12 +08:00
9b67c86219
[optimize](count) optimize pk exact query without reading data ( #28494 )
2023-12-22 14:18:15 +08:00
8c59e16f81
[opt](query cancel) optimization for query cancel #28778
2023-12-22 12:48:37 +08:00
012e66729a
[improvement](executor) Add tvf and regression test for Workload Scheduler ( #28733 )
...
1 Add select workload schedule policy tvf
2 Add reg test
2023-12-22 12:09:51 +08:00
83e7235bab
[fix](memory) Add thread asynchronous purge jemalloc dirty pages ( #28655 )
...
jemallctl purge all arena dirty pages may take several seconds, which will block memory GC and cause OOM.
So purge asynchronously in a thread.
2023-12-22 12:05:20 +08:00
453e3c18f4
[refactor](buffer) remove download buffer since it is no longer useful ( #28832 )
...
remove download buffer since it is no longer useful
2023-12-22 11:53:31 +08:00
0af6bd6390
[fix](group-commit) check if wal need recovery is abnormal ( #28769 )
2023-12-22 11:06:11 +08:00
172f68480b
[Enhancement](load) Limit the number of incorrect data drops and add documents ( #27727 )
...
In the load process, if there are problems with the original data, we will store the error data in an error_log file on the disk for subsequent debugging. However, if there are many error data, it will occupy a lot of disk space. Now we want to limit the number of error data that is saved to the disk.
Be familiar with the usage of doris' import function and internal implementation process
Add a new be configuration item load_error_log_limit_bytes = default value 200MB
Use the newly added threshold to limit the amount of data that RuntimeState::append_error_msg_to_file writes to disk
Write regression cases for testing and verification
Co-authored-by: xy720 <22125576+xy720@users.noreply.github.com >
2023-12-22 10:43:18 +08:00
0b9b1be1f1
[fix](function) Fix from_second functions overflow and wrong result ( #28685 )
2023-12-22 10:22:49 +08:00
49eaf0cc32
[fix](partial update) only report error when in strict mode partial update when finding missing rowsets during flushing memtable ( #28764 )
...
related pr: #28062 , #28674 , #28677
fix #28677
2023-12-22 09:50:10 +08:00
5153137b83
[fix](metrics) fix bvar memtable_input_block_allocated_size ( #28725 )
2023-12-21 21:16:14 +08:00