Commit Graph

6430 Commits

Author SHA1 Message Date
0436013baf [fix](decimal) fix cast decimal overflow and add test cases for casting decimalv2 to decimalv3 (#29165) 2023-12-27 20:58:37 +08:00
5f71691401 [fix](read) fix unexpected overflow of uninitialized column data in VStatisticsIterator::next_batch (#29141) 2023-12-27 20:58:02 +08:00
6d817bc253 [fix](topn opt) avoid using topn runtime predicate which segment does not contain such column(column unique id) when pruning segment (#29148) 2023-12-27 20:31:03 +08:00
c75e63a2a5 [Improvement](scan) Use scanner to do projection of scan node (#29124) 2023-12-27 16:00:52 +08:00
cd1e109cc3 [debug string](pipeline) Add necessary debug info (#29119) 2023-12-27 15:57:22 +08:00
2d2f14bc75 [fix](paimon) use SlotDescriptor to parse the required fields (#28990)
Before this PR, Paimon has created the schema of `VectorTable` by accessing meta information. However, once the schema of `VectorTable` in java is not same as `Block` in c++, BE will crashed, and there is no good way to troubleshoot errors.
2023-12-27 15:45:53 +08:00
cfed36afbf [Fix](topn opt) prevent from merge __TEMP__ column in segment iterator (#29121) 2023-12-27 15:42:48 +08:00
6f5672f318 [Refact](inverted index) refactor inverted index writer init (#29072) 2023-12-27 12:49:26 +08:00
3e5c8d9949 [fix](read) remove logic of estimating count of rows to read in segment iterator to avoid wrong result of unique key. (#29109) 2023-12-27 12:25:14 +08:00
9ff8bd2e9c [Enhancement](Wal)Support dynamic wal space limit (#27726) 2023-12-27 11:51:32 +08:00
6d26aca4ca [fix](pipeline) sort_merge should throw exception in has_next_block if got failed status (#29076)
Test in regression-test/suites/datatype_p0/decimalv3/test_decimalv3_overflow.groovy::249 sometimes failed when there are multiple BEs and FE process report status slowly for some reason.

explain select k1, k2, k1 * k2 from test_decimal128_overflow2 order by 1,2,3
--------------

+----------------------------------------------------------------------------------------------------------------------------+
| Explain String(Nereids Planner)                                                                                            |
+----------------------------------------------------------------------------------------------------------------------------+
| PLAN FRAGMENT 0                                                                                                            |
|   OUTPUT EXPRS:                                                                                                            |
|     k1[#5]                                                                                                                 |
|     k2[#6]                                                                                                                 |
|     (k1 * k2)[#7]                                                                                                          |
|   PARTITION: UNPARTITIONED                                                                                                 |
|                                                                                                                            |
|   HAS_COLO_PLAN_NODE: false                                                                                                |
|                                                                                                                            |
|   VRESULT SINK                                                                                                             |
|      MYSQL_PROTOCAL                                                                                                        |
|                                                                                                                            |
|   111:VMERGING-EXCHANGE                                                                                                    |
|      offset: 0                                                                                                             |
|                                                                                                                            |
| PLAN FRAGMENT 1                                                                                                            |
|                                                                                                                            |
|   PARTITION: HASH_PARTITIONED: k1[#0], k2[#1]                                                                              |
|                                                                                                                            |
|   HAS_COLO_PLAN_NODE: false                                                                                                |
|                                                                                                                            |
|   STREAM DATA SINK                                                                                                         |
|     EXCHANGE ID: 111                                                                                                       |
|     UNPARTITIONED                                                                                                          |
|                                                                                                                            |
|   108:VSORT                                                                                                                |
|   |  order by: k1[#5] ASC, k2[#6] ASC, (k1 * k2)[#7] ASC                                                                   |
|   |  offset: 0                                                                                                             |
|   |                                                                                                                        |
|   102:VOlapScanNode                                                                                                        |
|      TABLE: regression_test_datatype_p0_decimalv3.test_decimal128_overflow2(test_decimal128_overflow2), PREAGGREGATION: ON |
|      partitions=1/1 (test_decimal128_overflow2), tablets=8/8, tabletList=22841,22843,22845 ...                             |
|      cardinality=6, avgRowSize=0.0, numNodes=1                                                                             |
|      pushAggOp=NONE                                                                                                        |
|      projections: k1[#0], k2[#1], (k1[#0] * k2[#1])                                                                        |
|      project output tuple id: 1                                                                                            |
+----------------------------------------------------------------------------------------------------------------------------+
36 rows in set (0.03 sec)
Why failed:

Multiple BEs
Fragments 0 and 1 are MUST on different BEs
Pipeline task of VOlapScanNode which executes k1*k2 failed sets query status to cancelled
Pipeline task of VSort call try close, send Cancelled status to VMergeExchange
sort_curso did not throw exception when it meets error
2023-12-27 10:06:01 +08:00
a8e6676640 [Bug](security) BE download_files function exists log print sensitive msg #28592 (#28594) 2023-12-26 21:59:47 +08:00
6440fbfab6 [feature](scan) Implement parallel scanning by dividing the tablets based on the row range (#28967)
* [feature](scan) parallel scann on dup/mow mode

* fix bugs
2023-12-26 17:18:41 +08:00
4a60d01dc7 [improve](move-memtable) increase load_stream_flush_token_max_tasks (#29011) 2023-12-26 17:08:49 +08:00
1964a77d6c [enhencement](config) change default memtable size & loadStreamPerNode & default load parallelism (#28977)
We change memtable size from 200MB to 100MB to achieve smoother flush
performance. We change loadStreamPerNode from 20 to 60 to avoid stream
rpc to be the bottleneck when enable memtable_on_sink_node. We change
default s3&broker load parallelsim to make the most of CPUs on moderm
multi-core systems.

Signed-off-by: freemandealer <freeman.zhang1992@gmail.com>
2023-12-26 16:22:52 +08:00
31db633624 [improve](load) add profile for WaitFlushLimitTime (#29013) 2023-12-26 15:41:54 +08:00
52eeee347f [opt](compound) Optimize by deleting the compound expr after obtaining the final result (#28934) 2023-12-26 14:10:53 +08:00
c8ed14f11c [enhance](tablet) Reduce log in tablet meta (#28719) 2023-12-26 13:37:30 +08:00
92660bb1b2 [chore](config) modify variant_ratio_of_defaults_as_sparse_column from 0.95 to 1 (#28984)
since sparse column is not stable at present
2023-12-26 10:24:43 +08:00
f30e50676e [opt](scanner) optimize the number of threads of scanners (#28640)
1. Remove `doris_max_remote_scanner_thread_pool_thread_num`, use `doris_scanner_thread_pool_thread_num` only.
2. Set the default value `doris_scanner_thread_pool_thread_num` as `std::max(48, CpuInfo::num_cores() * 4)`
2023-12-26 10:24:12 +08:00
75a45484b6 [chore](config) modify tablet_schema_cache_recycle_interval from 24h to 1h (#28980)
To prevent from too many tablet schema cache in memory and lead to performance issue when hold lock
to erase item
2023-12-26 00:34:58 +08:00
cefae3dc90 [bug](storage) Fix gc rowset bug (#28979) 2023-12-26 00:29:03 +08:00
137f785698 [fix](parquet_reader) misused bool pointer (#28986)
Signed-off-by: pengyu <pengyu@selectdb.com>
2023-12-25 22:58:08 +08:00
c2c5df9341 [opt](assert_num_rows) support filter in AssertNumRows operator and fix some explain (#28935)
* NEED

* Update pipeline x

* fix pipelinex compile
2023-12-25 22:47:23 +08:00
0af9371a96 [fix](hash join) fix column ref DCHECK failure of hash join node block mem reuse (#28991)
Introduced by #28851, after evaluating build side expr, some columns in resulting block may be referenced more than once in the same block.

e.g. coalesce(col_a, 'string') if col_a is nullable but actually contains no null values, in this case funcition coalesce will insert a new nullable column which references the original col_a.
2023-12-25 22:19:01 +08:00
7081139bdc [fix](block) fix be core while mutable block merge may cause different row size between columns in origin block (#27943) 2023-12-25 20:35:22 +08:00
91e5b47439 [fix](hdfs) Fix HdfsFileSystem::exists_impl crash (#28952)
Calling hdfsGetLastExceptionRootCause without initializing ThreadLocalState
will crash. This PR modifies the condition for determining the existence of
a hdfs file, because hdfsExists will set errno to ENOENT when the file does
not exist, we can use this condition to check whether a file existence rather
than check the existence of the root cause.
2023-12-25 19:18:01 +08:00
c2eabbd441 [fix](load) fix nullptr when getting memtable flush running count (#28942)
* [fix](load) fix nullptr when getting memtable flush running count

* style
2023-12-25 13:49:18 +08:00
e9e1e2894b [performance](variant) support topn 2phase read for variant column (#28318)
[performance](variant) support topn 2phase read for variant column
2023-12-25 11:50:41 +08:00
f374beaa4e [fix](log) regularise some BE error type and fix a load task check #28729 2023-12-25 10:45:19 +08:00
3273e0e635 [refactor](pipelineX)do not override dependency() function in pipelineX (#28848) 2023-12-25 10:36:31 +08:00
24b1b4d96b [fix](pipelineX) fix use global rf when there no shared_scans (#28869) 2023-12-25 10:35:22 +08:00
e326ebb63e [feature](pipelineX) control exchange sink by memory usage (#28814) 2023-12-25 10:31:50 +08:00
d42fd68d6b [opt](invert index) Empty strings are not written to the index in the case of TOKENIZED (#28822) 2023-12-25 10:23:07 +08:00
b7ae7a07c7 [fix](join) incorrect result of left semi/anti join with empty build side (#28898) 2023-12-25 09:07:38 +08:00
bade50db56 [chore](test) Add testing util sync point (#28924) 2023-12-24 21:59:11 +08:00
145683ccdb [improvement](group commit) make get column function more reliable when replaying wal (#28900) 2023-12-24 21:17:39 +08:00
1545c36d16 Revert "[bugfix](scannercore) scanner will core in deconstructor during collect profile (#28727)" (#28931)
This reverts commit 4066de375efe6ff8e156a61df4f9316b3d9eaa4e.
2023-12-24 20:37:33 +08:00
db1da161f5 [optimize](zonemap) skip zonemap if predicate does not support_zonemap (#28595)
* [optimize](zonemap) skip zonemap if predicate does not support_zonemap #27608 (#28506)
2023-12-24 19:34:13 +08:00
dfbf082e06 [fix](merge-on-write) migration may cause duplicate keys for mow table (#28923) 2023-12-23 23:37:00 +08:00
96d4778f2e [fix](parquet) the end offset of column chunk may be wrong in parquet metadata (#28891) 2023-12-23 22:21:04 +08:00
de6c7a792e [fix](chore) update dcheck to avoid core during stress test (#28895) 2023-12-23 18:49:57 +08:00
2014396707 [fix](block) add block columns size dcheck (#28539) 2023-12-23 15:21:53 +08:00
e51f75e424 [FIX](map)fix map with rowstore table (#28877) 2023-12-23 12:11:06 +08:00
4066de375e [bugfix](scannercore) scanner will core in deconstructor during collect profile (#28727) 2023-12-23 11:09:46 +08:00
43776465d9 [fix](segcompaction) disable segcompaction by default (#28906)
Signed-off-by: freemandealer <freeman.zhang1992@gmail.com>
2023-12-23 07:43:41 +08:00
3b830f89a7 [improve](move-memtable) avoid using heavy work pool during append data (#28745) 2023-12-22 22:51:30 +08:00
f781f0cf24 [improve](load) limit delta writer flush task parallelism (#28883) 2023-12-22 21:50:56 +08:00
b1c5747f56 [improve](load) remove extra layer of heavy work pool in tablet_writer_add_block (#28550) 2023-12-22 20:10:50 +08:00
18c9ebce95 [improve](move-memtable) tweak load stream flush token num and max tasks (#28884) 2023-12-22 20:08:47 +08:00