doris

Author	SHA1	Message	Date
airborne12	c6b97c4daa	[Improvement](segment iterator) remove range in first read to save time (#26689 ) Currently, rowids may be fragmented significantly after `_get_row_ranges_by_column_conditions`, potentially leading to high CPU costs when processing these scattered ranges of rowid. This PR enhances the `SegmentIterator` by eliminating the initial range read in the `BitmapRangeIterator` constructor and introducing a `read_batch_rowids` method to both `BitmapRangeIterator` and `BackwardBitmapRangeIterator` classes. The aim is to boost performance by omitting redundant read operations, thereby reducing execution time. Moreover, to avoid unnecessary reads when the range is relatively complete, we employ a simple `is_continuous` check to determine if the block of rows is continuous. If so, we call `next_batch` instead of `read_by_rowids`, streamlining the processing of consecutive rowids. We selected three SQL statement scenarios to test the effects of the optimization, which are: 1. ```select COUNT() from wc_httplogs_inverted_index where request match "images" and (size >= 10 and status = 200);``` 2. ```select COUNT() from wc_httplogs_inverted_index where request match "HTTP" and (size >= 10 and status = 200);``` 3. ```select COUNT() from wc_httplogs_inverted_index where request match "GET" and (size >= 10 and status = 200);``` - The first SQL statement represents the scenario primarily optimized in this PR, where the first read matches a large number of rows but is highly fragmented. - The second SQL statement represents a scenario where the first read fully hits, mainly to verify if there is any performance degradation in the PR when hitting a complete rowid range. - The third SQL statement represents a near-total hit with only occasional misses, used to check if the PR degrades when the rowid range contains many continuous ranges. The results are as follows: 1. For the first SQL statement: 1. Before optimization: Execution time: 0.32 sec, FirstReadTime: 6s628ms 2. After optimization: Execution time: 0.16 sec, FirstReadTime: 1s604ms 2. For the second SQL statement: 1. Before optimization: Execution time: 0.16 sec, FirstReadTime: 682.816ms 2. After optimization: Execution time: 0.15 sec, FirstReadTime: 635.156ms 3. For the third SQL statement: 1. Before optimization: Execution time: 0.16 sec, FirstReadTime: 787.904ms 2. After optimization: Execution time: 0.16 sec, FirstReadTime: 798.861ms	2023-11-13 15:51:48 +08:00
zy-kkk	2f32a721ee	[refactor](jni) unified jni framework for jdbc catalog (#26317 ) This commit overhauls the JDBC connector logic within our project, transitioning from the previous mechanism of fetching data through JNI calls for individual ResultSet items to a more efficient and unified approach using the VectorTable data structure.	2023-11-13 14:28:15 +08:00
Jerry Hu	fa3c7d98c8	[fix](map) the implementation of ColumnMap::replicate was incorrect" (#26647 )	2023-11-13 12:17:14 +08:00
meiyi	c0fda8c5c2	[improve](group commit) Add a swicth to wait internal group commit lo… (#26734 ) * [improve](group commit) Add a swicth to make internal group commit load finish * modify group commit tvf plan	2023-11-13 10:35:35 +08:00
TengJianPing	7332b1b371	[fix](decimal) fix undefined behaviour of divide by zero when cast string to decimal (#26822 ) * [fix](decimal) fix undefined behaviour of divide by zero when cast string to decimal * fix format	2023-11-13 10:09:06 +08:00
Yongqiang YANG	d9e0a9fa2e	[enhancement](230) print max version and spec version when -230 happens (#26643 ) More information is provided.	2023-11-13 09:57:22 +08:00
AlexYue	07f1114ffa	[chore](fs) Don't print the stack for file system and it's derived class (#26814 )	2023-11-12 19:22:01 +08:00
Mingyu Chen	66054a5c78	[opt](scanner) increase the connection num of s3 client (#26795 )	2023-11-12 00:29:11 -06:00
yiguolei	8cf360fff7	[refactor](closure) remove ref count closure using auto release closure (#26718 ) 1. closure should be managed by a unique ptr and released by brpc , should not hold by our code. If hold by our code, we need to wait brpc finished during cancel or close. 2. closure should be exception safe, if any exception happens, should not memory leak. 3. using a specific callback interface to be implemented by Doris's code, we could write any code and doris should manage callback's lifecycle. 4. using a weak ptr between callback and closure. If callback is deconstruted before closure'Run, should not core.	2023-11-12 11:57:46 +08:00
AlexYue	12b2b0f366	[fix](s3) Prevent data race when finishing s3 file writer's _put_object operation (#26811 )	2023-11-12 07:29:14 +08:00
walter	c26f5a2bd2	[improvement](BE) Remove unnecessary error handling codes (#26760 )	2023-11-12 00:02:51 +08:00
Siyang Tang	196fadc044	[enhancement](metrics) enhance visibility of flush thread pool (#26544 )	2023-11-11 19:53:24 +08:00
lihangyu	8b33b0c4a4	[Fix](row store) cache invalidate key should not include sequence column (#26771 )	2023-11-11 01:30:32 -06:00
Dongyang Li	70fdd1f1af	[fix](ci) fix bug, tpch pipeline upload log (#26627 ) * [fix](ci) fix bug, tpch pipeline upload log Co-authored-by: stephen <hello-stephen@qq.com>	2023-11-10 18:01:40 +08:00
zhiqiang	4ebb517af0	[fix](be-ut) Fix compilation errors caused by missing opentelemetry headers (#26739 )	2023-11-10 14:58:46 +08:00
zhannngchen	899630d0eb	[chore](key_util) remove useless null_first parameter (#26635 ) Doris always put null in the first when sorting key, the parameter null_first of encode_keys is useless.	2023-11-10 14:27:47 +08:00
bobhan1	7878c08e15	[Revert](merge-on-write) Don't use delete bitmap to mark delete for rows with delete sign when sequence column doesn't exist (#26721 )	2023-11-10 13:55:40 +08:00
deardeng	7754791146	[improvement](disk balance) Prevent duplicate disk balance tasks afte… (#25990 )	2023-11-10 10:14:42 +08:00
zclllyybb	2bf48d7829	Revert "[Coverage](BE) Delete vinfo_func in BE (#26562 )" (#26723 ) This reverts commit 01094fd25ed539a8025066d8823c1e907109048a.	2023-11-10 10:14:11 +08:00
plat1ko	d767804815	[feature](merge-cloud) Decouple rowset id generator and local rowsets gc implementation (#25921 )	2023-11-10 10:07:02 +08:00
Gabriel	d988193d39	[pipelineX](shuffle) block exchange sink by memory usage (#26595 )	2023-11-09 21:28:22 +08:00
Qi Chen	c07a70e22a	[Fix](orc-reader) Add missing `break` introduced by #26548 . (#26633 ) Add missing break introduced by #26548. Sorry for this mistake.	2023-11-09 18:29:44 +08:00
zhiqiang	a5565f68b2	[Refactor](opentelemetry) Remove opentelemetry (#26605 )	2023-11-09 18:05:34 +08:00
bobhan1	eca747413d	[Fix](partial update) Fix core when doing partial update on tables with row column after schema change (#26632 )	2023-11-09 18:00:05 +08:00
daidai	baae7bf339	[fix](information_schema)fix bug that metadata_name_ids error tableid and append information_schema case. (#26238 ) fix bug that #24059 . Added some information_schema scanner tests. files schema_privileges table_privileges partitions rowsets statistics table_constraints Based on infodb_support_ext_catalog=false, it currently includes tests for all tables under the information_schema database.	2023-11-09 14:07:12 +08:00
wudongliang	22bf2889e5	[feature](tvf)(jni-avro)jni-avro scanner add complex data types (#26236 ) Support avro's enum, record, union data types	2023-11-09 13:58:49 +08:00
abmdocrt	5f62a4462d	[Enhancement](wal) Add wal space back pressure (#26483 )	2023-11-09 12:29:05 +08:00
zhengyu	33e46ee13d	[enhancement](config) enable single_replica_load by default in BE (#26619 ) Signed-off-by: freemandealer <freeman.zhang1992@gmail.com>	2023-11-09 12:14:37 +08:00
Qi Chen	d1438a8563	[Fix](orc-reader) Fix orc complex types when late materialization was turned on by disabling late materialization in this case. (#26548 ) Fix orc complex types when late materialization was turned on in orc reader by disabling late materialization in this case.	2023-11-09 12:05:43 +08:00
zclllyybb	01094fd25e	[Coverage](BE) Delete vinfo_func in BE (#26562 ) Delete vinfo_func in BE	2023-11-09 11:00:15 +08:00
amory	95f74f1544	[FIX](complextype)fix shrink in topN for complex type #26609	2023-11-09 10:56:14 +08:00
zhangstar333	74e452f19c	[bug](bitmap) fix bitmap value copy operator not call reset (#26451 ) when a empty bitmap assign to other bitmap the other bitmap should reset self firstly, and then set empty type.	2023-11-09 10:05:09 +08:00
yiguolei	66e591f7f2	[enhancement](brpc) add a auto release closure to ensure the closue safety (#26567 )	2023-11-09 08:50:42 +08:00
zhiqiang	55b2988bfd	[Opt](date_add/sub) Throw exception when result of date_add/sub out of range (#26475 )	2023-11-09 08:46:51 +08:00
HappenLee	a6f9df7096	[LOG] Add fatal log in exchange sink buffer (#26594 )	2023-11-08 21:52:21 +08:00
bobhan1	d0960bac56	[Fix](partial update) Fix partial update info loss when the delete bitmaps of the committed transactions are calculated by the compaction (#26556 ) a fix for #25147	2023-11-08 19:56:31 +08:00
Qi Chen	3bce6d3828	[Opt](orc-reader) Optimize orc string dict filter in not_single_conjunct case. (#26386 ) Optimize orc/parquet string dict filter in not_single_conjunct case. We can optimize this processing to filter block firstly by dict code, then filter by not_single_conjunct. Because dict code is int, it will filter faster than string. For example: ``` select count(l_receiptdate) from lineitem_date_as_string where l_shipmode in ('MAIL', 'SHIP') and l_commitdate < l_receiptdate and l_receiptdate >= '1994-01-01' and l_receiptdate < '1995-01-01'; ``` `l_receiptdate` and `l_shipmode` will using string dict filtering, and `l_commitdate < l_receiptdate` is the an not_single_conjunct which contains dict filter field. We can optimize this processing to filter block firstly by dict code, then filter by not_single_conjunct. Because dict code is int, it will filter faster than string. ### Test Result: Before: mysql> select count(l_receiptdate) from lineitem_date_as_string where l_shipmode in ('MAIL', 'SHIP') and l_commitdate < l_receiptdate and l_receiptdate >= '1994-01-01' and l_receiptdate < '1995-01-01'; +----------------------+ \| count(l_receiptdate) \| +----------------------+ \| 49314694 \| +----------------------+ 1 row in set (6.87 sec) After: mysql> select count(l_receiptdate) from lineitem_date_as_string where l_shipmode in ('MAIL', 'SHIP') and l_commitdate < l_receiptdate and l_receiptdate >= '1994-01-01' and l_receiptdate < '1995-01-01'; +----------------------+ \| count(l_receiptdate) \| +----------------------+ \| 49314694 \| +----------------------+ 1 row in set (4.85 sec)	2023-11-08 18:03:18 +08:00
Kaijie Chen	58bf79f79e	[fix](move-memtable) pass load stream num to backends (#26198 )	2023-11-08 16:16:33 +08:00
wangbo	6637f9c15f	Add enable_cgroup_cpu_soft_limit (#26510 )	2023-11-08 15:52:13 +08:00
Dongyang Li	f018b00646	[ci](perf) add new pipeline of tpch-sf100 (#26334 ) * [ci](perf) add new pipeline of tpch-sf100 Co-authored-by: stephen <hello-stephen@qq.com>	2023-11-08 15:32:02 +08:00
TengJianPing	a3666aa87e	[feature](decimal) support decimal256 when creating table (#26308 )	2023-11-08 15:21:01 +08:00
lihangyu	44b51bf0b9	[Feature](Variant) support variant load (#26572 )	2023-11-08 00:37:57 -06:00
meiyi	9502cc758d	[fix](regression) fix group commit regression test (#26557 )	2023-11-08 11:57:07 +08:00
Yongqiang YANG	a2419a8eb4	[enhancement](sink) refactor code of auto partition and where clause and enable them on sinkv2 (#26432 ) For better performance and elasticity, we move memtable from loadchannel to sink, VTabletSinkV2 is introduced, then there are VTabletWriter and VTabletSinkV2 distributing rows to tablets. where clauses on mvs are executed in VTabletWriter, while VTabletSinkV2 needs it too. So common code is moved to row distribution. Actually, we can layer code by rows' data flow, then the code is much more understood and maintainable. ScanNode -> Sink/Writer (RowDistribution -> IndexChannel / DeltaWriter)	2023-11-08 11:51:40 +08:00
zclllyybb	47ba4aaf30	[Enhancement](load) add timer and partitions number limit (#26549 ) add timer and partitions number limit	2023-11-08 11:22:40 +08:00
Xinyi Zou	1544110c1b	[feature-wip](arrow-flight)(step4) Support other DML and DDL statements, besides `Select` (#25919 ) Design Documentation Linked to #25514	2023-11-08 10:50:42 +08:00
Pxl	3cdbb6e637	[Bug](materialized-view) fix some bugs on create mv with percentile_approx (#26528 ) 1. percentile_approx have wrong symbol 2. fnCall.getParams() get obsolete childrens	2023-11-08 10:09:37 +08:00
Kaijie Chen	519b48648e	[fix](move-memtable) handle status when possible (#26526 )	2023-11-08 10:09:06 +08:00
HHoflittlefish777	607a5d25f1	[feature](streamload) support HTTP request with chunked transfer (#26520 )	2023-11-08 10:07:05 +08:00
Gabriel	a354f87d2e	[refactor](pipeline) simplify runtime state ctor (#26461 )	2023-11-08 09:57:09 +08:00

1 2 3 4 5 ...

6127 Commits