doris

Author	SHA1	Message	Date
Xinyi Zou	ef031c5fb2	[branch-2.1](memory) Fix reserve memory compatible with memory GC and logging (#37682 ) pick #36307 #36412	2024-07-12 11:43:26 +08:00
Sun Chenyang	4dc933bb28	[cherry-pick] (branch-2.1) fix query errors caused by ignore_above (#37685 ) ## Proposed changes pick from master #37679	2024-07-12 09:31:45 +08:00
Jerry Hu	87912de93f	[fix](scan) catch exceptions thrown in scanner (#36101 ) (#37408 ) ## Proposed changes pick #36101 The uncaught exceptions thrown in the scanner will cause the BE to crash.	2024-07-12 08:49:39 +08:00
Sun Chenyang	79a208259e	[cherry-pick] (branch-2.1) Remove the check for inverted index file exists #36945 (#37423 )	2024-07-11 21:35:52 +08:00
lihangyu	217eac790b	[pick](Variant) pick some refactor and fix #34925 #36317 #36201 #36793 (#37526 )	2024-07-11 21:25:34 +08:00
Xinyi Zou	cf2fb6945a	[branch-2.1](memory) Refactor LRU cache policy memory tracking (#37658 ) pick #36235 #35965	2024-07-11 21:04:01 +08:00
Xinyi Zou	62e0230523	[branch-2.1](memory) Add `ThreadMemTrackerMgr` BE UT (#37654 ) ## Proposed changes pick #35518	2024-07-11 21:03:49 +08:00
Kaijie Chen	fed632bf4a	[fix](move-memtable) check segment num when closing each tablet (#36753 ) (#37536 ) cherry-pick #36753 and #37660	2024-07-11 20:33:44 +08:00
Xinyi Zou	e66ffc1b6d	[branch-2.1](arrow-flight-sql) Fix pipelineX Unknown result sink type (#37540 ) pick ##35804	2024-07-11 12:30:46 +08:00
Luwei	9f4e7346fb	[fix](compaction) fixing the inaccurate statistics of concurrent compaction tasks (#37318 ) (#37496 )	2024-07-10 22:23:25 +08:00
Kaijie Chen	741807bb22	[performance](move-memtable) only call _select_streams when necessary (#35576 ) (#37406 ) cherry-pick #35576	2024-07-10 22:20:23 +08:00
walter	afcc6170f6	[fix](txn_manager) Add ingested rowsets to unused rowsets when removing txn (#37417 ) Generally speaking, as long as a rowset has a version, it can be considered not to be in a pending state. However, if the rowset was created through ingesting binlogs, it will have a version but should still be considered in a pending state because the ingesting txn has not yet been committed. This PR updates the condition for determining the pending state. If a rowset is COMMITTED, the txn should be allowed to roll back even if a version exists. Cherry-pick #36551	2024-07-10 14:25:44 +08:00
Xinyi Zou	0cdb371624	[branch-2.1](memory) Disable Arrow Jemalloc step 2 (#37556 ) pick #37533	2024-07-10 11:34:18 +08:00
deardeng	5247e0ff3a	[fix](clone) Increase robustness for clone #36642 (#37413 ) cherry pick from #36642	2024-07-09 17:18:14 +08:00
Xin Liao	7cda8db020	[fix](load) The NodeChannel should be canceled when failed to add block #37500 (#37527 ) cherry pick from #37500	2024-07-09 17:01:04 +08:00
Xinyi Zou	f7f0c20f00	[branch-2.1](cgroup memory) Correct cgroup mem info cache (#37440 ) pick #36966 Co-authored-by: Hongkun Xu <xuhongkun666@163.com>	2024-07-09 16:19:37 +08:00
Kaijie Chen	005304953e	[performance](load) do not copy input_block in memtable (#36939 ) (#37407 ) cherry-pick #36939	2024-07-09 15:59:44 +08:00
Mingyu Chen	81360cf897	[opt](test) shorten the external p0 running time (#37320 ) (#37473 ) bp #37320	2024-07-09 15:35:15 +08:00
Luwei	3337c1bbe3	[[enhancement](compaction) adjust compaction concurrency based on compaction score and workload (#37491 ) adjust compaction concurrency based on compaction score and workload #36672 fix null pointer when retrieving CPU load average #37171	2024-07-09 09:56:35 +08:00
meiyi	1e3ab0ff8c	[fix](group commit) Pick make group commit cancel in time (#36249 ) (#37404 ) pick https://github.com/apache/doris/pull/36249/	2024-07-09 09:25:11 +08:00
meiyi	1a25270918	[fix](group commit) Pick Fix the incorrect group commit count in log; fix the core in get_first_block (#36408 ) (#37405 ) Pick https://github.com/apache/doris/pull/36408/	2024-07-09 09:24:43 +08:00
amory	0a103aa11f	[improve](json)improve json support empty keys #36762 (#37351 )	2024-07-08 19:04:51 +08:00
walter	5280e277e7	[chore](be) Acquire and check MD5 digest of the file to download (#37418 ) Cherry-pick #35807, #36621, #36726	2024-07-08 18:55:35 +08:00
zhannngchen	494b54a5a5	[enhancement](trash) support skip trash, update trash default expire time (#37170 ) (#37409 ) cherry-pick #37170	2024-07-08 15:33:02 +08:00
hui lai	c66df8d9e6	[branch-2.1](load) fix no error url if no partition can be found (#36831 ) (#37401 ) ## Proposed changes pick #36831 before ``` Stream load result: { "TxnId": 2014, "Label": "83ba46bd-280c-4e22-b581-4eb126fd49cf", "Comment": "", "TwoPhaseCommit": "false", "Status": "Fail", "Message": "[DATA_QUALITY_ERROR]Encountered unqualified data, stop processing", "NumberTotalRows": 1, "NumberLoadedRows": 1, "NumberFilteredRows": 0, "NumberUnselectedRows": 0, "LoadBytes": 1669, "LoadTimeMs": 58, "BeginTxnTimeMs": 0, "StreamLoadPutTimeMs": 10, "ReadDataTimeMs": 0, "WriteDataTimeMs": 47, "CommitAndPublishTimeMs": 0 } ``` after ``` Stream load result: { "TxnId": 2014, "Label": "83ba46bd-280c-4e22-b581-4eb126fd49cf", "Comment": "", "TwoPhaseCommit": "false", "Status": "Fail", "Message": "[DATA_QUALITY_ERROR]too many filtered rows", "NumberTotalRows": 1, "NumberLoadedRows": 0, "NumberFilteredRows": 1, "NumberUnselectedRows": 0, "LoadBytes": 1669, "LoadTimeMs": 58, "BeginTxnTimeMs": 0, "StreamLoadPutTimeMs": 10, "ReadDataTimeMs": 0, "WriteDataTimeMs": 47, "CommitAndPublishTimeMs": 0, "ErrorURL": "http://XXXX:8040/api/_load_error_log?file=__shard_4/error_log_insert_stmt_c6461270125a615b-2873833fb48d56a3_c6461270125a615b_2873833fb48d56a3" } ``` ## Proposed changes Issue Number: close #xxx <!--Describe your changes.-->	2024-07-08 10:41:33 +08:00
hui lai	dd18652861	[branch-2.1](routine-load) make get Kafka meta timeout configurable (#37399 ) pick #36619	2024-07-08 10:39:17 +08:00
meiyi	70f46c12b3	[improve](group commit) Pick Modify group commit case and modify cancel status (#35995 ) (#37398 ) Pick https://github.com/apache/doris/pull/35995	2024-07-08 10:27:08 +08:00
bobhan1	a05406ecc9	[branch-2.1] Picks "[Fix](delete) Fix delete job timeout when executing delete from ... #37363 " (#37374 ) ## Proposed changes picks https://github.com/apache/doris/pull/37363	2024-07-07 18:33:17 +08:00
hui lai	423483ed8f	[branch-2.1](routine-load) optimize out of range error message (#37391 ) ## Proposed changes pick #36450 before ``` ErrorReason{code=errCode = 105, msg='be 10002 abort task, task id: d846f3d3-7c9e-44a7-bee0-3eff8cd11c6f job id: 11310 with reason: [INTERNAL_ERROR]Offset out of range, 0# doris::Status doris::Status::Error<6, true>(std::basic_string_view<char, std::char_traits<char> >) at /mnt/disk1/laihui/doris/be/src/common/status.h:422 1# doris::Status doris::Status::InternalError<true>(std::basic_string_view<char, std::char_traits<char> >) at /mnt/disk1/laihui/doris/be/src/common/status.h:468 2# doris::KafkaDataConsumer::group_consume(doris::BlockingQueue<RdKafka::Message>, long) at /mnt/disk1/laihui/doris/be/src/runtime/routine_load/data_consumer.cpp:226 3# doris::KafkaDataConsumerGroup::actual_consume(std::shared_ptr<doris::DataConsumer>, doris::BlockingQueue<RdKafka::Message>, long, std::function<void (doris::Status const&)>) at /mnt/disk1/laihui/doris/be/src/runtime/routine_load/data_consumer_group.cpp:200 4# void std::__invoke_impl<void, void (doris::KafkaDataConsumerGroup::&)(std::shared_ptr<doris::DataConsumer>, doris::BlockingQueue<RdKafka::Message>, long, std::function<void (doris::Status const&)>), doris::KafkaDataConsumerGroup&, std::shared_ptr<doris::DataConsumer>&, doris::BlockingQueue<RdKafka::Message>&, long&, doris::KafkaDataConsumerGroup::start_all(std::shared_ptr<doris::StreamLoadContext>, std::shared_ptr<doris::io::KafkaConsumerPipe>)::$_0&>(std::__invoke_memfun_deref, void (doris::KafkaDataConsumerGroup::&)(std::shared_ptr<doris::DataConsumer>, doris::BlockingQueue<RdKafka::Message>, long, std::function<void (doris::Status const&)>), doris::KafkaDataConsumerGroup&, std::shared_ptr<doris::DataConsumer>&, doris::BlockingQueue<RdKafka::Message>&, long&, doris::KafkaDataConsumerGroup::start_all(std::shared_ptr<doris::StreamLoadContext>, std::shared_ptr<doris::io::KafkaConsumerPipe>)::$_0&) at /mnt/disk1/laihui/build/ldb_toolchain/bin/../lib/gcc/x86_64-linux-gnu/11/../../../../include/c++/11/bits/invoke.h:74 ... ``` now ``` ErrorReason{code=errCode = 105, msg='be 10002 abort task, task id: 3ba0c0f4-d13c-4dfa-90ce-3df922fd9340 job id: 11310 with reason: [INTERNAL_ERROR]Offset out of range, consume partition 0, consume offset 100, the offset used by job does not exist in kafka, please check the offset, using the Alter ROUTINE LOAD command to modify it, and resume the job'} ``` ## Proposed changes Issue Number: close #xxx <!--Describe your changes.-->	2024-07-07 18:29:04 +08:00
abmdocrt	89857d3780	[cherry-pick](branch-2.1) Pick "Use async group commit rpc call (#36499 )" (#37380 ) ## Proposed changes Issue Number: close #xxx <!--Describe your changes.--> Pick #36499	2024-07-07 18:28:19 +08:00
abmdocrt	7d423b3a6a	[chery-pick](branch-2.1) Pick "[Fix](group commit) Fix group commit block queue mem estimate fault" (#37379 ) Pick [Fix](group commit) Fix group commit block queue mem estimate faule #35314 ## Proposed changes Issue Number: close #xxx <!--Describe your changes.--> Problem: When `group commit=async_mode` and NULL data is imported into a `variant` type column, it causes incorrect memory statistics for group commit backpressure, leading to a stuck issue. Cause: In group commit mode, blocks are first added to a queue in batches using `add block`, and then blocks are retrieved from the queue using `get block`. To track memory usage during backpressure, we add the block size to the memory statistics during `add block` and subtract the block size from the memory statistics during `get block`. However, for `variant` types, during the `add block` write to WAL, serialization occurs, which can merge types (e.g., merging `int` and `bigint` into `bigint`), thereby changing the block size. This results in a discrepancy between the block size during `get block` and `add block`, causing memory statistics to overflow. Solution: Record the block size at the time of `add block` and use this recorded size during `get block` instead of the actual block size. This ensures consistency in the memory addition and subtraction. ## Further comments If this is a relatively large or complex change, kick off the discussion at [dev@doris.apache.org](mailto:dev@doris.apache.org) by explaining why you chose the solution you did and what alternatives you considered, etc... ## Proposed changes Issue Number: close #xxx <!--Describe your changes.-->	2024-07-07 18:27:49 +08:00
hui lai	61bc624938	[branch-2.1](move-memtable) fix move memtable core when use multi table load (#37370 ) ## Proposed changes pick https://github.com/apache/doris/pull/35458	2024-07-07 18:25:00 +08:00
hui lai	f2693152bb	[fix](multi-table-load) fix be core when multi table load pipe finish fail (#37383 ) pick #36269	2024-07-07 18:24:16 +08:00
zzzxl	c399a0e216	[opt](inverted index) reduce generation of the rowid_result if not necessary #35357 (#36569 )	2024-07-06 21:33:03 +08:00
bobhan1	38b3870fe8	[branch-2.1] Picks "[fix](autoinc) Fix AutoIncrementGenerator and add more logs about auto-increment column #37306 " (#37366 ) ## Proposed changes picks https://github.com/apache/doris/pull/37306	2024-07-06 16:53:29 +08:00
Gabriel	a803e1493a	[pipeline](fix) Set upstream operators always runnable once source op… (#37325 ) …erator closed (#37297) Some kinds of source operators has a 1-1 relationship with a sink operator (such as AnalyticOperator). We must ensure AnalyticSinkOperator will not be blocked if AnalyticSourceOperator already closed. pick #37297	2024-07-05 13:54:34 +08:00
qiye	f8cee439b6	[feature](ES Catalog) map nested/object type in ES to JSON type in Doris (#37101 ) (#37182 ) backport #37101	2024-07-05 10:48:32 +08:00
daidai	c8978fc9d1	[fix](HadoopLz4BlockCompression)Fixed the bug that HadoopLz4BlockCompression creates _decompressor every time it decompresses.(#37187 ) (#37299 ) bp : #37187	2024-07-04 20:22:27 +08:00
Pxl	e2c2702dff	[Bug](runtime-filter) fix some rf error problems (#37155 ) ## Proposed changes pick from #37273	2024-07-04 20:03:46 +08:00
wangbo	b272247a57	[pick]log thread num (#37258 ) ## Proposed changes pick #37159	2024-07-04 15:27:52 +08:00
Mingyu Chen	ceef9ee123	[feature](serde) support presto compatible output format (#37039 ) (#37253 ) bp #37039	2024-07-04 13:56:05 +08:00
TengJianPing	fb344b66ca	[fix](hash join) fix numeric overflow when calculating hash table bucket size #37193 (#37213 ) ## Proposed changes Bp #37193	2024-07-04 11:12:52 +08:00
Gabriel	4532ba990a	[fix](pipeline) Avoid to close task twice (#36747 ) (#37115 )	2024-07-04 10:02:56 +08:00
Pxl	70e1c563b3	[Chore](runtime-filter) enlarge sync filter size rpc timeout limit (#37103 ) (#37225 ) pick from #37103	2024-07-03 21:02:26 +08:00
Pxl	ffc57c9ef4	[Bug](runtime-filter) fix brpc ctrl use after free (#37223 ) part of #35186	2024-07-03 21:01:50 +08:00
zhannngchen	97945af947	[fix](merge-on-write) when full clone failed, duplicate key might occur (#37001 ) (#37229 ) cherry-pick #37001	2024-07-03 19:48:10 +08:00
Tiewei Fang	0aeb768bf9	[Fix](export/outfile) Support compression when exporting data to Parquet / ORC. (#37167 ) bp: #36490	2024-07-03 10:53:57 +08:00
Tiewei Fang	bd24a8bdd9	[Fix](csv_reader) Add a session variable to control whether empty rows in CSV files are read as NULL values (#37153 ) bp: #36668	2024-07-02 22:12:17 +08:00
Mingyu Chen	e25717458e	[opt](catalog) add some profile for parquet reader and change meta cache config (#37040 ) (#37146 ) bp #37040	2024-07-02 20:58:43 +08:00
wangbo	f5572ac732	[pick]reset memtable flush thread num (#37092 ) ## Proposed changes pick #37028	2024-07-02 19:20:17 +08:00

1 2 3 4 5 ...

7932 Commits