doris

Author	SHA1	Message	Date
Jerry Hu	0571342538	[fix](sink) The issue with 2GB limit of protocol buffer (#37990 ) (#39112 ) ``` Fail to serialize doris.PFetchDataResult ``` If the size of `PFetchDataResult` is greater than 2G, protocol buffer cannot serialize the message. pick #37990	2024-08-09 04:01:56 +08:00
Kaijie Chen	0a3874f203	[fix](move-memtable) close stream when cancel load stream stub (#38912 ) (#39039 ) backport #38912	2024-08-07 23:24:00 +08:00
meiyi	3abb222064	[fix](group commit) Fix test_group_commit_async_wal_msg_fault_injection case (#35313 ) (#38911 ) pick https://github.com/apache/doris/pull/35313	2024-08-06 17:57:22 +08:00
Qi Chen	70a518e099	[Fix](multi-catalog) Fix not throw error when call close() in hive/iceberg writer. (#38902 ) ## Proposed changes [Fix] (multi-catalog) Fix not throw error when call close() in hive/iceberg writer. When the file writer closes(), it will sync buffer to commit. Therefore, sometimes data is written only when close() is called, which can expose some errors. For example, hdfs_file_writer. Therefore, this error needs to be captured in the entire close process.	2024-08-06 08:51:12 +08:00
meiyi	91f0301b43	[fix](group commit) Pick some group commit pr (#38320 ) Pick https://github.com/apache/doris/pull/38292, https://github.com/apache/doris/pull/34021, https://github.com/apache/doris/pull/38228, some modify of https://github.com/apache/doris/pull/37260, some modify of https://github.com/apache/doris/pull/37595	2024-07-25 17:32:44 +08:00
Xin Liao	ffc0d6884d	[Fix](load) Fix the channel leak when close wait has been cancelled #38031 (#38125 ) cherry pick from #38031	2024-07-19 22:58:54 +08:00
Pxl	d7e84b7ee3	[Enchancement](bitmap) optimize bitmap deserialize and remove some unused code (#37623 ) ## Proposed changes pick from #35789	2024-07-16 11:21:54 +08:00
Kaijie Chen	47096f2083	[test](regression) add cases for data quality error url (#34987 ) (#37777 ) cherry-pick #34987	2024-07-16 11:12:52 +08:00
Qi Chen	8930df3b31	[Feature](iceberg-writer) Implements iceberg partition transform. (#37692 ) ## Proposed changes Cherry-pick iceberg partition transform functionality. #36289 #36889 --------- Co-authored-by: kang <35803862+ghkang98@users.noreply.github.com> Co-authored-by: lik40 <lik40@chinatelecom.cn> Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com> Co-authored-by: Mingyu Chen <morningman@163.com>	2024-07-13 16:07:50 +08:00
Xinyi Zou	a61030215e	[branch-2.1](memory) Support make all memory snapshots (#37705 ) pick #36679	2024-07-12 16:21:37 +08:00
Xinyi Zou	ef031c5fb2	[branch-2.1](memory) Fix reserve memory compatible with memory GC and logging (#37682 ) pick #36307 #36412	2024-07-12 11:43:26 +08:00
Kaijie Chen	fed632bf4a	[fix](move-memtable) check segment num when closing each tablet (#36753 ) (#37536 ) cherry-pick #36753 and #37660	2024-07-11 20:33:44 +08:00
Kaijie Chen	741807bb22	[performance](move-memtable) only call _select_streams when necessary (#35576 ) (#37406 ) cherry-pick #35576	2024-07-10 22:20:23 +08:00
Xin Liao	7cda8db020	[fix](load) The NodeChannel should be canceled when failed to add block #37500 (#37527 ) cherry pick from #37500	2024-07-09 17:01:04 +08:00
meiyi	1a25270918	[fix](group commit) Pick Fix the incorrect group commit count in log; fix the core in get_first_block (#36408 ) (#37405 ) Pick https://github.com/apache/doris/pull/36408/	2024-07-09 09:24:43 +08:00
hui lai	c66df8d9e6	[branch-2.1](load) fix no error url if no partition can be found (#36831 ) (#37401 ) ## Proposed changes pick #36831 before ``` Stream load result: { "TxnId": 2014, "Label": "83ba46bd-280c-4e22-b581-4eb126fd49cf", "Comment": "", "TwoPhaseCommit": "false", "Status": "Fail", "Message": "[DATA_QUALITY_ERROR]Encountered unqualified data, stop processing", "NumberTotalRows": 1, "NumberLoadedRows": 1, "NumberFilteredRows": 0, "NumberUnselectedRows": 0, "LoadBytes": 1669, "LoadTimeMs": 58, "BeginTxnTimeMs": 0, "StreamLoadPutTimeMs": 10, "ReadDataTimeMs": 0, "WriteDataTimeMs": 47, "CommitAndPublishTimeMs": 0 } ``` after ``` Stream load result: { "TxnId": 2014, "Label": "83ba46bd-280c-4e22-b581-4eb126fd49cf", "Comment": "", "TwoPhaseCommit": "false", "Status": "Fail", "Message": "[DATA_QUALITY_ERROR]too many filtered rows", "NumberTotalRows": 1, "NumberLoadedRows": 0, "NumberFilteredRows": 1, "NumberUnselectedRows": 0, "LoadBytes": 1669, "LoadTimeMs": 58, "BeginTxnTimeMs": 0, "StreamLoadPutTimeMs": 10, "ReadDataTimeMs": 0, "WriteDataTimeMs": 47, "CommitAndPublishTimeMs": 0, "ErrorURL": "http://XXXX:8040/api/_load_error_log?file=__shard_4/error_log_insert_stmt_c6461270125a615b-2873833fb48d56a3_c6461270125a615b_2873833fb48d56a3" } ``` ## Proposed changes Issue Number: close #xxx <!--Describe your changes.-->	2024-07-08 10:41:33 +08:00
bobhan1	38b3870fe8	[branch-2.1] Picks "[fix](autoinc) Fix AutoIncrementGenerator and add more logs about auto-increment column #37306 " (#37366 ) ## Proposed changes picks https://github.com/apache/doris/pull/37306	2024-07-06 16:53:29 +08:00
Mingyu Chen	ceef9ee123	[feature](serde) support presto compatible output format (#37039 ) (#37253 ) bp #37039	2024-07-04 13:56:05 +08:00
Tiewei Fang	0aeb768bf9	[Fix](export/outfile) Support compression when exporting data to Parquet / ORC. (#37167 ) bp: #36490	2024-07-03 10:53:57 +08:00
wangbo	f5572ac732	[pick]reset memtable flush thread num (#37092 ) ## Proposed changes pick #37028	2024-07-02 19:20:17 +08:00
bobhan1	4210a6a8d6	[branch-2.1] PIck "[Fix](autoinc) Hanlde the processing of auto_increment column on exchange node rather than on TabletWriter when using TABLET_SINK_SHUFFLE_PARTITIONED #36836 " (#37029 ) ## Proposed changes pick https://github.com/apache/doris/pull/36836	2024-07-01 09:56:30 +08:00
bobhan1	12dddfc26c	[branch-2.1] Pick "[Fix](autoinc) try fix concurrent load problem with auto inc column #36421 " (#37027 ) ## Proposed changes pick https://github.com/apache/doris/pull/36421	2024-06-30 13:10:03 +08:00
plat1ko	3652fc31c3	[Pick 2.1] "Fix data loss when node channel been cancelled before close wait (#36662 )" (#36744 ) ## Proposed changes Pick from https://github.com/apache/doris/pull/36662	2024-06-25 11:36:31 +08:00
zclllyybb	bd47d5a681	[branch-2.1](auto-partition) Fix auto partition load failure in multi replica (#36586 ) this pr 1. picked #35630, which was reverted #36098 before. 2. picked #36344 from master these two pr fixed existing bug about auto partition load. --------- Co-authored-by: Kaijie Chen <ckj@apache.org>	2024-06-20 17:51:18 +08:00
zclllyybb	3b23eee37c	Revert "[fix](auto-partition) fix auto partition load lost data in multi sender (#35287 )" (#36098 ) Reverts apache/doris#35630 because it brought some more damaging bugs. we will fix it and merge in next version	2024-06-11 17:11:42 +08:00
yiguolei	f03cee5e30	[enhancement](oom) add exception in olap data convertor when memory is not enough to prevent oom (#35761 ) Issue Number: close #xxx <!--Describe your changes.--> --------- Co-authored-by: yiguolei <yiguolei@gmail.com>	2024-06-02 21:12:53 +08:00
wuwenchi	cb96a79d07	[bugfix](iceberg)fix datetime conversion error and data path error (#35708 ) ## Proposed changes Issue #31442 <!--Describe your changes.--> 1. The unit of the seventh parameter of `ZonedDateTime.of` is nanosecond, so we should multiply the microsecond by 1000. 2. When writing to a non-partitioned iceberg table, the data path has an extra slash	2024-06-01 00:42:48 +08:00
Kaijie Chen	c2fc485327	[fix](auto-partition) fix auto partition load lost data in multi sender (#35287 ) (#35630 ) ## Proposed changes Change `use_cnt` mechanism for incremental (auto partition) channels and streams, it's now dynamically counted. Use `close_wait()` of regular partitions as a synchronize point to make sure all sinks are in close phase before closing any incremental (auto partition) channels and streams. Add dummy (fake) partition and tablet if there is no regular partition in the auto partition table. Backport #35287 Co-authored-by: zhaochangle <zhaochangle@selectdb.com>	2024-05-31 10:27:03 +08:00
Qi Chen	b91d2caab8	[Feature](iceberg-writer) Implements iceberg sink basic functionality for inserting into table. (#35587 ) backport #34929	2024-05-29 16:40:54 +08:00
Mingyu Chen	3736d0af13	[Fix](hive-writer) Fix s3 file commiter not working. (#35502 ) (#35579 ) bp #35502 Co-authored-by: Qi Chen <kaka11.chen@gmail.com>	2024-05-29 12:14:42 +08:00
Mingyu Chen	5c40e87667	[opt](s3) auto retry when meeting 429 error (#35397 ) - Add 2 new BE config - `s3_read_base_wait_time_ms` and `s3_read_max_wait_time_ms` When meet s3 429 error, the "get" request will sleep `s3_read_base_wait_time_ms (1, 2, 3, 4)` ms get try again. The max sleep time is s3_read_max_wait_time_ms and the max retry time is max_s3_client_retry - Add more metrics for s3 file reader - `s3_file_reader_too_many_request`: counter of 429 error. - `s3_file_reader_s3_get_request`: the QPS of s3 get request. - `TotalGetRequest`: Get request counter in profile - `TooManyRequestErr`: 429 error counter in profile - `TooManyRequestSleepTime`: Sum of sleep time after 429 error in profile - `TotalBytesRead`: Total bytes read from s3 in profile	2024-05-28 23:00:31 +08:00
Qi Chen	84e9a14063	[Fix](hive-writer) Fix partition column orders issue when the partition fields inserted into the target table are inconsistent with the field order of the query source table and the schema field order of the query source table. (#35543 ) ## Proposed changes backport #35347 ## Further comments If this is a relatively large or complex change, kick off the discussion at [dev@doris.apache.org](mailto:dev@doris.apache.org) by explaining why you chose the solution you did and what alternatives you considered, etc...	2024-05-28 18:11:55 +08:00
Qi Chen	09f9012817	[Fix](hive-writer) Fix hive partition update core. (#35311 ) Issue: #31442 ``` /home/zcp/repo_center/doris_master/doris/be/src/common/signal_handler.h:421 1# PosixSignals::chained_handler(int, siginfo, void) [clone .part.0] in /usr/lib/jvm/java-17-openjdk-amd64/lib/server/libjvm.so 2# JVM_handle_linux_signal in /usr/lib/jvm/java-17-openjdk-amd64/lib/server/libjvm.so 3# 0x00007F963FA9D090 in /lib/x86_64-linux-gnu/libc.so.6 4# doris::vectorized::VHivePartitionWriter::_build_partition_update() at /home/zcp/repo_center/doris_master/doris/be/src/vec/sink/writer/vhive_partition_writer.cpp:215 5# doris::vectorized::VHivePartitionWriter::close(doris::Status const&) at /home/zcp/repo_center/doris_master/doris/be/src/vec/sink/writer/vhive_partition_writer.cpp:164 6# doris::vectorized::VHiveTableWriter::close(doris::Status) at /home/zcp/repo_center/doris_master/doris/be/src/vec/sink/writer/vhive_table_writer.cpp:209 7# doris::vectorized::AsyncResultWriter::process_block(doris::RuntimeState, doris::RuntimeProfile) at /home/zcp/repo_center/doris_master/doris/be/src/vec/sink/writer/async_result_writer.cpp:184 8# doris::vectorized::AsyncResultWriter::start_writer(doris::RuntimeState, doris::RuntimeProfile)::$_0::operator()() const at ```	2024-05-27 15:24:53 +08:00
Gabriel	ade1841a01	[fix](shuffle) Do not return error if local recvr is null (#35399 )	2024-05-26 20:20:50 +08:00
yiguolei	11971eddb4	[atomicstatus](be) add atomic status to share state between multi thread (#35002 )	2024-05-22 01:11:07 +08:00
yiguolei	f38ecd349c	[enhancement](memory) return error if allocate memory failed during add rows method (#35085 ) * return error when add rows failed * f --------- Co-authored-by: yiguolei <yiguolei@gmail.com>	2024-05-22 00:53:34 +08:00
abmdocrt	42425808a1	[Cherry-Pick](branch-2.1) Pick "Fix multiple replica partial update auto inc data inconsistency problem #34788 " (#35056 ) * [Fix](auto inc) Fix multiple replica partial update auto inc data inconsistency problem (#34788) * Problem: For tables with auto-increment columns, updating partial columns can cause data inconsistency among replicas. Cause: Previously, the implementation for updating partial columns in tables with auto-increment columns was done independently on each BE (Backend), leading to potential inconsistencies in the auto-increment column values generated by each BE. Solution: Before distributing blocks, determine if the update involves partial columns of a table with an auto-increment column. If so, add the auto-increment column to the last column of the block. After distributing to each BE, each BE will check if the data key for the partial column update exists. If it exists, the previous auto-increment column value is used; if not, the auto-increment column value from the last column of the block is used. This ensures that the auto-increment column values are consistent across different BEs. * 2 * [Fix](regression-test) Fix auto inc partial update unstable regression test (#34940)	2024-05-20 15:43:46 +08:00
abmdocrt	dff6171546	[fix](auto inc) db_id and table_id should be int64_t instead of int32_t (#34912 )	2024-05-18 18:29:59 +08:00
Kaijie Chen	4b96f9834f	[fix](move-memtable) change brpc connection type to single (#34883 )	2024-05-18 18:29:20 +08:00
Kaijie Chen	849eeb39e9	[fix](load) skip sending cancel rpc if VNodeChannel is not inited (#34897 )	2024-05-18 18:29:10 +08:00
Qi Chen	e13ce905cf	[Fix](hive-writer) Fix hive partition update file size and remove redundant column names. (#34651 ) (#34885 ) Backport #34651.	2024-05-15 11:23:32 +08:00
zhiqiang	0ae1b9c70a	[chore](remove code) Remove dragonbox related (#34528 ) * Revert "[refactor](mysql result format) use new serde framework to tuple convert (#25006)" This reverts commit e5ef0aa6d439c3f9b1f1fe5bc89c9ea6a71d4019. * run buildall * MORE * FIX	2024-05-13 22:16:57 +08:00
lihangyu	0a79c547ff	[Refactor](Sink) Remove is_append mode in table sink (#34684 ) Remove the is_append mode from the sink component due to the following reasons: 1. The performance improvement from this mode is relatively minor, approximately 10%, as demonstrated in previous benchmarks. 2. The mode complicates maintenance. It requires a separate data writing path to avoid copying, which increases complexity and poses a risk of potential data loss. I've already test the compability with previous version	2024-05-11 11:20:10 +08:00
lihangyu	853dbdcb00	[Feature](PreparedStatement) implement general server side prepared (#33807 )	2024-05-10 22:10:11 +08:00
Kaijie Chen	e2fc231b7b	[refactor](move-memtable) simplify LoadStreamStub::open (#34488 )	2024-05-10 14:43:31 +08:00
Qi Chen	7cb00a8e54	[Feature](hive-writer) Implements s3 file committer. (#34307 ) Backport #33937.	2024-04-29 19:56:49 +08:00
Xin Liao	cd1c9edd71	[fix](pipeline-load) fix no error url when data quality error and total rows is negative (#34072 ) (#34204 ) Co-authored-by: HHoflittlefish777 <77738092+HHoflittlefish777@users.noreply.github.com>	2024-04-27 18:19:08 +08:00
xy720	080c07ad87	[bug](random distribution) fix data loss and incorrect in random distribution table #33962	2024-04-24 17:13:50 +08:00
Kaijie Chen	ffd9da44a2	[fix](move-memtable) fix commit may fail due to duplicated reports (#32403 )	2024-04-19 15:02:49 +08:00
Tiewei Fang	315f6e44c2	[Branch-2.1](Outfile) Fixed the problem that the concurrent Outfile wrote multiple Success files (#33870 ) backport: #33016	2024-04-19 12:09:53 +08:00

1 2 3 4 5 ...

564 Commits