Commit Graph

564 Commits

Author SHA1 Message Date
0571342538 [fix](sink) The issue with 2GB limit of protocol buffer (#37990) (#39112)
```
Fail to serialize doris.PFetchDataResult
```

If the size of `PFetchDataResult` is greater than 2G, protocol buffer
cannot serialize the message.

pick #37990
2024-08-09 04:01:56 +08:00
0a3874f203 [fix](move-memtable) close stream when cancel load stream stub (#38912) (#39039)
backport #38912
2024-08-07 23:24:00 +08:00
3abb222064 [fix](group commit) Fix test_group_commit_async_wal_msg_fault_injection case (#35313) (#38911)
pick https://github.com/apache/doris/pull/35313
2024-08-06 17:57:22 +08:00
70a518e099 [Fix](multi-catalog) Fix not throw error when call close() in hive/iceberg writer. (#38902)
## Proposed changes
[Fix] (multi-catalog) Fix not throw error when call close() in
hive/iceberg writer.

When the file writer closes(), it will sync buffer to commit. Therefore,
sometimes data is written only when close() is called, which can expose
some errors. For example, hdfs_file_writer. Therefore, this error needs
to be captured in the entire close process.
2024-08-06 08:51:12 +08:00
91f0301b43 [fix](group commit) Pick some group commit pr (#38320)
Pick https://github.com/apache/doris/pull/38292,
https://github.com/apache/doris/pull/34021,
https://github.com/apache/doris/pull/38228, some modify of
https://github.com/apache/doris/pull/37260, some modify of
https://github.com/apache/doris/pull/37595
2024-07-25 17:32:44 +08:00
ffc0d6884d [Fix](load) Fix the channel leak when close wait has been cancelled #38031 (#38125)
cherry pick from #38031
2024-07-19 22:58:54 +08:00
Pxl
d7e84b7ee3 [Enchancement](bitmap) optimize bitmap deserialize and remove some unused code (#37623)
## Proposed changes
pick from #35789
2024-07-16 11:21:54 +08:00
47096f2083 [test](regression) add cases for data quality error url (#34987) (#37777)
cherry-pick #34987
2024-07-16 11:12:52 +08:00
8930df3b31 [Feature](iceberg-writer) Implements iceberg partition transform. (#37692)
## Proposed changes

Cherry-pick iceberg partition transform functionality. #36289 #36889

---------

Co-authored-by: kang <35803862+ghkang98@users.noreply.github.com>
Co-authored-by: lik40 <lik40@chinatelecom.cn>
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: Mingyu Chen <morningman@163.com>
2024-07-13 16:07:50 +08:00
a61030215e [branch-2.1](memory) Support make all memory snapshots (#37705)
pick #36679
2024-07-12 16:21:37 +08:00
ef031c5fb2 [branch-2.1](memory) Fix reserve memory compatible with memory GC and logging (#37682)
pick
#36307
#36412
2024-07-12 11:43:26 +08:00
fed632bf4a [fix](move-memtable) check segment num when closing each tablet (#36753) (#37536)
cherry-pick #36753 and #37660
2024-07-11 20:33:44 +08:00
741807bb22 [performance](move-memtable) only call _select_streams when necessary (#35576) (#37406)
cherry-pick #35576
2024-07-10 22:20:23 +08:00
7cda8db020 [fix](load) The NodeChannel should be canceled when failed to add block #37500 (#37527)
cherry pick from #37500
2024-07-09 17:01:04 +08:00
1a25270918 [fix](group commit) Pick Fix the incorrect group commit count in log; fix the core in get_first_block (#36408) (#37405)
Pick https://github.com/apache/doris/pull/36408/
2024-07-09 09:24:43 +08:00
c66df8d9e6 [branch-2.1](load) fix no error url if no partition can be found (#36831) (#37401)
## Proposed changes

pick #36831

before
```
Stream load result: {
    "TxnId": 2014,
    "Label": "83ba46bd-280c-4e22-b581-4eb126fd49cf",
    "Comment": "",
    "TwoPhaseCommit": "false",
    "Status": "Fail",
    "Message": "[DATA_QUALITY_ERROR]Encountered unqualified data, stop processing",
    "NumberTotalRows": 1,
    "NumberLoadedRows": 1,
    "NumberFilteredRows": 0,
    "NumberUnselectedRows": 0,
    "LoadBytes": 1669,
    "LoadTimeMs": 58,
    "BeginTxnTimeMs": 0,
    "StreamLoadPutTimeMs": 10,
    "ReadDataTimeMs": 0,
    "WriteDataTimeMs": 47,
    "CommitAndPublishTimeMs": 0
}
```

after
```
Stream load result: {
    "TxnId": 2014,
    "Label": "83ba46bd-280c-4e22-b581-4eb126fd49cf",
    "Comment": "",
    "TwoPhaseCommit": "false",
    "Status": "Fail",
    "Message": "[DATA_QUALITY_ERROR]too many filtered rows",
    "NumberTotalRows": 1,
    "NumberLoadedRows": 0,
    "NumberFilteredRows": 1,
    "NumberUnselectedRows": 0,
    "LoadBytes": 1669,
    "LoadTimeMs": 58,
    "BeginTxnTimeMs": 0,
    "StreamLoadPutTimeMs": 10,
    "ReadDataTimeMs": 0,
    "WriteDataTimeMs": 47,
    "CommitAndPublishTimeMs": 0,
    "ErrorURL": "http://XXXX:8040/api/_load_error_log?file=__shard_4/error_log_insert_stmt_c6461270125a615b-2873833fb48d56a3_c6461270125a615b_2873833fb48d56a3"
}
```

## Proposed changes

Issue Number: close #xxx

<!--Describe your changes.-->
2024-07-08 10:41:33 +08:00
38b3870fe8 [branch-2.1] Picks "[fix](autoinc) Fix AutoIncrementGenerator and add more logs about auto-increment column #37306" (#37366)
## Proposed changes

picks https://github.com/apache/doris/pull/37306
2024-07-06 16:53:29 +08:00
ceef9ee123 [feature](serde) support presto compatible output format (#37039) (#37253)
bp #37039
2024-07-04 13:56:05 +08:00
0aeb768bf9 [Fix](export/outfile) Support compression when exporting data to Parquet / ORC. (#37167)
bp: #36490
2024-07-03 10:53:57 +08:00
f5572ac732 [pick]reset memtable flush thread num (#37092)
## Proposed changes

pick #37028
2024-07-02 19:20:17 +08:00
4210a6a8d6 [branch-2.1] PIck "[Fix](autoinc) Hanlde the processing of auto_increment column on exchange node rather than on TabletWriter when using TABLET_SINK_SHUFFLE_PARTITIONED #36836" (#37029)
## Proposed changes

pick https://github.com/apache/doris/pull/36836
2024-07-01 09:56:30 +08:00
12dddfc26c [branch-2.1] Pick "[Fix](autoinc) try fix concurrent load problem with auto inc column #36421" (#37027)
## Proposed changes

pick https://github.com/apache/doris/pull/36421
2024-06-30 13:10:03 +08:00
3652fc31c3 [Pick 2.1] "Fix data loss when node channel been cancelled before close wait (#36662)" (#36744)
## Proposed changes

Pick from https://github.com/apache/doris/pull/36662
2024-06-25 11:36:31 +08:00
bd47d5a681 [branch-2.1](auto-partition) Fix auto partition load failure in multi replica (#36586)
this pr
1. picked #35630, which was reverted #36098 before.
2. picked #36344 from master

these two pr fixed existing bug about auto partition load.

---------

Co-authored-by: Kaijie Chen <ckj@apache.org>
2024-06-20 17:51:18 +08:00
3b23eee37c Revert "[fix](auto-partition) fix auto partition load lost data in multi sender (#35287)" (#36098)
Reverts apache/doris#35630 because it brought some more damaging bugs.
we will fix it and merge in next version
2024-06-11 17:11:42 +08:00
f03cee5e30 [enhancement](oom) add exception in olap data convertor when memory is not enough to prevent oom (#35761)
Issue Number: close #xxx

<!--Describe your changes.-->

---------

Co-authored-by: yiguolei <yiguolei@gmail.com>
2024-06-02 21:12:53 +08:00
cb96a79d07 [bugfix](iceberg)fix datetime conversion error and data path error (#35708)
## Proposed changes
Issue #31442

<!--Describe your changes.-->

1. The unit of the seventh parameter of `ZonedDateTime.of` is
nanosecond, so we should multiply the microsecond by 1000.
2. When writing to a non-partitioned iceberg table, the data path has an
extra slash
2024-06-01 00:42:48 +08:00
c2fc485327 [fix](auto-partition) fix auto partition load lost data in multi sender (#35287) (#35630)
## Proposed changes

Change `use_cnt` mechanism for incremental (auto partition) channels and
streams, it's now dynamically counted.
Use `close_wait()` of regular partitions as a synchronize point to make
sure all sinks are in close phase before closing any incremental (auto
partition) channels and streams.
Add dummy (fake) partition and tablet if there is no regular partition
in the auto partition table.

Backport #35287

Co-authored-by: zhaochangle <zhaochangle@selectdb.com>
2024-05-31 10:27:03 +08:00
b91d2caab8 [Feature](iceberg-writer) Implements iceberg sink basic functionality for inserting into table. (#35587)
backport #34929
2024-05-29 16:40:54 +08:00
3736d0af13 [Fix](hive-writer) Fix s3 file commiter not working. (#35502) (#35579)
bp #35502

Co-authored-by: Qi Chen <kaka11.chen@gmail.com>
2024-05-29 12:14:42 +08:00
5c40e87667 [opt](s3) auto retry when meeting 429 error (#35397)
- Add 2 new BE config

	- `s3_read_base_wait_time_ms` and `s3_read_max_wait_time_ms`

		When meet s3 429 error, the "get" request will
		sleep `s3_read_base_wait_time_ms (*1, *2, *3, *4)` ms get try again.
		The max sleep time is s3_read_max_wait_time_ms
		and the max retry time is max_s3_client_retry
		
- Add more metrics for s3 file reader

	- `s3_file_reader_too_many_request`: counter of 429 error.
	- `s3_file_reader_s3_get_request`: the QPS of s3 get request.

	- `TotalGetRequest`: Get request counter in profile
	- `TooManyRequestErr`: 429 error counter in profile
	- `TooManyRequestSleepTime`: Sum of sleep time after 429 error in profile
	- `TotalBytesRead`: Total bytes read from s3 in profile
2024-05-28 23:00:31 +08:00
84e9a14063 [Fix](hive-writer) Fix partition column orders issue when the partition fields inserted into the target table are inconsistent with the field order of the query source table and the schema field order of the query source table. (#35543)
## Proposed changes

backport #35347

## Further comments

If this is a relatively large or complex change, kick off the discussion
at [dev@doris.apache.org](mailto:dev@doris.apache.org) by explaining why
you chose the solution you did and what alternatives you considered,
etc...
2024-05-28 18:11:55 +08:00
09f9012817 [Fix](hive-writer) Fix hive partition update core. (#35311)
Issue: #31442
```
/home/zcp/repo_center/doris_master/doris/be/src/common/signal_handler.h:421
1# PosixSignals::chained_handler(int, siginfo*, void*) [clone .part.0] in /usr/lib/jvm/java-17-openjdk-amd64/lib/server/libjvm.so
2# JVM_handle_linux_signal in /usr/lib/jvm/java-17-openjdk-amd64/lib/server/libjvm.so
3# 0x00007F963FA9D090 in /lib/x86_64-linux-gnu/libc.so.6
4# doris::vectorized::VHivePartitionWriter::_build_partition_update() at /home/zcp/repo_center/doris_master/doris/be/src/vec/sink/writer/vhive_partition_writer.cpp:215
5# doris::vectorized::VHivePartitionWriter::close(doris::Status const&) at /home/zcp/repo_center/doris_master/doris/be/src/vec/sink/writer/vhive_partition_writer.cpp:164
6# doris::vectorized::VHiveTableWriter::close(doris::Status) at /home/zcp/repo_center/doris_master/doris/be/src/vec/sink/writer/vhive_table_writer.cpp:209
7# doris::vectorized::AsyncResultWriter::process_block(doris::RuntimeState*, doris::RuntimeProfile*) at /home/zcp/repo_center/doris_master/doris/be/src/vec/sink/writer/async_result_writer.cpp:184
8# doris::vectorized::AsyncResultWriter::start_writer(doris::RuntimeState*, doris::RuntimeProfile*)::$_0::operator()() const at
```
2024-05-27 15:24:53 +08:00
ade1841a01 [fix](shuffle) Do not return error if local recvr is null (#35399) 2024-05-26 20:20:50 +08:00
11971eddb4 [atomicstatus](be) add atomic status to share state between multi thread (#35002) 2024-05-22 01:11:07 +08:00
f38ecd349c [enhancement](memory) return error if allocate memory failed during add rows method (#35085)
* return error when add rows failed

* f

---------

Co-authored-by: yiguolei <yiguolei@gmail.com>
2024-05-22 00:53:34 +08:00
42425808a1 [Cherry-Pick](branch-2.1) Pick "Fix multiple replica partial update auto inc data inconsistency problem #34788" (#35056)
* [Fix](auto inc) Fix multiple replica partial update auto inc data inconsistency problem (#34788)

* **Problem:** For tables with auto-increment columns, updating partial columns can cause data inconsistency among replicas.

**Cause:** Previously, the implementation for updating partial columns in tables with auto-increment columns was done independently on each BE (Backend), leading to potential inconsistencies in the auto-increment column values generated by each BE.

**Solution:** Before distributing blocks, determine if the update involves partial columns of a table with an auto-increment column. If so, add the auto-increment column to the last column of the block. After distributing to each BE, each BE will check if the data key for the partial column update exists. If it exists, the previous auto-increment column value is used; if not, the auto-increment column value from the last column of the block is used. This ensures that the auto-increment column values are consistent across different BEs.

* 2

* [Fix](regression-test) Fix auto inc partial update unstable regression test (#34940)
2024-05-20 15:43:46 +08:00
dff6171546 [fix](auto inc) db_id and table_id should be int64_t instead of int32_t (#34912) 2024-05-18 18:29:59 +08:00
4b96f9834f [fix](move-memtable) change brpc connection type to single (#34883) 2024-05-18 18:29:20 +08:00
849eeb39e9 [fix](load) skip sending cancel rpc if VNodeChannel is not inited (#34897) 2024-05-18 18:29:10 +08:00
e13ce905cf [Fix](hive-writer) Fix hive partition update file size and remove redundant column names. (#34651) (#34885)
Backport #34651.
2024-05-15 11:23:32 +08:00
0ae1b9c70a [chore](remove code) Remove dragonbox related (#34528)
* Revert "[refactor](mysql result format) use new serde framework to tuple convert (#25006)"

This reverts commit e5ef0aa6d439c3f9b1f1fe5bc89c9ea6a71d4019.

* run buildall

* MORE

* FIX
2024-05-13 22:16:57 +08:00
0a79c547ff [Refactor](Sink) Remove is_append mode in table sink (#34684)
Remove the is_append mode from the sink component due to the following reasons:
1. The performance improvement from this mode is relatively minor, approximately 10%, as demonstrated in previous benchmarks.
2. The mode complicates maintenance. It requires a separate data writing path to avoid copying, which increases complexity and poses a risk of potential data loss.

I've already test the compability with previous version
2024-05-11 11:20:10 +08:00
853dbdcb00 [Feature](PreparedStatement) implement general server side prepared (#33807) 2024-05-10 22:10:11 +08:00
e2fc231b7b [refactor](move-memtable) simplify LoadStreamStub::open (#34488) 2024-05-10 14:43:31 +08:00
7cb00a8e54 [Feature](hive-writer) Implements s3 file committer. (#34307)
Backport #33937.
2024-04-29 19:56:49 +08:00
cd1c9edd71 [fix](pipeline-load) fix no error url when data quality error and total rows is negative (#34072) (#34204)
Co-authored-by: HHoflittlefish777 <77738092+HHoflittlefish777@users.noreply.github.com>
2024-04-27 18:19:08 +08:00
080c07ad87 [bug](random distribution) fix data loss and incorrect in random distribution table #33962 2024-04-24 17:13:50 +08:00
ffd9da44a2 [fix](move-memtable) fix commit may fail due to duplicated reports (#32403) 2024-04-19 15:02:49 +08:00
315f6e44c2 [Branch-2.1](Outfile) Fixed the problem that the concurrent Outfile wrote multiple Success files (#33870)
backport: #33016
2024-04-19 12:09:53 +08:00