Commit Graph

18 Commits

Author SHA1 Message Date
1c91fbc167 [fix](multi table) do not use strlen to calculate the length of msg (#40367) (#40511)
pick #40367

Meet code dump when using single stream multi table load:
```
SUMMARY: AddressSanitizer: heap-buffer-overflow /root/doris/be/src/io/fs/multi_table_pipe.cpp:99:22 in doris::io::MultiTablePipe::dispatch(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char>> const&, char const*, unsigned long, doris::Status (doris::io::KafkaConsumerPipe::*)(char const*, unsigned long))
```

1. It is hard to guaranteed that msg is a C-style string ending in '\0'
character. If not, it may cause the core dump to access memory out of
bounds.
2. It is not need to calculate the length of msg twice.

Therefore, deleting the logic that using strlen to calculate the length
of msg.
2024-09-09 10:35:59 +08:00
830f250a80 [opt](query cancel) cancel query if it has pipeline task leakage #39223 (#39537)
pick #39223 with some modifications. Optimization will only be applied
to pipeline x.
2024-08-19 14:33:59 +08:00
6035edad0b [fix](multi table) fix single stream multi table memory leak (#38255) (#38824)
pick (#38255)

We meet OOM when using single stream multi table


![image](https://github.com/user-attachments/assets/748e9914-d591-4f41-8b28-412d3cecc841)

It exist memory leak, and heap profile like:


![image](https://github.com/user-attachments/assets/af30c593-88ea-44f6-bba1-82436b13f99f)

The stream load context will not release in some exception conditions as
plan failed for high concurrency causing timeout when obtaining read
lock. It is introduced by https://github.com/apache/doris/pull/35458

The solution effect is shown in the following figure, which can run
stably with a small amount of memory


![image](https://github.com/user-attachments/assets/4483e0a5-6c0c-4cdc-b8ed-3408da6a86b2)
2024-08-04 22:12:44 +08:00
61bc624938 [branch-2.1](move-memtable) fix move memtable core when use multi table load (#37370)
## Proposed changes

pick https://github.com/apache/doris/pull/35458
2024-07-07 18:25:00 +08:00
300582f2e5 [branch-2.1](routine-load) fix be core when partial table load failed (#35622) 2024-05-30 09:35:36 +08:00
e38d844d40 [fix](multi-table-load) fix single stream multi table load cannot finish (#33816) 2024-04-19 15:03:06 +08:00
f8d1fa2be3 [chore](multi-table-load) add context info in log when using single-stream-multi-table load (#33317) 2024-04-10 16:03:05 +08:00
6ef9ed08aa [fix](multi-table-load) fix multi table load can not finish (#29957) 2024-01-18 10:03:35 +08:00
2fa511f80e [improve](multi-table-load) avoid plan and execute too many plan at once (#29951) 2024-01-16 21:14:35 +08:00
db17f5fe79 [improve](move-memtbale) enable move memtable in routine load (#28974) 2024-01-06 18:22:01 +08:00
Pxl
696ecc8c83 [Chore](log) adjust error code on too many filtered rows (#26168) 2023-11-01 00:15:56 +08:00
e20cab64f4 [improvement](scan) avoid too many scanners for file scan node (#25727)
In previous, when using file scan node(eq, querying hive table), the max number of scanner for each scan node
will be the `doris_scanner_thread_pool_thread_num`(default is 48).
And if the query parallelism is N, the total number of scanner would be 48 * N, which is too many.

In this PR, I change the logic, the max number of scanner for each scan node
will be the `doris_scanner_thread_pool_thread_num / query parallelism`. So that the total number of scanners
will be up to `doris_scanner_thread_pool_thread_num`.

Reduce the number of scanner can significantly reduce the memory usage of query.
2023-10-29 17:41:31 +08:00
Pxl
2e2d5bcba2 [Improvements](status) catch some error status (#25677)
catch some error status
2023-10-23 10:19:08 +08:00
642e5cdb69 [Fix](Status) Make Status [[nodiscard]] and handle returned Status correctly (#23395) 2023-09-29 22:38:52 +08:00
b013f8006d [enhancement](multi-table) enable mullti table routine load on pipeline engine (#21729) 2023-07-14 12:16:32 +08:00
Pxl
ca71048f7f [Chore](status) avoid empty error msg on status (#21454)
avoid empty error msg on status
2023-07-11 13:48:16 +08:00
4bf15b9788 [fix](load) fix race condition problem when insert commitinfo (#20823)
Signed-off-by: freemandealer <freeman.zhang1992@gmail.com>
2023-06-15 09:53:32 +08:00
09344eaab5 [feature](load) introduce single-stream-multi-table load (#20006)
For routine load (kafka load), user can produce all data for different
table into single topic and doris will dispatch them into corresponding
table.

Signed-off-by: freemandealer <freeman.zhang1992@gmail.com>
2023-06-07 17:55:25 +08:00