7ca00c114b
[fix](load) add lock for runtime_state->tablet_commit_infos ( #48709 ) ( #48850 )
...
backport #48709
2025-03-11 12:03:39 +08:00
c9381b0285
[fix](load) Fix import failure when the stream load parameter specifies Transfer-Encoding:chunked ( #48196 ) ( #48503 )
...
pick from master #48196
2025-03-04 10:12:54 +08:00
82fe8bc3d7
branch-2.1:[fix](libhdfs) fix the lifecycle issue of libhdfs config ( #48352 )
...
pick part of #47299
when calling `hdfsBuilderSetKerb5Conf`, the `value` string's lifecycle
must be with `hdfs_builder`.
2025-02-26 14:10:49 +08:00
feb4b09cb3
[fix](hive)Spelling mistake of the word "failed" for 2.1 ( #48193 )
2025-02-22 23:18:21 +08:00
c099ccdbd0
branch-2.1: [improve](load) print error string in local fs error messages #47918 ( #48010 )
...
Cherry-picked from #47918
Co-authored-by: Kaijie Chen <chenkaijie@selectdb.com >
2025-02-19 09:25:41 +08:00
209ddb374e
branch-2.1: [chore](io) Add debug log for critical file operations #46770 ( #46859 )
...
cherry pick from #46770
2025-02-06 09:41:45 +08:00
bbfb8fd41c
[branch-2.1] Fix local data dir metric missing ( #46200 ) ( #46603 )
...
pick #46200
2025-01-09 00:03:13 +08:00
64195d79ee
[refactor](metrics) Remove IntAtomicCounter & CoreLocal #45742 ( #45870 )
...
cherry pick from #45742
2024-12-24 23:13:48 +08:00
02feb16530
branch-2.1: [bug](s3) fix S3 file system gets absolute path #44965 ( #45529 )
...
Cherry-picked from https://github.com/apache/doris/pull/44965
2024-12-18 22:29:24 +08:00
23c5d52b04
branch-2.1: [fix](s3) improve error msg #45360 ( #45432 )
...
Cherry-picked from #45360
Co-authored-by: Socrates <suyiteng@selectdb.com >
2024-12-16 14:59:08 +08:00
667f5e6e6a
[feat](iceberg)Supports using rest type catalog to read tables in unity catalog for 2.1 ( #43525 ) ( #45217 )
...
bp: #43525
2024-12-12 00:49:36 -08:00
5d3f0a267a
[opt](scan) unify the local and remote scan bytes stats for all scanners for 2.1 ( #45167 )
...
pick part of #40493
TODO: not working with s3 reader
2024-12-10 14:19:19 +08:00
702abbff0f
[Opt](orc)Optimize the merge io when orc reader read multiple tiny stripes. ( #42004 ) ( #44239 )
...
bp #42004
Co-authored-by: kaka11chen <kaka11.chen@gmail.com >
2024-11-22 11:01:41 +08:00
dc67086d97
[fix](scan) Avoid memory allocated by buffered_reader from being traced ( #41921 ) ( #44253 )
...
Use OwnedSlice to replace `char*` in BufferedReader
## Proposed changes
pick #41921
2024-11-20 10:37:06 +08:00
aa0053347f
[fix](crash) be crash on ~LRUFileCache ( #42498 )
...
come from: https://github.com/apache/doris/pull/39036
Crash stack:
```
(gdb) bt
#0 0x00007ff5c8c6c387 in raise () from /lib64/libc.so.6
#1 0x00007ff5c8c6da78 in abort () from /lib64/libc.so.6
#2 0x0000561eb0a5e38a in __gnu_cxx::__verbose_terminate_handler () at ../../../../libstdc++-v3/libsupc++/vterminate.cc:95
#3 0x0000561eb0a5caf6 in __cxxabiv1::__terminate (handler=<optimized out>) at ../../../../libstdc++-v3/libsupc++/eh_terminate.cc:48
#4 0x0000561eb0a5cb61 in std::terminate () at ../../../../libstdc++-v3/libsupc++/eh_terminate.cc:58
#5 0x0000561ea4028552 in doris::io::LRUFileCache::~LRUFileCache (this=0x7ff540fed000) at /root/be/src/io/cache/block/block_lru_file_cache.h:62
#6 0x0000561ea402857e in doris::io::LRUFileCache::~LRUFileCache (this=0x33688) at /root/be/src/io/cache/block/block_lru_file_cache.h:54
#7 0x0000561ea4251cd2 in std::default_delete<doris::io::IFileCache>::operator() (this=0x7ff54101d000, __ptr=0x33688)
at /var/local/ldb-toolchain/bin/../lib/gcc/x86_64-linux-gnu/11/../../../../include/c++/11/bits/unique_ptr.h:85
#8 std::unique_ptr<doris::io::IFileCache, std::default_delete<doris::io::IFileCache> >::~unique_ptr (this=0x7ff54101d000)
at /var/local/ldb-toolchain/bin/../lib/gcc/x86_64-linux-gnu/11/../../../../include/c++/11/bits/unique_ptr.h:361
#9 std::destroy_at<std::unique_ptr<doris::io::IFileCache, std::default_delete<doris::io::IFileCache> > > (__location=0x7ff54101d000)
at /var/local/ldb-toolchain/bin/../lib/gcc/x86_64-linux-gnu/11/../../../../include/c++/11/bits/stl_construct.h:88
#10 std::_Destroy<std::unique_ptr<doris::io::IFileCache, std::default_delete<doris::io::IFileCache> > > (__pointer=0x7ff54101d000)
at /var/local/ldb-toolchain/bin/../lib/gcc/x86_64-linux-gnu/11/../../../../include/c++/11/bits/stl_construct.h:138
#11 std::_Destroy_aux<false>::__destroy<std::unique_ptr<doris::io::IFileCache, std::default_delete<doris::io::IFileCache> >*> (__first=0x7ff54101d000,
__last=0x7ff54101d008) at /var/local/ldb-toolchain/bin/../lib/gcc/x86_64-linux-gnu/11/../../../../include/c++/11/bits/stl_construct.h:152
#12 std::_Destroy<std::unique_ptr<doris::io::IFileCache, std::default_delete<doris::io::IFileCache> >*> (__first=<optimized out>, __last=0x7ff54101d008)
at /var/local/ldb-toolchain/bin/../lib/gcc/x86_64-linux-gnu/11/../../../../include/c++/11/bits/stl_construct.h:184
#13 std::_Destroy<std::unique_ptr<doris::io::IFileCache, std::default_delete<doris::io::IFileCache> >*, std::unique_ptr<doris::io::IFileCache, std::default_delete<doris::io::IFileCache> > > (__first=<optimized out>, __last=0x7ff54101d008)
at /var/local/ldb-toolchain/bin/../lib/gcc/x86_64-linux-gnu/11/../../../../include/c++/11/bits/alloc_traits.h:746
#14 std::vector<std::unique_ptr<doris::io::IFileCache, std::default_delete<doris::io::IFileCache> >, std::allocator<std::unique_ptr<doris::io::IFileCache, std::default_delete<doris::io::IFileCache> > > >::~vector (this=0x7ff596a24ba8)
at /var/local/ldb-toolchain/bin/../lib/gcc/x86_64-linux-gnu/11/../../../../include/c++/11/bits/stl_vector.h:680
#15 doris::io::FileCacheFactory::~FileCacheFactory (this=0x7ff596a24b80) at /root/be/src/io/cache/block/block_file_cache_factory.h:42
#16 doris::ExecEnv::destroy (this=0x561eb130b800 <doris::ExecEnv::GetInstance()::s_exec_env>) at /root/be/src/runtime/exec_env_init.cpp:651
#17 0x0000561ea35c3fe3 in main (argc=<optimized out>, argv=<optimized out>) at /root/be/src/service/doris_main.cpp:628
```
2024-10-25 20:42:20 +08:00
a3c1657c4b
[cherry-pick](branch-2.1) check end of file when reading page ( #42159 )
...
## Proposed changes
pick pr: https://github.com/apache/doris/pull/41816
2024-10-21 17:01:04 +08:00
d32688e091
[Enhancement](multi-catalog) Set hdfs native client logger to glog and redirect jvm stdout/stderr logger to jni.log. ( #41633 )
...
Backport #39540 .
Co-authored-by: Mingyu Chen <morningman@163.com >
2024-10-10 17:47:21 +08:00
d7659ff34d
[fix](bytebuffer) fix allocate size improper in append_and_flush ( #40613 ) ( #41133 )
...
pick (#40613 )
fix allocate size improper in append_and_flush introduced by
https://github.com/apache/doris/pull/38960
2024-09-24 16:01:52 +08:00
b52b572ade
[branch-2.1](memory) When Load ends, check memory tracker value returns is equal to 0 ( #40850 )
...
pick
#38960
#39908
#40043
#40092
#40016
#40439
---------
Co-authored-by: hui lai <1353307710@qq.com >
Co-authored-by: yiguolei <676222867@qq.com >
2024-09-15 23:47:53 +08:00
1c91fbc167
[fix](multi table) do not use strlen to calculate the length of msg ( #40367 ) ( #40511 )
...
pick #40367
Meet code dump when using single stream multi table load:
```
SUMMARY: AddressSanitizer: heap-buffer-overflow /root/doris/be/src/io/fs/multi_table_pipe.cpp:99:22 in doris::io::MultiTablePipe::dispatch(std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char>> const&, char const*, unsigned long, doris::Status (doris::io::KafkaConsumerPipe::*)(char const*, unsigned long))
```
1. It is hard to guaranteed that msg is a C-style string ending in '\0'
character. If not, it may cause the core dump to access memory out of
bounds.
2. It is not need to calculate the length of msg twice.
Therefore, deleting the logic that using strlen to calculate the length
of msg.
2024-09-09 10:35:59 +08:00
87ac378c4a
[branch-2.1](be-ut) wait lazy open in ut ( #40453 )
...
## Proposed changes
LRUFileCache test need to wait lazy open done
2024-09-06 09:47:47 +08:00
131238ff71
[fix](file-cache) change metric_value column in file_cache_statistics table to string ( #40083 )
...
Make it more flexible
followup #39552
2024-08-29 16:39:22 +08:00
6915d76731
[opt](file-cache) add evict file number per round ( #39721 )
...
Previously, when getting block from file cache, it may try to evict
lots of blocks to reserve capacity for lru cache. This operation may
take long time
while hold the lock, causing other operation blocked.
This PR add a new BE config `file_cache_max_evict_num_per_round`,
default is 1000, so that it will not hold lock for a long time.
2024-08-28 08:49:12 +08:00
a5c8ed1cde
[branch2.1][fix](cache) Catch the directory_iterator's error_code ( #39922 )
...
## Proposed changes
Catch the directory_iterator's error_code to avoid exceptions causing
core dump
2024-08-27 08:00:52 +08:00
6ceb574aa0
[branch-2.1]Pick IO limit/workload group usage table ( #39839 )
2024-08-23 18:51:47 +08:00
0bfcee1251
[opt](file-cache) support system table file_cache_statistics ( #39552 )
...
1. Add new system table: `file_cache_statistics`
This table is used for viewing metrics related to file cache on BE side
```
mysql> select * from information_schema.file_cache_statistics limit 10;
+-------+---------------+----------------------------+--------------------------------+--------------------+
| BE_ID | BE_IP | CACHE_PATH | METRIC_NAME | METRIC_VALUE |
+-------+---------------+----------------------------+--------------------------------+--------------------+
| 10003 | 172.20.32.136 | /mnt/output/be/file_cache/ |
disposable_queue_curr_elements | 0 |
| 10003 | 172.20.32.136 | /mnt/output/be/file_cache/ |
disposable_queue_curr_size | 0 |
| 10003 | 172.20.32.136 | /mnt/output/be/file_cache/ |
disposable_queue_max_elements | 102400 |
| 10003 | 172.20.32.136 | /mnt/output/be/file_cache/ |
disposable_queue_max_size | 21474836480 |
| 10003 | 172.20.32.136 | /mnt/output/be/file_cache/ | hits_ratio |
0.8539634687001242 |
| 10003 | 172.20.32.136 | /mnt/output/be/file_cache/ | hits_ratio_1h | 0
|
| 10003 | 172.20.32.136 | /mnt/output/be/file_cache/ | hits_ratio_5m | 0
|
| 10003 | 172.20.32.136 | /mnt/output/be/file_cache/ |
index_queue_curr_elements | 0 |
| 10003 | 172.20.32.136 | /mnt/output/be/file_cache/ |
index_queue_curr_size | 0 |
| 10003 | 172.20.32.136 | /mnt/output/be/file_cache/ |
index_queue_max_elements | 102400 |
+-------+---------------+----------------------------+--------------------------------+--------------------+
```
It will show metrics of file caches on each BE.
2. Add new metrics `hits_ratio_1h` and `hits_ratio_5m` for file cache
This 2 metrics will show the hit ratio of file cache in recent 1 hour or
5 minutes.
So that we can know recent hit ratio instead of global historical hit
ratio.
2024-08-21 10:03:39 +08:00
85f97a745a
[fix](s3) Fix fmt in s3 file wirter S3FileWriter::_dump_completed_part OOM ( #39562 )
2024-08-19 22:02:06 +08:00
830f250a80
[opt](query cancel) cancel query if it has pipeline task leakage #39223 ( #39537 )
...
pick #39223 with some modifications. Optimization will only be applied
to pipeline x.
2024-08-19 14:33:59 +08:00
0680c8d314
[improve](cache) File cache async init ( #39036 )
...
## Proposed changes
Do `load_cache_info_into_memory()` asynchronously in a background thread
in `LRUFileCache::initialize()`.
When the cache is not ready, `LRUFileCache::get_or_set()` will return
the FileBlock which state is SKIP_CACHE.
2024-08-15 16:27:51 +08:00
6035edad0b
[fix](multi table) fix single stream multi table memory leak ( #38255 ) ( #38824 )
...
pick (#38255 )
We meet OOM when using single stream multi table

It exist memory leak, and heap profile like:

The stream load context will not release in some exception conditions as
plan failed for high concurrency causing timeout when obtaining read
lock. It is introduced by https://github.com/apache/doris/pull/35458
The solution effect is shown in the following figure, which can run
stably with a small amount of memory

2024-08-04 22:12:44 +08:00
fed632bf4a
[fix](move-memtable) check segment num when closing each tablet ( #36753 ) ( #37536 )
...
cherry-pick #36753 and #37660
2024-07-11 20:33:44 +08:00
61bc624938
[branch-2.1](move-memtable) fix move memtable core when use multi table load ( #37370 )
...
## Proposed changes
pick https://github.com/apache/doris/pull/35458
2024-07-07 18:25:00 +08:00
6abec887f0
[fix](compile) fix compile issue introduced from #35397
2024-05-30 12:17:59 +08:00
300582f2e5
[branch-2.1](routine-load) fix be core when partial table load failed ( #35622 )
2024-05-30 09:35:36 +08:00
5c40e87667
[opt](s3) auto retry when meeting 429 error ( #35397 )
...
- Add 2 new BE config
- `s3_read_base_wait_time_ms` and `s3_read_max_wait_time_ms`
When meet s3 429 error, the "get" request will
sleep `s3_read_base_wait_time_ms (*1, *2, *3, *4)` ms get try again.
The max sleep time is s3_read_max_wait_time_ms
and the max retry time is max_s3_client_retry
- Add more metrics for s3 file reader
- `s3_file_reader_too_many_request`: counter of 429 error.
- `s3_file_reader_s3_get_request`: the QPS of s3 get request.
- `TotalGetRequest`: Get request counter in profile
- `TooManyRequestErr`: 429 error counter in profile
- `TooManyRequestSleepTime`: Sum of sleep time after 429 error in profile
- `TotalBytesRead`: Total bytes read from s3 in profile
2024-05-28 23:00:31 +08:00
eb49cd839b
[refactor](datalake) return the error status instead of static_cast<void> ( #34873 )
...
Followup #34797
`static_cast<void>` has ignored the wrong status, some of them should make the query finished with error status, so replace `static_cast<void>` with `RETURN_IF_ERROR`.
The following three scenarios need to be handled separately and cannot be simply replaced:
1. The outer function returns void;
2. Call status function inner constructors or destructors;
3. Call status function with best effort, and should ignore the wrong status.
2024-05-23 19:06:21 +08:00
adc364a6fd
[feature](Paimon) support deletion vector for Paimon naive reader ( #34743 ) ( #35241 )
...
bp #34743
Co-authored-by: 苏小刚 <suxiaogang223@icloud.com >
2024-05-23 00:01:30 +08:00
d63c3ae2d4
[bugfix](hive)fix testcase for viewfs for 2.1 #35178
2024-05-22 18:13:09 +08:00
4dd5379951
[bugfix](hive)fix error for writing to hive for 2.1 ( #34518 )
...
mirror #34520
2024-05-14 23:27:29 +08:00
a8be47f3ff
[fix](fs) Close local file writer when downloading via broker fs ( #34714 )
2024-05-12 09:45:24 +08:00
7a40f2a547
[branch-2.1](resource)fix check available fail when s3 aws_token is set and reset as, sk faild on be. ( #34219 )
2024-05-09 19:06:14 +08:00
7cb00a8e54
[Feature](hive-writer) Implements s3 file committer. ( #34307 )
...
Backport #33937 .
2024-04-29 19:56:49 +08:00
417431fd83
[Enhancement](hdfs-file-system) Change fs_handler ptr to shared_ptr and remove ref count operations. ( #34049 )
...
Backport #33959 .
2024-04-28 19:45:30 +08:00
e38d844d40
[fix](multi-table-load) fix single stream multi table load cannot finish ( #33816 )
2024-04-19 15:03:06 +08:00
cea02c4fb6
[fix](fs) Close local file writer when downloading finished ( #33556 )
2024-04-17 23:42:00 +08:00
f8d1fa2be3
[chore](multi-table-load) add context info in log when using single-stream-multi-table load ( #33317 )
2024-04-10 16:03:05 +08:00
cf7595d423
[opt](memory) Optimize mem tracker accuracy ( #32039 ) ( #33140 )
2024-04-10 11:42:19 +08:00
97850cf2bb
[fix](cooldown) Fix hdfs path ( #33315 )
2024-04-09 12:55:53 +08:00
69bf3b9da4
[fix](hdfs-writer) Catch error information after hdfsCloseFile() ( #33195 )
2024-04-07 23:24:17 +08:00
02430e6e53
[enhance](S3) Print the oss request id for each error s3 request ( #32499 )
2024-03-21 14:07:50 +08:00