Commit Graph

62 Commits

Author SHA1 Message Date
bd47d5a681 [branch-2.1](auto-partition) Fix auto partition load failure in multi replica (#36586)
this pr
1. picked #35630, which was reverted #36098 before.
2. picked #36344 from master

these two pr fixed existing bug about auto partition load.

---------

Co-authored-by: Kaijie Chen <ckj@apache.org>
2024-06-20 17:51:18 +08:00
3b23eee37c Revert "[fix](auto-partition) fix auto partition load lost data in multi sender (#35287)" (#36098)
Reverts apache/doris#35630 because it brought some more damaging bugs.
we will fix it and merge in next version
2024-06-11 17:11:42 +08:00
c2fc485327 [fix](auto-partition) fix auto partition load lost data in multi sender (#35287) (#35630)
## Proposed changes

Change `use_cnt` mechanism for incremental (auto partition) channels and
streams, it's now dynamically counted.
Use `close_wait()` of regular partitions as a synchronize point to make
sure all sinks are in close phase before closing any incremental (auto
partition) channels and streams.
Add dummy (fake) partition and tablet if there is no regular partition
in the auto partition table.

Backport #35287

Co-authored-by: zhaochangle <zhaochangle@selectdb.com>
2024-05-31 10:27:03 +08:00
cf7595d423 [opt](memory) Optimize mem tracker accuracy (#32039) (#33140) 2024-04-10 11:42:19 +08:00
e2e9b9d8a3 [improve](insert-into) record rows info in log for check (#29581) 2024-01-05 17:28:07 +08:00
6da36e1077 [feature](merge-cloud) Refactor write path code by abstract base class (#26537)
Refactor write path code by abstract base class. Whether to use `StorageEngine` or `CloudStorageEngine` will be determined during compilation instead of runtime `config::cloud_mode` to avoid unexpected null pointer or undefined behavior issues caused by merging code.

Class that depend on `StorageEngine` but are shared by the cloud mode need to have an abstract base class. Common code should be extracted into the base class, while the code that depends on `StorageEngine` should be implemented in a `StorageEngine` mix-in class of the base class.
2023-12-08 14:50:36 +08:00
642e5cdb69 [Fix](Status) Make Status [[nodiscard]] and handle returned Status correctly (#23395) 2023-09-29 22:38:52 +08:00
d3f1388717 [Feature](partitions) Support auto-partition (#24153)
Co-authored-by: zhangstar333 <2561612514@qq.com>
2023-09-12 15:23:15 +08:00
fdb7a44f57 Revert "[Feature](partitions) Support auto partition" (#24024)
* Revert "[Feature](partitions) Support auto partition (#23236)"

This reverts commit 6c544dd2011d731b8c9c51384c77bcf19c017981.

* Update config.h
2023-09-07 17:08:26 +08:00
6c544dd201 [Feature](partitions) Support auto partition (#23236)
Co-authored-by: zhangstar333 <2561612514@qq.com>
2023-09-06 16:26:45 +08:00
eea9a43d97 [fix](load) add lock for serialize profile (#23856) 2023-09-05 09:11:55 +08:00
24c1953e91 [fix](debug) add bvar counter for memtable & loadchannel (#22578)
* [fix](debug) add bvar counter for memtable & loadchannel

Signed-off-by: freemandealer <freeman.zhang1992@gmail.com>

* format code

Signed-off-by: freemandealer <freeman.zhang1992@gmail.com>

---------

Signed-off-by: freemandealer <freeman.zhang1992@gmail.com>
2023-08-04 13:58:28 +08:00
ee754307bb [refactor](load) refactor memtable flush actively (#21634) 2023-07-30 21:31:54 +08:00
c6063ed92f [Revert](lazy open) revert lazy open and add case (#21821) 2023-07-18 19:41:33 +08:00
6d579d924d [fix](profile) delete useless profile add_child #20989 2023-06-20 23:21:52 +08:00
ce1b39e79d [fix](profile) avoid unnecessary refresh profile of TabletsChannel
Before, refresh the TabletsChannel profile in the LoadChannelMgr refresh memory statistics thread

This means that enable_profile=false will refresh and have performance loss in stress test
2023-06-20 21:09:43 +08:00
6ee99c4138 [fix](load_profile) fix rows stat and add close_wait in sink (#20181) 2023-05-31 18:23:30 +08:00
e0d9f7f955 [enhancement](load) add some profile items for load (#20141) 2023-05-29 09:54:03 +08:00
fe111207a9 [Fix](lazy_open) Fix lazy open null point (#19829) 2023-05-23 09:17:46 +08:00
f8ef25bb10 [enhancement](load) lazy-open necessary partitions when load (#18874) 2023-05-14 16:09:55 +08:00
Pxl
60cda12e57 [Bug](pipeline-engine) fix hang on insert into select when enable pipeline engine (#19075) 2023-04-26 16:50:19 +08:00
3899c08036 [optimize](compile) remove unused template param from load channel (#18980)
* [optimize](compile) remove unused template param from load channel



---------

Co-authored-by: yiguolei <yiguolei@gmail.com>
2023-04-24 23:36:47 +08:00
8e4710079d [improvement](profile) Insert into add LoadChannel runtime profile (#18908)
TabletSink and LoadChannel in BE are M: N relationship,
Every once in a while LoadChannel will randomly return its own runtime profile to a TabletSink, so usually all LoadChannel runtime profiles are saved on each TabletSink, and the timeliness of the same LoadChannel profile saved on different TabletSinks is different, and each TabletSink will periodically send fe reports all the LoadChannel profiles saved by itself, and ensures to update the latest LoadChannel profile according to the timestamp.
2023-04-24 09:41:57 +08:00
e412dd12e8 [chore](build) Use include-what-you-use to optimize includes (PART II) (#18761)
Currently, there are some useless includes in the codebase. We can use a tool named include-what-you-use to optimize these includes. By using a strict include-what-you-use policy, we can get lots of benefits from it.
2023-04-19 23:11:48 +08:00
4380f1ec54 [Enhancement](load) reduce memory by memory size of global delta writer (#14491) 2023-01-03 20:05:21 +08:00
b23d068281 [refactor](remove-non-vec) Remove non vec load from memtable and delta writer (#15517)
Co-authored-by: yiguolei <yiguolei@gmail.com>
2022-12-30 21:22:58 +08:00
33b50860c7 [improvement](load) release load channel actively when error occurs (#14218) 2022-11-13 12:31:15 +08:00
32a029d9dc [enhancement](memtracker) Refactor load channel + memtable mem tracker (#13795) 2022-11-03 09:47:12 +08:00
c55d08fa2f [fix](memtracker) Refactor load channel mem tracker to improve accuracy (#12791)
The mem hook record tracker cannot guarantee that the final consumption is 0, nor can it guarantee that the memory alloc and free are recorded in a one-to-one correspondence.

In the life cycle of a memtable from insert to flush, the memory free of hook is more than that of alloc, resulting in tracker consumption less than 0.

In order to avoid the cumulative error of the upper load channel tracker, the memtable tracker consumption is reset to zero on destructor.
2022-09-21 20:16:19 +08:00
942b31038f [fix](memory) Fix BE OOM when load -238 fail (#12666)
When the flush is triggered when the load channel exceeds the mem limit, if the flush fails, an error message is returned and the load is terminated.

Usually flush failure is -238 error code. Because the memtable is frequently flushed after the load channel exceeds the mem limit, the number of segments exceeds the max value.
2022-09-17 00:17:53 +08:00
1304a17600 [fix](memtracker) Improve performance of tracking real physical memory of PodArray #12021 2022-08-24 14:24:14 +08:00
b300b4faa0 [enhancement](memtracker) Optimize readability of mem exceed limit error message #11877 2022-08-18 14:39:41 +08:00
73d8f5901d fix mem tracker limiter (#11376) 2022-08-01 09:44:04 +08:00
b6bdb3bdbc [fix] (mem tracker) Fix MemTracker accuracy (#11190) 2022-07-27 18:59:24 +08:00
ad31b6c902 [bugfix and improvement]fix mem tracker for load and simplify some macros (#11125) 2022-07-22 21:59:36 +08:00
4960043f5e [enhancement] Refactor to improve the usability of MemTracker (step2) (#10823) 2022-07-21 17:11:28 +08:00
eb25df5a2c [fix] (mem tracker) Fix inaccurate mem tracker leads to load OOM (#10409)
* fix load tracker

* fix comment
2022-06-25 14:13:02 +08:00
85362a907e [fix](mem tracker) Fix some memory leaks, inaccurate statistics, core dump, deadlock bugs (#10072)
1. Fix the memory leak. When the load task is canceled, the `IndexChannel` and `NodeChannel` mem trackers cannot be destructed in time.
2. Fix Load task being frequently canceled by oom and inaccurate `LoadChannel` mem tracker limit, and rewrite the variable name of `mem limit` in `LoadChannel`.
3. Fix core dump, when logout task mem tracker, phmap erase fails, resulting in repeated logout of the same tracker.
4. Fix the deadlock, when add_child_tracker mem limit exceeds, calling log_usage causes `_child_trackers_lock` deadlock.
5. Fix frequent log printing when thread mem tracker limit exceeds, which will affect readability and performance.
6. Optimize some details of mem tracker display.
2022-06-14 21:38:37 +08:00
d588e99b8b [fix][mem tracker] Fix logout load task mem tracker dcheck fail (#9943)
* fix tracker 0602

* fix format
2022-06-07 11:31:49 +08:00
ca05d1ee01 [fix](memory tracker) Fix lru cache, compaction tracker, add USE_MEM_TRACKER compile (#9661)
1. Fix Lru Cache MemTracker consumption value is negative.
2. Fix compaction Cache MemTracker has no track.
3. Add USE_MEM_TRACKER compile option.
4. Make sure the malloc/free hook is not stopped at any time.
2022-05-25 08:56:17 +08:00
b34ed43ec9 [feature-wip] (memory tracker) (step6, End) Fix some details (#9301)
1. Fix LoadTask, ChunkAllocator, TabletMeta, Brpc, the accuracy of memory track.
2. Modified some MemTracker names, deleted some unnecessary trackers, and improved readability.
3. More powerful MemTracker debugging capabilities.
4. Avoid creating TabletColumn temporary objects and improve BE startup time by 8%.
5. Fix some other details.
2022-05-10 18:17:09 +08:00
c9961c9bb9 [style] clang-format all c++ code (#9305)
- sh build-support/clang-format.sh  to  clang-format all c++ code
2022-04-29 16:14:22 +08:00
d330bc3806 [Vectorized](stream-load-vec) Support stream load in vectorized engine (#8709) (#9280)
Implement vectorized stream load.
Added fe configuration option `enable_vectorized_load` to enable vectorized stream load.

    Co-authored-by: tengjp@outlook.com
    Co-authored-by: mrhhsg@gmail.com
    Co-authored-by: minghong.zhou@163.com
    Co-authored-by: HappenLee <happenlee@hotmail.com>
    Co-authored-by: zhoubintao <35688959+zbtzbtzbt@users.noreply.github.com>
2022-04-29 09:50:51 +08:00
519305cb22 [feature-wip] (memory tracker) (step4) Switch TLS mem tracker to separate more detailed memory usage (#8669)
Based on #8605, Separate out the memory usage of each operator from the Query/Load/StorageEngine mem tracker.
2022-04-08 09:02:26 +08:00
eeae516e37 [Feature](Memory) Hook TCMalloc new/delete automatically counts to MemTracker (#8476)
Early Design Documentation: https://shimo.im/docs/DT6JXDRkdTvdyV3G

Implement a new way of memory statistics based on TCMalloc New/Delete Hook,
MemTracker and TLS, and it is expected that all memory new/delete/malloc/free
of the BE process can be counted.
2022-03-20 23:06:54 +08:00
e17aef9467 [refactor] refactor the implement of MemTracker, and related usage (#8322)
Modify the implementation of MemTracker:
1. Simplify a lot of useless logic;
2. Added MemTrackerTaskPool, as the ancestor of all query and import trackers, This is used to track the local memory usage of all tasks executing;
3. Add cosume/release cache, trigger a cosume/release when the memory accumulation exceeds the parameter mem_tracker_consume_min_size_bytes;
4. Add a new memory leak detection mode (Experimental feature), throw an exception when the remaining statistical value is greater than the specified range when the MemTracker is destructed, and print the accurate statistical value in HTTP, the parameter memory_leak_detection
5. Added Virtual MemTracker, cosume/release will not sync to parent. It will be used when introducing TCMalloc Hook to record memory later, to record the specified memory independently;
6. Modify the GC logic, register the buffer cached in DiskIoMgr as a GC function, and add other GC functions later;
7. Change the global root node from Root MemTracker to Process MemTracker, and remove Process MemTracker in exec_env;
8. Modify the macro that detects whether the memory has reached the upper limit, modify the parameters and default behavior of creating MemTracker, modify the error message format in mem_limit_exceeded, extend and apply transfer_to, remove Metric in MemTracker, etc.;

Modify where MemTracker is used:
1. MemPool adds a constructor to create a temporary tracker to avoid a lot of redundant code;
2. Added trackers for global objects such as ChunkAllocator and StorageEngine;
3. Added more fine-grained trackers such as ExprContext;
4. RuntimeState removes FragmentMemTracker, that is, PlanFragmentExecutor mem_tracker, which was previously used for independent statistical scan process memory, and replaces it with _scanner_mem_tracker in OlapScanNode;
5. MemTracker is no longer recorded in ReservationTracker, and ReservationTracker will be removed later;
2022-03-11 22:04:23 +08:00
26289c28b0 [fix](load)(compaction) Fix NodeChannel coredump bug and modify some compaction logic (#8072)
1. Fix the problem of BE crash caused by destruct sequence. (close #8058)
2. Add a new BE config `compaction_task_num_per_fast_disk`

    This config specify the max concurrent compaction task num on fast disk(typically .SSD).
    So that for high speed disk, we can execute more compaction task at same time,
    to compact the data as soon as possible

3. Avoid frequent selection of unqualified tablet to perform compaction.
4. Modify some log level to reduce the log size of BE.
5. Modify some clone logic to handle error correctly.
2022-02-17 10:52:08 +08:00
ef984a6a72 [improvement](load) Improve load fault tolerance (#7674)
Currently, if we encounter a problem with a replica of a tablet during the load process,
such as a write error, rpc error, -235, etc., it will cause the entire load job to fail,
which results in a significant reduction in Doris' fault tolerance.

This PR mainly changes:

1. refined the judgment of failed replicas in the load process, so that the failure of a few replicas will not affect the normal completion of the load job.
2. fix a bug introduced from #7754 that may cause BE coredump
2022-01-20 09:23:21 +08:00
5f8d91257b [improvement](routine-load) Reduce the probability that the routine load task rpc timeout (#7754)
If an load task has a relatively short timeout, then we need to ensure that
each RPC of this task does not get blocked for a long time.
And an RPC is usually blocked for two reasons.

1. handling "memory exceeds limit" in the RPC
    
    If the system finds that the memory occupied by the load exceeds the threshold,
    it will select the load channel that occupies the most memory and flush the memtable in it.
    this operation is done in the RPC, which may be more time consuming.

2. close the load channel

    When the load channel receives the last batch, it will end the task.
    It will wait for all memtables flushes to finish synchronously. This process is also time consuming.

Therefore, this PR solves this problem by.

1. Use timeout to determine whether it is a high-priority load task

    If the timeout of an load task is relatively short, then we mark it as a high-priority task.

2. not processing "memory exceeds limit" for high priority tasks
3. use a separate flush thread to flush memtable for high priority tasks.
2022-01-16 10:41:31 +08:00
db1c281be5 [Enhance][Load] Reduce the number of segments when loading a large volume data in one batch (#6947)
## Case

In the load process, each tablet will have a memtable to save the incoming data,
and if the data in a memtable is larger than 100MB, it will be flushed to disk as a `segment` file. And then
a new memtable will be created to save the following data/

Assume that this is a table with N buckets(tablets). So the max size of all memtables will be `N * 100MB`.
If N is large, it will cost too much memory.

So for memory limit purpose, when the size of all memtables reach a threshold(2GB as default), Doris will
try to flush all current memtables to disk(even if their size are not reach 100MB).

So you will see that the memtable will be flushed when it's size reach `2GB/N`, which maybe much smaller
than 100MB, resulting in too many small segment files.

## Solution

When decide to flush memtable to reduce memory consumption, NOT to flush all memtable, but to flush part
of them.
For example, there are 50 tablets(with 50 memtables). The memory limit is 1GB, so when each memtable reach
20MB, the total size reach 1GB, and flush will occur.

If I only flush 25 of 50 memtables, then next time when the total size reach 1GB, there will be 25 memtables with
size 10MB, and other 25 memtables with size 30MB. So I can flush those memtables with size 30MB, which is larger
than 20MB.

The main idea is to introduce some jitter during flush to ensure the small unevenness of each memtable, so as to ensure that flush will only be triggered when the memtable is large enough.

In my test, loading a table with 48 buckets, mem limit 2G, in previous version, the average memtable size is 44MB,
after modification, the average size is 82MB
2021-11-01 10:51:50 +08:00