doris

Author	SHA1	Message	Date
bobhan1	642e5cdb69	[Fix](Status) Make `Status` `[[nodiscard]]` and handle returned `Status` correctly (#23395 )	2023-09-29 22:38:52 +08:00
bobhan1	c926e8ff9d	[Enhancement](Status) use `Status` to expose the error info more explicitly in `FlushToken` (#24240 )	2023-09-12 19:30:16 +08:00
Kaijie Chen	26905e36e5	[fix](load) fix nullptr in memtable limiter flush (#23149 )	2023-08-18 19:55:53 +08:00
Kaijie Chen	2013dcd0e9	[refactor](load) cleanup segment flush logic in beta rowset writer (#21635 )	2023-07-18 18:17:57 +08:00
Pxl	ca71048f7f	[Chore](status) avoid empty error msg on status (#21454 ) avoid empty error msg on status	2023-07-11 13:48:16 +08:00
Kaijie Chen	dac2b638c6	[refactor](load) move memtable flush logic to flush token and rowset writer (#21547 )	2023-07-06 17:04:30 +08:00
Adonis Ling	e412dd12e8	[chore](build) Use include-what-you-use to optimize includes (PART II) (#18761 ) Currently, there are some useless includes in the codebase. We can use a tool named include-what-you-use to optimize these includes. By using a strict include-what-you-use policy, we can get lots of benefits from it.	2023-04-19 23:11:48 +08:00
zhannngchen	4be1b9e784	[enhancement](load) add slow log for memtable flush (#17962 )	2023-03-22 20:21:39 +08:00
Mingyu Chen	3fec5ff0f5	[refactor](scan-pool) move scan pool from env to scanner scheduler (#15604 ) The origin scan pools are in exec_env. But after enable new_load_scan_node by default, the scan pool in exec_env is no longer used. All scan task will be submitted to the scan pool in scanner_scheduler. BTW, reorganize the scan pool into 3 kinds: local scan pool For olap scan node remote scan pool For file scan node limited scan pool For query which set cpu resource limit or with small limit clause TODO: Use bthread to unify all IO task. Some trivial issues: fix bug that the memtable flush size printed in log is not right Add RuntimeProfile param in VScanner	2023-01-11 09:38:42 +08:00
Xin Liao	0d5291801d	[fix](load) fix that flush memtable concurrently may cause data inconsistency (#15005 )	2022-12-13 09:27:35 +08:00
plat1ko	f3aea7f0f0	[Enhancement](status) Unify error code and enable customed err msg for BE internal errors (#14744 )	2022-12-11 23:33:18 +08:00
Xinyi Zou	c55d08fa2f	[fix](memtracker) Refactor load channel mem tracker to improve accuracy (#12791 ) The mem hook record tracker cannot guarantee that the final consumption is 0, nor can it guarantee that the memory alloc and free are recorded in a one-to-one correspondence. In the life cycle of a memtable from insert to flush, the memory free of hook is more than that of alloc, resulting in tracker consumption less than 0. In order to avoid the cumulative error of the upper load channel tracker, the memtable tracker consumption is reset to zero on destructor.	2022-09-21 20:16:19 +08:00
yixiutt	3072e17b39	[Bugfix](primary-key) fix calc delete bitmap bug in concurrent memtable flush (#12605 ) Co-authored-by: yixiutt <yixiu@selectdb.com>	2022-09-15 21:50:24 +08:00
Xinyi Zou	7d836cf0c7	[fix](memtracker) Fix flush memtable to reduce load channel mem not executed (#11771 ) The memory value automatically tracked by the tcmalloc hook in the DeltaWriter is smaller than the value recorded manually in the memtable, because the first 4096-byte Chunk requested by each MemPool when the memtable is initialized is not tracked to the DeltaWriter by the hook. The values of the two are not equal, causing the mem_consumption() == _mem_table->memory_usage branch judgment to fail.	2022-08-16 14:30:45 +08:00
Xinyi Zou	73d8f5901d	fix mem tracker limiter (#11376 )	2022-08-01 09:44:04 +08:00
Xinyi Zou	b6bdb3bdbc	[fix] (mem tracker) Fix MemTracker accuracy (#11190 )	2022-07-27 18:59:24 +08:00
Xinyi Zou	4960043f5e	[enhancement] Refactor to improve the usability of MemTracker (step2) (#10823 )	2022-07-21 17:11:28 +08:00
Xinyi Zou	6ad024a2bf	[fix] (mem tracker) Refactor memtable mem tracker, fix flush memtable DCHECK failed (#10156 ) 1. Added memory leak detection for `DeltaWriter` and `MemTable` mem tracker 2. Modify memtable mem tracker to virtual to avoid frequent recursive consumption of parent tracker. 3. Disable memtable flush thread attach memtable tracker, ensure that memtable mem tracker is completely accurate. 4. Modify `memory_verbose_track=false`. At present, there is a performance problem in the frequent switch thread mem tracker. - Because the mem tracker exists as a shared_ptr in the thread local. Each time it is switched, the atomic variable use_count in the shared_ptr of the current tracker will be -1, and the tracker to be replaced use_count +1, multi-threading Frequent changes to the same tracker shared_ptr are slow. - TODO: 1. Reduce unnecessary thread mem tracker switch, 2. Consider using raw pointers for mem tracker in thread local.	2022-06-19 16:48:42 +08:00
Adonis Ling	718a51a388	[refactor][style] Use clang-format to sort includes (#9483 )	2022-05-10 21:25:35 +08:00
chenlinzhong	c9961c9bb9	[style] clang-format all c++ code (#9305 ) - sh build-support/clang-format.sh to clang-format all c++ code	2022-04-29 16:14:22 +08:00
yiguolei	e5e0dc421d	[refactor] Change ALL OLAPStatus to Status (#8855 ) Currently, there are 2 status code in BE, one is common/Status.h, and the other is olap/olap_define.h called OLAPStatus. OLAPStatus is just an enum type, it is very simple and could not save many informations, I will unify these code to common/Status.	2022-04-14 11:43:49 +08:00
Xinyi Zou	519305cb22	[feature-wip] (memory tracker) (step4) Switch TLS mem tracker to separate more detailed memory usage (#8669 ) Based on #8605, Separate out the memory usage of each operator from the Query/Load/StorageEngine mem tracker.	2022-04-08 09:02:26 +08:00
Mingyu Chen	5f8d91257b	[improvement](routine-load) Reduce the probability that the routine load task rpc timeout (#7754 ) If an load task has a relatively short timeout, then we need to ensure that each RPC of this task does not get blocked for a long time. And an RPC is usually blocked for two reasons. 1. handling "memory exceeds limit" in the RPC If the system finds that the memory occupied by the load exceeds the threshold, it will select the load channel that occupies the most memory and flush the memtable in it. this operation is done in the RPC, which may be more time consuming. 2. close the load channel When the load channel receives the last batch, it will end the task. It will wait for all memtables flushes to finish synchronously. This process is also time consuming. Therefore, this PR solves this problem by. 1. Use timeout to determine whether it is a high-priority load task If the timeout of an load task is relatively short, then we mark it as a high-priority task. 2. not processing "memory exceeds limit" for high priority tasks 3. use a separate flush thread to flush memtable for high priority tasks.	2022-01-16 10:41:31 +08:00
caiconghui	0393c9b3b9	[Optimize] Support send batch parallelism for olap table sink (#6397 ) * Support send batch parallelism for olap table sink Co-authored-by: caiconghui <caiconghui@xiaomi.com>	2021-08-30 11:03:09 +08:00
Mingyu Chen	ab06e92021	[Load Parallel][2/3] Support parallel flushing memtable during load (#5163 ) In the previous implementation, in an load job, multiple memtables of the same tablet are written to disk sequentially. In fact, multiple memtables can be written out of order in parallel, only need to ensure that each memtable uses a different segment writer.	2021-01-24 10:10:30 +08:00
LingBin	c617fc9064	Fix the flush_status bug in flush-executor (#2933 ) For a tablet, there may be multiple memtables, which will be flushed to disk one by one in the order of generation. If a memtable flush fails, then the load job will definitely fail, but the previous implementation will overwrite `_flush_status`, which may make the error can not be detected, leads to an error load job to be success. This patch also have two other changes: 1. Use `std::bind` to replace `boost::bind`; 2. Removes some unneeded headers.	2020-02-19 20:23:19 +08:00
lichaoyong	1cf0fb9117	Use ThreadPool to refactor MemTableFlushExecutor (#2931 ) 1. MemTableFlushExecutor maintain a ThreadPool to receive FlushTask. 2. FlushToken is used to seperate different tasks from different tablets. Every DeltaWriter of tablet constructs a FlushToken, task in FlushToken are handle serially, task between FlushToken are handle concurrently. 3. I have remove thread limit on data_dir, because of I/O is not the main timer consumer of Flush thread. Much of time is consumed in CPU decoding and compress.	2020-02-18 18:39:04 +08:00
LingBin	3c539aac54	[Refactor] Some tiny refactor on streaming-load related code (#2891 ) Mainly contains the following modifications: 1. Use `std::unique_ptr` to replace some naked pointers 2. Modify some methods from member-method to local-static-function 3. Modify some methods do not need to be public to private 4. Some formatting changes: such as wrapping lines that are too long 5. Remove some useless variables 6. Add or modify some comments for easier understanding No functional changes in this patch.	2020-02-13 10:42:52 +08:00
Dayue Gao	83b5455be5	[Load] Fix several races in stream load that could cause BE crash (#2414 ) This CL fixes the following problems 1. check whether TabletsChannel has been closed/cancelled in `reduce_mem_usage` to avoid using a closed DeltaWriter 2. make `FlushHandle.wait` wait for all submitted tasks to finish so that memtable is deallocated before its delta writer 3. make `~MemTracker()` release its consumption bytes to accommodate situations in aggregate_func.h that bitmap and hll call `MemTracker::consume` without corresponding `MemTracker::release`, which cause the consumption of root tracker never drops to zero	2019-12-10 21:59:05 +08:00
LingBin	6fbb5b31fa	Need to check the return status of push_memtable (#2328 ) When `BlockingQueue` is shutdown, the `blocking_put()` will return false, we could not ignore it.	2019-11-30 17:17:46 +08:00
LingBin	324f1b8f51	Unify the type of path_hash to `size_t` (#2324 ) The type of path hash should be `size_t`(i.e. `uint32_t`), but the current code mixes `int64_t`, ` int32_t` and `size_t`	2019-11-28 18:48:52 +08:00
Mingyu Chen	ee5b79ac2b	Fix bug that memtable should be destroyed before finishing the load process (#1983 ) The parent mem tracker may be release before visiting it in child mem tracker, which cause segfault.	2019-10-15 22:46:19 +08:00
Mingyu Chen	62acf5d098	Limit the memory usage of Loading process (#1954 )	2019-10-15 09:26:20 +08:00
Mingyu Chen	c643cbd30c	Optimize the load performance for large file (#1798 ) The current load process is: Tablet Sink -> Tablet Channel Mgr -> Tablets Channel -> Delta Writer -> MemTable -> Flush to disk In the path of Tablets Channel -> DeltaWriter -> MemTable -> Flush to disk, the following operations are performed: Insert tuple into different memtables according to tablet ID When the memtable size reaches the threshold, it is written to disk. The above operations are equivalent to single thread execution for a single load task. In fact, the insertion of memtable and the flush of memtable can be executed synchronously. Perform these operation in single thread prevents the insertion of memtable from being delayed due to slow disk writing. In the new implementation, I added a MemTableFlushExecutor class with a set of flush queues and corresponding worker threads. By default, each data directory uses two worker threads for flush, which can be modified by the parameter flush_thread_num_per_store of BE. DeltaWriter will push the full memtable to MemTableFlushExecutor for flush operation and generate a new memtable for receiving new data. This design can improve the performance of load large files. In single host testing, the time to load a 1GB text file is reduced from 48 seconds to 29 seconds.	2019-09-25 13:49:32 +08:00

34 Commits