doris

Author	SHA1	Message	Date
Xin Liao	03ea2866b7	[fix](load) add to error tablets when delta writer failed to close (#15118 ) The result of load should be failed when all tablets delta writer failed to close on single node. But the result returned to client is success. The reason is that the committed tablets and error tablets are both empty, so publish will be success. We should add it to error tablets when delta writer failed to close, then the transaction will be failed.	2022-12-19 14:22:25 +08:00
zhannngchen	0cd791ec57	[fix](load) delta writer init failed might cause data inconsistency between multiple replicas (#15058 ) In the following case, data inconsistency would happen between multiple replicas current delta writer only writes a few lines of data (which meas the write() method only called once) writer failed when init()(which is called at the fist time we call write()), and current tablet is recorded in _broken_tablets delta writer closed, and in the close() method, delta writer found it's not inited, treat such case as an empty load, it will try to init again, which would create an empty rowset. tablet sink received the error report in rpc response, marked the replica as failed, but since the quorum replicas are succeed, so the following load commit operation will succeed. FE send publish version task to each be, the one with empty rowset will publish version successfully. We got 2 replica with data and 1 empty replica.	2022-12-16 22:07:00 +08:00
plat1ko	f3aea7f0f0	[Enhancement](status) Unify error code and enable customed err msg for BE internal errors (#14744 )	2022-12-11 23:33:18 +08:00
AlexYue	898d0d42f1	[improvement](load)add more log for better bug tracing experience for be write (#14424 ) Recently when tracing one bug happened in version 1.1.4 I found out there were some places we can add more log for a better tracing.	2022-11-29 22:28:39 +08:00
Xin Liao	33b50860c7	[improvement](load) release load channel actively when error occurs (#14218 )	2022-11-13 12:31:15 +08:00
Xinyi Zou	32a029d9dc	[enhancement](memtracker) Refactor load channel + memtable mem tracker (#13795 )	2022-11-03 09:47:12 +08:00
Yongqiang YANG	c24e5585c3	[fix](load) clear and notify when an error happens in flushing (#13589 )	2022-10-25 13:39:17 +08:00
zhannngchen	1b0dafcaa1	[Enhancement](load) consider memtable in flush while reducing load me… (#13480 ) We should consider memory which are being flushed from memtable to disk when trying to reduce memory by flushing memtable. Otherwise, we might not release memory space as expected. (e.g. lots of large memtable is in flush, the reduce_mem_usage method picks some small memtables to flush, it can't release enough memory and also can generate lots of small segments, which can cause -238 error)	2022-10-21 08:35:35 +08:00
zhannngchen	d8ec53c83f	[enhancement](load) avoid duplicate reduce on same TabletsChannel #12975 In the policy changed by PR #12716, when reaching the hard limit, there might be multiple threads can pick same LoadChannel and call reduce_mem_usage on same TabletsChannel. Although there's a lock and condition variable can prevent multiple threads to reduce mem usage concurrently, but they still can do same reduce-work on that channel multiple times one by one, even it's just reduced.	2022-09-27 22:03:08 +08:00
zhannngchen	57d5f69814	[fix](load) print detailed error message (#12938 ) fix flush failure return message	2022-09-25 10:31:41 +08:00
zhannngchen	3bb920ba54	[Enhancement](load) Refine the load channel flush policy on mem limit (#12716 ) 1. Remove single load channel mem limit, only use load channel mgr mem limit 2. Default load channel mgr mem limit from 50% to 80% 3. load channel mgr add soft mem limit. When the soft limit is exceeded, other threads will not hang, only current thread triggers flush 4. When exceed load channel mgr mem limit, find a load channel with the largest mem usage, continue to find a tablet channel with the largest mem usage, and try to flush 1/3 of the mem usage of this tablet channel.	2022-09-24 10:01:13 +08:00
zhannngchen	27f7ae258d	[Enhancement](load) optimize flush policy to avoid small segments #12706 In current policy, if mem-limit exceeded, load channel will pick tablets that consume most memory, but mem_consumption contains memory in flush, if some delta writer flushing a full memtable(default 200MB), the current memtable might be very small, we should avoid flush such memtable, which can generate a very small segment.	2022-09-21 14:33:05 +08:00
Xinyi Zou	942b31038f	[fix](memory) Fix BE OOM when load -238 fail (#12666 ) When the flush is triggered when the load channel exceeds the mem limit, if the flush fails, an error message is returned and the load is terminated. Usually flush failure is -238 error code. Because the memtable is frequently flushed after the load channel exceeds the mem limit, the number of segments exceeds the max value.	2022-09-17 00:17:53 +08:00
zhengyu	445f0882d1	[Enhancement](log) improve error msg for delta writer fail (#12121 ) (#12360 ) Signed-off-by: freemandealer <freeman.zhang1992@gmail.com> Signed-off-by: freemandealer <freeman.zhang1992@gmail.com>	2022-09-07 10:10:51 +08:00
weizuo93	f730a048b1	[feature-wip](load) Support single replica load (#10298 ) During load process, the same operation are performed on all replicas such as sort and aggregation, which are resource-intensive. Concurrent data load would consume much CPU and memory resources. It's better to perform write process (writing data into MemTable and then data flush) on single replica and synchronize data files to other replicas before transaction finished.	2022-08-02 11:44:18 +08:00
Xinyi Zou	73d8f5901d	fix mem tracker limiter (#11376 )	2022-08-01 09:44:04 +08:00
Xinyi Zou	b6bdb3bdbc	[fix] (mem tracker) Fix MemTracker accuracy (#11190 )	2022-07-27 18:59:24 +08:00
Xinyi Zou	4960043f5e	[enhancement] Refactor to improve the usability of MemTracker (step2) (#10823 )	2022-07-21 17:11:28 +08:00
Lightman	486cf0ebd4	[Feature] Lightweight schema change of add/drop column (#10136 ) * [Schema Change] support fast add/drop column (#49) * [feature](schema-change) support fast schema change. coauthor: yixiutt * [schema change] Using columns desc from fe to read data. coauthor: Lchangliang * [feature](schema change) schema change optimize for add/drop columns. 1.add uniqueId field for class column. 2.schema change for add/drop columns directly update schema meta Co-authored-by: yixiutt <yixiu@selectdb.com> Co-authored-by: SWJTU-ZhangLei <1091517373@qq.com> [Feature](schema change) fix write and add regression test (#69) Co-authored-by: yixiutt <yixiu@selectdb.com> [schema change] be ssupport that delete use newest schema add delete regression test fix regression case (#107) tmp [feature](schema change) light schema change exclude rollup and agg/uniq/dup key type. [feature](schema change) fe olapTable maxUniqueId write in disk. [feature](schema change) add rpc iface for sc add column. [feature](schema change) add columnsDesc to TPushReq for ligtht sc. resolve the deadlock when schema change (#124) fix columns from fe don't has bitmap_index flag (#134) add update/delete case construct MATERIALIZED schema from origin schema when insert fix not vectorized compaction coredump use segment cache choose newest schema by schema version when compaction (#182) [bugfix](schema change) fix ligth schema change problem. [feature](schema change) light schema change add alter job. (#1) fix be ut [bug] (schema change) unique drop key column should not light schema change [feature](schema change) add schema change regression-test. fix regression test [bugfix](schema change) fix multi alter clauses for light schema change. (#2) [bugfix](schema change) fix multi clauses calculate column unique id (#3) modify PushTask process (#217) [Bugfix](schema change) fix jobId replay cause bdbje exception. [bug](schema change) fix max col unique id repeatitive. (#232) [optimize](schema change) modify pendingMaxColUniqueId generate rule. fix compaction error * fix be ut * fix snapshot load core fix unique_id error (#278) [refact](fe) remove redundant code for light schema change. (#4) [refact](fe) remove redundant code for light schema change. (#4) format fe core format be core fix be ut modify fe meta version fix rebase error flush schema into rowset_meta in old table [refactor](schema change) refact fe light schema change. (#5) delete the change of schemahash and support get max version schema * modify for review * fix be ut * fix schema change test	2022-07-12 19:41:06 +08:00
Tiewei Fang	c9f86bc7e2	[refactor] Refactoring Status static methods to format message using fmt(#9533 )	2022-07-02 18:58:23 +08:00
yiguolei	97996c9275	[fix](Insert) fix 5 concurrent "insert...select..." OOM (#10501 ) * [hotfix](dev-1.0.1) 5 concurrent insert...select... OOM Co-authored-by: minghong <minghong.zhou@163.com> Co-authored-by: yiguolei <yiguolei@gmail.com>	2022-07-01 15:29:26 +08:00
Kidd	eb25df5a2c	[fix] (mem tracker) Fix inaccurate mem tracker leads to load OOM (#10409 ) * fix load tracker * fix comment	2022-06-25 14:13:02 +08:00
pengxiangyu	75b3707a28	[refactor](load) add tablet errors when close_wait return error (#9619 )	2022-05-22 21:27:42 +08:00
Adonis Ling	718a51a388	[refactor][style] Use clang-format to sort includes (#9483 )	2022-05-10 21:25:35 +08:00
Xinyi Zou	b34ed43ec9	[feature-wip] (memory tracker) (step6, End) Fix some details (#9301 ) 1. Fix LoadTask, ChunkAllocator, TabletMeta, Brpc, the accuracy of memory track. 2. Modified some MemTracker names, deleted some unnecessary trackers, and improved readability. 3. More powerful MemTracker debugging capabilities. 4. Avoid creating TabletColumn temporary objects and improve BE startup time by 8%. 5. Fix some other details.	2022-05-10 18:17:09 +08:00
pengxiangyu	7234c964ae	[Bug] Missing error tablet list when close_wait return error (#9418 )	2022-05-08 06:45:28 +08:00
chenlinzhong	53574ce0ea	[Bug] (fix) DeltaWriter::mem_consumption() coredump (#9245 )	2022-05-07 19:13:08 +08:00
chenlinzhong	c9961c9bb9	[style] clang-format all c++ code (#9305 ) - sh build-support/clang-format.sh to clang-format all c++ code	2022-04-29 16:14:22 +08:00
HappenLee	d330bc3806	[Vectorized](stream-load-vec) Support stream load in vectorized engine (#8709 ) (#9280 ) Implement vectorized stream load. Added fe configuration option `enable_vectorized_load` to enable vectorized stream load. Co-authored-by: tengjp@outlook.com Co-authored-by: mrhhsg@gmail.com Co-authored-by: minghong.zhou@163.com Co-authored-by: HappenLee <happenlee@hotmail.com> Co-authored-by: zhoubintao <35688959+zbtzbtzbt@users.noreply.github.com>	2022-04-29 09:50:51 +08:00
yiguolei	e5e0dc421d	[refactor] Change ALL OLAPStatus to Status (#8855 ) Currently, there are 2 status code in BE, one is common/Status.h, and the other is olap/olap_define.h called OLAPStatus. OLAPStatus is just an enum type, it is very simple and could not save many informations, I will unify these code to common/Status.	2022-04-14 11:43:49 +08:00
Xinyi Zou	519305cb22	[feature-wip] (memory tracker) (step4) Switch TLS mem tracker to separate more detailed memory usage (#8669 ) Based on #8605, Separate out the memory usage of each operator from the Query/Load/StorageEngine mem tracker.	2022-04-08 09:02:26 +08:00
Xinyi Zou	eeae516e37	[Feature](Memory) Hook TCMalloc new/delete automatically counts to MemTracker (#8476 ) Early Design Documentation: https://shimo.im/docs/DT6JXDRkdTvdyV3G Implement a new way of memory statistics based on TCMalloc New/Delete Hook, MemTracker and TLS, and it is expected that all memory new/delete/malloc/free of the BE process can be counted.	2022-03-20 23:06:54 +08:00
Xinyi Zou	e17aef9467	[refactor] refactor the implement of MemTracker, and related usage (#8322 ) Modify the implementation of MemTracker: 1. Simplify a lot of useless logic; 2. Added MemTrackerTaskPool, as the ancestor of all query and import trackers, This is used to track the local memory usage of all tasks executing; 3. Add cosume/release cache, trigger a cosume/release when the memory accumulation exceeds the parameter mem_tracker_consume_min_size_bytes; 4. Add a new memory leak detection mode (Experimental feature), throw an exception when the remaining statistical value is greater than the specified range when the MemTracker is destructed, and print the accurate statistical value in HTTP, the parameter memory_leak_detection 5. Added Virtual MemTracker, cosume/release will not sync to parent. It will be used when introducing TCMalloc Hook to record memory later, to record the specified memory independently; 6. Modify the GC logic, register the buffer cached in DiskIoMgr as a GC function, and add other GC functions later; 7. Change the global root node from Root MemTracker to Process MemTracker, and remove Process MemTracker in exec_env; 8. Modify the macro that detects whether the memory has reached the upper limit, modify the parameters and default behavior of creating MemTracker, modify the error message format in mem_limit_exceeded, extend and apply transfer_to, remove Metric in MemTracker, etc.; Modify where MemTracker is used: 1. MemPool adds a constructor to create a temporary tracker to avoid a lot of redundant code; 2. Added trackers for global objects such as ChunkAllocator and StorageEngine; 3. Added more fine-grained trackers such as ExprContext; 4. RuntimeState removes FragmentMemTracker, that is, PlanFragmentExecutor mem_tracker, which was previously used for independent statistical scan process memory, and replaces it with _scanner_mem_tracker in OlapScanNode; 5. MemTracker is no longer recorded in ReservationTracker, and ReservationTracker will be removed later;	2022-03-11 22:04:23 +08:00
yiguolei	d880559214	[refactor] remove old schema change code on BE (#8342 )	2022-03-09 13:05:44 +08:00
Mingyu Chen	ef984a6a72	[improvement](load) Improve load fault tolerance (#7674 ) Currently, if we encounter a problem with a replica of a tablet during the load process, such as a write error, rpc error, -235, etc., it will cause the entire load job to fail, which results in a significant reduction in Doris' fault tolerance. This PR mainly changes: 1. refined the judgment of failed replicas in the load process, so that the failure of a few replicas will not affect the normal completion of the load job. 2. fix a bug introduced from #7754 that may cause BE coredump	2022-01-20 09:23:21 +08:00
Mingyu Chen	5f8d91257b	[improvement](routine-load) Reduce the probability that the routine load task rpc timeout (#7754 ) If an load task has a relatively short timeout, then we need to ensure that each RPC of this task does not get blocked for a long time. And an RPC is usually blocked for two reasons. 1. handling "memory exceeds limit" in the RPC If the system finds that the memory occupied by the load exceeds the threshold, it will select the load channel that occupies the most memory and flush the memtable in it. this operation is done in the RPC, which may be more time consuming. 2. close the load channel When the load channel receives the last batch, it will end the task. It will wait for all memtables flushes to finish synchronously. This process is also time consuming. Therefore, this PR solves this problem by. 1. Use timeout to determine whether it is a high-priority load task If the timeout of an load task is relatively short, then we mark it as a high-priority task. 2. not processing "memory exceeds limit" for high priority tasks 3. use a separate flush thread to flush memtable for high priority tasks.	2022-01-16 10:41:31 +08:00
曹建华	948a2a738d	[performance] Improve DeltaWriter's performance. (#7216 ) 1. Support batch write for DeltaWriter. 2. Use mutex instead of SpinLock.	2021-11-26 10:15:27 +08:00
Mingyu Chen	db1c281be5	[Enhance][Load] Reduce the number of segments when loading a large volume data in one batch (#6947 ) ## Case In the load process, each tablet will have a memtable to save the incoming data, and if the data in a memtable is larger than 100MB, it will be flushed to disk as a `segment` file. And then a new memtable will be created to save the following data/ Assume that this is a table with N buckets(tablets). So the max size of all memtables will be `N * 100MB`. If N is large, it will cost too much memory. So for memory limit purpose, when the size of all memtables reach a threshold(2GB as default), Doris will try to flush all current memtables to disk(even if their size are not reach 100MB). So you will see that the memtable will be flushed when it's size reach `2GB/N`, which maybe much smaller than 100MB, resulting in too many small segment files. ## Solution When decide to flush memtable to reduce memory consumption, NOT to flush all memtable, but to flush part of them. For example, there are 50 tablets(with 50 memtables). The memory limit is 1GB, so when each memtable reach 20MB, the total size reach 1GB, and flush will occur. If I only flush 25 of 50 memtables, then next time when the total size reach 1GB, there will be 25 memtables with size 10MB, and other 25 memtables with size 30MB. So I can flush those memtables with size 30MB, which is larger than 20MB. The main idea is to introduce some jitter during flush to ensure the small unevenness of each memtable, so as to ensure that flush will only be triggered when the memtable is large enough. In my test, loading a table with 48 buckets, mem limit 2G, in previous version, the average memtable size is 44MB, after modification, the average size is 82MB	2021-11-01 10:51:50 +08:00
Mingyu Chen	e8cabfff27	[S3] Support path style endpoint (#6962 ) Add a use_path_style property for S3 Upgrade hadoop-common and hadoop-aws to 2.8.0 to support path style property Fix some S3 URI bugs Add some logs for tracing load process.	2021-11-01 10:48:10 +08:00
HappenLee	1a81b9e160	[MemTracker] Some enchance of MemTracker (#5783 ) 1 Make some MemTracker have reasonable parent MemTracker not the root tracker 2 Make each MemTracker can be easily to trace. 3 Add show level of MemTracker to reduce the MemTracker show in the web page to have a way to control show how many tracker in web page.	2021-05-19 09:27:50 +08:00
stdpain	a1bce25677	[BUG] Fix Memory Leak in SchemaChange And Fix some DCHECK error (#5491 )	2021-03-17 09:27:05 +08:00
Yingchun Lai	0131c33966	[Enhance] Improve the readability of memtrackers' name (#5455 ) Improve the readability of memtrackers' name, then you will be happy to read website be_ip:port/mem_tracker	2021-03-11 22:33:31 +08:00
Mingyu Chen	51ccd44865	[Load Parallel][3/3] Support parallel delta writer (#5369 ) In the previous broker load, multiple OlapTableSinks would send data to the same LoadChannel, and because of the lock granularity problem, LoadChannel could only process these requests serially, which made it impossible to make full use of cluster resources. This CL modifies the related locks so that LoadChannel can process these requests in parallel. In the test, with a size of 20G, the load speed of 334 million rows of data in 3 nodes has been increased from 9min to 5min, and after enabling 2 concurrency, it can be increased to 3min. Also modify the profile of load job.	2021-02-07 22:42:18 +08:00
Zhengguo Yang	93a4c7efc1	[LOG] Standardize the use of VLOG in code (#5264 ) At present, the application of vlog in the code is quite confusing. It is inherited from impala VLOG_XX format, and there is also VLOG(number) format. VLOG(number) format does not have a unified specification, so this pr standardizes the use of VLOG	2021-01-21 12:09:09 +08:00
sduzh	6fedf5881b	[CodeFormat] Clang-format cpp sources (#4965 ) Clang-format all c++ source files.	2020-11-28 18:36:49 +08:00
Yingchun Lai	3438a746ac	[Typo] Fix typo in metrics macros (#4739 ) Just fix typo. Rename DEFINE_GAUGE_METRIC_PROTOTYPE_5ARG(name, unit) to DEFINE_GAUGE_METRIC_PROTOTYPE_2ARG(name, unit) Rename DEFINE_GAUGE_METRIC_PROTOTYPE_2ARG(name, unit) witch define core metrics to DEFINE_GAUGE_CORE_METRIC_PROTOTYPE_2ARG(name, unit)	2020-10-15 19:56:43 +08:00
HuangWei	704bcec9d3	[Bug] add_batch check state fix (#4575 )	2020-09-12 11:18:10 +08:00
Yingchun Lai	e71152132c	[metrics] Redesign metrics to 3 layers (#4115 ) Redesign metrics to 3 layers: MetricRegistry - MetricEntity - Metrics MetricRegistry : the register center MetricEntity : the entity registered on MetricRegistry. Generally a MetricRegistry can be registered on several MetricEntities, each of MetricEntity is an independent entity, such as server, disk_devices, data_directories, thrift clients and servers, and so on. Metric : metrics of an entity. Such as fragment_requests_total on server entity, disk_bytes_read on a disk_device entity, thrift_opened_clients on a thrift_client entity. MetricPrototype: the type of a metric. MetricPrototype is a global variable, can be shared by the same metrics across different MetricEntities.	2020-08-08 11:23:01 +08:00
HuangWei	10f822eb43	[MemTracker] make all MemTrackers shared (#4135 ) We make all MemTrackers shared, in order to show MemTracker real-time consumptions on the web. As follows: 1. nearly all MemTracker raw ptr -> shared_ptr 2. Use CreateTracker() to create new MemTracker(in order to add itself to its parent) 3. RowBatch & MemPool still use raw ptrs of MemTracker, it's easy to ensure RowBatch & MemPool destructor exec before MemTracker's destructor. So we don't change these code. 4. MemTracker can use RuntimeProfile's counter to calc consumption. So RuntimeProfile's counter need to be shared too. We add a shared counter pool to store the shared counter, don't change other counters of RuntimeProfile. Note that, this PR doesn't change the MemTracker tree structure. So there still have some orphan trackers, e.g. RowBlockV2's MemTracker. If you find some shared MemTrackers are little memory consumption & too time-consuming, you could make them be the orphan, then it's fine to use the raw ptr.	2020-07-31 21:57:21 +08:00
Dayue Gao	f9a52f5db4	[Bug] Insert may leak DeltaWriter when re-analyzed (#3973 )	2020-06-30 11:09:53 +08:00

1 2

57 Commits