doris

Author	SHA1	Message	Date
lichaoyong	0f30e03914	[BUG] Fix TabletSinkTest unit test (#4318 )	2020-08-10 21:28:28 +08:00
HuangWei	10f822eb43	[MemTracker] make all MemTrackers shared (#4135 ) We make all MemTrackers shared, in order to show MemTracker real-time consumptions on the web. As follows: 1. nearly all MemTracker raw ptr -> shared_ptr 2. Use CreateTracker() to create new MemTracker(in order to add itself to its parent) 3. RowBatch & MemPool still use raw ptrs of MemTracker, it's easy to ensure RowBatch & MemPool destructor exec before MemTracker's destructor. So we don't change these code. 4. MemTracker can use RuntimeProfile's counter to calc consumption. So RuntimeProfile's counter need to be shared too. We add a shared counter pool to store the shared counter, don't change other counters of RuntimeProfile. Note that, this PR doesn't change the MemTracker tree structure. So there still have some orphan trackers, e.g. RowBlockV2's MemTracker. If you find some shared MemTrackers are little memory consumption & too time-consuming, you could make them be the orphan, then it's fine to use the raw ptr.	2020-07-31 21:57:21 +08:00
HuangWei	fdd65c50c4	[Bug] fix mem_tracker use-after-free & add UT for it (#3899 )	2020-06-20 19:08:53 +08:00
Yingchun Lai	b576e54fe6	[ASAN] Fix some address problems detected by ASAN (#3495 ) LSAN detected errors have been fixed by a prior pathch (#3326), but there are still some ASAN detected errors. This patch try to fix these errors to make Doris BE more robustness. And then we can add CI run in LSAN/ASAN mode to detect memory errors as early as possible.	2020-05-11 10:30:45 +08:00
HuangWei	94539e7120	Non blocking OlapTableSink (#3143 ) ImplementaItion Notes NodeChannel _cur_batch -> _pending_batches: when _cur_batch is filled up, move it to _pending_batches. add_row() just produce batches. try_send_and_fetch_status() tries to consume one pending batch. If has in flight packet, skip send in this round. So we can add one sender thread to be in charge of all node channels try_send. IndexChannel init(), open() stay the same. Use for_each_node_channel() to expose the detailed changes of NodeChannel.(It's more easy to read & modify) Sender thread See func OlapTableSink::_send_batch_process() Why use polling？ If we use wait/notify, it will notify when generate a new batch. We can't skip sending this batch, coz it won't notify the same batch again. So wait/notify can't avoid blocking simply. So I choose polling. It's wasting to continuously try_send(), but it's difficult to set the suitable polling interval. Thus, I add std::this_thread::yield() to give up the time slice, give priority to other process/threads (if there are other process/threads waiting in the queue).	2020-05-07 10:43:41 +08:00
Yingchun Lai	b58b1b3953	[metrics] Make DorisMetrics to be a real singleton (#3417 )	2020-05-04 09:20:53 +08:00
Yingchun Lai	72f3082358	[Metrics] Add some metrics for container size in BE (#3246 ) We can observe the workload of BE, and also it's a way to check whether there is any problem in BE, like some container increase too large and lead to OOM. This patch add the following metrics: ``` Name Description rowset_count_generated_and_in_use The total count of rowset id generated and in use since BE last start unused_rowsets_count The total count of unused rowset waiting to be GC broker_count The total count of brokers in management data_stream_receiver_count The total count of data stream receivers in management fragment_endpoint_count The total count of fragment endpoints of data stream in management, should always equal to data_stream_receiver_count active_scan_context_count The total count of active scan contexts plan_fragment_count The total count of plan fragments in executing load_channel_count The total count of load channels in management result_buffer_block_count The total count of result buffer blocks for queries, each block has a limited queue size (default 1024) result_block_queue_count The total count of queues for fragments, each queue has a limited size (default 20, by config::max_memory_sink_batch_count) routine_load_task_count The total count of routine load tasks in executing small_file_cache_count The total count of cached small files' digest info stream_load_pipe_count The total count of stream load pipes, each pipe has a limited buffer size (default 1M) tablet_writer_count The total count of tablet writers brpc_endpoint_stub_count The total count of brpc endpoints ```	2020-04-25 16:13:39 +08:00
Yingchun Lai	4a7a88ede1	[LSAN] Fix some memory leak detected by LSAN (#3326 )	2020-04-22 22:59:44 +08:00
HuangWei	2ed184e06a	Add config: tablet writer open rpc timeout (#3258 )	2020-04-03 16:43:56 +08:00
yangzhg	3e6dfa31c4	[UnitTest] Fix BE unit test randomly failed (#2970 ) * fix http server related unit test failed due to http port has been used * fix unit test failed in DEBUG build type	2020-02-21 22:21:02 +08:00
Mingyu Chen	62acf5d098	Limit the memory usage of Loading process (#1954 )	2019-10-15 09:26:20 +08:00
ZHAO Chun	58801c6ab0	Support converting RowBatch and RowBlockV2 to/from Arrow (#1699 )	2019-08-27 11:30:00 +08:00
HangyuanLiu	69af50aa8c	Time zone related BE function (#1598 ) Details can be found in time-zone.md document	2019-08-12 20:57:59 +08:00
Mingyu Chen	0694b6a6fa	Fix bugs of Broker load (#1546 ) Use same UUID as query ID and load ID of a load execution plan. Each load execution plan has a load ID, and as a plan, there is also a query ID. We can use same UUID as query ID and load ID, for tracing the load process more easily. Change the load ID when retrying a load execution plan. When a load execution plan retry, the load ID should be changed, otherwise BE can not distinguish the old and new load requests. Cancel the running loading task when cancelling the broker load. When user cancel a broker load, the running loading task should also be cancelled, or it may occupies the worker thread for a long time. Remove the unnecessary query report when doing load execution plan. Only the last query report is needed. Add a new BE config tablet_writer_rpc_timeout_sec. It is used for RPC of tablet sink. The default is 600 seconds. which is long enough for flushing about 6GB data. The long timeout config will reduce the possibility of encountering fail to send batch error when loading. Use streaming_load_max_mb instead of mini_load_max_mb in BE config. Add more logs for tracing a broker load process easily.	2019-07-27 20:17:05 +08:00
lichaoyong	0d48a3961c	Refactor Storage Engine (#1478 ) NOTE: This patch would modify all Backend's data. And this will cause a very long time to restart be. So if you want to interferer your product environment, you should upgrade backend one by one. 1. Refactoring be is to clarify the structure the codes. 2. Use unique id to indicate a rowset. Nameing rowset with tablet_id and version will lead to many conflicts among compaction, clone, restore. 3. Extract an rowset interface to encapsulate rowsets with different format.	2019-07-15 21:18:22 +08:00

15 Commits