This lock was introduced by lazy open in #18874.
It's unnecessary and costly to hold a lock while writing data to DeltaWriter in the first place.
However, since lazy open is reverted in #21821, we can completely omit this lock.
_tablet_writers is not supposed to be changed once we've reached TabletsChannel::add_batch.
Before, refresh the TabletsChannel profile in the LoadChannelMgr refresh memory statistics thread
This means that enable_profile=false will refresh and have performance loss in stress test
TabletSink and LoadChannel in BE are M: N relationship,
Every once in a while LoadChannel will randomly return its own runtime profile to a TabletSink, so usually all LoadChannel runtime profiles are saved on each TabletSink, and the timeliness of the same LoadChannel profile saved on different TabletSinks is different, and each TabletSink will periodically send fe reports all the LoadChannel profiles saved by itself, and ensures to update the latest LoadChannel profile according to the timestamp.
Currently, there are some useless includes in the codebase. We can use a tool named include-what-you-use to optimize these includes. By using a strict include-what-you-use policy, we can get lots of benefits from it.
The current distribution model for Doris is as follows:
OlapTableSink seperate the original Block into serveral subblocks of each node(BE) by tablets distribution and distributes subblocks to storage engine of backends, then the storage engine will seperate the subblock into multiple tablets channel and each delta writer will handle partial of the block.
This model causes blocks to be split according to tablets, and the splitting process can be a relatively heavy operation. After splitting, the blocks are distributed to different DeltaWriters (Memtables) through RPCs to TabletChannels. The distribution operation on TabletChannels is also a relatively heavy operation. If the distribution property of the table is RANDOM distribution, then we have the opportunity to distribute the blocks according to the complete block during distribution. The advantage of doing so is to reduce memory copying and improve write locality, similar to appending the entire block to the memtable.
This optimze could save 10% ~ 20% CPU cost of RANDOM distribution table load when enable load_to_single_tablet
fmt::format dosen't support non-template object as args, even if it implements
`to_string()` or `operator<<`. so orignal code may cause `false` to be printed
instead of real cause of the failure. So to_string() need to be manually invoked.
Signed-off-by: freemandealer <freeman.zhang1992@gmail.com>
The result of load should be failed when all tablets delta writer failed to close on single node.
But the result returned to client is success.
The reason is that the committed tablets and error tablets are both empty, so publish will be success.
We should add it to error tablets when delta writer failed to close, then the transaction will be failed.
In the following case, data inconsistency would happen between multiple replicas
current delta writer only writes a few lines of data (which meas the write() method only called once)
writer failed when init()(which is called at the fist time we call write()), and current tablet is recorded in _broken_tablets
delta writer closed, and in the close() method, delta writer found it's not inited, treat such case as an empty load, it will try to init again, which would create an empty rowset.
tablet sink received the error report in rpc response, marked the replica as failed, but since the quorum replicas are succeed, so the following load commit operation will succeed.
FE send publish version task to each be, the one with empty rowset will publish version successfully.
We got 2 replica with data and 1 empty replica.
We should consider memory which are being flushed from memtable to disk when trying to reduce memory by flushing memtable. Otherwise, we might not release memory space as expected. (e.g. lots of large memtable is in flush, the reduce_mem_usage method picks some small memtables to flush, it can't release enough memory and also can generate lots of small segments, which can cause -238 error)
In the policy changed by PR #12716, when reaching the hard limit, there might be multiple threads can pick same LoadChannel and call reduce_mem_usage on same TabletsChannel. Although there's a lock and condition variable can prevent multiple threads to reduce mem usage concurrently, but they still can do same reduce-work on that channel multiple times one by one, even it's just reduced.