Commit Graph

14 Commits

Author SHA1 Message Date
0e4b3755a2 Refactor txn manager methods (#1950) 2019-10-11 17:16:13 +08:00
b72a4a4bc6 Add tablet meta checkpoint mechanism (#1936) 2019-10-10 09:39:02 +08:00
c643cbd30c Optimize the load performance for large file (#1798)
The current load process is:

Tablet Sink -> Tablet Channel Mgr -> Tablets Channel -> Delta Writer -> MemTable -> Flush to disk

In the path of Tablets Channel -> DeltaWriter -> MemTable -> Flush to disk, the following operations are performed:

Insert tuple into different memtables according to tablet ID
When the memtable size reaches the threshold, it is written to disk.
The above operations are equivalent to single thread execution for a single load task.
In fact, the insertion of memtable and the flush of memtable can be executed synchronously.
Perform these operation in single thread prevents the insertion of memtable from being delayed due to slow disk writing.

In the new implementation, I added a MemTableFlushExecutor class with a set of flush queues and corresponding worker threads.
By default, each data directory uses two worker threads for flush, which can be modified by the parameter flush_thread_num_per_store of BE.
DeltaWriter will push the full memtable to MemTableFlushExecutor for flush operation and generate a new memtable for receiving new data.

This design can improve the performance of load large files.
In single host testing, the time to load a 1GB text file is reduced from 48 seconds to 29 seconds.
2019-09-25 13:49:32 +08:00
720808fda5 Remove config::max_file_descriptor_number (#1833) 2019-09-20 07:50:57 +08:00
d1676c3c3d Check file descriptor number is larger than 65536 upon start (#1819) 2019-09-19 12:48:36 +08:00
981e0feb99 Check rowset is useful atomicly (#1750)
* Check rowset is useful atomicly

* Only release rowset id when it is added to unused rowset

* remove release rowset id when save rowset meta
2019-09-06 17:21:42 +08:00
a63989cc61 Use RowsetFactory to create and init RowsetWriter (#1740) 2019-09-04 17:02:43 +08:00
6f4feca3dc Add rowset id generator to FE and BE (#1678) 2019-09-02 18:51:31 +08:00
7e981b2b14 Limit the disk usage to avoid running out of disk capacity (#1702)
Set high watermark and flood stage of disk used capacity.
And forbid some operations if disk usage is too high.
2019-08-27 22:18:17 +08:00
2b2bc82ae2 Add timeout on snapshot of data (#1672)
Release snapshot when finishing or cancelling backup/restore job.
Snapshot may takes a lot disk space if not releasing them in time.
2019-08-21 21:18:53 +08:00
851b2ca3bd Remove unused code in StorageEngine (#1671) 2019-08-20 10:50:07 +08:00
dcb75729db Change cumulative compaction for decoupling storage from compution (#1576)
1. Calculate cumulative point when loading tablet first time.
2. Simplify pick rowsets logic upon delete predicate.
3. Saving meta and modify rowsets only once after cumulative compaction.
2019-08-13 18:25:56 +08:00
d938f9a6ea Implement the initial version of BetaRowset (#1568) 2019-08-06 10:40:16 +08:00
0d48a3961c Refactor Storage Engine (#1478)
NOTE: This patch would modify all Backend's data.
And this will cause a very long time to restart be.
So if you want to interferer your product environment,
you should upgrade backend one by one.

1. Refactoring be is to clarify the structure the codes.
2. Use unique id to indicate a rowset.
   Nameing rowset with tablet_id and version will lead to
   many conflicts among compaction, clone, restore.
3. Extract an rowset interface to encapsulate rowsets
   with different format.
2019-07-15 21:18:22 +08:00