doris

Files

Mingyu Chen db1c281be5 [Enhance][Load] Reduce the number of segments when loading a large volume data in one batch (#6947 )

## Case

In the load process, each tablet will have a memtable to save the incoming data,
and if the data in a memtable is larger than 100MB, it will be flushed to disk as a `segment` file. And then
a new memtable will be created to save the following data/

Assume that this is a table with N buckets(tablets). So the max size of all memtables will be `N * 100MB`.
If N is large, it will cost too much memory.

So for memory limit purpose, when the size of all memtables reach a threshold(2GB as default), Doris will
try to flush all current memtables to disk(even if their size are not reach 100MB).

So you will see that the memtable will be flushed when it's size reach `2GB/N`, which maybe much smaller
than 100MB, resulting in too many small segment files.

## Solution

When decide to flush memtable to reduce memory consumption, NOT to flush all memtable, but to flush part
of them.
For example, there are 50 tablets(with 50 memtables). The memory limit is 1GB, so when each memtable reach
20MB, the total size reach 1GB, and flush will occur.

If I only flush 25 of 50 memtables, then next time when the total size reach 1GB, there will be 25 memtables with
size 10MB, and other 25 memtables with size 30MB. So I can flush those memtables with size 30MB, which is larger
than 20MB.

The main idea is to introduce some jitter during flush to ensure the small unevenness of each memtable, so as to ensure that flush will only be triggered when the memtable is large enough.

In my test, loading a table with 48 buckets, mem limit 2G, in previous version, the average memtable size is 44MB,
after modification, the average size is 82MB

2021-11-01 10:51:50 +08:00

conf

[Feature][LDAP] Add LDAP authentication login and LDAP group authorization support. (#6333 )

2021-07-30 09:24:50 +08:00

fe-common

[Dependency] Upgrade thirdparty libs (#6766 )

2021-10-15 13:03:04 +08:00

fe-core

[Enhance][Load] Reduce the number of segments when loading a large volume data in one batch (#6947 )

2021-11-01 10:51:50 +08:00

spark-dpp

[Typo] Correct misspellings in SparkDpp (#6789 )

2021-10-10 23:07:39 +08:00

checkstyle-apache-header.txt

Add Checkstyle for doris-fe (#1353 )

2019-06-21 21:45:54 +08:00

checkstyle.xml

Add classes related to "tag". (#2343 )

2019-12-15 20:13:29 +08:00

pom.xml

[S3] Support path style endpoint (#6962 )

2021-11-01 10:48:10 +08:00

README

[CodeRefactor] Modify FE modules (#4146 )

2020-07-29 16:18:05 +08:00

README

# fe-common

This module is used to store some common classes of other modules.

# spark-dpp

This module is Spark DPP program, used for Spark Load function.
Depends: fe-common

# fe-core

This module is the main process module of FE.
Depends: fe-common, spark-dpp