doris

Files

Mingyu Chen 7c0bcbdca1 [enhance](parquet-reader) cache file meta of parquet to speed up query (#18074 )

Problem:
1. FE will split the parquet file into split. So a file can have several splits.
2. BE will scan each split, read the footer of the parquet file.
3. If 2 splits belongs to a same parquet file, the footer of this file will be read twice.

This PR mainly changes:
1. Use kv cache to cache the footer of parquet file.
2. The kv cache is belong to a scan node, so all parquet reader belong to this scan node will share same kv cache.
3. In cache, the key is "meta_file_path", the value is parsed thrift footer.

The KV Cache is sharded into mutlti sub cache.
So that different file can use different sub cache, avoid blocking each other

In my test, a query with 26 splits can reduce the footer parse time from 4s -> 1s

2023-03-25 23:22:57 +08:00

parquet

[enhance](parquet-reader) cache file meta of parquet to speed up query (#18074 )

2023-03-25 23:22:57 +08:00

vgeneric_iterators_test.cpp

[fix](union iterator) fix bug that result data order of VUnionIterator is different (#16938 )

2023-02-21 14:17:21 +08:00

vtablet_sink_test.cpp

[refactor](remove unused code) remove load stream mgr (#16580 )

2023-02-10 07:46:18 +08:00