doris

Files

Mingyu Chen 7c0bcbdca1 [enhance](parquet-reader) cache file meta of parquet to speed up query (#18074 )

Problem:
1. FE will split the parquet file into split. So a file can have several splits.
2. BE will scan each split, read the footer of the parquet file.
3. If 2 splits belongs to a same parquet file, the footer of this file will be read twice.

This PR mainly changes:
1. Use kv cache to cache the footer of parquet file.
2. The kv cache is belong to a scan node, so all parquet reader belong to this scan node will share same kv cache.
3. In cache, the key is "meta_file_path", the value is parsed thrift footer.

The KV Cache is sharded into mutlti sub cache.
So that different file can use different sub cache, avoid blocking each other

In my test, a query with 26 splits can reduce the footer parse time from 4s -> 1s

2023-03-25 23:22:57 +08:00

check/checkstyle

[improvement](dry-run)(tvf) support csv schema in tvf and add "dry_run_query" variable (#16983 )

2023-03-02 16:51:27 +08:00

fe-common

[Bug](DECIMALV3) Fix wrong precision for plus/minus (#18052 )

2023-03-25 09:42:39 +08:00

fe-core

[enhance](parquet-reader) cache file meta of parquet to speed up query (#18074 )

2023-03-25 23:22:57 +08:00

hive-udf

[chore](maven) Prefer protoc in thirdparty to the one in maven artifacts (#17596 )

2023-03-09 16:21:38 +08:00

java-udf

[improve](clickhouse jdbc) support clickhouse array type (#17993 )

2023-03-22 19:42:32 +08:00

spark-dpp

[fix](log) use logger to replace printStackTrace() (#17382 )

2023-03-03 14:51:30 +08:00

pom.xml

[bugfix](k8s)roll back jackson version (#18046 )

2023-03-24 19:36:59 +08:00

README

…

README

# Licensed to the Apache Software Foundation (ASF) under one
# or more contributor license agreements.  See the NOTICE file
# distributed with this work for additional information
# regarding copyright ownership.  The ASF licenses this file
# to you under the Apache License, Version 2.0 (the
# "License"); you may not use this file except in compliance
# with the License.  You may obtain a copy of the License at
#
#   http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing,
# software distributed under the License is distributed on an
# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
# KIND, either express or implied.  See the License for the
# specific language governing permissions and limitations
# under the License.

# fe-common

This module is used to store some common classes of other modules.

# spark-dpp

This module is Spark DPP program, used for Spark Load function.
Depends: fe-common

# fe-core

This module is the main process module of FE.
Depends: fe-common, spark-dpp