[improvement](statistics, multi catalog)Estimate hive table row count based on file size. (#21207)

Support estimate table row count based on file size.

With sample size=3000 (total partition number is 87491), load cache time is 45s.
With sample size=100000 (more than total partition number 87505), load cache time is 388s.
This commit is contained in:
Jibing-Li
2023-07-05 16:07:12 +08:00
committed by GitHub
parent 1121e7d0c3
commit 37a52789bd
6 changed files with 168 additions and 11 deletions

View File

@ -2023,4 +2023,9 @@ public class Config extends ConfigBase {
"是否禁止使用 WITH REOSOURCE 语句创建 Catalog。",
"Whether to disable creating catalog with WITH RESOURCE statement."})
public static boolean disallow_create_catalog_with_resource = true;
@ConfField(mutable = true, masterOnly = false, description = {
"Hive行数估算分区采样数",
"Sample size for hive row count estimation."})
public static int hive_stats_partition_sample_size = 3000;
}