[opt](file_reader) add prefetch buffer to read csv&json file (#18301)
Co-authored-by: ByteYue <[yj976240184@gmail.com](mailto:yj976240184@gmail.com)> This PR is an optimization for https://github.com/apache/doris/pull/17478: 1. Change the buffer size of `LineReader` to 4MB to align with the size of prefetch buffer. 2. Lazily prefetch data in the first read to prevent wasted reading. 3. S3 block size is 32MB only, which is too small for a file split. Set 128MB as default file split size. 4. Add `_end_offset` for prefetch buffer to prevent wasted reading. The query performance of reading data on object storage is improved by more than 3x+.
This commit is contained in:
@ -1742,8 +1742,11 @@ public class Config extends ConfigBase {
|
||||
@ConfField(mutable = true, masterOnly = false)
|
||||
public static long file_scan_node_split_num = 128;
|
||||
|
||||
// 0 means use the block size in HDFS/S3 as split size.
|
||||
// HDFS block size is 128MB, while S3 block size is 32MB.
|
||||
// 32MB is too small for a S3 file split, so set 128MB as default split size.
|
||||
@ConfField(mutable = true, masterOnly = false)
|
||||
public static long file_split_size = 0; // 0 means use the block size in HDFS/S3 as split size
|
||||
public static long file_split_size = 134217728;
|
||||
|
||||
/**
|
||||
* If set to TRUE, FE will:
|
||||
|
||||
Reference in New Issue
Block a user