[opt](FileReader) InMemoryReader is only used in s3 (#23486)

If file size < 8MB, the file will be read into memory, and this idea is from https://github.com/apache/hadoop/blob/trunk/hadoop-tools/hadoop-aws/src/site/markdown/tools/hadoop-aws/prefetching.md#s3inmemoryinputstream. However, in some cases, we only read one or two columns in a file, and the actually required bytes is only 1%, resulting in a multiple fold increase in the amount of data read. Therefore, `InMemoryReader` can only used in object storage, and reduce the threshold.
2023-08-30 20:43:39 +08:00
parent b7404896fa
commit 449c595f9d
4 changed files with 13 additions and 5 deletions
--- a/be/src/common/config.cpp
+++ b/be/src/common/config.cpp
@ -850,6 +850,9 @@ DEFINE_Validator(jsonb_type_length_soft_limit_bytes,
 // is greater than object_pool_buffer_size, release the object in the unused_object_pool.
 DEFINE_Int32(object_pool_buffer_size, "100");

+// Threshold of reading a small file into memory
+DEFINE_mInt32(in_memory_file_size, "1048576"); // 1MB
+
 // ParquetReaderWrap prefetch buffer size
 DEFINE_Int32(parquet_reader_max_buffer_size, "50");
 // Max size of parquet page header in bytes