doris

Author	SHA1	Message	Date
Mingyu Chen	2678afd2db	[fix][improvement](fs) add HdfsIO profile and modification time (#21638 ) Refactor the interface of create_file_reader the file_size and mtime are merged into FileDescription, not in FileReaderOptions anymore. Now the file handle cache can get correct file's modification time from FileDescription. Add HdfsIO for hdfs file reader pick from [Enhancement](multi-catalog) Add hdfs read statistics profile. #21442	2023-07-08 14:49:44 +08:00
caiconghui	db50face41	[fix](time_zone) be compatible with doris old version for CST time_zone when load orc file in broker load (#21263 ) Fix error for broker load with orc file when time_zone is CST of which message is "Failed to create orc row reader. reason = Can't open /usr/share/zoneinfo/CST" Co-authored-by: caiconghui1 <caiconghui1@jd.com>	2023-06-28 09:44:42 +08:00
Qi Chen	bad22dd4e2	[Fix](orc-reader) Fix orc dict filter null value issue in `_convert_dict_cols_to_string_cols` which caused incorrect result. (#21047 ) Query results should not have empty values. ``` use regresssion.multi_catalog; select commit_id from github_events_orc WHERE (event_type = 'CommitCommentEvent') AND commit_id != "" limit 10; ``` ``` +------------------------------------------+ \| commit_id \| +------------------------------------------+ \| 685c1fd8dbbdc10c042932f9a9f88be00ff96c75 \| \| 685c1fd8dbbdc10c042932f9a9f88be00ff96c75 \| \| 4e3ab2ff2d2474f5d51334b9b0fdf17e9845a166 \| \| \| \| \| \| \| \| \| \| \| \| \| \| 7191c20cb49da07a7fc16aa32dc0de4faff528b2 \| +------------------------------------------+ 10 rows in set (0.54 sec) ```	2023-06-21 14:54:01 +08:00
Qi Chen	c85271d2ae	[Fix](orc-reader) Fix filter size mismatch in orc reader. (#20998 ) Fix filter size mismatch in orc reader introduced by #20806	2023-06-20 12:27:16 +08:00
Qi Chen	b7a50a09fe	[Opt](orc-reader) Optimize orc reader by dict filtering. (#20806 ) Optimize orc reader by dict filtering. It is similar with #17594. Test result ssb-flat-100: (3 nodes) \| Query \| before opt \| after opt \| \| ------------- \|:-------------:\| ---------:\| Q1.1 \| 1.239 \| 1.145 Q1.2 \| 1.254 \| 1.128 Q1.3 \| 1.931 \| 1.644 Q2.1 \| 1.359 \| 1.006 Q2.2 \| 1.229 \| 0.674 Q2.3 \| 0.934 \| 0.427 Q3.1 \| 2.226 \| 1.712 Q3.2 \| 2.042 \| 1.562 Q3.3 \| 1.631 \| 1.021 Q3.4 \| 1.618 \| 0.732 Q4.1 \| 2.294 \| 1.858 Q4.2 \| 2.511 \| 1.961 Q4.3 \| 1.736 \| 1.446 total \| 22.004 \| 16.316	2023-06-16 13:11:37 +08:00
Qi Chen	73ad885e19	[Feature][Fix](multi-catalog) Implements transactional hive full acid tables. (#20679 ) After supporting insert-only transactional hive full acid tables #19518, #19419, this PR support transactional hive full acid tables. Support hive3 transactional hive full acid tables. Hive2 transactional hive full acid tables need to run major compactions.	2023-06-13 08:55:16 +08:00
Xinyi Zou	93b53cf2f4	[improvement](exception-safe) create and prepare node/sink support exception safe (#20551 )	2023-06-09 21:06:59 +08:00
Qi Chen	4c6b99d1f9	[Fix](orc-reader) Fix the inner reader of `MergeRangeFileReader` is not correct when creating `MergeRangeFileReader` in orc reader. (#20393 ) Fix the inner reader of MergeRangeFileReader is not correct when creating MergeRangeFileReader in orc reader.	2023-06-09 08:53:27 +08:00
Qi Chen	845d459f05	[Fix](orc-reader) Fix some bugs of orc lazy materialization. (#20410 ) Fix some bugs of orc lazy materialization(#18615) - Fix issue causing column size to continuously increase after `execute_conjuncts()` by calling `Block::erase_useless_column()`. - Fix partition issues of orc lazy materialization. - Fix lazy materialization will not be used when the predicate column is inconsistent with the orc file.	2023-06-09 08:53:01 +08:00
Qi Chen	4faee4d8fd	[Fix](multi-catalog) Fix be crashed when query hive table after schema changed(new column added). (#20537 ) Fix be crashed when query hive table after schema changed(new column added). Regression Test: test_hive_schema_evolution.groovy	2023-06-08 18:10:36 +08:00
Qi Chen	9b32d42ee4	[Fix](multi-catalog) fix all nested type test which introduced by #19518(support insert-only transactional table). (#20194 ) Fix `qt_nested_types_orc` in `test_tvf_p2` which introduced by #19518(support insert-only transactional table). ### Test case error `qt_nested_types_orc` in `test_tvf_p2` ``` select count(array0), count(array1), count(array2), count(array3), count(struct0), count(struct1), count(map0) from hdfs( "uri" = "hdfs://172.21.16.47:4007/catalog/tvf/orc/all_nested_types.orc", "format" = "orc", "fs.defaultFS" = "hdfs://172.21.16.47:4007") ``` Error Message： errCode = 2, detailMessage = (172.21.0.101)[INTERNAL_ERROR]Wrong data type for colum 'struct1'	2023-05-30 09:55:40 +08:00
Jerry Hu	9f8de89659	[refactor](exec) replace the single pointer with an array of 'conjuncts' in ExecNode (#19758 ) Refactoring the filtering conditions in the current ExecNode from an expression tree to an array can simplify the process of adding runtime filters. It eliminates the need for complex merge operations and removes the requirement for the frontend to combine expressions into a single entity. By representing the filtering conditions as an array, each condition can be treated individually, making it easier to add runtime filters without the need for complex merging logic. The array can store the individual conditions, and the runtime filter logic can iterate through the array to apply the filters as needed. This refactoring simplifies the codebase, improves readability, and reduces the complexity associated with handling filtering conditions and adding runtime filters. It separates the conditions into discrete entities, enabling more straightforward manipulation and management within the execution node.	2023-05-29 11:47:31 +08:00
Qi Chen	cb4a57f44f	[Opt](orc-reader) Support merge small IO facility in orc reader. (#20092 ) #18976 introduced merge small IO facility to optimize performance, and used by parquet reader. This PR support this facility in orc reader. Current ORC reader implementation need to reposition parent present stream when reading lazy columns in lazy materialization facility. So let it works by removing `DCHECK_GE(offset, cached_data.end_offset)`.	2023-05-26 21:06:12 +08:00
Qi Chen	53ba46e404	[Fix][Refactor] Fix 'not member call on null pointer of type 'doris::TextConverter' error in ubsan env and refactor text converter. (#19849 ) Fix 'not member call on null pointer of type doris::TextConverter' error in ubsan env and refactor text converter.	2023-05-22 21:00:19 +08:00
Pxl	2a02561863	[Bug](ubsan) fix some wrong downcast founded by ubsan (#19591 ) fix some wrong downcast founded by ubsan. ```cpp doris/be/src/olap/bloom_filter_predicate.h:43:32: runtime error: downcast of address 0x7f8ec2b691a0 which does not point to an object of type 'doris::BloomFilterColumnPredicate<doris::TYPE_DATE>::SpecificFilter' (aka 'BloomFilterFunc<(doris::PrimitiveType)11U>') 0x7f8ec2b691a0: note: object is of type 'doris::BloomFilterFunc<(doris::PrimitiveType)12>' e5 55 00 00 10 74 58 42 e5 55 00 00 00 00 10 00 8e 7f 00 00 20 07 6f cc 8e 7f 00 00 80 fe 68 cc ^~~~~~~~~~~~~~~~~~~~~~~ vptr for 'doris::BloomFilterFunc<(doris::PrimitiveType)12>' ``` 1. TYPE_DATE/TYPE_DATETIME have same data format, so I change the cast about bloom filter to reinterpret cast. ```cpp doris/be/src/vec/exec/format/orc/vorc_reader.h:281:17: runtime error: downcast of address 0x7f562f4c3180 which does not point to an object of type 'ColumnVector<int>' 0x7f562f4c3180: note: object is of type 'doris::vectorized::ColumnDecimal<doris::vectorized::Decimal<int> >' 74 65 00 00 20 91 70 f5 ca 55 00 00 02 00 00 00 00 00 00 00 f0 d4 4c 2f 56 7f 00 00 f0 d4 4c 2f ^~~~~~~~~~~~~~~~~~~~~~~ vptr for 'doris::vectorized::ColumnDecimal<doris::vectorized::Decimal<int> >' ``` 2. doris use ColumnDecimal to store decimal elements.	2023-05-15 14:27:48 +08:00
Qi Chen	0b25376cf8	[feature](torc) support insert only transactional hive table on be side (#19518 )	2023-05-11 14:15:09 +08:00
Ashin Gau	d7ad299154	[fix](NestedType) throw error when reading complex nested type in orc&parquet (#19489 ) Doris block does not support complex nested type now, but orc and parquet reader has generated complex nested column, which makes the output of mysql client wrong and users confused.	2023-05-11 07:51:02 +08:00
Ashin Gau	3ba3b6c66f	[opt](FileCache) use modification time to determine whether the file is changed (#18906 ) Get the last modification time from file status, and use the combination of path and modification time to generate cache identifier. When a file is changed, the modification time will be changed, so the former cache path will be invalid.	2023-05-11 07:50:39 +08:00
Qi Chen	096aa25ca6	[improvement](orc-reader) Implements ORC lazy materialization (#18615 ) - Implements ORC lazy materialization, integrate with the implementation of https://github.com/apache/doris-thirdparty/pull/56 and https://github.com/apache/doris-thirdparty/pull/62. - Refactor code: Move `execute_conjuncts()` and `execute_conjuncts_and_filter_block()` in `parquet_group_reader `to `VExprContext`, used by parquet reader and orc reader. - Add session variables `enable_parquet_lazy_materialization` and `enable_orc_lazy_materialization` to control whether enable lazy materialization. - Modify `build.sh` to update apache-orc submodule or download package every time.	2023-05-09 23:33:33 +08:00
Ashin Gau	b6c7f3aeb8	[opt](FileCache) Add file cache metrics and management (#19177 ) Add file cache metrics and management. 1. Get file cache metrics > If the performance of file cache is not efficient, there are currently no metrics to investigate the cause. In practice, hit ratio, disk usage, and segments removed status are very important information. API: `http://be_host:be_webserver_port/metrics` File cache metrics for each base path start with `doris_be_file_cache_` prefix. `hits_ratio` is the hit ratio of the cache since BE startup; `removed_elements` is the num of removed segment files since BE startup; Every cache path has three queues: index, normal and disposable. The capacity ratio of the three queues is 1:17:2. ``` doris_be_file_cache_hits_ratio{path="/mnt/datadisk1/gaoxin/file_cache"} 0.500000 doris_be_file_cache_hits_ratio{path="/mnt/datadisk1/gaoxin/small_file_cache"} 0.500000 doris_be_file_cache_removed_elements{path="/mnt/datadisk1/gaoxin/file_cache"} 0 doris_be_file_cache_removed_elements{path="/mnt/datadisk1/gaoxin/small_file_cache"} 0 doris_be_file_cache_normal_queue_max_size{path="/mnt/datadisk1/gaoxin/file_cache"} 912680550400 doris_be_file_cache_normal_queue_max_size{path="/mnt/datadisk1/gaoxin/small_file_cache"} 8500000000 doris_be_file_cache_normal_queue_max_elements{path="/mnt/datadisk1/gaoxin/file_cache"} 217600 doris_be_file_cache_normal_queue_max_elements{path="/mnt/datadisk1/gaoxin/small_file_cache"} 102400 doris_be_file_cache_normal_queue_curr_size{path="/mnt/datadisk1/gaoxin/file_cache"} 14129846 doris_be_file_cache_normal_queue_curr_size{path="/mnt/datadisk1/gaoxin/small_file_cache"} 14874904 doris_be_file_cache_normal_queue_curr_elements{path="/mnt/datadisk1/gaoxin/file_cache"} 18 doris_be_file_cache_normal_queue_curr_elements{path="/mnt/datadisk1/gaoxin/small_file_cache"} 22 ... ``` 2. Release file cache > Frequent segment files swapping can seriously affect the performance of file cache. Adding a deletion interface helps users clean up the file cache. API: `http://be_host:be_webserver_port/api/file_cache?op=release&base_path=${file_cache_base_path}` Return the number of released segment files. If `base_path` is not provide in url, all cache paths will be released. It's thread-safe to call this api, so only the segment files not been read currently can be released. ``` {"released_elements":22} ``` 3. Specify the base path to store cache data > Currently, regression testing lacks test cases of file cache, which cannot guarantee the stability of file cache. This interface is generally used in regression testing scenarios. Different queries use different paths to verify different usage cases and performance. User can set session variable `file_cache_base_path` to specify the base path to store cache data. `file_cache_base_path="random"` as default, means chosing a random path from cached paths to store cache data. If `file_cache_base_path` is not one of the base paths in BE configuration, a random path is used.	2023-05-05 14:28:01 +08:00
WenYao	339d804ec4	[Refactor](exceptionsafe) add factory creator to some class (#19000 )	2023-04-25 14:33:47 +08:00
Ashin Gau	29f502380c	[opt](FileReader) merge small IO to optimize read performace (#18796 ) Add `MergeRangeFileReader` to merge small IO to optimize parquet&orc read performance. `MergeRangeFileReader` is a FileReader that efficiently supports random access in format like parquet and orc. In order to merge small IO in parquet and orc, the random access ranges should be generated when creating the reader. The random access ranges is a list of ranges that order by offset. The range in random access ranges should be reading sequentially, can be skipped, but can't be read repeatedly. When calling read_at, if the start offset located in random access ranges, the slice size should not span two ranges. For example, in parquet, the random access ranges is the column offsets in a row group. When reading at offset, if [offset, offset + 8MB) contains many random access ranges, the reader will read data in [offset, offset + 8MB) as a whole, and copy the data in random access ranges into small buffers(name as box, default 1MB, 64MB in total). A box can be occupied by many ranges, and use a reference counter to record how many ranges are cached in the box. If reference counter equals zero, the box can be release or reused by other ranges. When there is no empty box for a new read operation, the read operation will do directly. ## Effects The runtime of ClickBench reduces from 102s to 77s, and the runtime of Query 24 reduces from 24.74s to 9.45s. The profile of Query 24: ``` VFILE_SCAN_NODE (id=0):(Active: 8s344ms, % non-child: 83.06%) - FileReadBytes: 534.46 MB - FileReadCalls: 1.031K (1031) - FileReadTime: 28s801ms - GetNextTime: 8s304ms - MaxScannerThreadNum: 12 - MergedSmallIO: 0ns - CopyTime: 157.774ms - MergedBytes: 549.91 MB - MergedIO: 94 - ReadTime: 28s642ms - RequestBytes: 507.96 MB - RequestIO: 1.001K (1001) - NumScanners: 18 ``` 1001 request IOs has been merged into 94 IOs. ## Remaining problems 1. Add p2 regression test in nest PR 2. Profiles are scattered in various codes and will be refactored in the next PR 3. Support ORC reader	2023-04-23 10:51:38 +08:00
Qi Chen	3328a65b75	[Fix](mutli-catalog) Use decimal v3 type to fix decimal loss issue in multi-catalog module. (#18835 ) Fix decimal v3 precision loss issues in the multi-catalog module. Now it will use decimal v3 to represent decimal type in the multi-catalog module. Regression Test: `test_load_with_decimal.groovy`	2023-04-20 11:02:53 +08:00
Adonis Ling	e412dd12e8	[chore](build) Use include-what-you-use to optimize includes (PART II) (#18761 ) Currently, there are some useless includes in the codebase. We can use a tool named include-what-you-use to optimize these includes. By using a strict include-what-you-use policy, we can get lots of benefits from it.	2023-04-19 23:11:48 +08:00
Qi Chen	cc4778a271	[Fix](orc-reader) Check hasNulls() firstly when use notNull data in ColumnVectorBatch. #18674	2023-04-15 19:48:31 +08:00
Ashin Gau	f28c75bd80	[fix](file_reader) bad_typeid when reading csv&json files (#18400 ) PR(#18340) resolve the conflict with PR(#18301) has changed the file_reader to create, resulting in e: [E-123] std::bad_typeid exception.	2023-04-06 10:00:29 +08:00
Ashin Gau	47aa8a6d8a	[fix](file_cache) turn on file cache by FE session variable (#18340 ) Fix tow bugs: 1. Enabling file caching requires both `FE session` and `BE` configurations(enable_file_cache=true) to be enabled. 2. `ParquetReader` has not used `IOContext` previously, but `CachedRemoteFileReader::read_at` needs `IOContext` after PR(#17586).	2023-04-05 15:51:47 +08:00
Qi Chen	eb0fd0017e	[Fix](orc-reader) Fix the scale of decimal column is incorrect when query orc tables. (#18324 ) The scale of decimal column is incorrect when query orc tables.	2023-04-04 08:50:47 +08:00
Ashin Gau	a813ad56ad	[fix](multi-catalog) key and value columns of map are normal column type (#18160 ) PR(#17330) has changed the column type of kay and value from array to normal column, but orc&parquet reader still cast to array column, resulting in cast error.	2023-03-28 23:11:40 +08:00
Mingyu Chen	cb79e42e5c	[refactor](file-system)(step-1) refactor file sysmte on BE and remove storage_backend (#17586 ) See #17764 for details I have tested: - Unit test for local/s3/hdfs/broker file system: be/test/io/fs/file_system_test.cpp - Outfile to local/s3/hdfs/broker. - Load from local/s3/hdfs/broker. - Query file on local/s3/hdfs/broker file system, with table value function and catalog. - Backup/Restore with local/s3/hdfs/broker file system Not test: - cold & host data separation case.	2023-03-21 21:08:38 +08:00
Gabriel	bd8e3e6405	[refactor](date) unify DateTimeValue and VecDateTimeValue (#17670 )	2023-03-20 16:27:08 +08:00
Pxl	16fc3a0e22	[Chore](compile) remove some unused static on inline function to reduce compile time (#17603 ) remove some unused static on inline function to reduce compile time	2023-03-13 11:11:59 +08:00
Mingyu Chen	3d0beec01d	[fix](orc) fix heap-use-after-free and potential memory leak of orc reader (#17431 ) fix heap-use-after-free The OrcReader has a internal FileInputStream, If the file is empty, the memory of FileInputStream will leak. Besides, there is a Statistics instance in FileInputStream. FileInputStream maybe delete if the orc reader is inited failed, but Statistics maybe used when orc reader is closed, causing heap-use-after-free error. Potential memory leak When init file scanner in file scan node, the file scanner prepare failed, the memory of file scanner will leak.	2023-03-06 08:42:35 +08:00
Ashin Gau	bf5037d6d5	[fix](OrcReader) typo in anaylize null values (#17156 ) typographical error in analyzing null values for OrcReader.	2023-02-28 14:29:13 +08:00
zxealous	a0782a1855	[fix](file reader) fix be core in broker file reader (#17039 ) A const reference member variables as class member stores a temporary object, which cannot be got after the temporary object being destroyed, cause be core dump while enable debug level log _broker_addr has been destroyed in BrokerFileReader	2023-02-26 12:35:31 +08:00
Ashin Gau	c43e521d29	[feature](multi-catalog) support map&struct type in parquet&orc reader (#17087 ) Support parsing map&struct type in parquet&orc reader. ## Remaining Problems 1. Doris use array type to build the key and value column of a `map`, but doesn't fill the offsets in value column, so the offsets in value column is wasted. 2. Parquet support reading only key or value column in `map`, this PR hasn't supported yet. 3. Parquet support reading partial columns in `struct`, this PR hasn't supported yet.	2023-02-26 08:55:39 +08:00
Ashin Gau	e42465ae59	[fix](OrcReader) handle null values in orc reader for string type (#17135 ) Orc doesn't fill null values in new batch, but the former batch has been release. Other types like int/long/timestamp... are flat types without pointer in them, so other types do not need to be handled separately like string.	2023-02-26 08:10:40 +08:00
zxealous	29c46d6926	[fix](struct-type) fix be core when load array orc file (#16978 ) * fix be core when load array orc file	2023-02-22 10:15:39 +08:00
Mingyu Chen	491d269412	[fix](tvf) fix bug that failed to get schema of tvf when file is empty (#16928 ) In previous implementation, when querying tvf, FE will get schema from BE. And BE will try to open the first file to get its schema info, but for orc or parquet format, if the file is empty, it will return error. But even for an empty file, we can still get schema info from file's footer. So we should handle the empty file to get schema info correctly. Also modify the catalog doc to add some FAQ.	2023-02-21 14:14:32 +08:00
Qi Chen	a46941c684	[Fix](multi-catalog) Fix switch-case fall-through issue in multi-catalog module. (#16931 ) Fix switch-case fall-through issue in multi-catalog module.	2023-02-20 21:35:41 +08:00
Jibing-Li	292926e5aa	[Fix](multi catalog)Fix partition case bug (#16763 ) Set column names from path to lower case in case-insensitive case. This is for Iceberg columns from path. Iceberg columns are case sensitive, which may cause error for table with partitions.	2023-02-16 15:47:23 +08:00
Jibing-Li	0d9714b179	[Fix](multi catalog)Support read hive1.x orc file. (#16677 ) Hive 1.x may write orc file with internal column name (_col0, _col1, _col2...). This will cause query result be NULL because column name in orc file doesn't match with column name in Doris table schema. This pr is to support query Hive orc files with internal column names. For now, we haven't see any problem in Parquet file, will send new pr to fix parquet if any problem show up in the future.	2023-02-14 14:32:27 +08:00
Pxl	5e4bb98900	[Chore](build) enable -Wpedantic and update lowest gcc version to 11.1 (#16290 ) enable -Wpedantic and update lowest gcc version to 11.1	2023-02-03 11:28:48 +08:00
Ashin Gau	9618427020	[improvement](multi-catalog) increase default batch_size to 4064 (#16326 ) The performance of ClickBench Q30 is affected by batch_size: \| batch_size \| 1024 \| 4096 \| 20480 \| \| -- \| -- \| -- \| -- \| \| Q30 query time \| 2.27 \| 1.08 \| 0.62 \| Because aggregation operator will create a new result block for each batch block, and Q30 has 90 columns, which is time-consuming. Larger batch_size will decrease the number of aggregation blocks, so the larger batch_size will improve performance. Doris internal reader will read at least 4064 rows even if batch_size < 4064, so this PR keep the process of reading external table the same as internal table.	2023-02-02 11:51:09 +08:00
Jibing-Li	1589d453a3	[fix](multi catalog)Support parquet and orc upper case column name (#16111 ) External hms catalog table column names in doris are all in lower case, while iceberg table or spark-sql created hive table may contain upper case column name, which will cause empty query result. This pr is to fix this bug. 1. For parquet file, transfer all column names to lower case while parse parquet metadata. 2. For orc file, store the origin column names and lower case column names in two vectors, use the suitable names in different cases. 3. FE side, change the column name back to the origin column name in iceberg while doing convertToIcebergExpr.	2023-01-27 23:52:11 +08:00
ZhaoChangle	199d7d3be8	[Refactor]Merged string_value into string_ref (#15925 )	2023-01-22 16:39:23 +08:00
Tiewei Fang	f17d69e450	[feature](file cache)Import `file cache` for remote file reader (#15622 ) The main purpose of this pr is to import `fileCache` for lakehouse reading remote files. Use the local disk as the cache for reading remote file, so the next time this file is read, the data can be obtained directly from the local disk. In addition, this pr includes a few other minor changes Import File Cache: 1. The imported `fileCache` is called `block_file_cache`, which uses lru replacement policy. 2. Implement a new FileRereader `CachedRemoteFilereader`, so that the logic of `file cache` is hidden under `CachedRemoteFilereader`. Other changes: 1. Add a new interface `fs()` for `FileReader`. 2. `IOContext` adds some statistical information to count the situation of `FileCache` Co-authored-by: Lightman <31928846+Lchangliang@users.noreply.github.com>	2023-01-10 12:23:56 +08:00
luozenglin	f8bb8c7829	[fix](broker) fix be core dump caused by broker load (#15390 ) * [fix](broker) fix be core dump caused by broker load	2022-12-28 10:57:41 +08:00
Tiewei Fang	ec055e1acb	[feature](new file reader) Integrate new file reader (#15175 )	2022-12-26 08:55:52 +08:00
Ashin Gau	5cefd05869	[fix](multi-catalog) fix and optimize iceberg v2 reader (#15274 ) Fix three bugs when read iceberg v2 tables: 1. The `delete position` in `delete file` represents the position of delete row in the entire file, but the `read range` in `RowGroupReader` represents the position in current row group. Therefore, we need to subtract the position of first row of current row group from `delete position`. 2. When only reading the partition columns, `RowGroupReader` skips processing the `delete position`. 3. If the `delete position` has delete all rows in a row group, the `read range` is empty, but we read the whole row group in such case. Optimize four performance issues: 1. We change `delete position` to `delete range`, and then merge `delete range` and `read range` into the final read ranges. This process is too tedious and time-consuming. . we can merge `delete position` and `read range` directly. 2. `delete position` is ordered in a `delete file`, so we can use merge-sort, instead of ordered-set. 3. Initialize `RowGroupReader` when reading, instead of initialize all row groups when opening a `ParquetReader`, to save memory usage, and the same as `IcebergReader`. 4. Change the recursive call of `_do_lazy_read` to loop logic.	2022-12-24 16:02:07 +08:00

1 2

58 Commits