Commit Graph

9 Commits

Author SHA1 Message Date
0ac5228c05 [feature-wip][multi-catalog]Support prefetch for orc file format (#11292)
Refactor the prefetch code in parquet and support prefetch for orc file format
2022-08-02 11:01:15 +08:00
8a366c9ba2 [feature](multi-catalog) read parquet file by start/offset (#10843)
To avoid reading the repeat row group, we should align offsets
2022-07-18 20:51:08 +08:00
f21ce35059 [refactor]remove unused private field _profile (#10732) 2022-07-11 14:04:09 +08:00
c358a43f35 [feature-wip] support parquet predicate push down (#10512) 2022-07-08 23:11:25 +08:00
7c330e38d9 [fix](multi-catalog)Fix coredump when reading the parquet file for multi-thread (#10635)
There is two issue fixed in this pr:

**The first issue** is the C++ code rule of `do not call virtual function in constructor or deconstructor`. 
The deconstructor function of `ArrowReaderWrap` call the virtual function named `close()`.
When deconstructing, it will never call `ParquetReaderWrap::close()` just call the `ArrowReaderWrap::close()`

**The second issue** is parallelism deconstructing for `ParquetReaderWrap` and `prefetch_batch`.
`prefetch_batch` use `thread.detach()` to separate the control from `ParquetReaderWrap`, but it rely on some local vars from `ParquetReaderWrap` such as **`_closed ` /`_total_groups ` and `_reader`**

In this case, `ParquetReaderWrap` may call deconstructor before `prefetch_batch` and then get the core dump.
2022-07-08 14:54:10 +08:00
Pxl
fd0bd395ac [Enhancement] Remove some unused include (#10035) 2022-06-17 10:47:25 +08:00
94089b9192 [Refactor] Use file factory to replace create file reader/writer (#9505)
1. Simplify code logic and improve abstraction
2. Fix the mem leak of raw pointer

Co-authored-by: lihaopeng <lihaopeng@baidu.com>
2022-06-08 15:07:39 +08:00
f377c26bf7 [refactor][be] Optimize headers (#9708) 2022-05-30 16:12:10 +08:00
cbbda7857b [feature-wip](parquet-orc) Support orc scanner in vectorized engine (#9541) 2022-05-26 21:39:12 +08:00