Commit Graph

10 Commits

Author SHA1 Message Date
7147a7c290 [feature-wip](multi-catalog) Support s3 storage for file scan node (#10977)
This is an example of s3 hms_catalog:
```sql
CREATE CATALOG hms_catalog properties(
"type" = "hms",
"hive.metastore.uris"="thrift://localhost:9083",
"AWS_ACCESS_KEY" = "your access key",
"AWS_SECRET_KEY"="your secret key",
"AWS_ENDPOINT"="s3 endpoint",
"AWS_REGION"="s3-region",
"fs.s3a.paging.maximum"="1000");
```
All these params are necessary;
2022-07-21 17:38:53 +08:00
56e036e68b [feature-wip](multi-catalog) Support runtime filter for file scan node (#11000)
* [feature-wip](multi-catalog) Support runtime filter for file scan node

Co-authored-by: morningman <morningman@apache.org>
2022-07-20 12:36:57 +08:00
8a366c9ba2 [feature](multi-catalog) read parquet file by start/offset (#10843)
To avoid reading the repeat row group, we should align offsets
2022-07-18 20:51:08 +08:00
60dd322aba [feature-wip](multi-catalog) Optimize threads and thrift interface of FileScanNode (#10942)
FileScanNode in be will launch as many threads as the number of splits.
The thrift interface of FileScanNode is excessive redundant.
2022-07-18 20:50:34 +08:00
ba1c527a23 [improvement](arrow) Avoid parse timezone for each datetime value (#10869)
* [improvement](arrow) Avoid parse timezone for each datetime value

Convert arrow batch to doris block is too slow when there are datetime values.
Because we call `TimezoneUtils::find_cctz_time_zone` for each values.

After modify, the tpch-100 q1 with external table cost from 40s -> 9s

Co-authored-by: morningman <morningman@apache.org>
2022-07-15 21:19:36 +08:00
f21ce35059 [refactor]remove unused private field _profile (#10732) 2022-07-11 14:04:09 +08:00
639f1cd26c [improvement](parquet-reader) Add some profile for parquet reader (#10740) 2022-07-11 12:19:06 +08:00
24d824a783 [improvement](multi-catalog) Impl parallel for file scanner to improve the scanner performance (#10620)
Add multi-thread support in FileScanNode on be and impl the file spilt logic in fe.
2022-07-09 15:52:53 +08:00
c358a43f35 [feature-wip] support parquet predicate push down (#10512) 2022-07-08 23:11:25 +08:00
abd10f0f3e [feature-wip](multi-catalog) Impl FileScanNode in be (#10402)
Define a new file scanner node for hms table in be.
This file scanner node is different from broker scan node as blow:
1. Broker scan node will define src slot and dest slot, there is two memory copy in it: first is from file to src slot
    and second from src to dest slot. Otherwise FileScanNode only have one stemp memory copy just from file to dest slot.
2. Broker scan node will read all the filed in the file to src slot and FileScanNode only read the need filed.
3. Broker scan node will convert type into string type for src slot and then use cast to convert to dest slot type,
    but FileScanNode will have the final type.

Now FileScanNode is a standalone code, but we will uniform the file scan and broker scan in the feature.
2022-06-29 11:04:01 +08:00