Commit Graph

17 Commits

Author SHA1 Message Date
96d4778f2e [fix](parquet) the end offset of column chunk may be wrong in parquet metadata (#28891) 2023-12-23 22:21:04 +08:00
5d8c465644 [regression](p2) fix test cases result (#28768)
regression-test/data/external_table_p2/hive/test_hive_hudi.out
regression-test/data/external_table_p2/hive/test_hive_to_array.out
regression-test/suites/external_table_p2/tvf/test_local_tvf_compression.groovy
regression-test/suites/external_table_p2/tvf/test_path_partition_keys.groovy
regression-test/data/external_table_p2/hive/test_hive_text_complex_type.out
2023-12-21 14:38:30 +08:00
ec40603b93 [fix](parquet) compressed_page_size has the same meaning in page v1 and v2 (#26783)
1. Parquet with page v2 is parsed error when using other codec except snappy. Because `compressed_page_size` has the same meaning in page v1 and v2, it always contains the bytes of definition level, repetition level and compressed data.
2. Add regression test for `fix_length_byte_array` stored decimal type, and dictionary encoded date/datetime type.
2023-11-14 08:30:42 +08:00
57ed781bb6 [fix](regression-test) Add tvf regression tests (#26455) 2023-11-09 12:09:32 +08:00
727fa2c0cd [opt](tvf) refine the class of ExternalFileTableValuedFunction (#24706)
`ExternalFileTableValuedFunction` now has 3 derived classes:

- LocalTableValuedFunction
- HdfsTableValuedFunction
- S3TableValuedFunction

All these tvfs are for reading data from file. The difference is where to read the file, eg, from HDFS or from local filesystem.

So I refine the fields and methods of these classes.
Now there 3 kinds of properties of these tvfs:

1. File format properties

	File format properties, such as `format`, `column_separator`. For all these tvfs, they are common properties.
	So these properties should be analyzed in parenet class `ExternalFileTableValuedFunction`.
	
2. URI or file path

	The URI or file path property indicate the file location. For different storage, the format of the uri are not same.
	So they should be analyzed in each derived classes.
	
3. Other properties

	All other properties which are special for certain tvf.
	So they should be analyzed in each derived classes.
	
There are 2 new classes:

- `FileFormatConstants`: Define some common property names or variables related to file format.
- `FileFormatUtils`: Define some util methods related to file format.

After this PR, if we want to add some common properties for all these tvfs, only need to handled it in
`ExternalFileTableValuedFunction`, to avoid missing handle it in any one of them.

### Behavior change

1. Remove `fs.defaultFS` property in `hdfs()`, it can be got from `uri`
2. Use `\t` as the default column separator of csv format, same as stream load
2023-10-07 12:44:04 +08:00
ef72321878 [fix](regression-text)fix test_path_partition_keys regression test (#24796)
fix test_path_partition_keys regression test
2023-09-24 23:32:33 +08:00
4dad7c94da [fix](orc) fix the count(*) pushdown issue in orc format (#24446)
In previous, when querying hive table in orc format, and the file is splitted.
the result of select count(*) may be multiple of the real row number.

This is because the number of rows should be got after orc strip prune,
otherwise, it may return wrong result
2023-09-16 09:57:39 +08:00
b407f275c8 [fix](hive) fix partition prune issue and some external table test cases (#24338)
1. Fix hive partition prune bug, introduced from #23845, will fail `test_hive_default_partition` test case.
2. Fix `test_local_tvf.groovy` test case, the path of local tvf should be relative path.
3. Fix `test_external_catalog_hive` test case, the `partitions` is now reserve keywords
4. Support `local` tvf in Nereids, but fix related issue like:

```
Caused by: java.lang.NullPointerException
        at org.apache.doris.nereids.stats.ExpressionEstimation.castMinMax(ExpressionEstimation.java:171) ~[doris-fe.jar:1.2-SNAPSHOT]
        at org.apache.doris.nereids.stats.ExpressionEstimation.visitCast(ExpressionEstimation.java:167) ~[doris-fe.jar:1.2-SNAPSHOT]
        at org.apache.doris.nereids.stats.ExpressionEstimation.visitCast(ExpressionEstimation.java:109) ~[doris-fe.jar:1.2-SNAPSHOT]
        at org.apache.doris.nereids.trees.expressions.Cast.accept(Cast.java:55) ~[doris-fe.jar:1.2-SNAPSHOT]
        at org.apache.doris.nereids.stats.ExpressionEstimation.visitAlias(ExpressionEstimation.java:394) ~[doris-fe.jar:1.2-SNAPSHOT]
        at org.apache.doris.nereids.stats.ExpressionEstimation.visitAlias(ExpressionEstimation.java:109) ~[doris-fe.jar:1.2-SNAPSHOT]
        at org.apache.doris.nereids.trees.expressions.Alias.accept(Alias.java:145) ~[doris-fe.jar:1.2-SNAPSHOT]
        at org.apache.doris.nereids.stats.ExpressionEstimation.estimate(ExpressionEstimation.java:119) ~[doris-fe.jar:1.2-SNAPSHOT]
        at org.apache.doris.nereids.stats.StatsCalculator.lambda$computeProject$7(StatsCalculator.java:785) ~[doris-fe.jar:1.2-SNAPSHOT]
        at java.util.stream.ReferencePipeline$3$1.accept(ReferencePipeline.java:193) ~[?:1.8.0_341]
        at java.util.Spliterators$ArraySpliterator.forEachRemaining(Spliterators.java:948) ~[?:1.8.0_341]
        at java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:482) ~[?:1.8.0_341]
```
2023-09-15 20:57:04 +08:00
ebe3749996 [fix](tvf)support s3,local compress_type and append regression test (#24055)
support s3,local compress_type and append regression test.
2023-09-13 00:32:59 +08:00
9df72a96f3 [Feature](multi-catalog) Support hadoop viewfs. (#24168)
### Feature

Support hadoop viewfs.

### Test

- Regression tests: 
  - hive viewfs test.
  - tvf viewfs test.

- Broker load with broker and with hdfs tests manually.
2023-09-13 00:20:12 +08:00
f9a75b5c4f [feature](csv_serde)1.append csv serde for serialize to csv and deserialize from csv. 2.let csvReader use csv serde not text_converter. (#23352)
1. append csv serde for serialize to csv and deserialize from csv.
2. let csvReader use csv serde not text_converter.
2023-09-10 00:16:21 +08:00
00826185c1 [fix](tvf view)Support Table valued function view for nereids (#23317)
Nereids doesn't support view based table value function, because tvf view doesn't contain the proper qualifier (catalog, db and table name). This pr is to support this function.

Also, fix nereids table value function explain output exprs incorrect bug.
2023-08-25 21:23:16 +08:00
8be0202b94 [improvement](old planner)Prune extra slots with old planner for sql like select count(1) from view (#23393)
The sql like
Select count(1) from view 
would contain all the columns in old planner's execution plan, which is slow, because BE need to read all the column in data files. This pr is to improve the plan to only contain one column.
2023-08-25 21:22:03 +08:00
ceb931c513 [regression-test](hdfs_tvf)append regression test that hdfs_tvf read compression file (#23454) 2023-08-25 09:00:21 +08:00
9d2e23b1aa [fix](parquet) A row of complex type may be stored across more pages (#23277)
A row of complex type may be stored across two(or more) pages, and the parameter `align_rows` indicates that whether the reader should read the remaining value of the last row in previous page.
2023-08-22 14:47:10 +08:00
91b15183e7 [enhance][external]enhance and fix external cases 0807 (#22689)
enhance and fix external cases 0807
2023-08-08 10:53:08 +08:00
c31226b144 [refractor](regression-test) sort out test cases of external tables (#22640)
sort out the test cases of external table.
After modify, there are 2 directories:

1. `external_table_p0`: all p0 cases of external tables: hive, es, jdbc and tvf
2. `external_table_p2`: all p2 cases of external tables: hive, es, mysql, pg, iceberg and tvf

So that we can run it with one line command like:

```
sh run-regression-test.sh --run -d external_table_p0,external_table_p2
```
2023-08-07 11:12:30 +08:00