Commit Graph

34 Commits

Author SHA1 Message Date
193be20c86 [feature](csv)Supports reading CSV data using LF and CRLF as line separators. (#37687) (#38099)
bp #37687
2024-07-22 22:53:04 +08:00
e7a001c420 [enhance](mtmv)support partition tvf (#37795)
pick from: https://github.com/apache/doris/pull/36479 and
https://github.com/apache/doris/pull/37201
2024-07-16 09:27:44 +08:00
ca0e44f83f [fix](case) fix struct format out files (#37350) (#37499)
bp #37350
2024-07-09 10:11:50 +08:00
ceef9ee123 [feature](serde) support presto compatible output format (#37039) (#37253)
bp #37039
2024-07-04 13:56:05 +08:00
4761090848 [fix](tvf) Partition columns in CTAS need to be compatible with the STRING type of external tables/TVF (#37161)
bp: #35489
2024-07-03 10:58:08 +08:00
b445c783eb [test](tvf) move p2 tvf tests from p2 to p0 (#37081) (#37152)
bp: #37081
2024-07-02 22:38:22 +08:00
74086189d3 [test](tvf) move p2 tvf tests from p2 to p0 (#36871) (#37150)
bp: #36871
2024-07-02 22:37:43 +08:00
bd24a8bdd9 [Fix](csv_reader) Add a session variable to control whether empty rows in CSV files are read as NULL values (#37153)
bp: #36668
2024-07-02 22:12:17 +08:00
5c1eef5f06 [feature](tvf) support max_filter_ratio (#35431) (#36911)
bp #35431

Co-authored-by: 苏小刚 <suxiaogang223@icloud.com>
2024-06-27 20:58:53 +08:00
f6beeb1ddd [Enhencement](tvf) select tvf supports using resource (#35139)
Create an S3/HDFS resource that TVF can use it directly to access the data source.
2024-05-24 16:23:58 +08:00
c0fd98abe5 [Fix](tvf) Fix that tvf reading empty files in compressed formats. (#34926)
1. Fix the issue with tvf reading empty compressed files.
2. move two test cases (`test_local_tvf_compression` and `test_s3_tvf_compression`) from p2 to p0
2024-05-21 12:59:31 +08:00
0e3ad5cd9d [fix](parquet) fix time zone error(isAdjustedToUTC=true) in parquet reader (#33675) (#33924)
bp (#33675)

Co-authored-by: Ashin Gau <AshinGau@users.noreply.github.com>
2024-04-20 19:06:54 +08:00
29556f758e [fix](parquet) fix time zone error in parquet reader (#33217)
`isAdjustedToUTC` is exactly the opposite in parquet reader(https://github.com/apache/parquet-format/blob/master/LogicalTypes.md), resulting the time with `isAdjustedToUTC=true` has increased by eight hours(UTC8).

The parquet with `isAdjustedToUTC=true` can be produced by spark-sql with the following configuration:
```
--conf spark.sql.session.timeZone=UTC
--conf spark.sql.parquet.outputTimestampType=TIMESTAMP_MICROS
```

However, using the following configuration, there's no logical and convert type in parquet meta data, so the time read by doris will also increase by eight hours(UTC8). Users need to set their own UTC time zone in doris(https://doris.apache.org/docs/dev/advanced/time-zone/)
```
--conf spark.sql.session.timeZone=UTC
--conf spark.sql.parquet.outputTimestampType=INT96
```
2024-04-07 23:24:22 +08:00
8bd101129a [behavior change](output) change float output format (#32049) 2024-03-21 14:07:22 +08:00
ade720470d [Improve](config)delete confused config for nested complex type (#29988) 2024-01-18 12:03:07 +08:00
81a0f8c041 [Feature](function) support generating const values from tvf numbers (#28051)
If specified, got a column of constant. otherwise an incremental series like it always be.

mysql> select * from numbers("number" = "5", "const_value" = "-123");
+--------+
| number |
+--------+
|   -123 |
|   -123 |
|   -123 |
|   -123 |
|   -123 |
+--------+
5 rows in set (0.11 sec)
2023-12-07 22:26:43 +08:00
573f0eaad9 [fix](regression)fix parquet data page v2 unstable case (#27753) 2023-11-29 18:58:37 +08:00
d771f16b79 [fix](parquet)fix bug that can not read parquet data page v2 (#27655) 2023-11-28 22:43:46 +08:00
cd6c61347d [Feature](tvf)(avro-jni) avro-jni add projection push down (#26885) 2023-11-27 10:33:27 +08:00
a16517a061 [test](tvf) append tvf read hive_text file regression case. (#26790) 2023-11-14 15:03:19 +08:00
22bf2889e5 [feature](tvf)(jni-avro)jni-avro scanner add complex data types (#26236)
Support avro's enum, record, union data types
2023-11-09 13:58:49 +08:00
57ed781bb6 [fix](regression-test) Add tvf regression tests (#26455) 2023-11-09 12:09:32 +08:00
7730a9025e [Fix](Regression-test) add test for tvf (#26322) 2023-11-03 19:07:07 +08:00
3e10e5af39 [Fix](Serde) Fix content displayed by complex types in MySQL Client (#25946)
This pr makes three changes to the display of complex types:
1. NULL value in complex types refers to being displayed as `null`, not `NULL`
2. struct type is displayed as "column_name": column_value
3. Time types such as `datetime` and `date`, are displayed with double quotes in complex types. like
    `{1, "2023-10-26 12:12:12"}`

This pr also do a code refactor:
1. nesting_level is set to a member variable of the `DataTypeSerDe`, rather than a parameter in methods.

What's more, this pr fix a bug that fileSize is not correct, introduced by this pr: #25854
2023-11-01 23:48:55 +08:00
85b8497624 [fix](Tvf) return empty set when tvf queries an empty file or an error uri (#25280)
### Before:
return errors when tvf queries an empty file or an error uri:
1. get parsed schema failed, empty csv file
2. Can not get first file, please check uri.

### Now:
we just return empty set when tvf queries an empty file or an error uri.
```sql
mysql> select * from s3( 
"uri" = "https://error_uri/exp_1.csv", 
"s3.access_key"= "xx", 
"s3.secret_key" = "yy", 
"format" = "csv") limit 10;

Empty set (1.29 sec)
```
2023-10-17 09:52:53 +08:00
977d119545 [fix](Insert select tvf) fix NPE because tvf do not have catalog name (#25149) 2023-10-09 18:02:43 +08:00
657e927d50 [fix](json)Fix the bug that read json file Out of bounds access (#23411) 2023-09-02 01:11:37 +08:00
f7a3d2778a [FIX](array)update array olapconvertor and support array nested other complex type (#23489)
* update array olapconvertor and support array nested other complex type

* update for inverted index
2023-08-29 16:18:11 +08:00
419e922a69 [fix](json)Fix the bug that does not stop when reading json files (#23062)
* [fix](json)Fix the bug that does not stop when reading json files
2023-08-18 18:23:19 +08:00
390c52f73a [Improve](complex-type) update for array/map element_at with nested complex type with local tvf (#22927) 2023-08-16 20:47:36 +08:00
707a527775 [FIX](map)insert into doris table with array/map type by local tvf (#22955) 2023-08-15 13:11:23 +08:00
5e2748d2b4 [Improve](complex-type)update orc reader for complex type and add regress tests (#22856) 2023-08-12 07:06:12 +08:00
28561f77e9 [fix](regression)fix test_hdfs_tvf regression_test out file : decimalv3 -> decimal (#22852) 2023-08-11 20:44:18 +08:00
c31226b144 [refractor](regression-test) sort out test cases of external tables (#22640)
sort out the test cases of external table.
After modify, there are 2 directories:

1. `external_table_p0`: all p0 cases of external tables: hive, es, jdbc and tvf
2. `external_table_p2`: all p2 cases of external tables: hive, es, mysql, pg, iceberg and tvf

So that we can run it with one line command like:

```
sh run-regression-test.sh --run -d external_table_p0,external_table_p2
```
2023-08-07 11:12:30 +08:00