Commit Graph

28 Commits

Author SHA1 Message Date
263135c193 [fix](case) fix export data consistency case (#32005) 2024-03-09 19:45:50 +08:00
1f825ee2d6 [improve](export) Support partition data consistency (#31290) 2024-03-01 04:25:43 +08:00
a8d8c6a271 [fix](file-writer) opt s3 file writer and fix empty file related issue #28983 #30703 #31169 (#31213)
* (feature)(cloud) Use dynamic allocator instead of static buffer pool for better elasticity. (#28983)

* [fix](outfile) Fix unable to export empty data (#30703)

Issue Number: close #30600
Fix unable to export empty data to hdfs / S3, this behavior is inconsistent with version 1.2.7,
version 1.2.7 can export empty data to hdfs/ S3, and there will be exported files on S3/HDFS.

* [fix](file-writer) avoid empty file for segment writer (#31169)

---------

Co-authored-by: AlexYue <yj976240184@gmail.com>
Co-authored-by: zxealous <zhouchangyue@baidu.com>
2024-02-21 16:48:54 +08:00
f65844fae4 [Enhencement](Outfile/Export) Export data to csv file format with BOM (#30533)
The UTF8 format of the Windows system has BOM. 

We add a new user property to `Outfile/Export`。Therefore, when exporting Doris data, users can choose whether to bring BOM on the beginning of the CSV file.

**Usage:**
```sql
-- outfile:
select * from demo.student
into outfile "file:///xxx/export/exp_"
format as csv
properties(
    "column_separator" = ",",
    "with_bom" = "true"
);

-- Export:
EXPORT TABLE student TO "file:///xx/tmpdata/export/exp_"
PROPERTIES(
    "format" = "csv",
    "with_bom" = "true"
);
```
2024-02-16 10:16:40 +08:00
203daba19d [fix](outfile) fix outfile csv did not write json column with string (#29067) 2024-02-01 19:01:08 +08:00
78b0fec33a [Fix](Outfile) Support export nested complex type data to orc file format (#28182) 2023-12-13 11:55:27 +08:00
3dcbf16404 [Fix](Outfile) The Struct type data exported from select outfile to the csv file format should contain a column name #28068
If the original data is:
```sql
+-----------------------------------------------------+
| s_info                                              |
+-----------------------------------------------------+
| {"s_id": 2, "s_name": "nereids", "s_address": "20"} |
| {"s_id": 1, "s_name": "doris", "s_address": "18"}   |
+-----------------------------------------------------+
```

In the original logic, the struct type data exported to a csv file format did not contain column names,like
```
{2, "nereids", "20"} 
{1, "doris", "18"}
```

This pr do not need to be merged into branch-2.0
2023-12-07 18:23:36 +08:00
9b59bc14b5 [test](Export) add show export regression testes (#27140) 2023-11-22 00:13:30 +08:00
3e10e5af39 [Fix](Serde) Fix content displayed by complex types in MySQL Client (#25946)
This pr makes three changes to the display of complex types:
1. NULL value in complex types refers to being displayed as `null`, not `NULL`
2. struct type is displayed as "column_name": column_value
3. Time types such as `datetime` and `date`, are displayed with double quotes in complex types. like
    `{1, "2023-10-26 12:12:12"}`

This pr also do a code refactor:
1. nesting_level is set to a member variable of the `DataTypeSerDe`, rather than a parameter in methods.

What's more, this pr fix a bug that fileSize is not correct, introduced by this pr: #25854
2023-11-01 23:48:55 +08:00
99b45e1938 [fix](Outfile) Export DateTimev2 type of doris to ORC's TimeStamp type (#25470)
Previously,doris's `DateTimev2` was exported to orc as a `String` type.
Now, export doris's `DateTimev2` to orc timestamp type.
2023-10-29 15:59:38 +08:00
7f66be84d5 [fix](Outfile) Infer the column name if the column is expression in select into outfile (#25854)
This pr do two things:
1. Infer the column name if the column is expression in `select into outfile`. The rule for column name generation can be refered in pr: #24990 
2. fix bug that it will core dump if the `_schema` fails to build in the open phase in vorc_transformer.cpp


TODO:
1. Support infer the column name if the column is expression in `select into outfile` in new optimizer(Nereids).
2023-10-25 22:49:04 +08:00
ade475a52b [regression](outfile)add regression for select outfile with underscore prefix #25797 2023-10-24 17:58:38 +08:00
6f9a084d99 [Fix](Outfile) Use data_type_serde to export data to parquet file format (#24998) 2023-10-13 13:58:34 +08:00
c6b1c903e4 [fix](Regression-test) fix that the String type in a nested type should contain double quotes and add regression-test (#25115) 2023-10-11 18:30:26 +08:00
a48b19ceb6 [feature](Outfile) select into outfile supports to export struct/map/array type data to orc file format (#24350)
We do not support nested complex type in this pr.
2023-09-21 20:15:18 +08:00
a946f99b8c [Fix](regression-test) fix regression-test of export parquet file format (#24450) 2023-09-20 15:41:49 +08:00
29fe87982f [improve](outfile) add file_suffix options for outfile (#24334) 2023-09-15 12:58:41 +08:00
9847f7789f [Feature](Export) Export sql supports to export data of view and exrernal table (#24070)
Previously, EXPORT only supported the export of the olap table,
This pr supports the export of view table and external table.
2023-09-13 22:55:19 +08:00
a27349c83a [fix](Export) Concatenation the outfile sql for Export (#23635)
In the original logic, the `Export` statement generates `Selectstmt` for execution. But there is no way to make the `SelectStmt` use the new optimizer.

Now, we change the `Export` statement to generate the `outfile SQL`, and then use the new optimizer to parse the SQL so that outfile can use the new optimizer.
2023-09-08 10:20:18 +08:00
103fa4eb55 [feature](Export) support export with nereids (#23319) 2023-08-29 19:36:19 +08:00
f32efe5758 [Fix](Outfile) Fix that it does not report error when export table to S3 with an incorrect ak/sk/bucket (#23441)
Problem:
It will return a result although we use wrong ak/sk/bucket name, such as:
```sql
mysql> select * from demo.student
    -> into outfile "s3://xxxx/exp_"
    -> format as csv
    -> properties(
    ->   "s3.endpoint" = "https://cos.ap-beijing.myqcloud.com",
    ->   "s3.region" = "ap-beijing",
    ->   "s3.access_key"= "xxx",
    ->   "s3.secret_key" = "yyyy"
    -> );
+------------+-----------+----------+----------------------------------------------------------------------------------------------------+
| FileNumber | TotalRows | FileSize | URL                                                                                                |
+------------+-----------+----------+----------------------------------------------------------------------------------------------------+
|          1 |         3 |       26 | s3://xxxx/exp_2ae166e2981d4c08-b577290f93aa82ba_ |
+------------+-----------+----------+----------------------------------------------------------------------------------------------------+
1 row in set (0.15 sec)
```

The reason for this is that we did not catch the error returned by `close()` phase.
2023-08-26 00:19:30 +08:00
18094511e7 [fix](Outfile/Nereids) fix that csv_with_names and csv_with_names_and_types file format could not be exported on nereids (#23387)
This problem is casued by #21197

Fixed an issue that `csv_with_names` and `csv_with_names_and_types` file format could not be exported on nereids optimizer when using `select...into outfile`.
2023-08-25 11:12:04 +08:00
1d05feea1b [Feature](Nereids) add executable function to support fold constant for functions (#18209)
1. Add date-time functions for fold constant for Nereids.
This is the list of executable date-time function nereids supports up to now:
- now()
- now(int)
- current_timestamp()
- current_timestamp(int)
- localtime()
- localtimestamp()
- curdate()
- current_date()
- curtime()
- current_time()
- date_{add/sub}(),{years/months/days/hours/minutes/seconds}_{add/sub}()
- datediff()
- {date/datev2}()
- {year/quarter/month/day/hour/minute/second}()
- dayof{year/month/week}()
- date_format()
- date_trunc()
- from_days()
- last_day()
- to_monday()
- from_unixtime()
- unix_timestamp()
- utc_timestamp()
- to_date()
- to_days()
- str_to_date()
- makedate()

2. solved problem:
- enable datev2/datetimev2 default.
- refactor Nereids foldConstantOnFE and support fold nested expression.
- separate the executable into multi-files for easily-reading and adding new functions
2023-05-17 21:26:31 +08:00
91cdb79d89 [Bugfix](Outfile) fix that export data to parquet and orc file format (#19436)
1. support export `LARGEINT` data type to parquet/orc file format.
2. Export the DORIS `DATE/DATETIME` type to the `Date/Timestamp` logic type of parquet file format.
3. Fix that the data is not correct when the DATE type data is exported to ORC.
2023-05-13 22:39:24 +08:00
45d0f53529 [Regression-test](Export) add regression test for export #18897 2023-04-23 19:43:22 +08:00
c2fae109c3 [Improvement](outfile) Support output null in parquet writer (#12970) 2022-09-29 13:36:30 +08:00
1f9eec5462 [Regression](datev2) Add test cases for datev2/datetimev2 (#11831) 2022-08-19 10:57:55 +08:00
ff1971f916 [improvement](test) add dryRun option and group all cases into either p0 or p1 (#11576)
1. add dryRun option to list tests
2. group all cases into p0 p1 p2
2022-08-17 22:45:53 +08:00