Commit Graph

55 Commits

Author SHA1 Message Date
b15ccdbe98 [Pick](Variant) pick some fix (#37922)
#37674
#37839
#37883 
#37857 
#37794
2024-07-16 21:38:47 +08:00
0aeb768bf9 [Fix](export/outfile) Support compression when exporting data to Parquet / ORC. (#37167)
bp: #36490
2024-07-03 10:53:57 +08:00
cbaff8a700 [fix](nereids)change the decimal's precision and scale for cast(xx as decimal) (#36540)
pick from master #36316

expression cast( xx as decimal )'s datatype maybe decimalv3 or decimalv2
depending on enable_decimal_conversion value in fe conf file. if
enable_decimal_conversion is true, the datatype is decimalv3(9, 0), but
the datatype was decimalv3(38, 9) in 2.0 releases. So this pr change the
datatype same as 2.0 releases to keep the behavior consistent.
2024-06-20 17:46:11 +08:00
d4956bfaf5 do not use path style to access s3 (#35788)
## Proposed changes
2024-06-03 13:57:13 +08:00
27cf5a667f [enhancement](export) filter empty partition before export table to remote storage (#35389) (#35542)
## Proposed changes

Linked PR : #35389 

<!--Describe your changes.-->

## Further comments

If this is a relatively large or complex change, kick off the discussion
at [dev@doris.apache.org](mailto:dev@doris.apache.org) by explaining why
you chose the solution you did and what alternatives you considered,
etc...
2024-05-28 18:11:12 +08:00
3ef5ed1ad0 [opt](Nereids) normalize column name of output file (#34650)
when do export to output file, normalize column name.
For example

> SELECT 1 > 2 INTO OUTFILE "..."

the column name of 1 > 2 will be __greater_than_0
2024-05-13 22:12:46 +08:00
520774a24b [fix](serde) fix ipv4/v6 serde functions for arrow, orc, parquet format (#34042)
this PR is from @sjyango work in #32326,
wants merge #32326 into master branch, but it's draft and not maintain long time. so have this new PR.
Co-authored-by: sjyango <sjyang2022@zju.edu.cn>
2024-05-10 14:37:04 +08:00
50f9d47e96 [test](hive) run suite cases both in hive2 and hive3 (#33874) (#34156)
bp #33874

Co-authored-by: 苏小刚 <suxiaogang223@icloud.com>
2024-04-26 13:48:09 +08:00
315f6e44c2 [Branch-2.1](Outfile) Fixed the problem that the concurrent Outfile wrote multiple Success files (#33870)
backport: #33016
2024-04-19 12:09:53 +08:00
b882704eaf [fix](Export) Set the default value of the data_consistence property of export to partition (#32830) 2024-04-07 23:24:22 +08:00
d7a3ff1ddf [Fix](Outfile) Fix the column type mapping in the orc/parquet file format (#32281)
| Doris Type             | Orc Type                     |  Parquet Type                |
|---------------------|--------------------|------------------------|
| Date                            | Long (logical: DATE)                 |       int32 (Logical: Date)                                        |
| DateTime                    | TIMESTAMP (logical: TIMESTAMP)    |       int96                          |
2024-03-22 08:52:16 +08:00
dc7d80860f [fix](case) fix export data consistency table key type (#32045) 2024-03-12 14:20:18 +08:00
263135c193 [fix](case) fix export data consistency case (#32005) 2024-03-09 19:45:50 +08:00
e2ebf9d566 [feature](Nereids) parallel output file (#31623)
legacy planner impl PR: #6539
2024-03-06 13:04:30 +08:00
1f825ee2d6 [improve](export) Support partition data consistency (#31290) 2024-03-01 04:25:43 +08:00
a8d8c6a271 [fix](file-writer) opt s3 file writer and fix empty file related issue #28983 #30703 #31169 (#31213)
* (feature)(cloud) Use dynamic allocator instead of static buffer pool for better elasticity. (#28983)

* [fix](outfile) Fix unable to export empty data (#30703)

Issue Number: close #30600
Fix unable to export empty data to hdfs / S3, this behavior is inconsistent with version 1.2.7,
version 1.2.7 can export empty data to hdfs/ S3, and there will be exported files on S3/HDFS.

* [fix](file-writer) avoid empty file for segment writer (#31169)

---------

Co-authored-by: AlexYue <yj976240184@gmail.com>
Co-authored-by: zxealous <zhouchangyue@baidu.com>
2024-02-21 16:48:54 +08:00
f65844fae4 [Enhencement](Outfile/Export) Export data to csv file format with BOM (#30533)
The UTF8 format of the Windows system has BOM. 

We add a new user property to `Outfile/Export`。Therefore, when exporting Doris data, users can choose whether to bring BOM on the beginning of the CSV file.

**Usage:**
```sql
-- outfile:
select * from demo.student
into outfile "file:///xxx/export/exp_"
format as csv
properties(
    "column_separator" = ",",
    "with_bom" = "true"
);

-- Export:
EXPORT TABLE student TO "file:///xx/tmpdata/export/exp_"
PROPERTIES(
    "format" = "csv",
    "with_bom" = "true"
);
```
2024-02-16 10:16:40 +08:00
37c36b0491 [fix](regression-test) fixtest_show_export case #30892 2024-02-16 10:12:23 +08:00
203daba19d [fix](outfile) fix outfile csv did not write json column with string (#29067) 2024-02-01 19:01:08 +08:00
b0cac0014d [enhance](FS) Improve FS error code (#29432) 2024-01-06 21:17:22 +08:00
8c05f7a784 [refactor](cluster)(step-4) remove cluster related to Database (#27861)
Issue Number: #19897

Remove `default_cluster` prefix related to database.
When upgrading, all prefix will be removed.
2023-12-16 18:28:53 +08:00
78b0fec33a [Fix](Outfile) Support export nested complex type data to orc file format (#28182) 2023-12-13 11:55:27 +08:00
b6e72d57c5 [Improvement](hms catalog) support show_create_database for hms catalog (#28145)
* [Improvement](hms catalog) support show_create_database for hms catalog

* update
2023-12-09 01:34:21 +08:00
97932d0381 [fix](export) the label of export should be unique with database scope (#27401)
### How to reproduce
1. create a database db1 and a table tbl1;
2. insert some data and export with label L1;
3. drop the db1 and tbl1, and recreate them with same name.
4. insert some data and export with same label L1;

Expect: export success
Actual: error: Label L1 have already been used.

This PR fix it.
2023-11-23 14:30:57 +08:00
b821672f8b [test](regression) add 'sync' for som stream load (#27357) 2023-11-22 10:52:34 +08:00
4fbcad9c7c [minor](show_export) make result of file url usable (#27209)
* [minor](show_export) make result of file url usable

* update regression-test

* update regression-test
2023-11-22 10:14:45 +08:00
9b59bc14b5 [test](Export) add show export regression testes (#27140) 2023-11-22 00:13:30 +08:00
99b45e1938 [fix](Outfile) Export DateTimev2 type of doris to ORC's TimeStamp type (#25470)
Previously,doris's `DateTimev2` was exported to orc as a `String` type.
Now, export doris's `DateTimev2` to orc timestamp type.
2023-10-29 15:59:38 +08:00
7f66be84d5 [fix](Outfile) Infer the column name if the column is expression in select into outfile (#25854)
This pr do two things:
1. Infer the column name if the column is expression in `select into outfile`. The rule for column name generation can be refered in pr: #24990 
2. fix bug that it will core dump if the `_schema` fails to build in the open phase in vorc_transformer.cpp


TODO:
1. Support infer the column name if the column is expression in `select into outfile` in new optimizer(Nereids).
2023-10-25 22:49:04 +08:00
ade475a52b [regression](outfile)add regression for select outfile with underscore prefix #25797 2023-10-24 17:58:38 +08:00
6f9a084d99 [Fix](Outfile) Use data_type_serde to export data to parquet file format (#24998) 2023-10-13 13:58:34 +08:00
c6b1c903e4 [fix](Regression-test) fix that the String type in a nested type should contain double quotes and add regression-test (#25115) 2023-10-11 18:30:26 +08:00
21d6f41492 [fix](regresion-test) Fix the problem of occasional failure of test_outfile_exception regression-test case (#24937) 2023-09-28 10:05:43 +08:00
a48b19ceb6 [feature](Outfile) select into outfile supports to export struct/map/array type data to orc file format (#24350)
We do not support nested complex type in this pr.
2023-09-21 20:15:18 +08:00
a946f99b8c [Fix](regression-test) fix regression-test of export parquet file format (#24450) 2023-09-20 15:41:49 +08:00
29fe87982f [improve](outfile) add file_suffix options for outfile (#24334) 2023-09-15 12:58:41 +08:00
9847f7789f [Feature](Export) Export sql supports to export data of view and exrernal table (#24070)
Previously, EXPORT only supported the export of the olap table,
This pr supports the export of view table and external table.
2023-09-13 22:55:19 +08:00
a27349c83a [fix](Export) Concatenation the outfile sql for Export (#23635)
In the original logic, the `Export` statement generates `Selectstmt` for execution. But there is no way to make the `SelectStmt` use the new optimizer.

Now, we change the `Export` statement to generate the `outfile SQL`, and then use the new optimizer to parse the SQL so that outfile can use the new optimizer.
2023-09-08 10:20:18 +08:00
103fa4eb55 [feature](Export) support export with nereids (#23319) 2023-08-29 19:36:19 +08:00
f32efe5758 [Fix](Outfile) Fix that it does not report error when export table to S3 with an incorrect ak/sk/bucket (#23441)
Problem:
It will return a result although we use wrong ak/sk/bucket name, such as:
```sql
mysql> select * from demo.student
    -> into outfile "s3://xxxx/exp_"
    -> format as csv
    -> properties(
    ->   "s3.endpoint" = "https://cos.ap-beijing.myqcloud.com",
    ->   "s3.region" = "ap-beijing",
    ->   "s3.access_key"= "xxx",
    ->   "s3.secret_key" = "yyyy"
    -> );
+------------+-----------+----------+----------------------------------------------------------------------------------------------------+
| FileNumber | TotalRows | FileSize | URL                                                                                                |
+------------+-----------+----------+----------------------------------------------------------------------------------------------------+
|          1 |         3 |       26 | s3://xxxx/exp_2ae166e2981d4c08-b577290f93aa82ba_ |
+------------+-----------+----------+----------------------------------------------------------------------------------------------------+
1 row in set (0.15 sec)
```

The reason for this is that we did not catch the error returned by `close()` phase.
2023-08-26 00:19:30 +08:00
18094511e7 [fix](Outfile/Nereids) fix that csv_with_names and csv_with_names_and_types file format could not be exported on nereids (#23387)
This problem is casued by #21197

Fixed an issue that `csv_with_names` and `csv_with_names_and_types` file format could not be exported on nereids optimizer when using `select...into outfile`.
2023-08-25 11:12:04 +08:00
10abbd2b62 [Feauture](Export) support parallel export job using Job Schedule (#22854) 2023-08-18 22:24:42 +08:00
f863c653e2 [Fix](Planner) fix limit execute before sort in show export job (#21663)
Problem:
When doing show export jobs, limit would execute before sort before changed. So the result would not be expected because limit always cut results first and we can not get what we want.

Example:
we having export job1 and job2 with JobId1 > JobId2. We want to get job with JobId1
show export from db order by JobId desc limit 1;
We do limit 1 first, so we would probably get Job2 because JobId assigned from small to large

Solve:
We can not cut results first if we have order by clause. And cut result set after sorting
2023-07-13 11:17:28 +08:00
91cdb79d89 [Bugfix](Outfile) fix that export data to parquet and orc file format (#19436)
1. support export `LARGEINT` data type to parquet/orc file format.
2. Export the DORIS `DATE/DATETIME` type to the `Date/Timestamp` logic type of parquet file format.
3. Fix that the data is not correct when the DATE type data is exported to ORC.
2023-05-13 22:39:24 +08:00
45d0f53529 [Regression-test](Export) add regression test for export #18897 2023-04-23 19:43:22 +08:00
5c265d8183 [fix](vec)crashing caused by parallel output file (#17384) 2023-03-03 19:03:53 +08:00
0e1e5a802b [config](load) enable new load scan node by default (#14808)
Set FE `enable_new_load_scan_node` to true by default.
So that all load tasks(broker load, stream load, routine load, insert into) will use FileScanNode instead of BrokerScanNode
to read data

1. Support loading parquet file in stream load with new load scan node.
2. Fix bug that new parquet reader can not read column without logical or converted type.
3. Change jsonb parser function to "jsonb_parse_error_to_null"
    So that if the input string is not a valid json string, it will return null for jsonb column in load task.
2022-12-16 09:41:43 +08:00
6f18726f01 [improvement](test) add sync for test_agg_keys_schema_change_datev2 (#13643)
1. add "sync" to avoid some potential meta sync problem when running regression test on multi-node cluster
2. Use /tmp dir as dest dir of outfile test, to avoid "No such file or directory" error.
2022-10-25 22:29:05 +08:00
b85c78ee00 [fix](regression) add 'if not exists' to 'create table' to support parallel test (#13576) (#13578)
Signed-off-by: freemandealer <freeman.zhang1992@gmail.com>
2022-10-25 16:37:07 +08:00
4b5a2c1a65 [fix](export)(outfile) fix bug that export may fail when writing SUCCESS file (#13574) 2022-10-23 13:02:49 +08:00