doris

Author	SHA1	Message	Date
lihangyu	b15ccdbe98	[Pick](Variant) pick some fix (#37922 ) #37674 #37839 #37883 #37857 #37794	2024-07-16 21:38:47 +08:00
Tiewei Fang	0aeb768bf9	[Fix](export/outfile) Support compression when exporting data to Parquet / ORC. (#37167 ) bp: #36490	2024-07-03 10:53:57 +08:00
starocean999	cbaff8a700	[fix](nereids)change the decimal's precision and scale for cast(xx as decimal) (#36540 ) pick from master #36316 expression cast( xx as decimal )'s datatype maybe decimalv3 or decimalv2 depending on enable_decimal_conversion value in fe conf file. if enable_decimal_conversion is true, the datatype is decimalv3(9, 0), but the datatype was decimalv3(38, 9) in 2.0 releases. So this pr change the datatype same as 2.0 releases to keep the behavior consistent.	2024-06-20 17:46:11 +08:00
Thearas	d4956bfaf5	do not use path style to access s3 (#35788 ) ## Proposed changes	2024-06-03 13:57:13 +08:00
caiconghui	27cf5a667f	[enhancement](export) filter empty partition before export table to remote storage (#35389 ) (#35542 ) ## Proposed changes Linked PR : #35389 <!--Describe your changes.--> ## Further comments If this is a relatively large or complex change, kick off the discussion at [dev@doris.apache.org](mailto:dev@doris.apache.org) by explaining why you chose the solution you did and what alternatives you considered, etc...	2024-05-28 18:11:12 +08:00
morrySnow	3ef5ed1ad0	[opt](Nereids) normalize column name of output file (#34650 ) when do export to output file, normalize column name. For example > SELECT 1 > 2 INTO OUTFILE "..." the column name of 1 > 2 will be __greater_than_0	2024-05-13 22:12:46 +08:00
zhangstar333	520774a24b	[fix](serde) fix ipv4/v6 serde functions for arrow, orc, parquet format (#34042 ) this PR is from @sjyango work in #32326, wants merge #32326 into master branch, but it's draft and not maintain long time. so have this new PR. Co-authored-by: sjyango <sjyang2022@zju.edu.cn>	2024-05-10 14:37:04 +08:00
Mingyu Chen	50f9d47e96	[test](hive) run suite cases both in hive2 and hive3 (#33874 ) (#34156 ) bp #33874 Co-authored-by: 苏小刚 <suxiaogang223@icloud.com>	2024-04-26 13:48:09 +08:00
Tiewei Fang	315f6e44c2	[Branch-2.1](Outfile) Fixed the problem that the concurrent Outfile wrote multiple Success files (#33870 ) backport: #33016	2024-04-19 12:09:53 +08:00
Tiewei Fang	b882704eaf	[fix](Export) Set the default value of the `data_consistence` property of export to `partition` (#32830 )	2024-04-07 23:24:22 +08:00
Tiewei Fang	d7a3ff1ddf	[Fix](Outfile) Fix the column type mapping in the orc/parquet file format (#32281 ) \| Doris Type \| Orc Type \| Parquet Type \| \|---------------------\|--------------------\|------------------------\| \| Date \| Long (logical: DATE) \| int32 (Logical: Date) \| \| DateTime \| TIMESTAMP (logical: TIMESTAMP) \| int96 \|	2024-03-22 08:52:16 +08:00
walter	dc7d80860f	[fix](case) fix export data consistency table key type (#32045 )	2024-03-12 14:20:18 +08:00
walter	263135c193	[fix](case) fix export data consistency case (#32005 )	2024-03-09 19:45:50 +08:00
morrySnow	e2ebf9d566	[feature](Nereids) parallel output file (#31623 ) legacy planner impl PR: #6539	2024-03-06 13:04:30 +08:00
walter	1f825ee2d6	[improve](export) Support partition data consistency (#31290 )	2024-03-01 04:25:43 +08:00
Mingyu Chen	a8d8c6a271	[fix](file-writer) opt s3 file writer and fix empty file related issue #28983 #30703 #31169 (#31213 ) * (feature)(cloud) Use dynamic allocator instead of static buffer pool for better elasticity. (#28983) * [fix](outfile) Fix unable to export empty data (#30703) Issue Number: close #30600 Fix unable to export empty data to hdfs / S3, this behavior is inconsistent with version 1.2.7, version 1.2.7 can export empty data to hdfs/ S3, and there will be exported files on S3/HDFS. * [fix](file-writer) avoid empty file for segment writer (#31169) --------- Co-authored-by: AlexYue <yj976240184@gmail.com> Co-authored-by: zxealous <zhouchangyue@baidu.com>	2024-02-21 16:48:54 +08:00
Tiewei Fang	f65844fae4	[Enhencement](Outfile/Export) Export data to csv file format with BOM (#30533 ) The UTF8 format of the Windows system has BOM. We add a new user property to `Outfile/Export`。Therefore, when exporting Doris data, users can choose whether to bring BOM on the beginning of the CSV file. Usage: ```sql -- outfile: select * from demo.student into outfile "file:///xxx/export/exp_" format as csv properties( "column_separator" = ",", "with_bom" = "true" ); -- Export: EXPORT TABLE student TO "file:///xx/tmpdata/export/exp_" PROPERTIES( "format" = "csv", "with_bom" = "true" ); ```	2024-02-16 10:16:40 +08:00
Tiewei Fang	37c36b0491	[fix](regression-test) fixtest_show_export case #30892	2024-02-16 10:12:23 +08:00
xueweizhang	203daba19d	[fix](outfile) fix outfile csv did not write json column with string (#29067 )	2024-02-01 19:01:08 +08:00
plat1ko	b0cac0014d	[enhance](FS) Improve FS error code (#29432 )	2024-01-06 21:17:22 +08:00
Mingyu Chen	8c05f7a784	[refactor](cluster)(step-4) remove cluster related to Database (#27861 ) Issue Number: #19897 Remove `default_cluster` prefix related to database. When upgrading, all prefix will be removed.	2023-12-16 18:28:53 +08:00
Tiewei Fang	78b0fec33a	[Fix](Outfile) Support export nested complex type data to orc file format (#28182 )	2023-12-13 11:55:27 +08:00
Yulei-Yang	b6e72d57c5	[Improvement](hms catalog) support show_create_database for hms catalog (#28145 ) * [Improvement](hms catalog) support show_create_database for hms catalog * update	2023-12-09 01:34:21 +08:00
Mingyu Chen	97932d0381	[fix](export) the label of export should be unique with database scope (#27401 ) ### How to reproduce 1. create a database db1 and a table tbl1; 2. insert some data and export with label L1; 3. drop the db1 and tbl1, and recreate them with same name. 4. insert some data and export with same label L1; Expect: export success Actual: error: Label L1 have already been used. This PR fix it.	2023-11-23 14:30:57 +08:00
Jerry Hu	b821672f8b	[test](regression) add 'sync' for som stream load (#27357 )	2023-11-22 10:52:34 +08:00
Yulei-Yang	4fbcad9c7c	[minor](show_export) make result of file url usable (#27209 ) * [minor](show_export) make result of file url usable * update regression-test * update regression-test	2023-11-22 10:14:45 +08:00
Tiewei Fang	9b59bc14b5	[test](Export) add `show export` regression testes (#27140 )	2023-11-22 00:13:30 +08:00
Tiewei Fang	99b45e1938	[fix](Outfile) Export `DateTimev2` type of doris to ORC's `TimeStamp` type (#25470 ) Previously,doris's `DateTimev2` was exported to orc as a `String` type. Now, export doris's `DateTimev2` to orc timestamp type.	2023-10-29 15:59:38 +08:00
Tiewei Fang	7f66be84d5	[fix](Outfile) Infer the column name if the column is expression in `select into outfile` (#25854 ) This pr do two things: 1. Infer the column name if the column is expression in `select into outfile`. The rule for column name generation can be refered in pr: #24990 2. fix bug that it will core dump if the `_schema` fails to build in the open phase in vorc_transformer.cpp TODO: 1. Support infer the column name if the column is expression in `select into outfile` in new optimizer(Nereids).	2023-10-25 22:49:04 +08:00
lsy3993	ade475a52b	[regression](outfile)add regression for select outfile with underscore prefix #25797	2023-10-24 17:58:38 +08:00
Tiewei Fang	6f9a084d99	[Fix](Outfile) Use data_type_serde to export data to `parquet` file format (#24998 )	2023-10-13 13:58:34 +08:00
Tiewei Fang	c6b1c903e4	[fix](Regression-test) fix that the String type in a nested type should contain double quotes and add regression-test (#25115 )	2023-10-11 18:30:26 +08:00
Tiewei Fang	21d6f41492	[fix](regresion-test) Fix the problem of occasional failure of test_outfile_exception regression-test case (#24937 )	2023-09-28 10:05:43 +08:00
Tiewei Fang	a48b19ceb6	[feature](Outfile) `select into outfile` supports to export struct/map/array type data to orc file format (#24350 ) We do not support nested complex type in this pr.	2023-09-21 20:15:18 +08:00
Tiewei Fang	a946f99b8c	[Fix](regression-test) fix regression-test of export parquet file format (#24450 )	2023-09-20 15:41:49 +08:00
wudi	29fe87982f	[improve](outfile) add file_suffix options for outfile (#24334 )	2023-09-15 12:58:41 +08:00
Tiewei Fang	9847f7789f	[Feature](Export) `Export` sql supports to export data of `view` and `exrernal table` (#24070 ) Previously, EXPORT only supported the export of the olap table, This pr supports the export of view table and external table.	2023-09-13 22:55:19 +08:00
Tiewei Fang	a27349c83a	[fix](Export) Concatenation the outfile sql for Export (#23635 ) In the original logic, the `Export` statement generates `Selectstmt` for execution. But there is no way to make the `SelectStmt` use the new optimizer. Now, we change the `Export` statement to generate the `outfile SQL`, and then use the new optimizer to parse the SQL so that outfile can use the new optimizer.	2023-09-08 10:20:18 +08:00
Tiewei Fang	103fa4eb55	[feature](Export) support export with nereids (#23319 )	2023-08-29 19:36:19 +08:00
Tiewei Fang	f32efe5758	[Fix](Outfile) Fix that it does not report error when export table to S3 with an incorrect ak/sk/bucket (#23441 ) Problem: It will return a result although we use wrong ak/sk/bucket name, such as: ```sql mysql> select * from demo.student -> into outfile "s3://xxxx/exp_" -> format as csv -> properties( -> "s3.endpoint" = "https://cos.ap-beijing.myqcloud.com", -> "s3.region" = "ap-beijing", -> "s3.access_key"= "xxx", -> "s3.secret_key" = "yyyy" -> ); +------------+-----------+----------+----------------------------------------------------------------------------------------------------+ \| FileNumber \| TotalRows \| FileSize \| URL \| +------------+-----------+----------+----------------------------------------------------------------------------------------------------+ \| 1 \| 3 \| 26 \| s3://xxxx/exp_2ae166e2981d4c08-b577290f93aa82ba_ \| +------------+-----------+----------+----------------------------------------------------------------------------------------------------+ 1 row in set (0.15 sec) ``` The reason for this is that we did not catch the error returned by `close()` phase.	2023-08-26 00:19:30 +08:00
Tiewei Fang	18094511e7	[fix](Outfile/Nereids) fix that `csv_with_names` and `csv_with_names_and_types` file format could not be exported on nereids (#23387 ) This problem is casued by #21197 Fixed an issue that `csv_with_names` and `csv_with_names_and_types` file format could not be exported on nereids optimizer when using `select...into outfile`.	2023-08-25 11:12:04 +08:00
Tiewei Fang	10abbd2b62	[Feauture](Export) support parallel export job using Job Schedule (#22854 )	2023-08-18 22:24:42 +08:00
LiBinfeng	f863c653e2	[Fix](Planner) fix limit execute before sort in show export job (#21663 ) Problem: When doing show export jobs, limit would execute before sort before changed. So the result would not be expected because limit always cut results first and we can not get what we want. Example: we having export job1 and job2 with JobId1 > JobId2. We want to get job with JobId1 show export from db order by JobId desc limit 1; We do limit 1 first, so we would probably get Job2 because JobId assigned from small to large Solve: We can not cut results first if we have order by clause. And cut result set after sorting	2023-07-13 11:17:28 +08:00
Tiewei Fang	91cdb79d89	[Bugfix](Outfile) fix that export data to parquet and orc file format (#19436 ) 1. support export `LARGEINT` data type to parquet/orc file format. 2. Export the DORIS `DATE/DATETIME` type to the `Date/Timestamp` logic type of parquet file format. 3. Fix that the data is not correct when the DATE type data is exported to ORC.	2023-05-13 22:39:24 +08:00
Tiewei Fang	45d0f53529	[Regression-test](Export) add regression test for export #18897	2023-04-23 19:43:22 +08:00
Jerry Hu	5c265d8183	[fix](vec)crashing caused by parallel output file (#17384 )	2023-03-03 19:03:53 +08:00
Mingyu Chen	0e1e5a802b	[config](load) enable new load scan node by default (#14808 ) Set FE `enable_new_load_scan_node` to true by default. So that all load tasks(broker load, stream load, routine load, insert into) will use FileScanNode instead of BrokerScanNode to read data 1. Support loading parquet file in stream load with new load scan node. 2. Fix bug that new parquet reader can not read column without logical or converted type. 3. Change jsonb parser function to "jsonb_parse_error_to_null" So that if the input string is not a valid json string, it will return null for jsonb column in load task.	2022-12-16 09:41:43 +08:00
Mingyu Chen	6f18726f01	[improvement](test) add sync for test_agg_keys_schema_change_datev2 (#13643 ) 1. add "sync" to avoid some potential meta sync problem when running regression test on multi-node cluster 2. Use /tmp dir as dest dir of outfile test, to avoid "No such file or directory" error.	2022-10-25 22:29:05 +08:00
zhengyu	b85c78ee00	[fix](regression) add 'if not exists' to 'create table' to support parallel test (#13576 ) (#13578 ) Signed-off-by: freemandealer <freeman.zhang1992@gmail.com>	2022-10-25 16:37:07 +08:00
Mingyu Chen	4b5a2c1a65	[fix](export)(outfile) fix bug that export may fail when writing SUCCESS file (#13574 )	2022-10-23 13:02:49 +08:00

1 2

55 Commits