doris

Author	SHA1	Message	Date
lihangyu	b15ccdbe98	[Pick](Variant) pick some fix (#37922 ) #37674 #37839 #37883 #37857 #37794	2024-07-16 21:38:47 +08:00
Mingyu Chen	ceef9ee123	[feature](serde) support presto compatible output format (#37039 ) (#37253 ) bp #37039	2024-07-04 13:56:05 +08:00
Tiewei Fang	0aeb768bf9	[Fix](export/outfile) Support compression when exporting data to Parquet / ORC. (#37167 ) bp: #36490	2024-07-03 10:53:57 +08:00
starocean999	cbaff8a700	[fix](nereids)change the decimal's precision and scale for cast(xx as decimal) (#36540 ) pick from master #36316 expression cast( xx as decimal )'s datatype maybe decimalv3 or decimalv2 depending on enable_decimal_conversion value in fe conf file. if enable_decimal_conversion is true, the datatype is decimalv3(9, 0), but the datatype was decimalv3(38, 9) in 2.0 releases. So this pr change the datatype same as 2.0 releases to keep the behavior consistent.	2024-06-20 17:46:11 +08:00
morrySnow	3ef5ed1ad0	[opt](Nereids) normalize column name of output file (#34650 ) when do export to output file, normalize column name. For example > SELECT 1 > 2 INTO OUTFILE "..." the column name of 1 > 2 will be __greater_than_0	2024-05-13 22:12:46 +08:00
zhangstar333	520774a24b	[fix](serde) fix ipv4/v6 serde functions for arrow, orc, parquet format (#34042 ) this PR is from @sjyango work in #32326, wants merge #32326 into master branch, but it's draft and not maintain long time. so have this new PR. Co-authored-by: sjyango <sjyang2022@zju.edu.cn>	2024-05-10 14:37:04 +08:00
Tiewei Fang	d7a3ff1ddf	[Fix](Outfile) Fix the column type mapping in the orc/parquet file format (#32281 ) \| Doris Type \| Orc Type \| Parquet Type \| \|---------------------\|--------------------\|------------------------\| \| Date \| Long (logical: DATE) \| int32 (Logical: Date) \| \| DateTime \| TIMESTAMP (logical: TIMESTAMP) \| int96 \|	2024-03-22 08:52:16 +08:00
Mryange	8bd101129a	[behavior change](output) change float output format (#32049 )	2024-03-21 14:07:22 +08:00
walter	263135c193	[fix](case) fix export data consistency case (#32005 )	2024-03-09 19:45:50 +08:00
walter	1f825ee2d6	[improve](export) Support partition data consistency (#31290 )	2024-03-01 04:25:43 +08:00
Mingyu Chen	a8d8c6a271	[fix](file-writer) opt s3 file writer and fix empty file related issue #28983 #30703 #31169 (#31213 ) * (feature)(cloud) Use dynamic allocator instead of static buffer pool for better elasticity. (#28983) * [fix](outfile) Fix unable to export empty data (#30703) Issue Number: close #30600 Fix unable to export empty data to hdfs / S3, this behavior is inconsistent with version 1.2.7, version 1.2.7 can export empty data to hdfs/ S3, and there will be exported files on S3/HDFS. * [fix](file-writer) avoid empty file for segment writer (#31169) --------- Co-authored-by: AlexYue <yj976240184@gmail.com> Co-authored-by: zxealous <zhouchangyue@baidu.com>	2024-02-21 16:48:54 +08:00
Tiewei Fang	f65844fae4	[Enhencement](Outfile/Export) Export data to csv file format with BOM (#30533 ) The UTF8 format of the Windows system has BOM. We add a new user property to `Outfile/Export`。Therefore, when exporting Doris data, users can choose whether to bring BOM on the beginning of the CSV file. Usage: ```sql -- outfile: select * from demo.student into outfile "file:///xxx/export/exp_" format as csv properties( "column_separator" = ",", "with_bom" = "true" ); -- Export: EXPORT TABLE student TO "file:///xx/tmpdata/export/exp_" PROPERTIES( "format" = "csv", "with_bom" = "true" ); ```	2024-02-16 10:16:40 +08:00
xueweizhang	203daba19d	[fix](outfile) fix outfile csv did not write json column with string (#29067 )	2024-02-01 19:01:08 +08:00
Tiewei Fang	78b0fec33a	[Fix](Outfile) Support export nested complex type data to orc file format (#28182 )	2023-12-13 11:55:27 +08:00
Tiewei Fang	3dcbf16404	[Fix](Outfile) The Struct type data exported from select outfile to the csv file format should contain a column name #28068 If the original data is： ```sql +-----------------------------------------------------+ \| s_info \| +-----------------------------------------------------+ \| {"s_id": 2, "s_name": "nereids", "s_address": "20"} \| \| {"s_id": 1, "s_name": "doris", "s_address": "18"} \| +-----------------------------------------------------+ ``` In the original logic, the struct type data exported to a csv file format did not contain column names,like ``` {2, "nereids", "20"} {1, "doris", "18"} ``` This pr do not need to be merged into branch-2.0	2023-12-07 18:23:36 +08:00
Tiewei Fang	9b59bc14b5	[test](Export) add `show export` regression testes (#27140 )	2023-11-22 00:13:30 +08:00
Tiewei Fang	3e10e5af39	[Fix](Serde) Fix content displayed by complex types in MySQL Client (#25946 ) This pr makes three changes to the display of complex types： 1. NULL value in complex types refers to being displayed as `null`, not `NULL` 2. struct type is displayed as "column_name": column_value 3. Time types such as `datetime` and `date`, are displayed with double quotes in complex types. like `{1, "2023-10-26 12:12:12"}` This pr also do a code refactor: 1. nesting_level is set to a member variable of the `DataTypeSerDe`, rather than a parameter in methods. What's more, this pr fix a bug that fileSize is not correct, introduced by this pr: #25854	2023-11-01 23:48:55 +08:00
Tiewei Fang	99b45e1938	[fix](Outfile) Export `DateTimev2` type of doris to ORC's `TimeStamp` type (#25470 ) Previously,doris's `DateTimev2` was exported to orc as a `String` type. Now, export doris's `DateTimev2` to orc timestamp type.	2023-10-29 15:59:38 +08:00
Tiewei Fang	7f66be84d5	[fix](Outfile) Infer the column name if the column is expression in `select into outfile` (#25854 ) This pr do two things: 1. Infer the column name if the column is expression in `select into outfile`. The rule for column name generation can be refered in pr: #24990 2. fix bug that it will core dump if the `_schema` fails to build in the open phase in vorc_transformer.cpp TODO: 1. Support infer the column name if the column is expression in `select into outfile` in new optimizer(Nereids).	2023-10-25 22:49:04 +08:00
lsy3993	ade475a52b	[regression](outfile)add regression for select outfile with underscore prefix #25797	2023-10-24 17:58:38 +08:00
Tiewei Fang	6f9a084d99	[Fix](Outfile) Use data_type_serde to export data to `parquet` file format (#24998 )	2023-10-13 13:58:34 +08:00
Tiewei Fang	c6b1c903e4	[fix](Regression-test) fix that the String type in a nested type should contain double quotes and add regression-test (#25115 )	2023-10-11 18:30:26 +08:00
Tiewei Fang	a48b19ceb6	[feature](Outfile) `select into outfile` supports to export struct/map/array type data to orc file format (#24350 ) We do not support nested complex type in this pr.	2023-09-21 20:15:18 +08:00
Tiewei Fang	a946f99b8c	[Fix](regression-test) fix regression-test of export parquet file format (#24450 )	2023-09-20 15:41:49 +08:00
wudi	29fe87982f	[improve](outfile) add file_suffix options for outfile (#24334 )	2023-09-15 12:58:41 +08:00
Tiewei Fang	9847f7789f	[Feature](Export) `Export` sql supports to export data of `view` and `exrernal table` (#24070 ) Previously, EXPORT only supported the export of the olap table, This pr supports the export of view table and external table.	2023-09-13 22:55:19 +08:00
Tiewei Fang	a27349c83a	[fix](Export) Concatenation the outfile sql for Export (#23635 ) In the original logic, the `Export` statement generates `Selectstmt` for execution. But there is no way to make the `SelectStmt` use the new optimizer. Now, we change the `Export` statement to generate the `outfile SQL`, and then use the new optimizer to parse the SQL so that outfile can use the new optimizer.	2023-09-08 10:20:18 +08:00
Tiewei Fang	103fa4eb55	[feature](Export) support export with nereids (#23319 )	2023-08-29 19:36:19 +08:00
Tiewei Fang	f32efe5758	[Fix](Outfile) Fix that it does not report error when export table to S3 with an incorrect ak/sk/bucket (#23441 ) Problem: It will return a result although we use wrong ak/sk/bucket name, such as: ```sql mysql> select * from demo.student -> into outfile "s3://xxxx/exp_" -> format as csv -> properties( -> "s3.endpoint" = "https://cos.ap-beijing.myqcloud.com", -> "s3.region" = "ap-beijing", -> "s3.access_key"= "xxx", -> "s3.secret_key" = "yyyy" -> ); +------------+-----------+----------+----------------------------------------------------------------------------------------------------+ \| FileNumber \| TotalRows \| FileSize \| URL \| +------------+-----------+----------+----------------------------------------------------------------------------------------------------+ \| 1 \| 3 \| 26 \| s3://xxxx/exp_2ae166e2981d4c08-b577290f93aa82ba_ \| +------------+-----------+----------+----------------------------------------------------------------------------------------------------+ 1 row in set (0.15 sec) ``` The reason for this is that we did not catch the error returned by `close()` phase.	2023-08-26 00:19:30 +08:00
Tiewei Fang	18094511e7	[fix](Outfile/Nereids) fix that `csv_with_names` and `csv_with_names_and_types` file format could not be exported on nereids (#23387 ) This problem is casued by #21197 Fixed an issue that `csv_with_names` and `csv_with_names_and_types` file format could not be exported on nereids optimizer when using `select...into outfile`.	2023-08-25 11:12:04 +08:00
mch_ucchi	1d05feea1b	[Feature](Nereids) add executable function to support fold constant for functions (#18209 ) 1. Add date-time functions for fold constant for Nereids. This is the list of executable date-time function nereids supports up to now: - now() - now(int) - current_timestamp() - current_timestamp(int) - localtime() - localtimestamp() - curdate() - current_date() - curtime() - current_time() - date_{add/sub}(),{years/months/days/hours/minutes/seconds}_{add/sub}() - datediff() - {date/datev2}() - {year/quarter/month/day/hour/minute/second}() - dayof{year/month/week}() - date_format() - date_trunc() - from_days() - last_day() - to_monday() - from_unixtime() - unix_timestamp() - utc_timestamp() - to_date() - to_days() - str_to_date() - makedate() 2. solved problem: - enable datev2/datetimev2 default. - refactor Nereids foldConstantOnFE and support fold nested expression. - separate the executable into multi-files for easily-reading and adding new functions	2023-05-17 21:26:31 +08:00
Tiewei Fang	91cdb79d89	[Bugfix](Outfile) fix that export data to parquet and orc file format (#19436 ) 1. support export `LARGEINT` data type to parquet/orc file format. 2. Export the DORIS `DATE/DATETIME` type to the `Date/Timestamp` logic type of parquet file format. 3. Fix that the data is not correct when the DATE type data is exported to ORC.	2023-05-13 22:39:24 +08:00
Tiewei Fang	45d0f53529	[Regression-test](Export) add regression test for export #18897	2023-04-23 19:43:22 +08:00
Gabriel	c2fae109c3	[Improvement](outfile) Support output null in parquet writer (#12970 )	2022-09-29 13:36:30 +08:00
Gabriel	1f9eec5462	[Regression](datev2) Add test cases for datev2/datetimev2 (#11831 )	2022-08-19 10:57:55 +08:00
Yongqiang YANG	ff1971f916	[improvement](test) add dryRun option and group all cases into either p0 or p1 (#11576 ) 1. add dryRun option to list tests 2. group all cases into p0 p1 p2	2022-08-17 22:45:53 +08:00

36 Commits