doris

Author	SHA1	Message	Date
Mingyu Chen	55636e8035	[test](migrate) move 3 cases from p2 to p0 (#36957 ) (#37264 ) bp #36957 Co-authored-by: slothever <18522955+wsjz@users.noreply.github.com>	2024-07-04 20:09:59 +08:00
Mingyu Chen	3613413a54	[fix](hive) support find serde info from both tbl properties and serde properties (#37043 ) (#37188 ) bp #37043	2024-07-04 13:55:38 +08:00
Jibing-Li	bf3ea1839c	[test]Mv external p2 test case to p0. (#37070 ) (#37140 ) backport: https://github.com/apache/doris/pull/37070	2024-07-04 11:19:31 +08:00
zy-kkk	a9f9113c48	[branch-2.1][test](external)move hive cases from p2 to p0 (#37149 ) pk (#36855) test_hive_same_db_table_name test_hive_special_char_partition test_complex_types test_wide_table	2024-07-03 19:44:52 +08:00
Mingyu Chen	e5695e058f	[test](migrate) move 2 cases from p2 to p0 (#36935 ) (#37200 ) bp #36935 Co-authored-by: zhangdong <493738387@qq.com>	2024-07-03 17:29:01 +08:00
Qi Chen	e857680661	[Migrate-Test](multi-catalog) Migrate p2 tests from p2 to p0. (#37175 ) Backport #36989.	2024-07-03 11:08:49 +08:00
wuwenchi	e7e1e967cf	[test](migrate) move 2 cases from p2 to p0 for 2.1 (#37139 ) pick #37004	2024-07-02 22:50:53 +08:00
Tiewei Fang	74086189d3	[test](tvf) move p2 tvf tests from p2 to p0 (#36871 ) (#37150 ) bp: #36871	2024-07-02 22:37:43 +08:00
Ashin Gau	cf86eb8647	[test](migrate) move test_hive_text_complex_type from p2 to p0 (#37007 ) (#37123 ) bp: #37007	2024-07-02 17:36:37 +08:00
Mingyu Chen	fcc26cc671	[test](migrate) move some cases from p2 to p0 (#36750 )(#36787 ) (#36922 ) bp #36750 and #36787	2024-06-27 20:59:50 +08:00
daidai	bc062a2595	[fix](orc)fix orc reader missing column. (#35735 ) ## Proposed changes bp #35583 Issue Number: close #xxx <!--Describe your changes.-->	2024-05-31 22:51:44 +08:00
Qi Chen	68eda58a8c	[Fix](multi-catalog) Fix string dict filtering when use null related function in parquet and orc reader. (#35335 ) The following sql and when the dictionary column contains functions related to null, the results will be incorrect. ``` select * from ( select IF(o_orderpriority IS NULL, 'null', o_orderpriority) AS o_orderpriority from test_string_dict_filter_orc ) as A where o_orderpriority = 'null'; ``` ``` select * from ( select IFNULL(o_orderpriority, 'null') AS o_orderpriority from test_string_dict_filter_parquet ) as A where o_orderpriority = 'null' ``` ``` select * from ( select COALESCE(o_orderpriority, 'null') AS o_orderpriority from test_string_dict_filter_parquet ) as A where o_orderpriority = 'null'; ```	2024-05-27 15:25:29 +08:00
Qi Chen	99af54f779	[Fix](orc-reader) Fix the issue when string col has mixed plain and dict encoding in different stripes. (#34146 ) (#34248 ) backport #34146	2024-04-28 19:43:57 +08:00
Qi Chen	acc2b532e7	[Test](hive-writer) Adjust test_hive_write_partitions regression test to resolve special characters issue with git on windows. (#34026 )	2024-04-26 15:05:47 +08:00
苏小刚	1c025c0488	[docker](hive) add hive3 docker compose and modify scripts (#33115 ) add hive3 docker compose from: big-data-europe/docker-hive#56	2024-04-17 23:42:13 +08:00
Qi Chen	4963d60a07	[Fix](multi-catalog)Fix the issue of not initializing the writer caused by refactoring and add hive writing regression test. (#32721 ) (#33446 ) backport #32721.	2024-04-10 11:42:22 +08:00
Mingyu Chen	73de61ed84	[opt](hive) skip hidden file and dir (#32412 ) When query hive table, we should skip all hidden dirs and files, like: ``` /visible/.hidden/path /visible/.hidden.txt ```	2024-03-21 14:07:24 +08:00
wuwenchi	926908ece2	[fix](hive) fix spelling mistakes for "separatorChar" #32061	2024-03-12 14:20:18 +08:00
Ashin Gau	248ea20901	Revert "[test](regression) add regression test for schange change of complex …" (#31660 ) This reverts commit dcd2afdb4e857791fed66a46f28ab3adc25494e1. Reverts #31207	2024-03-01 19:06:59 +08:00
Ashin Gau	e3b4b83bca	[test](regression) add regression test for schange change of complex type (#31207 ) Add regression test for #31128	2024-02-22 19:50:07 +08:00
Qi Chen	92cad69fc4	[Fix](parquet-reader) Fix reading fixed length byte array decimal in parquet reader. (#30535 )	2024-01-31 23:53:40 +08:00
zhangdong	658c869aac	[improvement](mtmv)mtmv support partition by hms table (#29989 )	2024-01-29 19:02:46 +08:00
wuwenchi	7da86c37ec	[fix](hive) add support for `quoteChar` and `seperatorChar` for hive (#28613 ) add support for quoteChar and seperatorChar .	2023-12-19 19:35:03 +08:00
bobhan1	01c94a554d	[fix](autoinc) Fix broker load when target table has autoinc column (#28402 )	2023-12-14 18:02:54 +08:00
Jibing-Li	a271fee3c5	[test](statistics)Add external empty table test case. (#28267 )	2023-12-13 21:48:01 +08:00
Qi Chen	60bc3be8a2	[Opt](Compression) Opt zstd block decompression by `ZSTD_decompressDCtx()`. (#27534 ) Opt zstd block decompression by `ZSTD_decompressDCtx()` to replace streaming decompression. It will improve performance but consume more memory. Test result: - env: 1 node(16 cores, 64G). - parquet column: 100 million rows of char(255) column. - result: 5.2 -> 4.6.	2023-12-01 09:10:32 +08:00
Qi Chen	e4149c6e4c	[Fix](parquet-reader) Fix null map issue in parquet reader. (#27777 ) Fix null map issue in parquet reader which cause result incorrect such as `min()`, `max()`. In order to share null map between parquet converted src column and dst column to avoid copying. It is very tricky that will call mutable function `doris_nullable_column->get_null_map_column_ptr()` which will set `_need_update_has_null = true`. Because some operations such as agg will call `has_null()` to set `_need_update_has_null = false`.	2023-11-30 13:55:37 +08:00
Qi Chen	cc395f5428	[Fix](hive-transactional-table) Fix NPE when query empty hive transactional table. (#27563 )	2023-11-25 10:29:39 +08:00
daidai	3585c7e216	[test](parquet)append parquet reader byte_array_decimal and rle_bool case (#26751 )	2023-11-14 15:05:10 +08:00
wudongliang	22bf2889e5	[feature](tvf)(jni-avro)jni-avro scanner add complex data types (#26236 ) Support avro's enum, record, union data types	2023-11-09 13:58:49 +08:00
Jibing-Li	80f654ec2a	[Fix](statistics)Fix analyze min max sql syntax error. #26240	2023-11-02 09:22:32 +08:00
Jibing-Li	78204f7c92	[Fix](statistics)Fix external couldn't analyze database bug (#26025 )	2023-10-31 11:32:47 +08:00
zy-kkk	501c6096dd	Revert "[Test](multi-catalog) Add tpcds sf100 hive shape. (#25639 )" (#26069 ) This reverts commit 3beba1764c01b6712b108556433c96429c59cc45.	2023-10-29 12:45:32 +08:00
Qi Chen	3beba1764c	[Test](multi-catalog) Add tpcds sf100 hive shape. (#25639 ) Add tpcds sf100 hive shapes. Disable query64 temporarily because it is not same with emr cluster after collecting metadata by analyze table xxx. And the root cause need to analyze, will enable in future PR.	2023-10-27 18:39:29 +08:00
Qi Chen	c86fad7cbd	[Fix](orc-reader) Fix orc decimal128 scale issue. (#25977 )	2023-10-26 08:50:18 -05:00
zhangguoqiang	e7a3cb079b	[Enhance](regression)docker hive s3 file address is determined based on the configuration (#25905 ) docker hive s3 file address is determined based on the configuration custom_settings.env	2023-10-26 11:58:33 +08:00
zhangdong	ce18f1148a	[improvement](catalog)compatible with paimon 0.5 (#24985 ) compatible with paimon 0.5 add p0 for paimon,need set enablePaimonTest=true	2023-10-17 22:07:13 +08:00
zhangguoqiang	dc0c39f1d8	[Enhance](external)change hive docker to host network and add hive case (#24401 ) 1. Change the external hive docker network mode from the bridge mode to the host mode to support the external test of the multi-node doris cluster 2. Added more hive test data in various formats 3. Added a test case with hive	2023-09-15 17:46:24 +08:00
daidai	657e927d50	[fix](json)Fix the bug that read json file Out of bounds access (#23411 )	2023-09-02 01:11:37 +08:00
bobhan1	4c00b1760b	[feature](partial update) Support partial update for broker load (#22970 )	2023-08-29 14:41:01 +08:00
Ashin Gau	23094a01d4	[fix](test) load data inpath will remove the data in hdfs (#22908 ) Load data from hdfs in hive will move the source directory into table's location directory, leading the error like Can not get first file, please check uri in tvf test.	2023-08-12 15:12:00 +08:00
Qi Chen	124516c1ea	[Fix](orc-reader) Fix `Wrong data type for column` error when column order in hive table is not same in orc file schema. (#21306 ) `Wrong data type for column` error when column order in hive table is not same in orc file schema. The root cause is in order to handle the following case: The table in orc format of Hive 1.x may encounter system column names such as `_col0`, `_col1`, `_col2`... in the underlying orc file schema, which need to use the column names in the hive table for mapping. ### Solution Currently fix this issue by handling the following case by specifying hive version to 1.x.x in the hive catalog configuration. ```sql CREATE CATALOG hive PROPERTIES ( 'hive.version' = '1.x.x' ); ```	2023-07-03 09:32:55 +08:00
Qi Chen	73ad885e19	[Feature][Fix](multi-catalog) Implements transactional hive full acid tables. (#20679 ) After supporting insert-only transactional hive full acid tables #19518, #19419, this PR support transactional hive full acid tables. Support hive3 transactional hive full acid tables. Hive2 transactional hive full acid tables need to run major compactions.	2023-06-13 08:55:16 +08:00
Qi Chen	4faee4d8fd	[Fix](multi-catalog) Fix be crashed when query hive table after schema changed(new column added). (#20537 ) Fix be crashed when query hive table after schema changed(new column added). Regression Test: test_hive_schema_evolution.groovy	2023-06-08 18:10:36 +08:00
Weijie Guo	9535ed01aa	[feature](tvf) Support compress file for tvf hdfs() and s3() (#19530 ) We can support this by add a new properties for tvf, like : `select * from hdfs("uri" = "xxx", ..., "compress_type" = "lz4", ...)` User can: Specify compression explicitly by setting `"compression" = "xxx"`. Doris can infer the compression type by the suffix of file name(e.g. `file1.gz`) Currently, we only support reading compress file in `csv` format, and on BE side, we already support. All need to do is to analyze the `"compress_type"` on FE side and pass it to BE.	2023-05-16 08:50:43 +08:00
Qi Chen	4418eb36a3	[Fix](multi-catalog) Fix some hive partition issues. (#19513 ) Fix some hive partition issues. 1. Fix be will crash when using hive partitions field of `date`, `timestamp`, `decimal` type. 2. Fix hdfs uri decode error when using `timestamp` partition filed which will cause some url-encoding for special chars, such as `%3A` will encode `:`.	2023-05-11 07:49:46 +08:00
zhangguoqiang	32ccf0c68d	[test](case)add external hive parquet case 0328 #18169 add case about external hive parquet	2023-03-29 09:13:03 +08:00
Mingyu Chen	500c7fb702	[improvement](multi-catalog) support unsupported column type (#15660 ) When creating an external catalog, Doris will automatically sync the schema of table from external catalog. But some of column type are not supported by Doris now, such as struct, map, etc. In previous, when meeting these unsupported column, Doris will throw an exception, and the corresponding table can not be synced. But user may just want to query other supported columns. In this PR, I add a new column type: UNSUPPORTED. And now it is just used for external table schema sync. When meeting unsupported column, it will be synced as column with UNSUPPORTED type. When query this table, there are serval situation: select * from table: throw error Unsupported type 'UNSUPPORTED_TYPE' xxx select k1 from table: k1 is with supported type. query OK. select * except(k2): k2 is with unsupported type. query OK	2023-01-08 10:07:10 +08:00
Mingyu Chen	dd7ec8f4ca	[improvement](test) add tpch1 orc for hive catalog and refactor some test dir (#14669 ) Add tpch 1g orc test case in hive docker Refactor some suites dir of catalog test cases. And "-internal" for dlf endpoint, to support access oss with aliyun vpc.	2022-11-30 10:03:58 +08:00
Mingyu Chen	064b8d2aa6	[fix](multi-catalog) fix coredump when querying partitioned hive table with text format (#14604 ) BE will crash when querying partitioned hive table with text format and put partition column at first of select items. 1. FE should use file slots to set the column mapping index of csv file. 2. BE should use `get_by_name` of block to get right column in a block in csv reader.	2022-11-26 11:42:40 +08:00

1 2

56 Commits