doris

Author	SHA1	Message	Date
Ashin Gau	e3b4b83bca	[test](regression) add regression test for schange change of complex type (#31207 ) Add regression test for #31128	2024-02-22 19:50:07 +08:00
Qi Chen	92cad69fc4	[Fix](parquet-reader) Fix reading fixed length byte array decimal in parquet reader. (#30535 )	2024-01-31 23:53:40 +08:00
zhangdong	658c869aac	[improvement](mtmv)mtmv support partition by hms table (#29989 )	2024-01-29 19:02:46 +08:00
wuwenchi	7da86c37ec	[fix](hive) add support for `quoteChar` and `seperatorChar` for hive (#28613 ) add support for quoteChar and seperatorChar .	2023-12-19 19:35:03 +08:00
bobhan1	01c94a554d	[fix](autoinc) Fix broker load when target table has autoinc column (#28402 )	2023-12-14 18:02:54 +08:00
Jibing-Li	a271fee3c5	[test](statistics)Add external empty table test case. (#28267 )	2023-12-13 21:48:01 +08:00
Qi Chen	60bc3be8a2	[Opt](Compression) Opt zstd block decompression by `ZSTD_decompressDCtx()`. (#27534 ) Opt zstd block decompression by `ZSTD_decompressDCtx()` to replace streaming decompression. It will improve performance but consume more memory. Test result: - env: 1 node(16 cores, 64G). - parquet column: 100 million rows of char(255) column. - result: 5.2 -> 4.6.	2023-12-01 09:10:32 +08:00
Qi Chen	e4149c6e4c	[Fix](parquet-reader) Fix null map issue in parquet reader. (#27777 ) Fix null map issue in parquet reader which cause result incorrect such as `min()`, `max()`. In order to share null map between parquet converted src column and dst column to avoid copying. It is very tricky that will call mutable function `doris_nullable_column->get_null_map_column_ptr()` which will set `_need_update_has_null = true`. Because some operations such as agg will call `has_null()` to set `_need_update_has_null = false`.	2023-11-30 13:55:37 +08:00
Qi Chen	cc395f5428	[Fix](hive-transactional-table) Fix NPE when query empty hive transactional table. (#27563 )	2023-11-25 10:29:39 +08:00
daidai	3585c7e216	[test](parquet)append parquet reader byte_array_decimal and rle_bool case (#26751 )	2023-11-14 15:05:10 +08:00
wudongliang	22bf2889e5	[feature](tvf)(jni-avro)jni-avro scanner add complex data types (#26236 ) Support avro's enum, record, union data types	2023-11-09 13:58:49 +08:00
Jibing-Li	80f654ec2a	[Fix](statistics)Fix analyze min max sql syntax error. #26240	2023-11-02 09:22:32 +08:00
Jibing-Li	78204f7c92	[Fix](statistics)Fix external couldn't analyze database bug (#26025 )	2023-10-31 11:32:47 +08:00
zy-kkk	501c6096dd	Revert "[Test](multi-catalog) Add tpcds sf100 hive shape. (#25639 )" (#26069 ) This reverts commit 3beba1764c01b6712b108556433c96429c59cc45.	2023-10-29 12:45:32 +08:00
Qi Chen	3beba1764c	[Test](multi-catalog) Add tpcds sf100 hive shape. (#25639 ) Add tpcds sf100 hive shapes. Disable query64 temporarily because it is not same with emr cluster after collecting metadata by analyze table xxx. And the root cause need to analyze, will enable in future PR.	2023-10-27 18:39:29 +08:00
Qi Chen	c86fad7cbd	[Fix](orc-reader) Fix orc decimal128 scale issue. (#25977 )	2023-10-26 08:50:18 -05:00
zhangguoqiang	e7a3cb079b	[Enhance](regression)docker hive s3 file address is determined based on the configuration (#25905 ) docker hive s3 file address is determined based on the configuration custom_settings.env	2023-10-26 11:58:33 +08:00
zhangdong	ce18f1148a	[improvement](catalog)compatible with paimon 0.5 (#24985 ) compatible with paimon 0.5 add p0 for paimon,need set enablePaimonTest=true	2023-10-17 22:07:13 +08:00
zhangguoqiang	dc0c39f1d8	[Enhance](external)change hive docker to host network and add hive case (#24401 ) 1. Change the external hive docker network mode from the bridge mode to the host mode to support the external test of the multi-node doris cluster 2. Added more hive test data in various formats 3. Added a test case with hive	2023-09-15 17:46:24 +08:00
daidai	657e927d50	[fix](json)Fix the bug that read json file Out of bounds access (#23411 )	2023-09-02 01:11:37 +08:00
bobhan1	4c00b1760b	[feature](partial update) Support partial update for broker load (#22970 )	2023-08-29 14:41:01 +08:00
Ashin Gau	23094a01d4	[fix](test) load data inpath will remove the data in hdfs (#22908 ) Load data from hdfs in hive will move the source directory into table's location directory, leading the error like Can not get first file, please check uri in tvf test.	2023-08-12 15:12:00 +08:00
Qi Chen	124516c1ea	[Fix](orc-reader) Fix `Wrong data type for column` error when column order in hive table is not same in orc file schema. (#21306 ) `Wrong data type for column` error when column order in hive table is not same in orc file schema. The root cause is in order to handle the following case: The table in orc format of Hive 1.x may encounter system column names such as `_col0`, `_col1`, `_col2`... in the underlying orc file schema, which need to use the column names in the hive table for mapping. ### Solution Currently fix this issue by handling the following case by specifying hive version to 1.x.x in the hive catalog configuration. ```sql CREATE CATALOG hive PROPERTIES ( 'hive.version' = '1.x.x' ); ```	2023-07-03 09:32:55 +08:00
Qi Chen	73ad885e19	[Feature][Fix](multi-catalog) Implements transactional hive full acid tables. (#20679 ) After supporting insert-only transactional hive full acid tables #19518, #19419, this PR support transactional hive full acid tables. Support hive3 transactional hive full acid tables. Hive2 transactional hive full acid tables need to run major compactions.	2023-06-13 08:55:16 +08:00
Qi Chen	4faee4d8fd	[Fix](multi-catalog) Fix be crashed when query hive table after schema changed(new column added). (#20537 ) Fix be crashed when query hive table after schema changed(new column added). Regression Test: test_hive_schema_evolution.groovy	2023-06-08 18:10:36 +08:00
Weijie Guo	9535ed01aa	[feature](tvf) Support compress file for tvf hdfs() and s3() (#19530 ) We can support this by add a new properties for tvf, like : `select * from hdfs("uri" = "xxx", ..., "compress_type" = "lz4", ...)` User can: Specify compression explicitly by setting `"compression" = "xxx"`. Doris can infer the compression type by the suffix of file name(e.g. `file1.gz`) Currently, we only support reading compress file in `csv` format, and on BE side, we already support. All need to do is to analyze the `"compress_type"` on FE side and pass it to BE.	2023-05-16 08:50:43 +08:00
Qi Chen	4418eb36a3	[Fix](multi-catalog) Fix some hive partition issues. (#19513 ) Fix some hive partition issues. 1. Fix be will crash when using hive partitions field of `date`, `timestamp`, `decimal` type. 2. Fix hdfs uri decode error when using `timestamp` partition filed which will cause some url-encoding for special chars, such as `%3A` will encode `:`.	2023-05-11 07:49:46 +08:00
zhangguoqiang	32ccf0c68d	[test](case)add external hive parquet case 0328 #18169 add case about external hive parquet	2023-03-29 09:13:03 +08:00
Mingyu Chen	500c7fb702	[improvement](multi-catalog) support unsupported column type (#15660 ) When creating an external catalog, Doris will automatically sync the schema of table from external catalog. But some of column type are not supported by Doris now, such as struct, map, etc. In previous, when meeting these unsupported column, Doris will throw an exception, and the corresponding table can not be synced. But user may just want to query other supported columns. In this PR, I add a new column type: UNSUPPORTED. And now it is just used for external table schema sync. When meeting unsupported column, it will be synced as column with UNSUPPORTED type. When query this table, there are serval situation: select * from table: throw error Unsupported type 'UNSUPPORTED_TYPE' xxx select k1 from table: k1 is with supported type. query OK. select * except(k2): k2 is with unsupported type. query OK	2023-01-08 10:07:10 +08:00
Mingyu Chen	dd7ec8f4ca	[improvement](test) add tpch1 orc for hive catalog and refactor some test dir (#14669 ) Add tpch 1g orc test case in hive docker Refactor some suites dir of catalog test cases. And "-internal" for dlf endpoint, to support access oss with aliyun vpc.	2022-11-30 10:03:58 +08:00
Mingyu Chen	064b8d2aa6	[fix](multi-catalog) fix coredump when querying partitioned hive table with text format (#14604 ) BE will crash when querying partitioned hive table with text format and put partition column at first of select items. 1. FE should use file slots to set the column mapping index of csv file. 2. BE should use `get_by_name` of block to get right column in a block in csv reader.	2022-11-26 11:42:40 +08:00
lsy3993	5dfe5ef965	[test](hive catalog)add hive catalog test case (#14217 )	2022-11-19 17:26:18 +08:00
Tiewei Fang	a1d02f36ac	[feature](table-valued-function) support `hdfs()` tvf (#14213 ) This pr does two things: 1. support `hdfs()` table valued function. 2. add regression test	2022-11-18 14:17:02 +08:00
Ashin Gau	44ee4386f7	[test](multi-catalog)Regression test for external hive orc table (#13762 ) Add regression test for external hive orc table. This PR has generated all basic types support by hive orc, and create a hive external table to touch them in docker environment. Functions to be tested: 1. Ensure that all types are parsed correctly 2. Ensure that the null map of all types are parsed correctly 3. Ensure that the `SearchArgument` of `OrcReader` works well 4. Only select partition columns	2022-11-17 20:36:02 +08:00
Jibing-Li	30f36070b5	[test](multi-catalog)Regression test for external hive parquet table (#13611 )	2022-11-14 14:10:10 +08:00
Tiewei Fang	c418bbd2d1	[feature-wip](new-scan) support Json reader (#13546 ) Issue Number: close #12574 This pr adds `NewJsonReader` which implements GenericReader interface to support read json format file. TODO: 1. modify `_scann_eof` later. 2. Rename `NewJsonReader` to `JsonReader` when `JsonReader` is deleted.	2022-10-26 12:52:21 +08:00
Mingyu Chen	847b80ebfa	[test](jdbc) add jdbc and hive regression test (#13143 ) 1. Modify default behavior of `build.sh` The `BUILD_JAVA_UDF` is default ON, so that jvm is needed for compilation and runtime. 2. Add docker-compose for MySQL 5.7, PostgreSQL 14 and Hive 2 See `docker/thirdparties/docker-compose`. 3. Add some regression test cases for jdbc query on MySQL, PG and Hive Catalog The default is `false`, if set to true, you need first start docker for MySQL/PG/Hive. 4. Support `if not exists` and `if exists` for create/drop resource and create/drop encryptkey	2022-10-21 15:29:27 +08:00

37 Commits