Commit Graph

64 Commits

Author SHA1 Message Date
f7068b5658 [cherry-pick](branch-2.1) Make doris read hive text table parameters and behavior consistent with hive (#37840)
## Proposed changes

pick from master https://github.com/apache/doris/pull/37638

<!--Describe your changes.-->
2024-07-16 22:24:50 +08:00
81360cf897 [opt](test) shorten the external p0 running time (#37320) (#37473)
bp #37320
2024-07-09 15:35:15 +08:00
55636e8035 [test](migrate) move 3 cases from p2 to p0 (#36957) (#37264)
bp #36957

Co-authored-by: slothever <18522955+wsjz@users.noreply.github.com>
2024-07-04 20:09:59 +08:00
3613413a54 [fix](hive) support find serde info from both tbl properties and serde properties (#37043) (#37188)
bp #37043
2024-07-04 13:55:38 +08:00
bf3ea1839c [test]Mv external p2 test case to p0. (#37070) (#37140)
backport: https://github.com/apache/doris/pull/37070
2024-07-04 11:19:31 +08:00
a9f9113c48 [branch-2.1][test](external)move hive cases from p2 to p0 (#37149)
pk (#36855)
test_hive_same_db_table_name
test_hive_special_char_partition
test_complex_types
test_wide_table
2024-07-03 19:44:52 +08:00
e5695e058f [test](migrate) move 2 cases from p2 to p0 (#36935) (#37200)
bp #36935

Co-authored-by: zhangdong <493738387@qq.com>
2024-07-03 17:29:01 +08:00
e857680661 [Migrate-Test](multi-catalog) Migrate p2 tests from p2 to p0. (#37175)
Backport #36989.
2024-07-03 11:08:49 +08:00
e7e1e967cf [test](migrate) move 2 cases from p2 to p0 for 2.1 (#37139)
pick #37004
2024-07-02 22:50:53 +08:00
74086189d3 [test](tvf) move p2 tvf tests from p2 to p0 (#36871) (#37150)
bp: #36871
2024-07-02 22:37:43 +08:00
cf86eb8647 [test](migrate) move test_hive_text_complex_type from p2 to p0 (#37007) (#37123)
bp: #37007
2024-07-02 17:36:37 +08:00
fcc26cc671 [test](migrate) move some cases from p2 to p0 (#36750)(#36787) (#36922)
bp #36750 and #36787
2024-06-27 20:59:50 +08:00
bc062a2595 [fix](orc)fix orc reader missing column. (#35735)
## Proposed changes
bp #35583 
Issue Number: close #xxx

<!--Describe your changes.-->
2024-05-31 22:51:44 +08:00
7381cd56b0 [docker](hive) sync for hive initializing (#35479)
Add healthy checking for hive2 and hive3
2024-05-29 15:03:06 +08:00
68eda58a8c [Fix](multi-catalog) Fix string dict filtering when use null related function in parquet and orc reader. (#35335)
The following sql and when the dictionary column contains functions related to null, the results will be incorrect.
```
select * from ( select IF(o_orderpriority IS NULL, 'null', o_orderpriority) AS o_orderpriority from test_string_dict_filter_orc ) as A where o_orderpriority = 'null';
```
```
select * from ( select IFNULL(o_orderpriority, 'null') AS o_orderpriority from test_string_dict_filter_parquet ) as A where o_orderpriority = 'null'
```
```
select * from ( select COALESCE(o_orderpriority, 'null') AS o_orderpriority from test_string_dict_filter_parquet ) as A where o_orderpriority = 'null';
```
2024-05-27 15:25:29 +08:00
4ecc3edc21 [test](hive)revert hive container to host mode (#34322)
Revert hive container to host mode to fix pipeline problem
2024-05-07 10:36:01 +08:00
99af54f779 [Fix](orc-reader) Fix the issue when string col has mixed plain and dict encoding in different stripes. (#34146) (#34248)
backport #34146
2024-04-28 19:43:57 +08:00
acc2b532e7 [Test](hive-writer) Adjust test_hive_write_partitions regression test to resolve special characters issue with git on windows. (#34026) 2024-04-26 15:05:47 +08:00
7f4b7b04ad [test](hive)add subnet for hive docker compose (#34000) (#34157)
bp #34000
Co-authored-by: 苏小刚 <suxiaogang223@icloud.com>
2024-04-26 13:49:33 +08:00
50f9d47e96 [test](hive) run suite cases both in hive2 and hive3 (#33874) (#34156)
bp #33874

Co-authored-by: 苏小刚 <suxiaogang223@icloud.com>
2024-04-26 13:48:09 +08:00
1c025c0488 [docker](hive) add hive3 docker compose and modify scripts (#33115)
add hive3 docker compose from:
big-data-europe/docker-hive#56
2024-04-17 23:42:13 +08:00
4963d60a07 [Fix](multi-catalog)Fix the issue of not initializing the writer caused by refactoring and add hive writing regression test. (#32721) (#33446)
backport #32721.
2024-04-10 11:42:22 +08:00
73de61ed84 [opt](hive) skip hidden file and dir (#32412)
When query hive table, we should skip all hidden dirs and files, like:
```
/visible/.hidden/path
/visible/.hidden.txt
```
2024-03-21 14:07:24 +08:00
926908ece2 [fix](hive) fix spelling mistakes for "separatorChar" #32061 2024-03-12 14:20:18 +08:00
248ea20901 Revert "[test](regression) add regression test for schange change of complex …" (#31660)
This reverts commit dcd2afdb4e857791fed66a46f28ab3adc25494e1.
Reverts #31207
2024-03-01 19:06:59 +08:00
e3b4b83bca [test](regression) add regression test for schange change of complex type (#31207)
Add regression test for #31128
2024-02-22 19:50:07 +08:00
92cad69fc4 [Fix](parquet-reader) Fix reading fixed length byte array decimal in parquet reader. (#30535) 2024-01-31 23:53:40 +08:00
658c869aac [improvement](mtmv)mtmv support partition by hms table (#29989) 2024-01-29 19:02:46 +08:00
7da86c37ec [fix](hive) add support for quoteChar and seperatorChar for hive (#28613)
add support for quoteChar and seperatorChar .
2023-12-19 19:35:03 +08:00
01c94a554d [fix](autoinc) Fix broker load when target table has autoinc column (#28402) 2023-12-14 18:02:54 +08:00
a271fee3c5 [test](statistics)Add external empty table test case. (#28267) 2023-12-13 21:48:01 +08:00
60bc3be8a2 [Opt](Compression) Opt zstd block decompression by ZSTD_decompressDCtx(). (#27534)
Opt zstd block decompression by `ZSTD_decompressDCtx()` to replace streaming decompression.
It will improve performance but consume more memory. 

Test result: 
- env: 1 node(16 cores, 64G).
- parquet column: 100 million rows of char(255) column.
- result: 5.2 -> 4.6.
2023-12-01 09:10:32 +08:00
e4149c6e4c [Fix](parquet-reader) Fix null map issue in parquet reader. (#27777)
Fix null map issue in parquet reader which cause result incorrect such as `min()`, `max()`.

In order to share null map between parquet converted src column and dst column to avoid copying. It is very tricky that will call mutable function `doris_nullable_column->get_null_map_column_ptr()` which will set `_need_update_has_null = true`. Because some operations such as agg will call `has_null()` to set `_need_update_has_null = false`.
2023-11-30 13:55:37 +08:00
cc395f5428 [Fix](hive-transactional-table) Fix NPE when query empty hive transactional table. (#27563) 2023-11-25 10:29:39 +08:00
3585c7e216 [test](parquet)append parquet reader byte_array_decimal and rle_bool case (#26751) 2023-11-14 15:05:10 +08:00
22bf2889e5 [feature](tvf)(jni-avro)jni-avro scanner add complex data types (#26236)
Support avro's enum, record, union data types
2023-11-09 13:58:49 +08:00
80f654ec2a [Fix](statistics)Fix analyze min max sql syntax error. #26240 2023-11-02 09:22:32 +08:00
78204f7c92 [Fix](statistics)Fix external couldn't analyze database bug (#26025) 2023-10-31 11:32:47 +08:00
501c6096dd Revert "[Test](multi-catalog) Add tpcds sf100 hive shape. (#25639)" (#26069)
This reverts commit 3beba1764c01b6712b108556433c96429c59cc45.
2023-10-29 12:45:32 +08:00
3beba1764c [Test](multi-catalog) Add tpcds sf100 hive shape. (#25639)
Add tpcds sf100 hive shapes.

Disable query64 temporarily because it is not same with emr cluster after collecting metadata by analyze table xxx.
And the root cause need to analyze, will enable in future PR.
2023-10-27 18:39:29 +08:00
c86fad7cbd [Fix](orc-reader) Fix orc decimal128 scale issue. (#25977) 2023-10-26 08:50:18 -05:00
e7a3cb079b [Enhance](regression)docker hive s3 file address is determined based on the configuration (#25905)
docker hive s3 file address is determined based on the configuration custom_settings.env
2023-10-26 11:58:33 +08:00
ce18f1148a [improvement](catalog)compatible with paimon 0.5 (#24985)
compatible with paimon 0.5
add p0 for paimon,need set enablePaimonTest=true
2023-10-17 22:07:13 +08:00
dc0c39f1d8 [Enhance](external)change hive docker to host network and add hive case (#24401)
1. Change the external hive docker network mode from the bridge mode to the host mode to support the external test of the multi-node doris cluster
2. Added more hive test data in various formats
3. Added a test case with hive
2023-09-15 17:46:24 +08:00
657e927d50 [fix](json)Fix the bug that read json file Out of bounds access (#23411) 2023-09-02 01:11:37 +08:00
4c00b1760b [feature](partial update) Support partial update for broker load (#22970) 2023-08-29 14:41:01 +08:00
23094a01d4 [fix](test) load data inpath will remove the data in hdfs (#22908)
Load data from hdfs in hive will move the source directory into table's location directory, leading the error like Can not get first file, please check uri in tvf test.
2023-08-12 15:12:00 +08:00
124516c1ea [Fix](orc-reader) Fix Wrong data type for column error when column order in hive table is not same in orc file schema. (#21306)
`Wrong data type for column` error when column order in hive table is not same in orc file schema.

The root cause is in order to handle the following case:

The table in orc format of Hive 1.x may encounter system column names such as `_col0`, `_col1`, `_col2`... in the underlying orc file schema, which need to use the column names in the hive table for mapping.

### Solution
Currently fix this issue by handling the following case by specifying hive version to 1.x.x in the hive catalog configuration.

```sql
CREATE CATALOG hive PROPERTIES (
    'hive.version' = '1.x.x'
);
```
2023-07-03 09:32:55 +08:00
722839e118 [Fix](multi-catalog) Fix hive transaction table regression test by adding hive-docker missing configurations. (#20832)
Fix hive transaction table regression test test_transactional_hive by adding hive-docker missing configurations of #20679. Hive need to be set these configurations to do compaction.
2023-06-16 13:08:24 +08:00
73ad885e19 [Feature][Fix](multi-catalog) Implements transactional hive full acid tables. (#20679)
After supporting insert-only transactional hive full acid tables #19518, #19419, this PR support transactional hive full acid tables.

Support hive3 transactional hive full acid tables.
Hive2 transactional hive full acid tables need to run major compactions.
2023-06-13 08:55:16 +08:00