Commit Graph

66 Commits

Author SHA1 Message Date
0b4552f74b [cherry-pick](branch-2.1) pick hive text write from master (#40537)
## Proposed changes
pick prs:
https://github.com/apache/doris/pull/38549
https://github.com/apache/doris/pull/40183
https://github.com/apache/doris/pull/40315

---------

Co-authored-by: Calvin Kirs <kirs@apache.org>
2024-09-27 20:57:07 +08:00
7bb9ca91c8 [branch-2.1](fix) adjust data download url about hive docker (#40846)
## Proposed changes

fix paimon regression test

Co-authored-by: Dongyang Li <hello_stephen@qq.com>
Co-authored-by: stephen <hello-stephen@qq.com>
2024-09-14 23:19:54 +08:00
ca07a00c93 Revert "[branch-2.1](hive) support hive write text table (#38549) (#4… (#40157)
…0063)"

This reverts commit c6df7c21a3c09ae1664deabacb88dfcea9d94b68.

## Proposed changes

Issue Number: close #xxx

<!--Describe your changes.-->

Co-authored-by: yiguolei <yiguolei@gmail.com>
2024-08-30 10:25:38 +08:00
c6df7c21a3 [branch-2.1](hive) support hive write text table (#38549) (#40063)
1. Support write hive text table
2. Add SessionVariable `hive_text_compression` to write compressed hive
text table
3. Supported compression type: gzip, bzip2, snappy, lz4, zstd

pick from https://github.com/apache/doris/pull/38549
2024-08-29 16:50:40 +08:00
b9da934b16 [fix](hive) report error with escape char and null format (#39700) (#39869)
bp #39700

Co-authored-by: Socrates <suxiaogang223@icloud.com>
2024-08-24 09:23:03 +08:00
3da2d1c9d6 [bug](parquet)Fix the problem that the parquet reader reads the missing sub-columns of the struct and fails. (#38718) (#39192)
bp #38718
2024-08-11 20:37:40 +08:00
607c0b82a9 [opt](serde)Optimize the filling of fixed values ​​into block columns without repeated deserialization. (#37377) (#38245) (#38810)
## Proposed changes
pick pr: #38575  and fix this pr bug :  #38245
2024-08-05 09:13:08 +08:00
5d02c48715 [feature](hive)Support reading renamed Parquet Hive and Orc Hive tables. (#38432) (#38809)
bp #38432 

## Proposed changes
Add `hive_parquet_use_column_names` and `hive_orc_use_column_names`
session variables to read the table after rename column in `Hive`.

These two session variables are referenced from
`parquet_use_column_names` and `orc_use_column_names` of `Trino` hive
connector.

By default, these two session variables are true. When they are set to
false, reading orc/parquet will access the columns according to the
ordinal position in the Hive table definition.

For example:
```mysql
in Hive :
hive> create table tmp (a int , b string) stored as parquet;
hive> insert into table tmp values(1,"2");
hive> alter table tmp  change column  a new_a int;
hive> insert into table tmp values(2,"4");

in Doris :
mysql> set hive_parquet_use_column_names=true;
Query OK, 0 rows affected (0.00 sec)

mysql> select  * from tmp;
+-------+------+
| new_a | b    |
+-------+------+
|  NULL | 2    |
|     2 | 4    |
+-------+------+
2 rows in set (0.02 sec)

mysql> set hive_parquet_use_column_names=false;
Query OK, 0 rows affected (0.00 sec)

mysql> select  * from tmp;
+-------+------+
| new_a | b    |
+-------+------+
|     1 | 2    |
|     2 | 4    |
+-------+------+
2 rows in set (0.02 sec)
```

You can use `set
parquet.column.index.access/orc.force.positional.evolution = true/false`
in hive 3 to control the results of reading the table like these two
session variables. However, for the rename struct inside column parquet
table, the effects of hive and doris are different.
2024-08-05 09:06:49 +08:00
f7068b5658 [cherry-pick](branch-2.1) Make doris read hive text table parameters and behavior consistent with hive (#37840)
## Proposed changes

pick from master https://github.com/apache/doris/pull/37638

<!--Describe your changes.-->
2024-07-16 22:24:50 +08:00
81360cf897 [opt](test) shorten the external p0 running time (#37320) (#37473)
bp #37320
2024-07-09 15:35:15 +08:00
55636e8035 [test](migrate) move 3 cases from p2 to p0 (#36957) (#37264)
bp #36957

Co-authored-by: slothever <18522955+wsjz@users.noreply.github.com>
2024-07-04 20:09:59 +08:00
3613413a54 [fix](hive) support find serde info from both tbl properties and serde properties (#37043) (#37188)
bp #37043
2024-07-04 13:55:38 +08:00
bf3ea1839c [test]Mv external p2 test case to p0. (#37070) (#37140)
backport: https://github.com/apache/doris/pull/37070
2024-07-04 11:19:31 +08:00
a9f9113c48 [branch-2.1][test](external)move hive cases from p2 to p0 (#37149)
pk (#36855)
test_hive_same_db_table_name
test_hive_special_char_partition
test_complex_types
test_wide_table
2024-07-03 19:44:52 +08:00
e5695e058f [test](migrate) move 2 cases from p2 to p0 (#36935) (#37200)
bp #36935

Co-authored-by: zhangdong <493738387@qq.com>
2024-07-03 17:29:01 +08:00
e857680661 [Migrate-Test](multi-catalog) Migrate p2 tests from p2 to p0. (#37175)
Backport #36989.
2024-07-03 11:08:49 +08:00
e7e1e967cf [test](migrate) move 2 cases from p2 to p0 for 2.1 (#37139)
pick #37004
2024-07-02 22:50:53 +08:00
74086189d3 [test](tvf) move p2 tvf tests from p2 to p0 (#36871) (#37150)
bp: #36871
2024-07-02 22:37:43 +08:00
cf86eb8647 [test](migrate) move test_hive_text_complex_type from p2 to p0 (#37007) (#37123)
bp: #37007
2024-07-02 17:36:37 +08:00
fcc26cc671 [test](migrate) move some cases from p2 to p0 (#36750)(#36787) (#36922)
bp #36750 and #36787
2024-06-27 20:59:50 +08:00
bc062a2595 [fix](orc)fix orc reader missing column. (#35735)
## Proposed changes
bp #35583 
Issue Number: close #xxx

<!--Describe your changes.-->
2024-05-31 22:51:44 +08:00
68eda58a8c [Fix](multi-catalog) Fix string dict filtering when use null related function in parquet and orc reader. (#35335)
The following sql and when the dictionary column contains functions related to null, the results will be incorrect.
```
select * from ( select IF(o_orderpriority IS NULL, 'null', o_orderpriority) AS o_orderpriority from test_string_dict_filter_orc ) as A where o_orderpriority = 'null';
```
```
select * from ( select IFNULL(o_orderpriority, 'null') AS o_orderpriority from test_string_dict_filter_parquet ) as A where o_orderpriority = 'null'
```
```
select * from ( select COALESCE(o_orderpriority, 'null') AS o_orderpriority from test_string_dict_filter_parquet ) as A where o_orderpriority = 'null';
```
2024-05-27 15:25:29 +08:00
99af54f779 [Fix](orc-reader) Fix the issue when string col has mixed plain and dict encoding in different stripes. (#34146) (#34248)
backport #34146
2024-04-28 19:43:57 +08:00
acc2b532e7 [Test](hive-writer) Adjust test_hive_write_partitions regression test to resolve special characters issue with git on windows. (#34026) 2024-04-26 15:05:47 +08:00
1c025c0488 [docker](hive) add hive3 docker compose and modify scripts (#33115)
add hive3 docker compose from:
big-data-europe/docker-hive#56
2024-04-17 23:42:13 +08:00
4963d60a07 [Fix](multi-catalog)Fix the issue of not initializing the writer caused by refactoring and add hive writing regression test. (#32721) (#33446)
backport #32721.
2024-04-10 11:42:22 +08:00
73de61ed84 [opt](hive) skip hidden file and dir (#32412)
When query hive table, we should skip all hidden dirs and files, like:
```
/visible/.hidden/path
/visible/.hidden.txt
```
2024-03-21 14:07:24 +08:00
926908ece2 [fix](hive) fix spelling mistakes for "separatorChar" #32061 2024-03-12 14:20:18 +08:00
248ea20901 Revert "[test](regression) add regression test for schange change of complex …" (#31660)
This reverts commit dcd2afdb4e857791fed66a46f28ab3adc25494e1.
Reverts #31207
2024-03-01 19:06:59 +08:00
e3b4b83bca [test](regression) add regression test for schange change of complex type (#31207)
Add regression test for #31128
2024-02-22 19:50:07 +08:00
92cad69fc4 [Fix](parquet-reader) Fix reading fixed length byte array decimal in parquet reader. (#30535) 2024-01-31 23:53:40 +08:00
658c869aac [improvement](mtmv)mtmv support partition by hms table (#29989) 2024-01-29 19:02:46 +08:00
7da86c37ec [fix](hive) add support for quoteChar and seperatorChar for hive (#28613)
add support for quoteChar and seperatorChar .
2023-12-19 19:35:03 +08:00
01c94a554d [fix](autoinc) Fix broker load when target table has autoinc column (#28402) 2023-12-14 18:02:54 +08:00
a271fee3c5 [test](statistics)Add external empty table test case. (#28267) 2023-12-13 21:48:01 +08:00
60bc3be8a2 [Opt](Compression) Opt zstd block decompression by ZSTD_decompressDCtx(). (#27534)
Opt zstd block decompression by `ZSTD_decompressDCtx()` to replace streaming decompression.
It will improve performance but consume more memory. 

Test result: 
- env: 1 node(16 cores, 64G).
- parquet column: 100 million rows of char(255) column.
- result: 5.2 -> 4.6.
2023-12-01 09:10:32 +08:00
e4149c6e4c [Fix](parquet-reader) Fix null map issue in parquet reader. (#27777)
Fix null map issue in parquet reader which cause result incorrect such as `min()`, `max()`.

In order to share null map between parquet converted src column and dst column to avoid copying. It is very tricky that will call mutable function `doris_nullable_column->get_null_map_column_ptr()` which will set `_need_update_has_null = true`. Because some operations such as agg will call `has_null()` to set `_need_update_has_null = false`.
2023-11-30 13:55:37 +08:00
cc395f5428 [Fix](hive-transactional-table) Fix NPE when query empty hive transactional table. (#27563) 2023-11-25 10:29:39 +08:00
3585c7e216 [test](parquet)append parquet reader byte_array_decimal and rle_bool case (#26751) 2023-11-14 15:05:10 +08:00
22bf2889e5 [feature](tvf)(jni-avro)jni-avro scanner add complex data types (#26236)
Support avro's enum, record, union data types
2023-11-09 13:58:49 +08:00
80f654ec2a [Fix](statistics)Fix analyze min max sql syntax error. #26240 2023-11-02 09:22:32 +08:00
78204f7c92 [Fix](statistics)Fix external couldn't analyze database bug (#26025) 2023-10-31 11:32:47 +08:00
501c6096dd Revert "[Test](multi-catalog) Add tpcds sf100 hive shape. (#25639)" (#26069)
This reverts commit 3beba1764c01b6712b108556433c96429c59cc45.
2023-10-29 12:45:32 +08:00
3beba1764c [Test](multi-catalog) Add tpcds sf100 hive shape. (#25639)
Add tpcds sf100 hive shapes.

Disable query64 temporarily because it is not same with emr cluster after collecting metadata by analyze table xxx.
And the root cause need to analyze, will enable in future PR.
2023-10-27 18:39:29 +08:00
c86fad7cbd [Fix](orc-reader) Fix orc decimal128 scale issue. (#25977) 2023-10-26 08:50:18 -05:00
e7a3cb079b [Enhance](regression)docker hive s3 file address is determined based on the configuration (#25905)
docker hive s3 file address is determined based on the configuration custom_settings.env
2023-10-26 11:58:33 +08:00
ce18f1148a [improvement](catalog)compatible with paimon 0.5 (#24985)
compatible with paimon 0.5
add p0 for paimon,need set enablePaimonTest=true
2023-10-17 22:07:13 +08:00
dc0c39f1d8 [Enhance](external)change hive docker to host network and add hive case (#24401)
1. Change the external hive docker network mode from the bridge mode to the host mode to support the external test of the multi-node doris cluster
2. Added more hive test data in various formats
3. Added a test case with hive
2023-09-15 17:46:24 +08:00
657e927d50 [fix](json)Fix the bug that read json file Out of bounds access (#23411) 2023-09-02 01:11:37 +08:00
4c00b1760b [feature](partial update) Support partial update for broker load (#22970) 2023-08-29 14:41:01 +08:00