Commit Graph

256 Commits

Author SHA1 Message Date
3da2d1c9d6 [bug](parquet)Fix the problem that the parquet reader reads the missing sub-columns of the struct and fails. (#38718) (#39192)
bp #38718
2024-08-11 20:37:40 +08:00
607c0b82a9 [opt](serde)Optimize the filling of fixed values ​​into block columns without repeated deserialization. (#37377) (#38245) (#38810)
## Proposed changes
pick pr: #38575  and fix this pr bug :  #38245
2024-08-05 09:13:08 +08:00
5d02c48715 [feature](hive)Support reading renamed Parquet Hive and Orc Hive tables. (#38432) (#38809)
bp #38432 

## Proposed changes
Add `hive_parquet_use_column_names` and `hive_orc_use_column_names`
session variables to read the table after rename column in `Hive`.

These two session variables are referenced from
`parquet_use_column_names` and `orc_use_column_names` of `Trino` hive
connector.

By default, these two session variables are true. When they are set to
false, reading orc/parquet will access the columns according to the
ordinal position in the Hive table definition.

For example:
```mysql
in Hive :
hive> create table tmp (a int , b string) stored as parquet;
hive> insert into table tmp values(1,"2");
hive> alter table tmp  change column  a new_a int;
hive> insert into table tmp values(2,"4");

in Doris :
mysql> set hive_parquet_use_column_names=true;
Query OK, 0 rows affected (0.00 sec)

mysql> select  * from tmp;
+-------+------+
| new_a | b    |
+-------+------+
|  NULL | 2    |
|     2 | 4    |
+-------+------+
2 rows in set (0.02 sec)

mysql> set hive_parquet_use_column_names=false;
Query OK, 0 rows affected (0.00 sec)

mysql> select  * from tmp;
+-------+------+
| new_a | b    |
+-------+------+
|     1 | 2    |
|     2 | 4    |
+-------+------+
2 rows in set (0.02 sec)
```

You can use `set
parquet.column.index.access/orc.force.positional.evolution = true/false`
in hive 3 to control the results of reading the table like these two
session variables. However, for the rename struct inside column parquet
table, the effects of hive and doris are different.
2024-08-05 09:06:49 +08:00
c0caca7c55 [fix](ES Catalog)Fix unstable test test_es_query (#38801) (#38802)
## Proposed changes

bp #38801
2024-08-03 23:49:00 +08:00
b0943064e0 [fix](kerberos)fix and refactor ugi login for kerberos and simple authentication (#38607)
pick from  (#37301)
2024-08-01 14:01:32 +08:00
41fa7bc9fd [bugfix](paimon)Fixed the reading of timestamp with time zone type data for 2.1 (#37716) (#38592)
bp: #37716
2024-08-01 10:23:06 +08:00
ef8a1918c3 [case][fix](iceberg)move rest cases from p2 to p0 and fix iceberg version issue for 2.1 (#37898) (#38589)
bp: #37898
2024-07-31 22:41:56 +08:00
86dd2d24ce [fix](test) Modify SQLServer image to custom hub (#38515) (#38613)
pick from master #38515

Co-authored-by: zy-kkk <zhongyk10@gmail.com>
2024-07-31 19:21:28 +08:00
c011060e4f [chore](ci) adjust thirdparty docker image source for easy management… (#38558)
… (#37307)



pick from master #37307

Co-authored-by: stephen <hello-stephen@qq.com>
2024-07-31 14:47:16 +08:00
f7068b5658 [cherry-pick](branch-2.1) Make doris read hive text table parameters and behavior consistent with hive (#37840)
## Proposed changes

pick from master https://github.com/apache/doris/pull/37638

<!--Describe your changes.-->
2024-07-16 22:24:50 +08:00
bdf3e3a17e [test](docker) change the default region for docker compose (#37768) (#37813)
bp #37768
2024-07-15 22:18:33 +08:00
e5339a4014 [feature](ES Catalog)Support control scroll level by config #37180 (#37290)
## Proposed changes

backport #37180
2024-07-15 16:41:38 +08:00
ea12114549 [fix](dockerfile) Switch repos to point to to vault.centos.org because CentOS 7 is EOL (#37568) (#37763)
bp #37568
2024-07-15 15:57:56 +08:00
16de141743 [regression](kerberos)add hive kerberos docker regression env (#37657)
## Proposed changes
pick:
[regression](kerberos)fix regression pipeline env when write hosts 
(#37057)
[regression](kerberos)add hive kerberos docker regression env (#36430)
2024-07-15 09:35:39 +08:00
56a207c3f0 [case](paimon/iceberg)move cases from p2 to p0 (#37276) (#37738)
bp #37276

Co-authored-by: wuwenchi <wuwenchihdu@hotmail.com>
2024-07-13 10:01:05 +08:00
81360cf897 [opt](test) shorten the external p0 running time (#37320) (#37473)
bp #37320
2024-07-09 15:35:15 +08:00
f8cee439b6 [feature](ES Catalog) map nested/object type in ES to JSON type in Doris (#37101) (#37182)
backport #37101
2024-07-05 10:48:32 +08:00
55636e8035 [test](migrate) move 3 cases from p2 to p0 (#36957) (#37264)
bp #36957

Co-authored-by: slothever <18522955+wsjz@users.noreply.github.com>
2024-07-04 20:09:59 +08:00
3613413a54 [fix](hive) support find serde info from both tbl properties and serde properties (#37043) (#37188)
bp #37043
2024-07-04 13:55:38 +08:00
bf3ea1839c [test]Mv external p2 test case to p0. (#37070) (#37140)
backport: https://github.com/apache/doris/pull/37070
2024-07-04 11:19:31 +08:00
a9f9113c48 [branch-2.1][test](external)move hive cases from p2 to p0 (#37149)
pk (#36855)
test_hive_same_db_table_name
test_hive_special_char_partition
test_complex_types
test_wide_table
2024-07-03 19:44:52 +08:00
e5695e058f [test](migrate) move 2 cases from p2 to p0 (#36935) (#37200)
bp #36935

Co-authored-by: zhangdong <493738387@qq.com>
2024-07-03 17:29:01 +08:00
e857680661 [Migrate-Test](multi-catalog) Migrate p2 tests from p2 to p0. (#37175)
Backport #36989.
2024-07-03 11:08:49 +08:00
e7e1e967cf [test](migrate) move 2 cases from p2 to p0 for 2.1 (#37139)
pick #37004
2024-07-02 22:50:53 +08:00
74086189d3 [test](tvf) move p2 tvf tests from p2 to p0 (#36871) (#37150)
bp: #36871
2024-07-02 22:37:43 +08:00
cf86eb8647 [test](migrate) move test_hive_text_complex_type from p2 to p0 (#37007) (#37123)
bp: #37007
2024-07-02 17:36:37 +08:00
4dcceaefea [test](ES Catalog) Add test cases for ES 5.x (#34441) (#36993)
backport #34441
2024-06-28 16:58:07 +08:00
46eef9d948 [build](docker) add repo for new version of git (#35892) (#36909)
bp #35892
2024-06-27 21:00:14 +08:00
fcc26cc671 [test](migrate) move some cases from p2 to p0 (#36750)(#36787) (#36922)
bp #36750 and #36787
2024-06-27 20:59:50 +08:00
26b1ef428a [branch-2.1](doris compose) fix docker start failed (#36534) 2024-06-20 20:14:17 +08:00
ac0f6e75d2 [bugfix](iceberg)Read error when timestamp does not have time zone for 2.1 (#36435)
bp: #36141
2024-06-20 18:32:31 +08:00
9e972cb0b9 [bugfix](iceberg)Fix the datafile path error issue for 2.1 (#36066)
bp: #35957
2024-06-08 21:51:46 +08:00
a42b06a168 [branch-2.1][test](jdbc catalog) Change the db2 image address and repair test (#35967) 2024-06-06 17:21:40 +08:00
bc062a2595 [fix](orc)fix orc reader missing column. (#35735)
## Proposed changes
bp #35583 
Issue Number: close #xxx

<!--Describe your changes.-->
2024-05-31 22:51:44 +08:00
7381cd56b0 [docker](hive) sync for hive initializing (#35479)
Add healthy checking for hive2 and hive3
2024-05-29 15:03:06 +08:00
68eda58a8c [Fix](multi-catalog) Fix string dict filtering when use null related function in parquet and orc reader. (#35335)
The following sql and when the dictionary column contains functions related to null, the results will be incorrect.
```
select * from ( select IF(o_orderpriority IS NULL, 'null', o_orderpriority) AS o_orderpriority from test_string_dict_filter_orc ) as A where o_orderpriority = 'null';
```
```
select * from ( select IFNULL(o_orderpriority, 'null') AS o_orderpriority from test_string_dict_filter_parquet ) as A where o_orderpriority = 'null'
```
```
select * from ( select COALESCE(o_orderpriority, 'null') AS o_orderpriority from test_string_dict_filter_parquet ) as A where o_orderpriority = 'null';
```
2024-05-27 15:25:29 +08:00
50f50cf8cc Revert "[fix][docker] fix kafka test scritps (#33417)" (#35229)
This reverts commit c35b2becdd08ab9255b3a0c2a19d74970f621388.
2024-05-22 20:33:14 +08:00
bc70968019 [chore](regression) Modify character encoding to be consistent with Doris (#35228) 2024-05-22 20:04:50 +08:00
4ecc3edc21 [test](hive)revert hive container to host mode (#34322)
Revert hive container to host mode to fix pipeline problem
2024-05-07 10:36:01 +08:00
99af54f779 [Fix](orc-reader) Fix the issue when string col has mixed plain and dict encoding in different stripes. (#34146) (#34248)
backport #34146
2024-04-28 19:43:57 +08:00
acc2b532e7 [Test](hive-writer) Adjust test_hive_write_partitions regression test to resolve special characters issue with git on windows. (#34026) 2024-04-26 15:05:47 +08:00
7f4b7b04ad [test](hive)add subnet for hive docker compose (#34000) (#34157)
bp #34000
Co-authored-by: 苏小刚 <suxiaogang223@icloud.com>
2024-04-26 13:49:33 +08:00
50f9d47e96 [test](hive) run suite cases both in hive2 and hive3 (#33874) (#34156)
bp #33874

Co-authored-by: 苏小刚 <suxiaogang223@icloud.com>
2024-04-26 13:48:09 +08:00
1c025c0488 [docker](hive) add hive3 docker compose and modify scripts (#33115)
add hive3 docker compose from:
big-data-europe/docker-hive#56
2024-04-17 23:42:13 +08:00
87e6c94851 [docker](script)add --grace to be_prestop.sh (#33599) 2024-04-17 23:42:12 +08:00
cc103920d1 [k8s](improve)add docker resource script for k8s (#33329) 2024-04-17 23:42:00 +08:00
8c66915bb5 [fix](doris compose) Fix not show ms recycler .out log in cloud mode (#33489) 2024-04-12 15:09:25 +08:00
4963d60a07 [Fix](multi-catalog)Fix the issue of not initializing the writer caused by refactoring and add hive writing regression test. (#32721) (#33446)
backport #32721.
2024-04-10 11:42:22 +08:00
c35b2becdd [fix][docker] fix kafka test scritps (#33417)
Co-authored-by: 胥剑旭 <xujianxu@xujianxudeMacBook-Pro.local>
2024-04-09 16:11:09 +08:00
2c87238504 [enhance](S3) Print the oss request id for each error s3 request (#32491) 2024-03-21 14:07:50 +08:00