Commit Graph

266 Commits

Author SHA1 Message Date
8104b992d1 [fix](ES Catalog)Do not extract doc_values of field with ignore_above setting (#40314) (#40464)
bp #40314
2024-09-06 16:25:30 +08:00
ca07a00c93 Revert "[branch-2.1](hive) support hive write text table (#38549) (#4… (#40157)
…0063)"

This reverts commit c6df7c21a3c09ae1664deabacb88dfcea9d94b68.

## Proposed changes

Issue Number: close #xxx

<!--Describe your changes.-->

Co-authored-by: yiguolei <yiguolei@gmail.com>
2024-08-30 10:25:38 +08:00
c6df7c21a3 [branch-2.1](hive) support hive write text table (#38549) (#40063)
1. Support write hive text table
2. Add SessionVariable `hive_text_compression` to write compressed hive
text table
3. Supported compression type: gzip, bzip2, snappy, lz4, zstd

pick from https://github.com/apache/doris/pull/38549
2024-08-29 16:50:40 +08:00
b9da934b16 [fix](hive) report error with escape char and null format (#39700) (#39869)
bp #39700

Co-authored-by: Socrates <suxiaogang223@icloud.com>
2024-08-24 09:23:03 +08:00
cf698fb615 [fix](regression) fix some jdbc datasource docker health check (#39141) (#39872) 2024-08-24 03:29:18 +08:00
508c7a7040 [fix](hive)Modify the Hive notification event processing method when using meta cache and add parameters to the Hive catalog. (#39239) (#39865)
bp #39239

Co-authored-by: daidai <2017501503@qq.com>
2024-08-23 23:21:02 +08:00
40a58b9e42 [branch-2.1][regression test](jdbc catalog) Enable CLICKHOUSE_ALWAYS_RUN_INITDB_SCRIPTS for clickhouse docker (#39667)
pick (#39425) #39693
2024-08-23 09:59:03 +08:00
27ba2542e2 [case](iceberg)append iceberg schema change case. (#38766) (#39630)
bp #38766
2024-08-21 09:17:12 +08:00
2948b5ea2b [branch-2.1][fix](jdbc scan) Remove the conjuncts.remove call in JdbcScan (#39407)
pick (#39180)

In #37565, due to the change in the calling order of finalize, the final
generated Plan will be missing the PREDICATES that have been pushed down
in Jdbc. Although this behavior is correct, before perfectly handling
the push down of various PREDICATES, we need to keep all conjuncts to
ensure that we can still filter data normally when the data returned by
Jdbc is a superset.
2024-08-16 19:01:40 +08:00
43cc8d648d [fix](ES Catalog)Check isArray before parse json to array (#39104) (#39273)
## Proposed changes

bp #39104
2024-08-13 15:13:40 +08:00
3da2d1c9d6 [bug](parquet)Fix the problem that the parquet reader reads the missing sub-columns of the struct and fails. (#38718) (#39192)
bp #38718
2024-08-11 20:37:40 +08:00
607c0b82a9 [opt](serde)Optimize the filling of fixed values ​​into block columns without repeated deserialization. (#37377) (#38245) (#38810)
## Proposed changes
pick pr: #38575  and fix this pr bug :  #38245
2024-08-05 09:13:08 +08:00
5d02c48715 [feature](hive)Support reading renamed Parquet Hive and Orc Hive tables. (#38432) (#38809)
bp #38432 

## Proposed changes
Add `hive_parquet_use_column_names` and `hive_orc_use_column_names`
session variables to read the table after rename column in `Hive`.

These two session variables are referenced from
`parquet_use_column_names` and `orc_use_column_names` of `Trino` hive
connector.

By default, these two session variables are true. When they are set to
false, reading orc/parquet will access the columns according to the
ordinal position in the Hive table definition.

For example:
```mysql
in Hive :
hive> create table tmp (a int , b string) stored as parquet;
hive> insert into table tmp values(1,"2");
hive> alter table tmp  change column  a new_a int;
hive> insert into table tmp values(2,"4");

in Doris :
mysql> set hive_parquet_use_column_names=true;
Query OK, 0 rows affected (0.00 sec)

mysql> select  * from tmp;
+-------+------+
| new_a | b    |
+-------+------+
|  NULL | 2    |
|     2 | 4    |
+-------+------+
2 rows in set (0.02 sec)

mysql> set hive_parquet_use_column_names=false;
Query OK, 0 rows affected (0.00 sec)

mysql> select  * from tmp;
+-------+------+
| new_a | b    |
+-------+------+
|     1 | 2    |
|     2 | 4    |
+-------+------+
2 rows in set (0.02 sec)
```

You can use `set
parquet.column.index.access/orc.force.positional.evolution = true/false`
in hive 3 to control the results of reading the table like these two
session variables. However, for the rename struct inside column parquet
table, the effects of hive and doris are different.
2024-08-05 09:06:49 +08:00
c0caca7c55 [fix](ES Catalog)Fix unstable test test_es_query (#38801) (#38802)
## Proposed changes

bp #38801
2024-08-03 23:49:00 +08:00
b0943064e0 [fix](kerberos)fix and refactor ugi login for kerberos and simple authentication (#38607)
pick from  (#37301)
2024-08-01 14:01:32 +08:00
41fa7bc9fd [bugfix](paimon)Fixed the reading of timestamp with time zone type data for 2.1 (#37716) (#38592)
bp: #37716
2024-08-01 10:23:06 +08:00
ef8a1918c3 [case][fix](iceberg)move rest cases from p2 to p0 and fix iceberg version issue for 2.1 (#37898) (#38589)
bp: #37898
2024-07-31 22:41:56 +08:00
86dd2d24ce [fix](test) Modify SQLServer image to custom hub (#38515) (#38613)
pick from master #38515

Co-authored-by: zy-kkk <zhongyk10@gmail.com>
2024-07-31 19:21:28 +08:00
c011060e4f [chore](ci) adjust thirdparty docker image source for easy management… (#38558)
… (#37307)



pick from master #37307

Co-authored-by: stephen <hello-stephen@qq.com>
2024-07-31 14:47:16 +08:00
f7068b5658 [cherry-pick](branch-2.1) Make doris read hive text table parameters and behavior consistent with hive (#37840)
## Proposed changes

pick from master https://github.com/apache/doris/pull/37638

<!--Describe your changes.-->
2024-07-16 22:24:50 +08:00
bdf3e3a17e [test](docker) change the default region for docker compose (#37768) (#37813)
bp #37768
2024-07-15 22:18:33 +08:00
e5339a4014 [feature](ES Catalog)Support control scroll level by config #37180 (#37290)
## Proposed changes

backport #37180
2024-07-15 16:41:38 +08:00
ea12114549 [fix](dockerfile) Switch repos to point to to vault.centos.org because CentOS 7 is EOL (#37568) (#37763)
bp #37568
2024-07-15 15:57:56 +08:00
16de141743 [regression](kerberos)add hive kerberos docker regression env (#37657)
## Proposed changes
pick:
[regression](kerberos)fix regression pipeline env when write hosts 
(#37057)
[regression](kerberos)add hive kerberos docker regression env (#36430)
2024-07-15 09:35:39 +08:00
56a207c3f0 [case](paimon/iceberg)move cases from p2 to p0 (#37276) (#37738)
bp #37276

Co-authored-by: wuwenchi <wuwenchihdu@hotmail.com>
2024-07-13 10:01:05 +08:00
81360cf897 [opt](test) shorten the external p0 running time (#37320) (#37473)
bp #37320
2024-07-09 15:35:15 +08:00
f8cee439b6 [feature](ES Catalog) map nested/object type in ES to JSON type in Doris (#37101) (#37182)
backport #37101
2024-07-05 10:48:32 +08:00
55636e8035 [test](migrate) move 3 cases from p2 to p0 (#36957) (#37264)
bp #36957

Co-authored-by: slothever <18522955+wsjz@users.noreply.github.com>
2024-07-04 20:09:59 +08:00
3613413a54 [fix](hive) support find serde info from both tbl properties and serde properties (#37043) (#37188)
bp #37043
2024-07-04 13:55:38 +08:00
bf3ea1839c [test]Mv external p2 test case to p0. (#37070) (#37140)
backport: https://github.com/apache/doris/pull/37070
2024-07-04 11:19:31 +08:00
a9f9113c48 [branch-2.1][test](external)move hive cases from p2 to p0 (#37149)
pk (#36855)
test_hive_same_db_table_name
test_hive_special_char_partition
test_complex_types
test_wide_table
2024-07-03 19:44:52 +08:00
e5695e058f [test](migrate) move 2 cases from p2 to p0 (#36935) (#37200)
bp #36935

Co-authored-by: zhangdong <493738387@qq.com>
2024-07-03 17:29:01 +08:00
e857680661 [Migrate-Test](multi-catalog) Migrate p2 tests from p2 to p0. (#37175)
Backport #36989.
2024-07-03 11:08:49 +08:00
e7e1e967cf [test](migrate) move 2 cases from p2 to p0 for 2.1 (#37139)
pick #37004
2024-07-02 22:50:53 +08:00
74086189d3 [test](tvf) move p2 tvf tests from p2 to p0 (#36871) (#37150)
bp: #36871
2024-07-02 22:37:43 +08:00
cf86eb8647 [test](migrate) move test_hive_text_complex_type from p2 to p0 (#37007) (#37123)
bp: #37007
2024-07-02 17:36:37 +08:00
4dcceaefea [test](ES Catalog) Add test cases for ES 5.x (#34441) (#36993)
backport #34441
2024-06-28 16:58:07 +08:00
46eef9d948 [build](docker) add repo for new version of git (#35892) (#36909)
bp #35892
2024-06-27 21:00:14 +08:00
fcc26cc671 [test](migrate) move some cases from p2 to p0 (#36750)(#36787) (#36922)
bp #36750 and #36787
2024-06-27 20:59:50 +08:00
26b1ef428a [branch-2.1](doris compose) fix docker start failed (#36534) 2024-06-20 20:14:17 +08:00
ac0f6e75d2 [bugfix](iceberg)Read error when timestamp does not have time zone for 2.1 (#36435)
bp: #36141
2024-06-20 18:32:31 +08:00
9e972cb0b9 [bugfix](iceberg)Fix the datafile path error issue for 2.1 (#36066)
bp: #35957
2024-06-08 21:51:46 +08:00
a42b06a168 [branch-2.1][test](jdbc catalog) Change the db2 image address and repair test (#35967) 2024-06-06 17:21:40 +08:00
bc062a2595 [fix](orc)fix orc reader missing column. (#35735)
## Proposed changes
bp #35583 
Issue Number: close #xxx

<!--Describe your changes.-->
2024-05-31 22:51:44 +08:00
7381cd56b0 [docker](hive) sync for hive initializing (#35479)
Add healthy checking for hive2 and hive3
2024-05-29 15:03:06 +08:00
68eda58a8c [Fix](multi-catalog) Fix string dict filtering when use null related function in parquet and orc reader. (#35335)
The following sql and when the dictionary column contains functions related to null, the results will be incorrect.
```
select * from ( select IF(o_orderpriority IS NULL, 'null', o_orderpriority) AS o_orderpriority from test_string_dict_filter_orc ) as A where o_orderpriority = 'null';
```
```
select * from ( select IFNULL(o_orderpriority, 'null') AS o_orderpriority from test_string_dict_filter_parquet ) as A where o_orderpriority = 'null'
```
```
select * from ( select COALESCE(o_orderpriority, 'null') AS o_orderpriority from test_string_dict_filter_parquet ) as A where o_orderpriority = 'null';
```
2024-05-27 15:25:29 +08:00
50f50cf8cc Revert "[fix][docker] fix kafka test scritps (#33417)" (#35229)
This reverts commit c35b2becdd08ab9255b3a0c2a19d74970f621388.
2024-05-22 20:33:14 +08:00
bc70968019 [chore](regression) Modify character encoding to be consistent with Doris (#35228) 2024-05-22 20:04:50 +08:00
4ecc3edc21 [test](hive)revert hive container to host mode (#34322)
Revert hive container to host mode to fix pipeline problem
2024-05-07 10:36:01 +08:00
99af54f779 [Fix](orc-reader) Fix the issue when string col has mixed plain and dict encoding in different stripes. (#34146) (#34248)
backport #34146
2024-04-28 19:43:57 +08:00