eddea8b309
[opt](hive docker)Parallel put hive data ( #46571 ) ( #46682 )
...
Problem Summary:
Parallel put `tpch1.db`, `paimon1` and `tvf_data` hive data. Reduce the
time cost from 22m to 16m on 16C machine.
Change-Id: Ib75c57d397ce1f96d5108d4b570bcb215f31d421
2025-01-09 14:08:35 +08:00
5d2930e783
[fix](shellcheck) fix hive-metastore and enable shellcheck in docker ( #46496 ) ( #46574 )
...
cherry-pick (#46496 )
Co-authored-by: Socrates <suyiteng@selectdb.com >
2025-01-08 11:10:34 +08:00
d8c94d6392
branch-2.1: [fix](regression)fix hive translation unstable case. #46385 ( #46409 )
...
Cherry-picked from #46385
Co-authored-by: daidai <changyuwei@selectdb.com >
2025-01-04 08:59:56 +08:00
02239e4fb2
branch-2.1: [chore](regression) do not hard code S3 bucket and endpoint of hive t… #46159 ( #46169 )
...
Cherry-picked from #46159
Co-authored-by: zgxme <zhenggaoxiong@selectdb.com >
2024-12-31 11:44:36 +08:00
a380f5d222
[enchement](utf8)import enable_text_validate_utf8 session var ( #45537 ) ( #46070 )
...
bp #45537
2024-12-28 10:05:03 +08:00
303557ac70
[fix](hive)fix hive insert only translaction table. ( #45753 )
...
### What problem does this PR solve?
bp #44001 , but no hive4 acid table.
Problem Summary:
1. Fixed the issue that when reading insert translaction only tables,
there was no acid check, which caused multiple data reads (i.e., reading
data from the previous base_n).
2. Forbidden to create, insert data, and delete aicd tables.
2024-12-22 21:23:21 +08:00
7d32e4f71f
branch-2.1: [Fix](ORC) Not push down fixed char type in orc reader #45484 ( #45525 )
...
cherry-pick #45484
2024-12-19 14:06:00 +08:00
702abbff0f
[Opt](orc)Optimize the merge io when orc reader read multiple tiny stripes. ( #42004 ) ( #44239 )
...
bp #42004
Co-authored-by: kaka11chen <kaka11.chen@gmail.com >
2024-11-22 11:01:41 +08:00
3136fa48a6
branch-2.1: [chore](ci) adjust some invalid url #44261 ( #44270 )
...
Cherry-picked from #44261
Co-authored-by: Dongyang Li <lidongyang@selectdb.com >
2024-11-19 19:28:04 +08:00
48e33bfb2a
branch-2.1: [fix](hive)Fixed the issue of reading hive table with empty lzo files #43979 ( #44063 )
...
Cherry-picked from #43979
Co-authored-by: wuwenchi <wuwenchi@selectdb.com >
2024-11-16 16:14:50 +08:00
4531cd86e3
branch-2.1: [fix](regression-test) add checks for existence and successful upload of data files in hive-metastore.sh #43853 ( #43888 )
...
Cherry-picked from #43853
Co-authored-by: Socrates <suyiteng@selectdb.com >
2024-11-14 11:23:23 +08:00
a1ff02288f
branch-2.1: [fix](hive) support query hive view created by spark ( #43553 )
...
Cherry-picked from #43530
Co-authored-by: Mingyu Chen (Rayner) <morningman@163.com >
Co-authored-by: morningman <yunyou@selectdb.com >
2024-11-11 23:28:53 +08:00
cdd32d9582
[enhance](hive) support reading hive table with OpenCSVSerde #42257 ( #42940 )
...
cherry pick from #42257
Co-authored-by: Socrates <suxiaogang223@icloud.com >
2024-10-31 11:12:07 +08:00
fce4695f37
[Configuration](transactional-hive) Add skip_checking_acid_version_file session var to skip checking acid version file in some hive envs. ( #42111 )( #42225 ) ( #42939 )
...
cherry-pick (#42111 )(#42225 )
---------
Co-authored-by: Qi Chen <kaka11.chen@gmail.com >
2024-10-31 09:52:20 +08:00
157d67e7ca
[enhance](hive) Add regression-test cases for hive text ddl and hive text insert and fix reading null string bug #42200 ( #42273 )
...
cherry pick from #42200
Co-authored-by: Socrates <suxiaogang223@icloud.com >
2024-10-22 23:56:57 +08:00
38e529cd29
[cherry-pick](branch-2.1) support decimal256 for parquet reader ( #42241 )
...
## Proposed changes
pick pr: https://github.com/apache/doris/pull/41526
2024-10-22 19:42:09 +08:00
a32ad0b1f7
[cherry-pick](branch-2.1) support reading brotli compressed parquet file ( #42162 )
...
pick pr: https://github.com/apache/doris/pull/41875
2024-10-21 16:48:09 +08:00
1b901f6fcc
[cherry-pick](branch-2.1) add parquet tvf cases and fix some parquet bug ( #41931 )
...
## Proposed changes
pick pr:
https://github.com/apache/doris/pull/41683
https://github.com/apache/doris/pull/41506
https://github.com/apache/doris/pull/41338
https://github.com/apache/doris/pull/39326
---------
Co-authored-by: morningman <morningman@163.com >
2024-10-17 14:20:58 +08:00
4888c632f4
[cherry-pick](branch2.1) support escape.delim and serialization.null.format for hive text ( #41684 )
...
## Proposed changes
pick from master:
https://github.com/apache/doris/pull/40291
2024-10-15 00:08:23 +08:00
0b4552f74b
[cherry-pick](branch-2.1) pick hive text write from master ( #40537 )
...
## Proposed changes
pick prs:
https://github.com/apache/doris/pull/38549
https://github.com/apache/doris/pull/40183
https://github.com/apache/doris/pull/40315
---------
Co-authored-by: Calvin Kirs <kirs@apache.org >
2024-09-27 20:57:07 +08:00
7bb9ca91c8
[branch-2.1](fix) adjust data download url about hive docker ( #40846 )
...
## Proposed changes
fix paimon regression test
Co-authored-by: Dongyang Li <hello_stephen@qq.com >
Co-authored-by: stephen <hello-stephen@qq.com >
2024-09-14 23:19:54 +08:00
ca07a00c93
Revert "[branch-2.1](hive) support hive write text table ( #38549 ) (#4… ( #40157 )
...
…0063)"
This reverts commit c6df7c21a3c09ae1664deabacb88dfcea9d94b68.
## Proposed changes
Issue Number: close #xxx
<!--Describe your changes.-->
Co-authored-by: yiguolei <yiguolei@gmail.com >
2024-08-30 10:25:38 +08:00
c6df7c21a3
[branch-2.1](hive) support hive write text table ( #38549 ) ( #40063 )
...
1. Support write hive text table
2. Add SessionVariable `hive_text_compression` to write compressed hive
text table
3. Supported compression type: gzip, bzip2, snappy, lz4, zstd
pick from https://github.com/apache/doris/pull/38549
2024-08-29 16:50:40 +08:00
b9da934b16
[fix](hive) report error with escape char and null format ( #39700 ) ( #39869 )
...
bp #39700
Co-authored-by: Socrates <suxiaogang223@icloud.com >
2024-08-24 09:23:03 +08:00
3da2d1c9d6
[bug](parquet)Fix the problem that the parquet reader reads the missing sub-columns of the struct and fails. ( #38718 ) ( #39192 )
...
bp #38718
2024-08-11 20:37:40 +08:00
607c0b82a9
[opt](serde)Optimize the filling of fixed values into block columns without repeated deserialization. ( #37377 ) ( #38245 ) ( #38810 )
...
## Proposed changes
pick pr: #38575 and fix this pr bug : #38245
2024-08-05 09:13:08 +08:00
5d02c48715
[feature](hive)Support reading renamed Parquet Hive and Orc Hive tables. ( #38432 ) ( #38809 )
...
bp #38432
## Proposed changes
Add `hive_parquet_use_column_names` and `hive_orc_use_column_names`
session variables to read the table after rename column in `Hive`.
These two session variables are referenced from
`parquet_use_column_names` and `orc_use_column_names` of `Trino` hive
connector.
By default, these two session variables are true. When they are set to
false, reading orc/parquet will access the columns according to the
ordinal position in the Hive table definition.
For example:
```mysql
in Hive :
hive> create table tmp (a int , b string) stored as parquet;
hive> insert into table tmp values(1,"2");
hive> alter table tmp change column a new_a int;
hive> insert into table tmp values(2,"4");
in Doris :
mysql> set hive_parquet_use_column_names=true;
Query OK, 0 rows affected (0.00 sec)
mysql> select * from tmp;
+-------+------+
| new_a | b |
+-------+------+
| NULL | 2 |
| 2 | 4 |
+-------+------+
2 rows in set (0.02 sec)
mysql> set hive_parquet_use_column_names=false;
Query OK, 0 rows affected (0.00 sec)
mysql> select * from tmp;
+-------+------+
| new_a | b |
+-------+------+
| 1 | 2 |
| 2 | 4 |
+-------+------+
2 rows in set (0.02 sec)
```
You can use `set
parquet.column.index.access/orc.force.positional.evolution = true/false`
in hive 3 to control the results of reading the table like these two
session variables. However, for the rename struct inside column parquet
table, the effects of hive and doris are different.
2024-08-05 09:06:49 +08:00
f7068b5658
[cherry-pick](branch-2.1) Make doris read hive text table parameters and behavior consistent with hive ( #37840 )
...
## Proposed changes
pick from master https://github.com/apache/doris/pull/37638
<!--Describe your changes.-->
2024-07-16 22:24:50 +08:00
81360cf897
[opt](test) shorten the external p0 running time ( #37320 ) ( #37473 )
...
bp #37320
2024-07-09 15:35:15 +08:00
55636e8035
[test](migrate) move 3 cases from p2 to p0 ( #36957 ) ( #37264 )
...
bp #36957
Co-authored-by: slothever <18522955+wsjz@users.noreply.github.com >
2024-07-04 20:09:59 +08:00
3613413a54
[fix](hive) support find serde info from both tbl properties and serde properties ( #37043 ) ( #37188 )
...
bp #37043
2024-07-04 13:55:38 +08:00
bf3ea1839c
[test]Mv external p2 test case to p0. ( #37070 ) ( #37140 )
...
backport: https://github.com/apache/doris/pull/37070
2024-07-04 11:19:31 +08:00
a9f9113c48
[branch-2.1][test](external)move hive cases from p2 to p0 ( #37149 )
...
pk (#36855 )
test_hive_same_db_table_name
test_hive_special_char_partition
test_complex_types
test_wide_table
2024-07-03 19:44:52 +08:00
e5695e058f
[test](migrate) move 2 cases from p2 to p0 ( #36935 ) ( #37200 )
...
bp #36935
Co-authored-by: zhangdong <493738387@qq.com >
2024-07-03 17:29:01 +08:00
e857680661
[Migrate-Test](multi-catalog) Migrate p2 tests from p2 to p0. ( #37175 )
...
Backport #36989 .
2024-07-03 11:08:49 +08:00
e7e1e967cf
[test](migrate) move 2 cases from p2 to p0 for 2.1 ( #37139 )
...
pick #37004
2024-07-02 22:50:53 +08:00
74086189d3
[test](tvf) move p2 tvf tests from p2 to p0 ( #36871 ) ( #37150 )
...
bp: #36871
2024-07-02 22:37:43 +08:00
cf86eb8647
[test](migrate) move test_hive_text_complex_type from p2 to p0 ( #37007 ) ( #37123 )
...
bp: #37007
2024-07-02 17:36:37 +08:00
fcc26cc671
[test](migrate) move some cases from p2 to p0 ( #36750 )( #36787 ) ( #36922 )
...
bp #36750 and #36787
2024-06-27 20:59:50 +08:00
bc062a2595
[fix](orc)fix orc reader missing column. ( #35735 )
...
## Proposed changes
bp #35583
Issue Number: close #xxx
<!--Describe your changes.-->
2024-05-31 22:51:44 +08:00
68eda58a8c
[Fix](multi-catalog) Fix string dict filtering when use null related function in parquet and orc reader. ( #35335 )
...
The following sql and when the dictionary column contains functions related to null, the results will be incorrect.
```
select * from ( select IF(o_orderpriority IS NULL, 'null', o_orderpriority) AS o_orderpriority from test_string_dict_filter_orc ) as A where o_orderpriority = 'null';
```
```
select * from ( select IFNULL(o_orderpriority, 'null') AS o_orderpriority from test_string_dict_filter_parquet ) as A where o_orderpriority = 'null'
```
```
select * from ( select COALESCE(o_orderpriority, 'null') AS o_orderpriority from test_string_dict_filter_parquet ) as A where o_orderpriority = 'null';
```
2024-05-27 15:25:29 +08:00
99af54f779
[Fix](orc-reader) Fix the issue when string col has mixed plain and dict encoding in different stripes. ( #34146 ) ( #34248 )
...
backport #34146
2024-04-28 19:43:57 +08:00
acc2b532e7
[Test](hive-writer) Adjust test_hive_write_partitions regression test to resolve special characters issue with git on windows. ( #34026 )
2024-04-26 15:05:47 +08:00
1c025c0488
[docker](hive) add hive3 docker compose and modify scripts ( #33115 )
...
add hive3 docker compose from:
big-data-europe/docker-hive#56
2024-04-17 23:42:13 +08:00
4963d60a07
[Fix](multi-catalog)Fix the issue of not initializing the writer caused by refactoring and add hive writing regression test. ( #32721 ) ( #33446 )
...
backport #32721 .
2024-04-10 11:42:22 +08:00
73de61ed84
[opt](hive) skip hidden file and dir ( #32412 )
...
When query hive table, we should skip all hidden dirs and files, like:
```
/visible/.hidden/path
/visible/.hidden.txt
```
2024-03-21 14:07:24 +08:00
926908ece2
[fix](hive) fix spelling mistakes for "separatorChar" #32061
2024-03-12 14:20:18 +08:00
248ea20901
Revert "[test](regression) add regression test for schange change of complex …" ( #31660 )
...
This reverts commit dcd2afdb4e857791fed66a46f28ab3adc25494e1.
Reverts #31207
2024-03-01 19:06:59 +08:00
e3b4b83bca
[test](regression) add regression test for schange change of complex type ( #31207 )
...
Add regression test for #31128
2024-02-22 19:50:07 +08:00
92cad69fc4
[Fix](parquet-reader) Fix reading fixed length byte array decimal in parquet reader. ( #30535 )
2024-01-31 23:53:40 +08:00