doris

Author	SHA1	Message	Date
github-actions[bot]	5c344ea043	branch-2.1: [opt](docker) add a script flag to control load data or not #51065 (#51083 ) Cherry-picked from #51065 Co-authored-by: zgxme <zhenggaoxiong@selectdb.com>	2025-05-21 12:09:07 +08:00
github-actions[bot]	13fbc9efa6	branch-2.1: [fix](hive) fix write hive partition by Doris #50864 (#50921 ) Cherry-picked from #50864 Co-authored-by: Socrates <suxiaogang223@icloud.com>	2025-05-17 16:14:23 +08:00
Socrates	0710d9b2d6	branch-2.1: [fix](orc) Should not pass selection vector when decode child column of List or Map #50136 (#50316 ) bp: #50136	2025-04-25 09:04:06 +08:00
Socrates	94986fc574	branch-2.1: [fix](multi-catalog) Fix bug: "Can not create a Path from an empty string" (#49382 ) (#49641 ) ### What problem does this PR solve? Problem Summary: In HiveMetaStoreCache, the function FileInputFormat.setInputPaths is used to set input paths. However, this function splits paths using commas, which is not the expected behavior. As a result, when partition values contain commas, it leads to incorrect path parsing and potential errors. ```java public static void setInputPaths(JobConf conf, String org.apache.hadoop.shaded.com.aSeparatedPaths) { setInputPaths(conf, StringUtils.stringToPath( getPathStrings(org.apache.hadoop.shaded.com.aSeparatedPaths))); } ``` To prevent FileInputFormat.setInputPaths from splitting paths by commas, we use another overloaded version of the method. Instead of passing a comma-separated string, we explicitly pass a Path object, ensuring that partition values containing commas are handled correctly. ```java public static void setInputPaths(JobConf conf, Path... inputPaths) { Path path = new Path(conf.getWorkingDirectory(), inputPaths[0]); StringBuffer str = new StringBuffer(StringUtils.escapeString(path.toString())); for(int i = 1; i < inputPaths.length;i++) { str.append(StringUtils.COMMA_STR); path = new Path(conf.getWorkingDirectory(), inputPaths[i]); str.append(StringUtils.escapeString(path.toString())); } conf.set(org.apache.hadoop.shaded.org.apache.hadoop.mapreduce.lib.input. FileInputFormat.INPUT_DIR, str.toString()); } ``` ### Release note None	2025-03-29 09:13:43 +08:00
github-actions[bot]	226f848ad8	branch-2.1: [fix](hive docker)Table `partition_location_1` miss data #47539 (#47559 ) Cherry-picked from #47539 Co-authored-by: Thearas <gaozifeng@selectdb.com>	2025-02-07 11:21:47 +08:00
github-actions[bot]	af55eba242	branch-2.1: [opt](hive docker)Exit on creating table failed #47390 (#47453 )	2025-01-26 17:28:20 +08:00
Thearas	7c9d64d79a	[opt](iceberg docker)Add health check for iceberg rest container (#46767 ) (#47422 )	2025-01-25 09:04:27 +08:00
Thearas	eddea8b309	[opt](hive docker)Parallel put hive data (#46571 ) (#46682 ) Problem Summary: Parallel put `tpch1.db`, `paimon1` and `tvf_data` hive data. Reduce the time cost from 22m to 16m on 16C machine. Change-Id: Ib75c57d397ce1f96d5108d4b570bcb215f31d421	2025-01-09 14:08:35 +08:00
Mingyu Chen (Rayner)	5d2930e783	[fix](shellcheck) fix hive-metastore and enable shellcheck in docker (#46496 ) (#46574 ) cherry-pick (#46496) Co-authored-by: Socrates <suyiteng@selectdb.com>	2025-01-08 11:10:34 +08:00
github-actions[bot]	d8c94d6392	branch-2.1: [fix](regression)fix hive translation unstable case. #46385 (#46409 ) Cherry-picked from #46385 Co-authored-by: daidai <changyuwei@selectdb.com>	2025-01-04 08:59:56 +08:00
github-actions[bot]	02239e4fb2	branch-2.1: [chore](regression) do not hard code S3 bucket and endpoint of hive t… #46159 (#46169 ) Cherry-picked from #46159 Co-authored-by: zgxme <zhenggaoxiong@selectdb.com>	2024-12-31 11:44:36 +08:00
daidai	a380f5d222	[enchement](utf8)import enable_text_validate_utf8 session var (#45537 ) (#46070 ) bp #45537	2024-12-28 10:05:03 +08:00
daidai	303557ac70	[fix](hive)fix hive insert only translaction table. (#45753 ) ### What problem does this PR solve? bp #44001 , but no hive4 acid table. Problem Summary: 1. Fixed the issue that when reading insert translaction only tables, there was no acid check, which caused multiple data reads (i.e., reading data from the previous base_n). 2. Forbidden to create, insert data, and delete aicd tables.	2024-12-22 21:23:21 +08:00
Socrates	7d32e4f71f	branch-2.1: [Fix](ORC) Not push down fixed char type in orc reader #45484 (#45525 ) cherry-pick #45484	2024-12-19 14:06:00 +08:00
daidai	702abbff0f	[Opt](orc)Optimize the merge io when orc reader read multiple tiny stripes. (#42004 ) (#44239 ) bp #42004 Co-authored-by: kaka11chen <kaka11.chen@gmail.com>	2024-11-22 11:01:41 +08:00
github-actions[bot]	3136fa48a6	branch-2.1: [chore](ci) adjust some invalid url #44261 (#44270 ) Cherry-picked from #44261 Co-authored-by: Dongyang Li <lidongyang@selectdb.com>	2024-11-19 19:28:04 +08:00
github-actions[bot]	48e33bfb2a	branch-2.1: [fix](hive)Fixed the issue of reading hive table with empty lzo files #43979 (#44063 ) Cherry-picked from #43979 Co-authored-by: wuwenchi <wuwenchi@selectdb.com>	2024-11-16 16:14:50 +08:00
github-actions[bot]	4531cd86e3	branch-2.1: [fix](regression-test) add checks for existence and successful upload of data files in hive-metastore.sh #43853 (#43888 ) Cherry-picked from #43853 Co-authored-by: Socrates <suyiteng@selectdb.com>	2024-11-14 11:23:23 +08:00
github-actions[bot]	a1ff02288f	branch-2.1: [fix](hive) support query hive view created by spark (#43553 ) Cherry-picked from #43530 Co-authored-by: Mingyu Chen (Rayner) <morningman@163.com> Co-authored-by: morningman <yunyou@selectdb.com>	2024-11-11 23:28:53 +08:00
Mingyu Chen (Rayner)	cdd32d9582	[enhance](hive) support reading hive table with OpenCSVSerde #42257 (#42940 ) cherry pick from #42257 Co-authored-by: Socrates <suxiaogang223@icloud.com>	2024-10-31 11:12:07 +08:00
Mingyu Chen (Rayner)	fce4695f37	[Configuration](transactional-hive) Add `skip_checking_acid_version_file` session var to skip checking acid version file in some hive envs. (#42111 )(#42225 ) (#42939 ) cherry-pick (#42111)(#42225) --------- Co-authored-by: Qi Chen <kaka11.chen@gmail.com>	2024-10-31 09:52:20 +08:00
Rayner Chen	157d67e7ca	[enhance](hive) Add regression-test cases for hive text ddl and hive text insert and fix reading null string bug #42200 (#42273 ) cherry pick from #42200 Co-authored-by: Socrates <suxiaogang223@icloud.com>	2024-10-22 23:56:57 +08:00
Socrates	38e529cd29	[cherry-pick](branch-2.1) support decimal256 for parquet reader (#42241 ) ## Proposed changes pick pr: https://github.com/apache/doris/pull/41526	2024-10-22 19:42:09 +08:00
Socrates	a32ad0b1f7	[cherry-pick](branch-2.1) support reading brotli compressed parquet file (#42162 ) pick pr: https://github.com/apache/doris/pull/41875	2024-10-21 16:48:09 +08:00
Socrates	1b901f6fcc	[cherry-pick](branch-2.1) add parquet tvf cases and fix some parquet bug (#41931 ) ## Proposed changes pick pr: https://github.com/apache/doris/pull/41683 https://github.com/apache/doris/pull/41506 https://github.com/apache/doris/pull/41338 https://github.com/apache/doris/pull/39326 --------- Co-authored-by: morningman <morningman@163.com>	2024-10-17 14:20:58 +08:00
Socrates	4888c632f4	[cherry-pick](branch2.1) support escape.delim and serialization.null.format for hive text (#41684 ) ## Proposed changes pick from master: https://github.com/apache/doris/pull/40291	2024-10-15 00:08:23 +08:00
Socrates	0b4552f74b	[cherry-pick](branch-2.1) pick hive text write from master (#40537 ) ## Proposed changes pick prs: https://github.com/apache/doris/pull/38549 https://github.com/apache/doris/pull/40183 https://github.com/apache/doris/pull/40315 --------- Co-authored-by: Calvin Kirs <kirs@apache.org>	2024-09-27 20:57:07 +08:00
Socrates	7bb9ca91c8	[branch-2.1](fix) adjust data download url about hive docker (#40846 ) ## Proposed changes fix paimon regression test Co-authored-by: Dongyang Li <hello_stephen@qq.com> Co-authored-by: stephen <hello-stephen@qq.com>	2024-09-14 23:19:54 +08:00
yiguolei	ca07a00c93	Revert "[branch-2.1](hive) support hive write text table (#38549 ) (#4… (#40157 ) …0063)" This reverts commit c6df7c21a3c09ae1664deabacb88dfcea9d94b68. ## Proposed changes Issue Number: close #xxx <!--Describe your changes.--> Co-authored-by: yiguolei <yiguolei@gmail.com>	2024-08-30 10:25:38 +08:00
Socrates	c6df7c21a3	[branch-2.1](hive) support hive write text table (#38549 ) (#40063 ) 1. Support write hive text table 2. Add SessionVariable `hive_text_compression` to write compressed hive text table 3. Supported compression type: gzip, bzip2, snappy, lz4, zstd pick from https://github.com/apache/doris/pull/38549	2024-08-29 16:50:40 +08:00
Mingyu Chen	b9da934b16	[fix](hive) report error with escape char and null format (#39700 ) (#39869 ) bp #39700 Co-authored-by: Socrates <suxiaogang223@icloud.com>	2024-08-24 09:23:03 +08:00
daidai	3da2d1c9d6	[bug](parquet)Fix the problem that the parquet reader reads the missing sub-columns of the struct and fails. (#38718 ) (#39192 ) bp #38718	2024-08-11 20:37:40 +08:00
daidai	607c0b82a9	[opt](serde)Optimize the filling of fixed values into block columns without repeated deserialization. (#37377 ) (#38245 ) (#38810 ) ## Proposed changes pick pr: #38575 and fix this pr bug : #38245	2024-08-05 09:13:08 +08:00
daidai	5d02c48715	[feature](hive)Support reading renamed Parquet Hive and Orc Hive tables. (#38432 ) (#38809 ) bp #38432 ## Proposed changes Add `hive_parquet_use_column_names` and `hive_orc_use_column_names` session variables to read the table after rename column in `Hive`. These two session variables are referenced from `parquet_use_column_names` and `orc_use_column_names` of `Trino` hive connector. By default, these two session variables are true. When they are set to false, reading orc/parquet will access the columns according to the ordinal position in the Hive table definition. For example: ```mysql in Hive : hive> create table tmp (a int , b string) stored as parquet; hive> insert into table tmp values(1,"2"); hive> alter table tmp change column a new_a int; hive> insert into table tmp values(2,"4"); in Doris : mysql> set hive_parquet_use_column_names=true; Query OK, 0 rows affected (0.00 sec) mysql> select * from tmp; +-------+------+ \| new_a \| b \| +-------+------+ \| NULL \| 2 \| \| 2 \| 4 \| +-------+------+ 2 rows in set (0.02 sec) mysql> set hive_parquet_use_column_names=false; Query OK, 0 rows affected (0.00 sec) mysql> select * from tmp; +-------+------+ \| new_a \| b \| +-------+------+ \| 1 \| 2 \| \| 2 \| 4 \| +-------+------+ 2 rows in set (0.02 sec) ``` You can use `set parquet.column.index.access/orc.force.positional.evolution = true/false` in hive 3 to control the results of reading the table like these two session variables. However, for the rename struct inside column parquet table, the effects of hive and doris are different.	2024-08-05 09:06:49 +08:00
苏小刚	f7068b5658	[cherry-pick](branch-2.1) Make doris read hive text table parameters and behavior consistent with hive (#37840 ) ## Proposed changes pick from master https://github.com/apache/doris/pull/37638 <!--Describe your changes.-->	2024-07-16 22:24:50 +08:00
Mingyu Chen	81360cf897	[opt](test) shorten the external p0 running time (#37320 ) (#37473 ) bp #37320	2024-07-09 15:35:15 +08:00
Mingyu Chen	55636e8035	[test](migrate) move 3 cases from p2 to p0 (#36957 ) (#37264 ) bp #36957 Co-authored-by: slothever <18522955+wsjz@users.noreply.github.com>	2024-07-04 20:09:59 +08:00
Mingyu Chen	3613413a54	[fix](hive) support find serde info from both tbl properties and serde properties (#37043 ) (#37188 ) bp #37043	2024-07-04 13:55:38 +08:00
Jibing-Li	bf3ea1839c	[test]Mv external p2 test case to p0. (#37070 ) (#37140 ) backport: https://github.com/apache/doris/pull/37070	2024-07-04 11:19:31 +08:00
zy-kkk	a9f9113c48	[branch-2.1][test](external)move hive cases from p2 to p0 (#37149 ) pk (#36855) test_hive_same_db_table_name test_hive_special_char_partition test_complex_types test_wide_table	2024-07-03 19:44:52 +08:00
Mingyu Chen	e5695e058f	[test](migrate) move 2 cases from p2 to p0 (#36935 ) (#37200 ) bp #36935 Co-authored-by: zhangdong <493738387@qq.com>	2024-07-03 17:29:01 +08:00
Qi Chen	e857680661	[Migrate-Test](multi-catalog) Migrate p2 tests from p2 to p0. (#37175 ) Backport #36989.	2024-07-03 11:08:49 +08:00
wuwenchi	e7e1e967cf	[test](migrate) move 2 cases from p2 to p0 for 2.1 (#37139 ) pick #37004	2024-07-02 22:50:53 +08:00
Tiewei Fang	74086189d3	[test](tvf) move p2 tvf tests from p2 to p0 (#36871 ) (#37150 ) bp: #36871	2024-07-02 22:37:43 +08:00
Ashin Gau	cf86eb8647	[test](migrate) move test_hive_text_complex_type from p2 to p0 (#37007 ) (#37123 ) bp: #37007	2024-07-02 17:36:37 +08:00
Mingyu Chen	fcc26cc671	[test](migrate) move some cases from p2 to p0 (#36750 )(#36787 ) (#36922 ) bp #36750 and #36787	2024-06-27 20:59:50 +08:00
daidai	bc062a2595	[fix](orc)fix orc reader missing column. (#35735 ) ## Proposed changes bp #35583 Issue Number: close #xxx <!--Describe your changes.-->	2024-05-31 22:51:44 +08:00
Qi Chen	68eda58a8c	[Fix](multi-catalog) Fix string dict filtering when use null related function in parquet and orc reader. (#35335 ) The following sql and when the dictionary column contains functions related to null, the results will be incorrect. ``` select * from ( select IF(o_orderpriority IS NULL, 'null', o_orderpriority) AS o_orderpriority from test_string_dict_filter_orc ) as A where o_orderpriority = 'null'; ``` ``` select * from ( select IFNULL(o_orderpriority, 'null') AS o_orderpriority from test_string_dict_filter_parquet ) as A where o_orderpriority = 'null' ``` ``` select * from ( select COALESCE(o_orderpriority, 'null') AS o_orderpriority from test_string_dict_filter_parquet ) as A where o_orderpriority = 'null'; ```	2024-05-27 15:25:29 +08:00
Qi Chen	99af54f779	[Fix](orc-reader) Fix the issue when string col has mixed plain and dict encoding in different stripes. (#34146 ) (#34248 ) backport #34146	2024-04-28 19:43:57 +08:00
Qi Chen	acc2b532e7	[Test](hive-writer) Adjust test_hive_write_partitions regression test to resolve special characters issue with git on windows. (#34026 )	2024-04-26 15:05:47 +08:00

1 2

92 Commits