Commit Graph

302 Commits

Author SHA1 Message Date
5d2930e783 [fix](shellcheck) fix hive-metastore and enable shellcheck in docker (#46496) (#46574)
cherry-pick (#46496)

Co-authored-by: Socrates <suyiteng@selectdb.com>
2025-01-08 11:10:34 +08:00
d8c94d6392 branch-2.1: [fix](regression)fix hive translation unstable case. #46385 (#46409)
Cherry-picked from #46385

Co-authored-by: daidai <changyuwei@selectdb.com>
2025-01-04 08:59:56 +08:00
02239e4fb2 branch-2.1: [chore](regression) do not hard code S3 bucket and endpoint of hive t… #46159 (#46169)
Cherry-picked from #46159

Co-authored-by: zgxme <zhenggaoxiong@selectdb.com>
2024-12-31 11:44:36 +08:00
6dd92be33d [feature](statistics)Support get row count for pg and sql server. (#42674) (#46131)
backport: https://github.com/apache/doris/pull/42674
2024-12-29 19:37:21 +08:00
a380f5d222 [enchement](utf8)import enable_text_validate_utf8 session var (#45537) (#46070)
bp #45537
2024-12-28 10:05:03 +08:00
e8be023b35 [typo](docker) Adjust the indentation format of the init_be and entry_point scripts, as well as the duration of loop execution(Merge-2.1). (#45860)
### What problem does this PR solve?

Issue Number: close #xxx

Related PR: #xxx

Master PR: #45308 

Problem Summary:

Adjust the indentation format of the `init_be` and `entry_point`
scripts, as well as the duration of loop execution.
Adjust the smallest unit of all indentations to a single tab character,
and modify the loop duration when checking the BE startup status,
changing both from 300 seconds to 30 seconds to speed up the overall
Docker startup time.
2024-12-25 12:16:42 +08:00
980f28f9e2 [feat](docker)Modify the init_be and start_be scripts to meet the requirements for rapid Docker startup(Merge 2.1). (#45858)
### What problem does this PR solve?

Issue Number: close #xxx

Related PR: #45267

Master PR: #45269

Problem Summary:

To meet the needs of rapid Docker startup, I have made adjustments to
two related scripts in the Docker startup process. First, I added a env
`SKIP_CHECK_ULIMIT` to the `start_be.sh` script, which will skip the
size checks for `swap`, `ulimit`, and `max_map_count`. At the same time,
I used `--console` to start the process and print logs. The reason why I
did not use the `--daemon` daemon command to execute is that starting
with a foreground log printing method in a Docker container is the
correct and reliable approach.

At the same time, I added a check logic for a `be.conf` configuration
item in the `init_be.sh` script: if it is the first time starting,
append the export `SKIP_CHECK_ULIMIT=true` to skip the `ulimit` value
check in the BE process. In summary, these adjustments can meet the
basic requirements for rapid Docker startup usage.
2024-12-25 12:11:14 +08:00
303557ac70 [fix](hive)fix hive insert only translaction table. (#45753)
### What problem does this PR solve?
bp #44001 , but no hive4 acid table.

Problem Summary:
1. Fixed the issue that when reading insert translaction only tables,
there was no acid check, which caused multiple data reads (i.e., reading
data from the previous base_n).
2. Forbidden to create, insert data, and delete aicd tables.
2024-12-22 21:23:21 +08:00
19c0e89da7 [enchement](iceberg)support read iceberg partition evolution table. (#45367) (#45569)
cherry-pick #45367

Co-authored-by: daidai <changyuwei@selectdb.com>
2024-12-20 08:56:51 +08:00
7d32e4f71f branch-2.1: [Fix](ORC) Not push down fixed char type in orc reader #45484 (#45525)
cherry-pick #45484
2024-12-19 14:06:00 +08:00
ea24410faf [enhancement][docker] fix kafka docker issue (#45091) 2024-12-06 14:36:57 +08:00
11c517fe1e [enhancement][docker]update routine docker file (#45048) 2024-12-05 17:27:44 +08:00
702abbff0f [Opt](orc)Optimize the merge io when orc reader read multiple tiny stripes. (#42004) (#44239)
bp #42004

Co-authored-by: kaka11chen <kaka11.chen@gmail.com>
2024-11-22 11:01:41 +08:00
3136fa48a6 branch-2.1: [chore](ci) adjust some invalid url #44261 (#44270)
Cherry-picked from #44261

Co-authored-by: Dongyang Li <lidongyang@selectdb.com>
2024-11-19 19:28:04 +08:00
83b74827aa branch-2.1: [fix](iceberg)Fix count(*) error with dangling delete problem #44039 (#44101)
Cherry-picked from #44039

Co-authored-by: wuwenchi <wuwenchi@selectdb.com>
2024-11-19 17:19:25 +08:00
efb3bdd96e [fix](test) fix clickhouse jdbc catalog func push down case #43196 (#44151)
cherry pick from #43196

Co-authored-by: zy-kkk <zhongyk10@gmail.com>
2024-11-18 18:03:10 +08:00
48e33bfb2a branch-2.1: [fix](hive)Fixed the issue of reading hive table with empty lzo files #43979 (#44063)
Cherry-picked from #43979

Co-authored-by: wuwenchi <wuwenchi@selectdb.com>
2024-11-16 16:14:50 +08:00
4531cd86e3 branch-2.1: [fix](regression-test) add checks for existence and successful upload of data files in hive-metastore.sh #43853 (#43888)
Cherry-picked from #43853

Co-authored-by: Socrates <suyiteng@selectdb.com>
2024-11-14 11:23:23 +08:00
a1ff02288f branch-2.1: [fix](hive) support query hive view created by spark (#43553)
Cherry-picked from #43530

Co-authored-by: Mingyu Chen (Rayner) <morningman@163.com>
Co-authored-by: morningman <yunyou@selectdb.com>
2024-11-11 23:28:53 +08:00
1ac00ea983 branch-2.1: [feat](doris compose) Copy lastest compose code from master branch (#43464)
Copy lastest code from master branch to support run docker suites
without external doris cluster, enable jvm debug port, ..., etc.
2024-11-08 09:47:19 +08:00
cdd32d9582 [enhance](hive) support reading hive table with OpenCSVSerde #42257 (#42940)
cherry pick from #42257

Co-authored-by: Socrates <suxiaogang223@icloud.com>
2024-10-31 11:12:07 +08:00
fce4695f37 [Configuration](transactional-hive) Add skip_checking_acid_version_file session var to skip checking acid version file in some hive envs. (#42111)(#42225) (#42939)
cherry-pick (#42111)(#42225)

---------

Co-authored-by: Qi Chen <kaka11.chen@gmail.com>
2024-10-31 09:52:20 +08:00
2defa90be7 [test](ES Catalog)Add mapping _routing test case (#42074) (#42282)
## Proposed changes

bp #42074
2024-10-23 10:14:12 +08:00
157d67e7ca [enhance](hive) Add regression-test cases for hive text ddl and hive text insert and fix reading null string bug #42200 (#42273)
cherry pick from #42200

Co-authored-by: Socrates <suxiaogang223@icloud.com>
2024-10-22 23:56:57 +08:00
38e529cd29 [cherry-pick](branch-2.1) support decimal256 for parquet reader (#42241)
## Proposed changes
pick pr: https://github.com/apache/doris/pull/41526
2024-10-22 19:42:09 +08:00
c1d2b8d548 [2.1][improvement](jdbc catalog) Disallow non-constant type conversion pushdown and implicit conversion pushdown (#42242)
pick (#42102)

Add a variable `enable_jdbc_cast_predicate_push_down`, the default value
is false, which prohibits the pushdown of non-constant predicates with
type conversion and all predicates with implicit conversion. This change
can prevent the wrong predicates from being pushed down to the Jdbc data
source, resulting in query data errors, because the predicates with cast
were not correctly pushed down to the data source before. If you find
that the data is read correctly and the performance is better before
this change, you can manually set this variable to true

```
| Expression                                          | Can Push Down |
|-----------------------------------------------------|---------------|
| column type equals const type                       | Yes           |
| column type equals cast const type                  | Yes           |
| cast column type equals const type                  | No            |
| cast column type equals cast const type             | No            |
| column type not equals column type                  | No            |
| column type not equals cast const type              | No            |
| cast column type not equals const type              | No            |
| cast column type not equals cast const type         | No            |

```
2024-10-22 17:27:29 +08:00
a32ad0b1f7 [cherry-pick](branch-2.1) support reading brotli compressed parquet file (#42162)
pick pr: https://github.com/apache/doris/pull/41875
2024-10-21 16:48:09 +08:00
a150d160ea [fix](jdbc catalog) fix and add mysql and doris extremum test #41679 (#42122)
cherry pick from #41679

---------

Co-authored-by: zy-kkk <zhongyk10@gmail.com>
2024-10-21 16:39:40 +08:00
1b901f6fcc [cherry-pick](branch-2.1) add parquet tvf cases and fix some parquet bug (#41931)
## Proposed changes
pick pr:
  https://github.com/apache/doris/pull/41683
  https://github.com/apache/doris/pull/41506
  https://github.com/apache/doris/pull/41338
  https://github.com/apache/doris/pull/39326

---------

Co-authored-by: morningman <morningman@163.com>
2024-10-17 14:20:58 +08:00
4888c632f4 [cherry-pick](branch2.1) support escape.delim and serialization.null.format for hive text (#41684)
## Proposed changes
pick from master:
https://github.com/apache/doris/pull/40291
2024-10-15 00:08:23 +08:00
4f81fc474c [bugfix](paimon)Get the file format by file name (#41020) (#41487)
bp #41020
2024-09-30 15:46:13 +08:00
0b4552f74b [cherry-pick](branch-2.1) pick hive text write from master (#40537)
## Proposed changes
pick prs:
https://github.com/apache/doris/pull/38549
https://github.com/apache/doris/pull/40183
https://github.com/apache/doris/pull/40315

---------

Co-authored-by: Calvin Kirs <kirs@apache.org>
2024-09-27 20:57:07 +08:00
c744eb87c5 [fix](regression)fix some regression test (#40928) (#41046)
bp #40928
2024-09-20 18:17:44 +08:00
5f583fa329 [branch-2.1][test](jdbc catalog) add oceanbase ce jdbc catalog test (#40978)
pick #34972)
2024-09-19 22:11:24 +08:00
7bb9ca91c8 [branch-2.1](fix) adjust data download url about hive docker (#40846)
## Proposed changes

fix paimon regression test

Co-authored-by: Dongyang Li <hello_stephen@qq.com>
Co-authored-by: stephen <hello-stephen@qq.com>
2024-09-14 23:19:54 +08:00
8708fae420 [fix](ES Catalog)Support parse single value for array column (#40614) (#40660)
bp #40614
2024-09-11 17:26:48 +08:00
8104b992d1 [fix](ES Catalog)Do not extract doc_values of field with ignore_above setting (#40314) (#40464)
bp #40314
2024-09-06 16:25:30 +08:00
ca07a00c93 Revert "[branch-2.1](hive) support hive write text table (#38549) (#4… (#40157)
…0063)"

This reverts commit c6df7c21a3c09ae1664deabacb88dfcea9d94b68.

## Proposed changes

Issue Number: close #xxx

<!--Describe your changes.-->

Co-authored-by: yiguolei <yiguolei@gmail.com>
2024-08-30 10:25:38 +08:00
c6df7c21a3 [branch-2.1](hive) support hive write text table (#38549) (#40063)
1. Support write hive text table
2. Add SessionVariable `hive_text_compression` to write compressed hive
text table
3. Supported compression type: gzip, bzip2, snappy, lz4, zstd

pick from https://github.com/apache/doris/pull/38549
2024-08-29 16:50:40 +08:00
b9da934b16 [fix](hive) report error with escape char and null format (#39700) (#39869)
bp #39700

Co-authored-by: Socrates <suxiaogang223@icloud.com>
2024-08-24 09:23:03 +08:00
cf698fb615 [fix](regression) fix some jdbc datasource docker health check (#39141) (#39872) 2024-08-24 03:29:18 +08:00
508c7a7040 [fix](hive)Modify the Hive notification event processing method when using meta cache and add parameters to the Hive catalog. (#39239) (#39865)
bp #39239

Co-authored-by: daidai <2017501503@qq.com>
2024-08-23 23:21:02 +08:00
40a58b9e42 [branch-2.1][regression test](jdbc catalog) Enable CLICKHOUSE_ALWAYS_RUN_INITDB_SCRIPTS for clickhouse docker (#39667)
pick (#39425) #39693
2024-08-23 09:59:03 +08:00
27ba2542e2 [case](iceberg)append iceberg schema change case. (#38766) (#39630)
bp #38766
2024-08-21 09:17:12 +08:00
2948b5ea2b [branch-2.1][fix](jdbc scan) Remove the conjuncts.remove call in JdbcScan (#39407)
pick (#39180)

In #37565, due to the change in the calling order of finalize, the final
generated Plan will be missing the PREDICATES that have been pushed down
in Jdbc. Although this behavior is correct, before perfectly handling
the push down of various PREDICATES, we need to keep all conjuncts to
ensure that we can still filter data normally when the data returned by
Jdbc is a superset.
2024-08-16 19:01:40 +08:00
43cc8d648d [fix](ES Catalog)Check isArray before parse json to array (#39104) (#39273)
## Proposed changes

bp #39104
2024-08-13 15:13:40 +08:00
3da2d1c9d6 [bug](parquet)Fix the problem that the parquet reader reads the missing sub-columns of the struct and fails. (#38718) (#39192)
bp #38718
2024-08-11 20:37:40 +08:00
607c0b82a9 [opt](serde)Optimize the filling of fixed values ​​into block columns without repeated deserialization. (#37377) (#38245) (#38810)
## Proposed changes
pick pr: #38575  and fix this pr bug :  #38245
2024-08-05 09:13:08 +08:00
5d02c48715 [feature](hive)Support reading renamed Parquet Hive and Orc Hive tables. (#38432) (#38809)
bp #38432 

## Proposed changes
Add `hive_parquet_use_column_names` and `hive_orc_use_column_names`
session variables to read the table after rename column in `Hive`.

These two session variables are referenced from
`parquet_use_column_names` and `orc_use_column_names` of `Trino` hive
connector.

By default, these two session variables are true. When they are set to
false, reading orc/parquet will access the columns according to the
ordinal position in the Hive table definition.

For example:
```mysql
in Hive :
hive> create table tmp (a int , b string) stored as parquet;
hive> insert into table tmp values(1,"2");
hive> alter table tmp  change column  a new_a int;
hive> insert into table tmp values(2,"4");

in Doris :
mysql> set hive_parquet_use_column_names=true;
Query OK, 0 rows affected (0.00 sec)

mysql> select  * from tmp;
+-------+------+
| new_a | b    |
+-------+------+
|  NULL | 2    |
|     2 | 4    |
+-------+------+
2 rows in set (0.02 sec)

mysql> set hive_parquet_use_column_names=false;
Query OK, 0 rows affected (0.00 sec)

mysql> select  * from tmp;
+-------+------+
| new_a | b    |
+-------+------+
|     1 | 2    |
|     2 | 4    |
+-------+------+
2 rows in set (0.02 sec)
```

You can use `set
parquet.column.index.access/orc.force.positional.evolution = true/false`
in hive 3 to control the results of reading the table like these two
session variables. However, for the rename struct inside column parquet
table, the effects of hive and doris are different.
2024-08-05 09:06:49 +08:00
c0caca7c55 [fix](ES Catalog)Fix unstable test test_es_query (#38801) (#38802)
## Proposed changes

bp #38801
2024-08-03 23:49:00 +08:00