Commit Graph

227 Commits

Author SHA1 Message Date
3a1e95c6c2 branch-2.1: [improvement](jdbc catalog) Optimize the acquisition of indentity type in SQLServer (#51659)
pick #51285
2025-06-16 16:50:37 +08:00
ac65fed0ed branch-2.1: [fix](jdbc test) Add more connections to mysql docker #50970 (#51210)
Cherry-picked from #50970

Co-authored-by: zy-kkk <zhongyk10@gmail.com>
2025-05-24 17:46:23 +08:00
5c344ea043 branch-2.1: [opt](docker) add a script flag to control load data or not #51065 (#51083)
Cherry-picked from #51065

Co-authored-by: zgxme <zhenggaoxiong@selectdb.com>
2025-05-21 12:09:07 +08:00
13fbc9efa6 branch-2.1: [fix](hive) fix write hive partition by Doris #50864 (#50921)
Cherry-picked from #50864

Co-authored-by: Socrates <suxiaogang223@icloud.com>
2025-05-17 16:14:23 +08:00
48778eab4d branch-2.1: [fix](iceberg)Fix the inconsistency between the data in pg and the data in MinIO. #50578 (#50641)
Cherry-picked from #50578

Co-authored-by: wuwenchi <wuwenchi@selectdb.com>
2025-05-07 23:15:02 +08:00
4aff17f355 branch-2.1: [fix](docker hive3) hive server oom and not auto-restart #50456 (#50507)
Cherry-picked from #50456

Co-authored-by: Thearas <gaozifeng@selectdb.com>
2025-05-03 22:44:29 +08:00
0710d9b2d6 branch-2.1: [fix](orc) Should not pass selection vector when decode child column of List or Map #50136 (#50316)
bp: #50136
2025-04-25 09:04:06 +08:00
94986fc574 branch-2.1: [fix](multi-catalog) Fix bug: "Can not create a Path from an empty string" (#49382) (#49641)
### What problem does this PR solve?
Problem Summary:
In HiveMetaStoreCache, the function FileInputFormat.setInputPaths is
used to set input paths. However, this function splits paths using
commas, which is not the expected behavior. As a result, when partition
values contain commas, it leads to incorrect path parsing and potential
errors.
```java
  public static void setInputPaths(JobConf conf, String org.apache.hadoop.shaded.com.aSeparatedPaths) {
    setInputPaths(conf, StringUtils.stringToPath(
                        getPathStrings(org.apache.hadoop.shaded.com.aSeparatedPaths)));
  }
```
To prevent FileInputFormat.setInputPaths from splitting paths by commas,
we use another overloaded version of the method. Instead of passing a
comma-separated string, we explicitly pass a Path object, ensuring that
partition values containing commas are handled correctly.
```java
  public static void setInputPaths(JobConf conf, Path... inputPaths) {
    Path path = new Path(conf.getWorkingDirectory(), inputPaths[0]);
    StringBuffer str = new StringBuffer(StringUtils.escapeString(path.toString()));
    for(int i = 1; i < inputPaths.length;i++) {
      str.append(StringUtils.COMMA_STR);
      path = new Path(conf.getWorkingDirectory(), inputPaths[i]);
      str.append(StringUtils.escapeString(path.toString()));
    }
    conf.set(org.apache.hadoop.shaded.org.apache.hadoop.mapreduce.lib.input.
      FileInputFormat.INPUT_DIR, str.toString());
  }
```

### Release note

None
2025-03-29 09:13:43 +08:00
676b868d99 branch-2.1:[opt](docker) Add ranger docker component (#47697) (#48359)
### What problem does this PR solve?
bp  https://github.com/apache/doris/pull/47697

### Release note

None

### Check List (For Author)

- Test <!-- At least one of them must be included. -->
    - [ ] Regression test
    - [ ] Unit Test
    - [ ] Manual test (add detailed scripts or steps below)
    - [x] No need to test or manual test. Explain why:
- [ ] This is a refactor/code format and no logic has been changed.
        - [x] Previous test can cover this change.
        - [ ] No code files have been changed.
        - [ ] Other reason <!-- Add your reason?  -->

- Behavior changed:
    - [x] No.
    - [ ] Yes. <!-- Explain the behavior change -->

- Does this need documentation?
    - [x] No.
- [ ] Yes. <!-- Add document PR link here. eg:
https://github.com/apache/doris-website/pull/1214 -->

### Check List (For Reviewer who merge this PR)

- [ ] Confirm the release note
- [ ] Confirm test cases
- [ ] Confirm document
- [ ] Add branch pick label <!-- Add branch pick label that this PR
should merge into -->
2025-02-27 09:47:25 +08:00
fb31586612 branch-2.1: [test](jdbc catalog) add more jdbc catalog extreme test (#47799)
cherry-pick (#47525)
2025-02-14 17:03:49 +08:00
226f848ad8 branch-2.1: [fix](hive docker)Table partition_location_1 miss data #47539 (#47559)
Cherry-picked from #47539

Co-authored-by: Thearas <gaozifeng@selectdb.com>
2025-02-07 11:21:47 +08:00
af55eba242 branch-2.1: [opt](hive docker)Exit on creating table failed #47390 (#47453) 2025-01-26 17:28:20 +08:00
7c9d64d79a [opt](iceberg docker)Add health check for iceberg rest container (#46767) (#47422) 2025-01-25 09:04:27 +08:00
5f2438aeab branch-2.1: [opt](docker)Add healthy check for ES and Kafka #47362 (#47414)
Cherry-picked from #47362

Co-authored-by: Thearas <gaozifeng@selectdb.com>
2025-01-25 09:00:50 +08:00
407d04fab5 branch-2.1: [opt](docker)Replace healthy container with --wait #47357 (#47421)
Cherry-picked from #47357

Co-authored-by: Thearas <gaozifeng@selectdb.com>
2025-01-25 08:31:15 +08:00
baaf026e82 [fix](hive docker)Reserve host port for hive2 namenode and datanode (#47262) (#47354)
Problem Summary:

The [External hive

CI](http://43.132.222.7:8111/buildConfiguration/Doris_External_Regression/612304?buildTab=log&linesState=3650&logView=flowAware)
failed because of `namenode` error( 50070 port already in used), docker
logs:
```txt
2025-01-21T04:22:37.955682469Z java.net.BindException: Port in use: 0.0.0.0:50070
2025-01-21T04:22:37.955686106Z 	at org.apache.hadoop.http.HttpServer2.openListeners(HttpServer2.java:940)
2025-01-21T04:22:37.955689402Z 	at org.apache.hadoop.http.HttpServer2.start(HttpServer2.java:876)
2025-01-21T04:22:37.955692708Z 	at org.apache.hadoop.hdfs.server.namenode.NameNodeHttpServer.start(NameNodeHttpServer.java:142)
2025-01-21T04:22:37.955697828Z 	at org.apache.hadoop.hdfs.server.namenode.NameNode.startHttpServer(NameNode.java:760)
2025-01-21T04:22:37.955701444Z 	at org.apache.hadoop.hdfs.server.namenode.NameNode.initialize(NameNode.java:639)
2025-01-21T04:22:37.955704831Z 	at org.apache.hadoop.hdfs.server.namenode.NameNode.<init>(NameNode.java:819)
2025-01-21T04:22:37.955708237Z 	at org.apache.hadoop.hdfs.server.namenode.NameNode.<init>(NameNode.java:803)
2025-01-21T04:22:37.955711674Z 	at org.apache.hadoop.hdfs.server.namenode.NameNode.createNameNode(NameNode.java:1500)
2025-01-21T04:22:37.955715090Z 	at org.apache.hadoop.hdfs.server.namenode.NameNode.main(NameNode.java:1566)
2025-01-21T04:22:37.955718446Z Caused by: java.net.BindException: Address already in use
2025-01-21T04:22:37.955722013Z 	at sun.nio.ch.Net.bind0(Native Method)
2025-01-21T04:22:37.955725460Z 	at sun.nio.ch.Net.bind(Net.java:433)
2025-01-21T04:22:37.955729227Z 	at sun.nio.ch.Net.bind(Net.java:425)
2025-01-21T04:22:37.955733074Z 	at sun.nio.ch.ServerSocketChannelImpl.bind(ServerSocketChannelImpl.java:223)
2025-01-21T04:22:37.955736600Z 	at sun.nio.ch.ServerSocketAdaptor.bind(ServerSocketAdaptor.java:74)
2025-01-21T04:22:37.955740197Z 	at org.mortbay.jetty.nio.SelectChannelConnector.open(SelectChannelConnector.java:216)
2025-01-21T04:22:37.955743884Z 	at org.apache.hadoop.http.HttpServer2.openListeners(HttpServer2.java:934)
2025-01-21T04:22:37.955747391Z 	... 8 more
2025-01-21T04:22:37.961686454Z 25/01/21 04:22:37 INFO util.ExitUtil: Exiting with status 1
```

The best choice is avoid the services using server port at range
`/proc/sys/net/ipv4/ip_local_port_range` (32768-60999). But since the
namenode [hardcode exposing port `50070` in docker
image](https://hub.docker.com/layers/bde2020/hadoop-datanode/2.0.0-hadoop2.7.4-java8/images/sha256-5623fca5e36d890983cdc6cfd29744d1d65476528117975b3af6a80d99b3c62f),
so we add the port to `net.ipv4.ip_local_reserved_ports` and introduce a
new flags `--reserve-ports` to control it (default false, because not
everyone want to modify system reserved ports).

Change-Id: I03a81e9931cb555695199436b6f0517cccf83588
2025-01-24 16:12:03 +08:00
3aad9e5f67 [opt](oceanbase docker)Use LTS docker image and print unhealthy docker logs (#46647) (#47349)
### What problem does this PR solve?

Problem Summary:
Oceanbase container sometimes start failed.
<img width="653" alt="image"

src="https://github.com/user-attachments/assets/d95c66cf-7e04-4179-a565-9b9dd8b87128"
/>

We do two things:
1. Print last 100 lines docker logs of unhealthy container for debugging
2. Upgrade Oceanbase docker image to the newest `4.2.1-lts`, since it is
7 months newer than `4.2.1`, more stable
2025-01-24 11:22:02 +08:00
4bd55b2f8b branch-2.1: [Opt](external-docker) Modify kerberos network mode to host #47043 (#47095)
Cherry-picked from #47043

Co-authored-by: zgxme <zhenggaoxiong@selectdb.com>
2025-01-16 23:12:05 +08:00
13fa4ea2ee branch-2.1 [Opt](docker) kerberos docker healthy check (#46662) (#46858)
#46662

Co-authored-by: zgxme <zhenggaoxiong@selectdb.com>
2025-01-13 15:38:17 +08:00
c016eb49c5 [enhance](mtmv)When obtaining the partition list fails, treat the pai… (#46708)
…mon table as an unpartitioned table  (#46641)

pick: https://github.com/apache/doris/pull/46641
2025-01-10 10:46:09 +08:00
72cdedc47f branch-2.1: [opt](iceberg docker)Use PostgreSQL as the backend for the Iceberg REST server. #46289 (#46576)
Cherry-picked from #46289

Co-authored-by: wuwenchi <wuwenchi@selectdb.com>
2025-01-09 22:30:03 +08:00
eddea8b309 [opt](hive docker)Parallel put hive data (#46571) (#46682)
Problem Summary:
Parallel put `tpch1.db`, `paimon1` and `tvf_data` hive data. Reduce the
time cost from 22m to 16m on 16C machine.

Change-Id: Ib75c57d397ce1f96d5108d4b570bcb215f31d421
2025-01-09 14:08:35 +08:00
3bc70876c4 branch-2.1: [fix](test) Optimize the health check after oceanbase docker starts #46434 (#46599)
Cherry-picked from #46434

Co-authored-by: zy-kkk <zhongyk10@gmail.com>
2025-01-08 20:29:40 +08:00
4d0037a928 branch-2.1: [fix](ES catalog)Fix query long value exception with doc_value #46554 (#46581)
Cherry-picked from #46554

Co-authored-by: qiye <luen@selectdb.com>
2025-01-08 15:26:58 +08:00
5d2930e783 [fix](shellcheck) fix hive-metastore and enable shellcheck in docker (#46496) (#46574)
cherry-pick (#46496)

Co-authored-by: Socrates <suyiteng@selectdb.com>
2025-01-08 11:10:34 +08:00
d8c94d6392 branch-2.1: [fix](regression)fix hive translation unstable case. #46385 (#46409)
Cherry-picked from #46385

Co-authored-by: daidai <changyuwei@selectdb.com>
2025-01-04 08:59:56 +08:00
02239e4fb2 branch-2.1: [chore](regression) do not hard code S3 bucket and endpoint of hive t… #46159 (#46169)
Cherry-picked from #46159

Co-authored-by: zgxme <zhenggaoxiong@selectdb.com>
2024-12-31 11:44:36 +08:00
6dd92be33d [feature](statistics)Support get row count for pg and sql server. (#42674) (#46131)
backport: https://github.com/apache/doris/pull/42674
2024-12-29 19:37:21 +08:00
a380f5d222 [enchement](utf8)import enable_text_validate_utf8 session var (#45537) (#46070)
bp #45537
2024-12-28 10:05:03 +08:00
303557ac70 [fix](hive)fix hive insert only translaction table. (#45753)
### What problem does this PR solve?
bp #44001 , but no hive4 acid table.

Problem Summary:
1. Fixed the issue that when reading insert translaction only tables,
there was no acid check, which caused multiple data reads (i.e., reading
data from the previous base_n).
2. Forbidden to create, insert data, and delete aicd tables.
2024-12-22 21:23:21 +08:00
19c0e89da7 [enchement](iceberg)support read iceberg partition evolution table. (#45367) (#45569)
cherry-pick #45367

Co-authored-by: daidai <changyuwei@selectdb.com>
2024-12-20 08:56:51 +08:00
7d32e4f71f branch-2.1: [Fix](ORC) Not push down fixed char type in orc reader #45484 (#45525)
cherry-pick #45484
2024-12-19 14:06:00 +08:00
ea24410faf [enhancement][docker] fix kafka docker issue (#45091) 2024-12-06 14:36:57 +08:00
702abbff0f [Opt](orc)Optimize the merge io when orc reader read multiple tiny stripes. (#42004) (#44239)
bp #42004

Co-authored-by: kaka11chen <kaka11.chen@gmail.com>
2024-11-22 11:01:41 +08:00
3136fa48a6 branch-2.1: [chore](ci) adjust some invalid url #44261 (#44270)
Cherry-picked from #44261

Co-authored-by: Dongyang Li <lidongyang@selectdb.com>
2024-11-19 19:28:04 +08:00
83b74827aa branch-2.1: [fix](iceberg)Fix count(*) error with dangling delete problem #44039 (#44101)
Cherry-picked from #44039

Co-authored-by: wuwenchi <wuwenchi@selectdb.com>
2024-11-19 17:19:25 +08:00
efb3bdd96e [fix](test) fix clickhouse jdbc catalog func push down case #43196 (#44151)
cherry pick from #43196

Co-authored-by: zy-kkk <zhongyk10@gmail.com>
2024-11-18 18:03:10 +08:00
48e33bfb2a branch-2.1: [fix](hive)Fixed the issue of reading hive table with empty lzo files #43979 (#44063)
Cherry-picked from #43979

Co-authored-by: wuwenchi <wuwenchi@selectdb.com>
2024-11-16 16:14:50 +08:00
4531cd86e3 branch-2.1: [fix](regression-test) add checks for existence and successful upload of data files in hive-metastore.sh #43853 (#43888)
Cherry-picked from #43853

Co-authored-by: Socrates <suyiteng@selectdb.com>
2024-11-14 11:23:23 +08:00
a1ff02288f branch-2.1: [fix](hive) support query hive view created by spark (#43553)
Cherry-picked from #43530

Co-authored-by: Mingyu Chen (Rayner) <morningman@163.com>
Co-authored-by: morningman <yunyou@selectdb.com>
2024-11-11 23:28:53 +08:00
cdd32d9582 [enhance](hive) support reading hive table with OpenCSVSerde #42257 (#42940)
cherry pick from #42257

Co-authored-by: Socrates <suxiaogang223@icloud.com>
2024-10-31 11:12:07 +08:00
fce4695f37 [Configuration](transactional-hive) Add skip_checking_acid_version_file session var to skip checking acid version file in some hive envs. (#42111)(#42225) (#42939)
cherry-pick (#42111)(#42225)

---------

Co-authored-by: Qi Chen <kaka11.chen@gmail.com>
2024-10-31 09:52:20 +08:00
2defa90be7 [test](ES Catalog)Add mapping _routing test case (#42074) (#42282)
## Proposed changes

bp #42074
2024-10-23 10:14:12 +08:00
157d67e7ca [enhance](hive) Add regression-test cases for hive text ddl and hive text insert and fix reading null string bug #42200 (#42273)
cherry pick from #42200

Co-authored-by: Socrates <suxiaogang223@icloud.com>
2024-10-22 23:56:57 +08:00
38e529cd29 [cherry-pick](branch-2.1) support decimal256 for parquet reader (#42241)
## Proposed changes
pick pr: https://github.com/apache/doris/pull/41526
2024-10-22 19:42:09 +08:00
c1d2b8d548 [2.1][improvement](jdbc catalog) Disallow non-constant type conversion pushdown and implicit conversion pushdown (#42242)
pick (#42102)

Add a variable `enable_jdbc_cast_predicate_push_down`, the default value
is false, which prohibits the pushdown of non-constant predicates with
type conversion and all predicates with implicit conversion. This change
can prevent the wrong predicates from being pushed down to the Jdbc data
source, resulting in query data errors, because the predicates with cast
were not correctly pushed down to the data source before. If you find
that the data is read correctly and the performance is better before
this change, you can manually set this variable to true

```
| Expression                                          | Can Push Down |
|-----------------------------------------------------|---------------|
| column type equals const type                       | Yes           |
| column type equals cast const type                  | Yes           |
| cast column type equals const type                  | No            |
| cast column type equals cast const type             | No            |
| column type not equals column type                  | No            |
| column type not equals cast const type              | No            |
| cast column type not equals const type              | No            |
| cast column type not equals cast const type         | No            |

```
2024-10-22 17:27:29 +08:00
a32ad0b1f7 [cherry-pick](branch-2.1) support reading brotli compressed parquet file (#42162)
pick pr: https://github.com/apache/doris/pull/41875
2024-10-21 16:48:09 +08:00
a150d160ea [fix](jdbc catalog) fix and add mysql and doris extremum test #41679 (#42122)
cherry pick from #41679

---------

Co-authored-by: zy-kkk <zhongyk10@gmail.com>
2024-10-21 16:39:40 +08:00
1b901f6fcc [cherry-pick](branch-2.1) add parquet tvf cases and fix some parquet bug (#41931)
## Proposed changes
pick pr:
  https://github.com/apache/doris/pull/41683
  https://github.com/apache/doris/pull/41506
  https://github.com/apache/doris/pull/41338
  https://github.com/apache/doris/pull/39326

---------

Co-authored-by: morningman <morningman@163.com>
2024-10-17 14:20:58 +08:00
4888c632f4 [cherry-pick](branch2.1) support escape.delim and serialization.null.format for hive text (#41684)
## Proposed changes
pick from master:
https://github.com/apache/doris/pull/40291
2024-10-15 00:08:23 +08:00