Commit Graph

22 Commits

Author SHA1 Message Date
e898dbbba0 branch-2.1: [fix](mc)Fixed the issue that maxcompute catalog can only read part of the timestamp data #49600 (#49706)
Cherry-picked from #49600

Co-authored-by: daidai <changyuwei@selectdb.com>
2025-04-01 17:09:15 +08:00
738dbf0347 branch-2.1: [enhancement](maxcompute)support maxcompute timestamp column type. #48768 (#48842)
Cherry-picked from #48768

Co-authored-by: daidai <changyuwei@selectdb.com>
2025-03-10 17:38:20 +08:00
6f8d2bfa88 branch-2.1: [enchement](mc)opt maxcompute read arrow skip repeated check of isNull #45989 (#46023)
Cherry-picked from #45989

Co-authored-by: daidai <changyuwei@selectdb.com>
2024-12-26 20:54:19 +08:00
b332217584 [enchement](mc)mc catalog append netowrk config (#44194) (#45149)
bp #44194
2024-12-07 09:52:19 -08:00
8c0f73cb90 [Enhancement](MaxCompute)Refactoring maxCompute catalog using Storage API.(#40225 , #40888 ,#41386 ) (#41610)
bp #40225 , #40888 ,#41386

## Proposed changes
Among them, #40225 is the new api of mc,
#40888 is used to fix the bug when reading null between the new and old
apis,
#41386 is used for compatibility between the new and old versions
2024-10-11 11:55:41 +08:00
e4c48cfec5 [fix](multi-catalog)fix max compute null parts table read (#40046) (#40179)
bp #40046

Co-authored-by: slothever <18522955+wsjz@users.noreply.github.com>
2024-08-30 22:56:08 +08:00
67a8099991 [fix](multi-catalog)fix max compute array and map type read offset (#39822)
bp #39680
2024-08-23 16:53:52 +08:00
9c6180d9ba [revert](jni) revert part of #32455 #32904 2024-03-27 20:45:44 +08:00
c0d7a5660e [fix](paimon) support paimon with hive2 (#32455)
In order to support paimon with hive2, we need to modify the origin HiveMetastoreClient.java
to let it compatible with both hive2 and hive3.
And this modified HiveMetastoreClient should be at the front of the CLASSPATH, so that
it can overwrite the HiveMetastoreClient in hadoop jar.

This PR mainly changes:

1. Copy HiveMetastoreClient.java in FE to BE's preload jar.

2. Split the origin `preload-extensions-jar-with-dependencies.jar` into 2 jars
    1. `preload-extensions-project.jar`, which contains the modified HiveMetastoreClient.
    2. `preload-extensions-jar-with-dependencies.jar`, which contains other dependency jars.

3. Modify the `start_be.sh`, to let `preload-extensions-project.jar` be loaded first.

4. Change the way the assemble the jni scanner jar
    Only need to assemble the project jar, without other dependencies.
    Because actually we only use classed under `org.apache.doris` package.
    So remove other unused dependency jars can also reduce the output size of BE.

5. fix bug that the prefix of paimon properties should be `paimon.`, not `paimon`

6. Support paimon with hive2
    User can set `hive.version` in paimon catalog properties to specify the hive version.
2024-03-26 15:31:07 +08:00
4c8aaa156a [fix](jni) remove 'push_down_predicates' and fix BE crash with decimal predicate (#32253) (#32599) 2024-03-21 14:07:50 +08:00
0b5b7175d6 [fix](multi-catalog) add max compute custom odps and tunnel url (#31390)
add max compute custom odps and tunnel url
2024-02-29 16:44:40 +08:00
1706699e7e [fix](multi-catalog)support the max compute partition prune (#27154)
1. max compute partition prune,
we just support filter mc partitions by '=',it can filter just one partition
to support multiple partition filter and range operator('>','<', '>='..), the partition prune should be supported.

2. add max compute row count cache and partitionValues cache

3. add max compute regression case
2023-12-01 22:28:26 +08:00
add6bdb240 [fix](multi-catalog)add the max compute fe ut and fix download expired (#27007)
1. add the max compute fe ut and fix download expired
2. solve memery leak when allocator close
3. add correct partition rows
2023-11-20 10:42:07 +08:00
18c2a13e09 [fix](multi-catalog)fix maxcompute partition filter and session creation (#24911)
add maxcompute partition support
fix maxcompute partition filter
modify maxcompute session create method
2023-10-17 22:36:10 +08:00
4816ca6679 [fix](multi-catalog)fix mc decimal type parse, fix wrong obj location (#24242)
1. mc decimal type need parse correctly by arrow vector method
2. fix wrong obj location if use oss,obs,cosn

Will add test case in another PR
2023-09-15 17:44:56 +08:00
fca34ec337 [fix](multi-catalog)support bit type and hidden mc secret key (#24124)
support max compute bit type and mask mc secret key
bool type will use bit arrow vector
should mask secret key: close #24019
2023-09-12 10:36:48 +08:00
7d488688b4 [fix](multi-catalog)fix minio default region and throw minio error msg, support s3 bucket root path (#21994)
1. check minio region, set default region if user region is not provided, and throw minio error msg
2. support read root path s3://bucket1
3. fix max compute public access
2023-07-20 20:48:55 +08:00
9adbca685a [opt](hudi) use spark bundle to read hudi data (#21260)
Use spark-bundle to read hudi data instead of using hive-bundle to read hudi data.

**Advantage** for using spark-bundle to read hudi data:
1. The performance of spark-bundle is more than twice that of hive-bundle
2. spark-bundle using `UnsafeRow` can reduce data copying and GC time of the jvm
3. spark-bundle support `Time Travel`, `Incremental Read`, and `Schema Change`, these functions can be quickly ported to Doris

**Disadvantage** for using spark-bundle to read hudi data:
1. More dependencies make hudi-dependency.jar very cumbersome(from 138M -> 300M)
2. spark-bundle only provides `RDD` interface and cannot be used directly
2023-07-04 17:04:49 +08:00
a6b51ec19a [Feature](avro) Support Apache Avro file format (#19990)
support read avro file by hdfs() or s3() .
```sql
select * from s3(
         "uri" = "http://127.0.0.1:9312/test2/person.avro",
         "ACCESS_KEY" = "ak",
         "SECRET_KEY" = "sk",
         "FORMAT" = "avro");
+--------+--------------+-------------+-----------------+
| name   | boolean_type | double_type | long_type       |
+--------+--------------+-------------+-----------------+
| Alyssa |            1 |     10.0012 | 100000000221133 |
| Ben    |            0 |    5555.999 |      4009990000 |
| lisi   |            0 | 5992225.999 |      9099933330 |
+--------+--------------+-------------+-----------------+

select * from hdfs(
                "uri" = "hdfs://127.0.0.1:9000/input/person2.avro",
                "fs.defaultFS" = "hdfs://127.0.0.1:9000",
                "hadoop.username" = "doris",
                "format" = "avro");
+--------+--------------+-------------+-----------+
| name   | boolean_type | double_type | long_type |
+--------+--------------+-------------+-----------+
| Alyssa |            1 |  8888.99999 |  89898989 |
+--------+--------------+-------------+-----------+
```

current avro reader only support common data type, the complex data types will be supported later.
2023-06-28 21:15:35 +08:00
2b3c82f57a [fix](multi-catalog)fix max compute scanner OOM and datetime (#20957)
1. Fix MC jni scanner OOM
2. add the second datetime type for MC SDK timestamp
3. make s3 uri case insensitive by the way
4. optimize max compute scanner parallel model
2023-06-26 13:53:29 +08:00
923f7edad0 [opt](hudi) using native reader to read the base file with no log file (#20988)
Two optimizations:
1. Insert string bytes directly to remove decoding&encoding process.
2. Use native reader to read the hudi base file if it has no log file. Use `explain` to show how many splits are read natively.
2023-06-20 11:20:21 +08:00
57656b2459 [Enhancement](java-udf) java-udf module split to sub modules (#20185)
The java-udf module has become increasingly large and difficult to manage, making it inconvenient to package and use as needed. It needs to be split into multiple sub-modules, such as : java-commom、java-udf、jdbc-scanner、hudi-scanner、 paimon-scanner.

Co-authored-by: lexluo <lexluo@tencent.com>
2023-06-13 09:41:22 +08:00