Commit Graph

2724 Commits

Author SHA1 Message Date
1f1932c6b7 [enhancement](nereids)add some date functions for constant fold (#32772) 2024-04-10 11:34:30 +08:00
814e4ed3ec [fix](nereids)partition prune should consider <=> operator (#32965) 2024-04-10 11:34:30 +08:00
97a2977f2a [improvement](executor)Add tag property for workload group #32874 2024-04-10 11:34:29 +08:00
e980cd3e7f [feature](Nereids): add ColumnPruningPostProcessor. (#32800) 2024-04-10 11:34:29 +08:00
26e86d53a4 [enhance](mtmv)support olap table partition column is null (#32698) 2024-04-10 11:34:29 +08:00
bb8bc75af4 [feature](agg) add aggregate function sum0 (#32541) 2024-04-10 11:34:29 +08:00
2a0644f442 [Fix](function) Fix unix_timestamp core for string input (#32871) 2024-04-09 12:48:35 +08:00
ebbfb06162 [Bug](array) fix array column core dump in get_shrinked_column as not check type (#33295)
* [Bug](array) fix array column core dump in get_shrinked_column as not check type

* add function could_shrinked_column
2024-04-08 07:27:40 +08:00
1b3e4322e8 [improvement](serde) Handle NaN values in number for MySQL result write (#33227) 2024-04-07 23:24:23 +08:00
fae55e0e46 [Feature](information_schema) add processlist table for information_schema db (#32511) 2024-04-07 23:24:22 +08:00
29556f758e [fix](parquet) fix time zone error in parquet reader (#33217)
`isAdjustedToUTC` is exactly the opposite in parquet reader(https://github.com/apache/parquet-format/blob/master/LogicalTypes.md), resulting the time with `isAdjustedToUTC=true` has increased by eight hours(UTC8).

The parquet with `isAdjustedToUTC=true` can be produced by spark-sql with the following configuration:
```
--conf spark.sql.session.timeZone=UTC
--conf spark.sql.parquet.outputTimestampType=TIMESTAMP_MICROS
```

However, using the following configuration, there's no logical and convert type in parquet meta data, so the time read by doris will also increase by eight hours(UTC8). Users need to set their own UTC time zone in doris(https://doris.apache.org/docs/dev/advanced/time-zone/)
```
--conf spark.sql.session.timeZone=UTC
--conf spark.sql.parquet.outputTimestampType=INT96
```
2024-04-07 23:24:22 +08:00
d9d950d98e [fix](iceberg) fix iceberg predicate conversion bug (#33283)
Followup #32923

Some cases are not covered in #32923
2024-04-07 22:12:38 +08:00
190763e301 [bugfix](iceberg)Convert the datetime type in the predicate according to the target column (#32923)
Convert the datetime type in the predicate according to the target column.
And add a testcase for #32194
related #30478 #30162
2024-04-07 22:12:33 +08:00
62699c8eea [improve](function) the offset params in lead/lag function could use 0 (#33174) 2024-04-07 12:58:03 +08:00
797b8fa456 [FIX](agg) fix vertical_compaction_reader for agg table with array/map type (#33130) 2024-04-03 18:09:45 +08:00
425c00a0d1 [fix](agg) incorrect result with having conjuncts and limit (#33040) 2024-03-30 10:14:44 +08:00
9d6fb39573 [regression-test](Variant) add order by to make test stable (#33014) (#33039) 2024-03-29 17:25:26 +08:00
5d576b41d7 [opt](invert index) use lowercase by default #32405 (#32940) 2024-03-29 14:37:40 +08:00
3a196c8b0f [Pick](Variant) pick 2 prs about bugfix of variant (#33011)
* [Fix](Variant) forbit table with variant type doing segment compaction temporarily

TODO fix this corretly in later works

* [Bug](Variant) use lower case name for variant's root, since backend treat parent column as lower case

This PR address the problem as blow:
```
errCode = 2, detailMessage = (172.16.56.137)[CANCELLED]failed to initialize storage reader. tablet=17136, res=[INTERNAL_ERROR]Not found field_name, field_name:Tags.tag_key1, schema:[Thread(8), Tags(9), Source(5), tags.tag_key1(-1), Title(6), Level(3), Time(2), CreateDate(1), Message(7), IP(4), AppId(0)]

```
2024-03-29 11:12:28 +08:00
71e16e6f35 [fix](iceberg) fix iceberg catalog bug and p2 test cases (#32898)
1. Fix iceberg catalog bug

    This PR #30198 change the logic of `IcebergHMSExternalCatalog.java`,
    to get locationUrl by calling hive metastore's `getCatalog()` method.
    But this method only exists in hive 3+. So it will fail if we using hive 2.x.

    I temporary remove this logic, because this logic is only used from iceberg table writing.
    Which is still under development. We will rethink this logic later.

2. Fix test cases

    Some of P2 test cases missed `order_qt`. And because the output format of the floating point
    type is changed, some result in `out` files need to be regenerated.
2024-03-27 20:44:38 +08:00
Pxl
ca64fa7954 [Bug](materialized-view) do not check key/value column when index is dup or mow (#32695)
do not check key/value column when index is dup or mow
2024-03-27 02:56:11 +08:00
ad688171f5 [fix](Nereids): handle distinct and nullable property when rewriting sum literal (#32778)
Refactored the SQL query to handle the distinct and nullable values when computing the sum of 'v' and 'v + 1' in the 't' table like:

select sum(v + 1), sum(distinct v + 1) from t
=>
select sum(v) + count(v), sum(distinct v) + count(distinct v)
2024-03-26 20:36:07 +08:00
b5a1914740 [Fix](nereids) Fix deletestmt getting catalog (#32701) 2024-03-26 20:29:03 +08:00
1b6b92a19d [improvement](mtmv) Support hll function roll up when query rewrite by materialized view (#32431)
Support hll roll up, the hll fucntion supportd is as following:

+-----------------------------------------------------------------------------------------------------------------------------------------------------+
|                      in query                    |                          in materialized view                           |        rolluped        |
+ ------------------------------------------------ + ----------------------------------------------------------------------- + ---------------------- +
| HLL_UNION_AGG(hll column)                        | hll_union(column) or hll_raw_agg(column) as column1                     | HLL_UNION_AGG(column1) |
| HLL_RAW_AGG(hll column) or HLL_UNION(hll column) |                                                                         | HLL_UNION(column)      |
| approx_count_distinct(not hll column)            | hll_union(HLL_HASH(column)) or hll_raw_agg(HLL_HASH(column)) as column1 | HLL_UNION_AGG(column1) |
| HLL_UNION_AGG(HLL_HASH(column))                  |                                                                         | HLL_UNION_AGG(column)  |
| hll_cardinality(hll_union(HLL_HASH(column)))     | hll_union(HLL_HASH(column)) or hll_raw_agg(HLL_HASH(column)) as column1 |                        |
| hll_cardinality(hll_raw_agg(HLL_HASH(column)))   | hll_union(HLL_HASH(column)) or hll_raw_agg(HLL_HASH(column)) as column1 |                        |
| HLL_RAW_AGG(HLL_HASH(column))                    | hll_union(HLL_HASH(column)) or hll_raw_agg(HLL_HASH(column)) as column1 | HLL_RAW_AGG(column1)   |
+-----------------------------------------------------------------------------------------------------------------------------------------------------+
2024-03-26 20:26:16 +08:00
0655d49a21 [enhancement](nereids) only push having as agg's parent if having just use slots from agg's output (#32414)
1. only push having as agg's parent if having just use slots from agg's output
2. show user friendly error message when item in select list but not in aggregate node's output
2024-03-26 20:26:04 +08:00
60c3372d8e [fix](Nereids): fix elimiate join by pkfk when there are multi joins (#32703) 2024-03-26 20:24:39 +08:00
Pxl
fdf6b8fe8d [Bug](repeat) fix core dump coz output slot'order on repeat node not match with pre repeat exprs (#32662)
fix core dump coz output slot'order on repeat node not match with pre repeat exprs
2024-03-26 20:22:20 +08:00
0a44de67bf [bug](distinct agg) fix distinct streaming agg not output all data (#32760)
fix distinct streaming agg not output all data
2024-03-26 20:19:36 +08:00
ad2d20348a [fix](pipeline) fix use error row desc when origin block clear #32803 (#32849)
* fix

* add case
2024-03-26 20:02:46 +08:00
c0d7a5660e [fix](paimon) support paimon with hive2 (#32455)
In order to support paimon with hive2, we need to modify the origin HiveMetastoreClient.java
to let it compatible with both hive2 and hive3.
And this modified HiveMetastoreClient should be at the front of the CLASSPATH, so that
it can overwrite the HiveMetastoreClient in hadoop jar.

This PR mainly changes:

1. Copy HiveMetastoreClient.java in FE to BE's preload jar.

2. Split the origin `preload-extensions-jar-with-dependencies.jar` into 2 jars
    1. `preload-extensions-project.jar`, which contains the modified HiveMetastoreClient.
    2. `preload-extensions-jar-with-dependencies.jar`, which contains other dependency jars.

3. Modify the `start_be.sh`, to let `preload-extensions-project.jar` be loaded first.

4. Change the way the assemble the jni scanner jar
    Only need to assemble the project jar, without other dependencies.
    Because actually we only use classed under `org.apache.doris` package.
    So remove other unused dependency jars can also reduce the output size of BE.

5. fix bug that the prefix of paimon properties should be `paimon.`, not `paimon`

6. Support paimon with hive2
    User can set `hive.version` in paimon catalog properties to specify the hive version.
2024-03-26 15:31:07 +08:00
ec43f65235 [feature](hudi) support hudi incremental read (#32052)
* [feature](hudi) support incremental read for hudi table

* fix jdk17 java options
2024-03-26 15:31:07 +08:00
7b94cfdba1 Revert "[Fix](tests) add regression tests for trino-connector (#32552)"
This reverts commit 3fc3a4650681cb519405730899a2f22f268b38c1.
2024-03-25 22:38:21 +08:00
ff0da8108b [fix](RF) fix 'Invalid value' error of RF of decimal type (#32749) 2024-03-25 22:34:19 +08:00
3fc3a46506 [Fix](tests) add regression tests for trino-connector (#32552) 2024-03-25 22:31:55 +08:00
b9788e5e37 [fix] (Nereids) fix date function rewrite on datetimev1 column bug (#32569) 2024-03-24 08:07:01 +08:00
3d14d9e379 [fix](Nereids) fix bind having aggregate failed again (#32687)
follow up #32490

add more tests and fix some cases because some sqls are valid to mysql, but failed in doris
2024-03-24 08:06:13 +08:00
35e580ec7a [fix](RF) fix 'Invalid value' error of RF of datetimev2 type for max value (#32649) 2024-03-22 22:29:50 +08:00
326a264fcd [Improvement](executor)Add spill property for workload group #32554 2024-03-22 16:38:19 +08:00
f443d6de85 [Fix](variant) filter with variant access may lead to to parition/tablet prune fall through (#32560)
Query like `select * from ut_p partitions(p2) where cast(var['a'] as int)  > 0` will fall through parition/tablet prunning since it's plan like
```
mysql> explain analyzed plan select * from ut_p where id = 3 and cast(var['a'] as int) = 789;
+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| Explain String(Nereids Planner)                                                                                                                                            |
+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| LogicalResultSink[26] ( outputExprs=[id#0, var#1] )                                                                                                                        |
| +--LogicalProject[25] ( distinct=false, projects=[id#0, var#1], excepts=[] )                                                                                               |
|    +--LogicalFilter[24] ( predicates=((cast(var#4 as INT) = 789) AND (id#0 = 3)) )                                                                                         |
|       +--LogicalFilter[23] ( predicates=(0 = __DORIS_DELETE_SIGN__#2) )                                                                                                    |
|          +--LogicalProject[22] ( distinct=false, projects=[id#0, var#1, __DORIS_DELETE_SIGN__#2, __DORIS_VERSION_COL__#3, element_at(var#1, 'a') AS `var`#4], excepts=[] ) |
|             +--LogicalOlapScan ( qualified=regression_test_variant_p0.ut_p, indexName=<index_not_selected>, selectedIndexId=10145, preAgg=ON )                             |
+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
6 rows in set (0.01 sec)
```
with an extra LogicalProject on top of LogicalOlapScan, so we should handle such case to prune parition/tablet
2024-03-22 16:38:19 +08:00
6812b575b2 [fix](Nereids) fix bind having aggregate failed (#32490)
fix bind having aggregate failed, keep the behavior like mysql
2024-03-22 16:36:46 +08:00
01a5413e45 [fix](Nereids) filter-limit-project translate to wrong plan (#32496) 2024-03-22 16:35:47 +08:00
4de8775e17 [feat](Nereids): rewrite sum literal to sum and count (#32244)
sum(v + 2) => sum(v) + 2*count(v)
sum(v - 2) => sum(v) - 2*count(v)
2024-03-22 16:35:47 +08:00
3f36aa2d48 [chore](Nereids) remove ensure project on top join (#32562) 2024-03-22 16:35:47 +08:00
39382a9774 [fix](Nereids): just pull up alias project above join through topn (#32305) 2024-03-22 16:35:47 +08:00
66336e59e6 [fix](join) the result of left semi join with empty right side should be false, not null (#32477) 2024-03-22 16:35:43 +08:00
0f343e914a Revert "[case](Cloud) Add ssb case for hdfs vault (#32567)"
This reverts commit 60a673979e5b2f98e9cc66c77fa0d6c61b8ed0f7.
2024-03-22 15:26:09 +08:00
ebfb3418f9 [test](mtmv)Add tpch test cherry pick to branch 21 (#32611)
* [test](neredis) Add tpch test for query rewrite by materialized view (#30870)

query rewrite by materialized view sql is as following
q1, q5, q6, q8, q9, q12, q14
the other is not supported now, will be supported later

* change code usage
2024-03-22 15:20:38 +08:00
ea71472d64 [fix](build index) fix core when build index for a new column which without data (#32550) (#32669)
Co-authored-by: Luennng <luennng@gmail.com>
Co-authored-by: Tanya-W <tanya1218w@163,com>
2024-03-22 15:05:19 +08:00
23c12fd68f [fix](join) core caused by null-safe-equal join (#32623) 2024-03-22 08:53:47 +08:00
d3bdda6071 [fix](partial update) fix data correctness risk when load delete sign data into a table with sequence col (#32574) 2024-03-22 08:52:38 +08:00