Commit Graph

7972 Commits

Author SHA1 Message Date
Pxl
ca64fa7954 [Bug](materialized-view) do not check key/value column when index is dup or mow (#32695)
do not check key/value column when index is dup or mow
2024-03-27 02:56:11 +08:00
ad688171f5 [fix](Nereids): handle distinct and nullable property when rewriting sum literal (#32778)
Refactored the SQL query to handle the distinct and nullable values when computing the sum of 'v' and 'v + 1' in the 't' table like:

select sum(v + 1), sum(distinct v + 1) from t
=>
select sum(v) + count(v), sum(distinct v) + count(distinct v)
2024-03-26 20:36:07 +08:00
b5a1914740 [Fix](nereids) Fix deletestmt getting catalog (#32701) 2024-03-26 20:29:03 +08:00
820bf7e590 [fix](mtmv) use min value to check the first partition (#32765) 2024-03-26 20:28:52 +08:00
3658dfd500 [enhance](auth)node priv can show proc (#32751) 2024-03-26 20:28:00 +08:00
1b6b92a19d [improvement](mtmv) Support hll function roll up when query rewrite by materialized view (#32431)
Support hll roll up, the hll fucntion supportd is as following:

+-----------------------------------------------------------------------------------------------------------------------------------------------------+
|                      in query                    |                          in materialized view                           |        rolluped        |
+ ------------------------------------------------ + ----------------------------------------------------------------------- + ---------------------- +
| HLL_UNION_AGG(hll column)                        | hll_union(column) or hll_raw_agg(column) as column1                     | HLL_UNION_AGG(column1) |
| HLL_RAW_AGG(hll column) or HLL_UNION(hll column) |                                                                         | HLL_UNION(column)      |
| approx_count_distinct(not hll column)            | hll_union(HLL_HASH(column)) or hll_raw_agg(HLL_HASH(column)) as column1 | HLL_UNION_AGG(column1) |
| HLL_UNION_AGG(HLL_HASH(column))                  |                                                                         | HLL_UNION_AGG(column)  |
| hll_cardinality(hll_union(HLL_HASH(column)))     | hll_union(HLL_HASH(column)) or hll_raw_agg(HLL_HASH(column)) as column1 |                        |
| hll_cardinality(hll_raw_agg(HLL_HASH(column)))   | hll_union(HLL_HASH(column)) or hll_raw_agg(HLL_HASH(column)) as column1 |                        |
| HLL_RAW_AGG(HLL_HASH(column))                    | hll_union(HLL_HASH(column)) or hll_raw_agg(HLL_HASH(column)) as column1 | HLL_RAW_AGG(column1)   |
+-----------------------------------------------------------------------------------------------------------------------------------------------------+
2024-03-26 20:26:16 +08:00
0655d49a21 [enhancement](nereids) only push having as agg's parent if having just use slots from agg's output (#32414)
1. only push having as agg's parent if having just use slots from agg's output
2. show user friendly error message when item in select list but not in aggregate node's output
2024-03-26 20:26:04 +08:00
60c3372d8e [fix](Nereids): fix elimiate join by pkfk when there are multi joins (#32703) 2024-03-26 20:24:39 +08:00
2d398dfe1f [fix](planner)the decimal type of's precision and scale setope ration codn is wrong (#32787) 2024-03-26 20:22:29 +08:00
Pxl
fdf6b8fe8d [Bug](repeat) fix core dump coz output slot'order on repeat node not match with pre repeat exprs (#32662)
fix core dump coz output slot'order on repeat node not match with pre repeat exprs
2024-03-26 20:22:20 +08:00
efe684572e [Enhancement](Load) Nereids supports http_stream and group_commit with stream load (#31259) 2024-03-26 17:22:42 +08:00
34cb83fb6e [feature](merge-cloud) Add init CloudEnv and CloudInternalCatalog (#29962) 2024-03-26 17:11:05 +08:00
f9ae03ac3c [feature](Nereids) support data masking policy (#32526)
support data masking policy

note:
if a user send the query
```sql
select name from tbl limit 1
```
and the user have row policy on `tbl.name` with the filter `name = 'Beijing'`, and have data masking policy on `tbl.name` with the masking `concat(substring(name, 1, 4), '****')`, we will rewrite the query to
```sql
select concat(substring(name, 1, 4), '****') as name
from tbl
where name = 'Beijing' -- note that this name is from tbl, not from the alias in the select list
limit 1
```

the result would be `Beij****`
2024-03-26 15:31:08 +08:00
0a2d7379fc [enhance](auth)row policy support catalog and match name instead id (#32310)
Follow up #32137

storage name instead id to meta,Prevent table deletion and reconstruction and causing ID changes
2024-03-26 15:31:08 +08:00
c0d7a5660e [fix](paimon) support paimon with hive2 (#32455)
In order to support paimon with hive2, we need to modify the origin HiveMetastoreClient.java
to let it compatible with both hive2 and hive3.
And this modified HiveMetastoreClient should be at the front of the CLASSPATH, so that
it can overwrite the HiveMetastoreClient in hadoop jar.

This PR mainly changes:

1. Copy HiveMetastoreClient.java in FE to BE's preload jar.

2. Split the origin `preload-extensions-jar-with-dependencies.jar` into 2 jars
    1. `preload-extensions-project.jar`, which contains the modified HiveMetastoreClient.
    2. `preload-extensions-jar-with-dependencies.jar`, which contains other dependency jars.

3. Modify the `start_be.sh`, to let `preload-extensions-project.jar` be loaded first.

4. Change the way the assemble the jni scanner jar
    Only need to assemble the project jar, without other dependencies.
    Because actually we only use classed under `org.apache.doris` package.
    So remove other unused dependency jars can also reduce the output size of BE.

5. fix bug that the prefix of paimon properties should be `paimon.`, not `paimon`

6. Support paimon with hive2
    User can set `hive.version` in paimon catalog properties to specify the hive version.
2024-03-26 15:31:07 +08:00
37c8cc040d [feature-wip](ranger)support datamask and row filter (#32137)
doris ranger support datamask and row filter

hive ranger support row filter
2024-03-26 15:31:07 +08:00
ec43f65235 [feature](hudi) support hudi incremental read (#32052)
* [feature](hudi) support incremental read for hudi table

* fix jdk17 java options
2024-03-26 15:31:07 +08:00
8714dde34f [monir] remove unused cluster code (#31360)
* [monir] remove unused cluster code

* 2
2024-03-26 15:31:07 +08:00
983e5df812 Fix compile (#32818)
[fix](compile) fix code style (#32819)

* Fix compile

* fix style
2024-03-26 15:02:46 +08:00
6f47055f5a [opt](profile) Disable show query/load profile stmt (#32467) (#32813) 2024-03-26 13:56:46 +08:00
6457a9a642 [fix](Nereids) system default decimalv3 scale should be 9 (#32754)
select round('1.1234', 2) should return 1.12, not 1
2024-03-26 10:43:49 +08:00
552bf5b41a [opt](jdbc catalog) close when jdbcClient is not empty (#32747) 2024-03-25 22:33:33 +08:00
0eb9256274 [Fix](TransientTask)Export tasks should only be run on the master node (#32700)
* [Fix](TransientTask)Export tasks should only be run on the master node
Add thread name

Export Task runs only on the master node, so it is necessary to explicitly start the corresponding resources. At the same time, refactor some code to avoid circular dependencies.

* TransientTaskManager is initialized twice. Therefore, the second initialization needs to be deleted.
2024-03-25 22:33:02 +08:00
0f5e13e17c [fix](Nereids) IGNORE_STORAGE_DATA_DISTRIBUTION should not block generating filter for nested loop join (#32653) 2024-03-25 22:33:02 +08:00
97a48ca6b7 [fix](mtmv)fix mysql MTMV do not automatically refresh even if the refresh method is COMPLETE (#32683) 2024-03-25 22:32:18 +08:00
b9ac5b3d5b [bugfix](hive)use originHiveKeys for hive partitionvalue (#32664) 2024-03-25 22:31:38 +08:00
1b0cd4a4db [branch-2.1](routine-load) self-adaption backoff timeout (#32734)
* [opt](routine-load) self-adaption backoff timeout (#32227)

* [fix](routine-load) fix timeout backoff can not work (#32661)
2024-03-24 18:39:58 +08:00
b9788e5e37 [fix] (Nereids) fix date function rewrite on datetimev1 column bug (#32569) 2024-03-24 08:07:01 +08:00
5c3fc818bb [Fix](Export) fix variable name of code #32637 2024-03-24 08:07:01 +08:00
5077f7dd9d [conf](mysql) opt mysql network timeout to 600s #32545 2024-03-24 08:07:01 +08:00
1d4e5a1c58 [enhance](auth)enable col auth (#32659) 2024-03-24 08:07:01 +08:00
bd2e7d0ec8 [feature](hive)support insert overwrite (#32610)
support insert overwrite for unpartitioned table and partitioned table.

issue: #31442
2024-03-24 08:06:13 +08:00
ac8ba43d8d [fix](multi-catalog)resolve hive meta store compatibility for different version issues (#32551)
Fix hive list partition at different version.
Only Hive3 uses the prependCatalogToDbName() to wrap db_name.
2024-03-24 08:06:13 +08:00
f62cdecc08 [fix](Nereids) do not push down topn-filter through right/full outer join if the first orderkey is nulls first (#32633)
* do not push down topn-filter through right/full outer join if the first order key is nulls first
2024-03-24 08:06:13 +08:00
3d14d9e379 [fix](Nereids) fix bind having aggregate failed again (#32687)
follow up #32490

add more tests and fix some cases because some sqls are valid to mysql, but failed in doris
2024-03-24 08:06:13 +08:00
c223d9e7d0 [Fix](Test)Reduce the Sleep time and ensure that the corresponding thread resources are released after the case is executed. (#32665)
* [Fix](Test)Reduce the Sleep time and ensure that the corresponding thread resources are released after the case is executed.
* Fix load job id error
2024-03-24 08:06:13 +08:00
8b960beaec [refactor](nereids)unify outputTupleDesc and projection (#32093)
* unify join node output project
2024-03-24 08:05:42 +08:00
8b34915518 [Fix](compress) Fix occasional crushes when serializing blocks (#32672) 2024-03-23 06:20:45 +08:00
8514fabe16 Revert "[fix](routine-load) fix timeout backoff can not work (#32661)" (#32709)
This reverts commit 0d0f787d3e9901192a403d5eb61ea58c8ea17a8e.

Co-authored-by: stephen <hello-stephen@qq.com>
2024-03-22 22:27:37 +08:00
8af666b192 [branch-2.1](routine-load) enhance auto resume to keep routine load stable (#32689)
* enhance auto resume to keep routine load stable

* do not auto resume if job cannot resume definitely (#32419)
2024-03-22 18:07:12 +08:00
62c7d0a421 [Fix](point query) add query options for short circuit queries (#32530) (#32684)
Some options like `be_exec_version` needed for functions
2024-03-22 18:03:18 +08:00
0d0f787d3e [fix](routine-load) fix timeout backoff can not work (#32661) 2024-03-22 16:38:52 +08:00
326a264fcd [Improvement](executor)Add spill property for workload group #32554 2024-03-22 16:38:19 +08:00
f443d6de85 [Fix](variant) filter with variant access may lead to to parition/tablet prune fall through (#32560)
Query like `select * from ut_p partitions(p2) where cast(var['a'] as int)  > 0` will fall through parition/tablet prunning since it's plan like
```
mysql> explain analyzed plan select * from ut_p where id = 3 and cast(var['a'] as int) = 789;
+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| Explain String(Nereids Planner)                                                                                                                                            |
+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| LogicalResultSink[26] ( outputExprs=[id#0, var#1] )                                                                                                                        |
| +--LogicalProject[25] ( distinct=false, projects=[id#0, var#1], excepts=[] )                                                                                               |
|    +--LogicalFilter[24] ( predicates=((cast(var#4 as INT) = 789) AND (id#0 = 3)) )                                                                                         |
|       +--LogicalFilter[23] ( predicates=(0 = __DORIS_DELETE_SIGN__#2) )                                                                                                    |
|          +--LogicalProject[22] ( distinct=false, projects=[id#0, var#1, __DORIS_DELETE_SIGN__#2, __DORIS_VERSION_COL__#3, element_at(var#1, 'a') AS `var`#4], excepts=[] ) |
|             +--LogicalOlapScan ( qualified=regression_test_variant_p0.ut_p, indexName=<index_not_selected>, selectedIndexId=10145, preAgg=ON )                             |
+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
6 rows in set (0.01 sec)
```
with an extra LogicalProject on top of LogicalOlapScan, so we should handle such case to prune parition/tablet
2024-03-22 16:38:19 +08:00
e41311d77d [bug](fold) fix fold constant core dump with variant type (#32265)
1. variant type core dump at call get_data_at function, as not impl this function.
2. some case can't pass at old planner and fold_constant_by_be = on.
3. open enable_fold_constant_by_be = true.
2024-03-22 16:37:33 +08:00
8a6fc79797 [fix](routine-load) avoid routine load pause for check transaction status fail (#32638) 2024-03-22 16:36:49 +08:00
6812b575b2 [fix](Nereids) fix bind having aggregate failed (#32490)
fix bind having aggregate failed, keep the behavior like mysql
2024-03-22 16:36:46 +08:00
1c521cd94e [fix](backup) clear snapshotInfos and backupMeta when cancel (#32646) 2024-03-22 16:36:46 +08:00
a10466598b [fix](jdbc catalog) Fix query errors without jdbc pool default value on only BE upgrade (#32618) 2024-03-22 16:36:22 +08:00
01a5413e45 [fix](Nereids) filter-limit-project translate to wrong plan (#32496) 2024-03-22 16:35:47 +08:00