Commit Graph

4203 Commits

Author SHA1 Message Date
fefc0d6814 [Fix](planner)fix create view ignore order by info bug. (#18197) 2023-03-30 20:17:46 +08:00
ce79ff947a [Enhancement](spark load)Support spark time out config (#17108) 2023-03-30 20:12:46 +08:00
99bd5ec022 [fix](Nereids) fix some bugs in Subquery to window rule (#18233)
we introduce this rule by PR #17968, but some corner case do
not be processed correctly. This PR fix these bugs:
1. fix window function generation method, replace inner slot with
   equivalent outer slot
2. forbid below scenes
    a. inner has a mapping project
    b. inner has an unexpected filter
    c. outer has a mapping project
    d. outer has an unexpected filter
    e. outer has additional table
    f. outer has same table
    g. outer and inner with different join condition
    h. outer and inner has same table with different join condition
2023-03-30 16:09:16 +08:00
ea41d94582 [Improve](complex-type) Support Count(complexType) (#17868)
Support count function for ARRAY/MAP/STRUCT type
2023-03-30 15:43:32 +08:00
b7af110f61 [Bug](bloomfilter) Fix bloom filter for date type (#18205) 2023-03-30 14:15:06 +08:00
Pxl
cec983b7ef [Chore](materialized-view) forbiden create mv with where clause contained aggregate column (#18168)
forbiden create mv with where clause contained aggregate column

create table a_table(
	k1 int null,
	k2 int not null,
    k3 bigint null,
	k4 bigint sum null,
    k5 bitmap bitmap_union null,
    k6 hll hll_union null
)
aggregate key (k1,k2,k3)
distributed BY hash(k1) buckets 3
properties("replication_num" = "1");
create materialized view where_1 as select k1,k4 from a_table where k4 =1; // invalid, mv on agg table need group by
create materialized view where_2 as select k1,sum(k4) from a_table where k4 =1 group by k1; // invalid, k4 is agg column
create materialized view where_2 as select k1,sum(k4) from a_table where k1+k4 =1 group by k1; // invalid, k4 is agg column
2023-03-30 13:03:03 +08:00
Pxl
c8ad62a3cd [Enchancement](materialized-view) enchance materialized view where clause match (#18179)
enchance materialized view where clause match
2023-03-30 13:02:21 +08:00
c8ea5bff1d [Fix](planner) fix nested udf bind arguments exception (#18188)
nested alias function will cause bind argument exception, sql like:
``` sql
CREATE ALIAS FUNCTION f1(DATETIMEV2(3), INT)
            with PARAMETER (datetime1, int1) as date_trunc(days_sub(datetime1, int1), 'day')

CREATE ALIAS FUNCTION f2(DATETIMEV2(3), int)
            with PARAMETER (datetime1, int1) as DATE_FORMAT(HOURS_ADD(
                date_trunc(datetime1, 'day'),
                add(multiply(floor(divide(HOUR(datetime1), divide(24,int1))), 1), 1)
            ), '%Y%m%d:%H')

select f2(f1(now(3), 2), 3)
```

bug in FunctionCallExpr#rewriteExpr(), the retExpr will be replaced to originExpr to change the alias function to builtin function, but the retExpr.fn is not null, so when return to outer scope, the fn will be covered. That's the example:
```
f1(f1()) -> date_trunc(days_sub(date_trunc(days_sub()))) is correct and
f1(f1()) -> date_trunc(days_sub(days_sub())) is bug.
``` 

we fix it.
2023-03-30 11:39:02 +08:00
b3657959c9 [fix](planner )need add LateralViewRef's id into TableRef's allTableRefIds (#18220)
1. add LateralViewRef's id into TableRef's allTableRefIds, so the caller won't miss LateralViewRef when trying to get all the tableref ids.
2. TableFunctionNode should use child node's output tuple id as the input tuple id
2023-03-30 11:32:18 +08:00
9c1aad06ea [Improve](query) improve column match performance by introducing a column name map in MaterializedIndexMeta (#18203)
improve column match performance by introducing a column name map in `MaterializedIndexMeta`
`getColumnByName` is slow due to the linear search process, using a map to speed up search.
2023-03-30 11:24:51 +08:00
525f15dddf [vectorized](function) support array_sortby function (#18071) 2023-03-30 11:07:49 +08:00
c0d55b54c0 [Improvement](statistics) Support for statistics removing and incremental collection (#18069)
* Support for removing statistics and incremental collection

* Fix syntax
2023-03-30 10:40:43 +08:00
58bc18af54 [Improvement](log) avoid warn log in specializeTemplateFunction at initialization (#18070)
avoid warn log in specializeTemplateFunction at initialization like this:

```
2023-03-23 21:17:54,931 INFO (leaderCheckpointer|89) [FunctionSet.specializeTemplateFunction():1337] specializeTemplateFunction exception at initialize
org.apache.doris.catalog.TypeException: ARRAY<DECIMALV3(9, 0)> is not MapType
        at org.apache.doris.catalog.MapType.specializeTemplateType(MapType.java:137) ~[fe-common-1.2-SNAPSHOT.jar:1.2-SNAPSHOT]
        at org.apache.doris.catalog.FunctionSet.specializeTemplateFunction(FunctionSet.java:1321) ~[doris-fe.jar:1.2-SNAPSHOT]
        at org.apache.doris.catalog.FunctionSet.getFunction(FunctionSet.java:1251) ~[doris-fe.jar:1.2-SNAPSHOT]
        at org.apache.doris.catalog.FunctionSet.getFunction(FunctionSet.java:1216) ~[doris-fe.jar:1.2-SNAPSHOT]
        at org.apache.doris.catalog.FunctionSet.addBuiltinBothScalaAndVectorized(FunctionSet.java:1449) ~[doris-fe.jar:1.2-SNAPSHOT]
        at org.apache.doris.catalog.FunctionSet.addScalarAndVectorizedBuiltin(FunctionSet.java:1432) ~[doris-fe.jar:1.2-SNAPSHOT]
        at org.apache.doris.builtins.ScalarBuiltins.initBuiltins(ScalarBuiltins.java:108) ~[doris-fe.jar:1.2-SNAPSHOT]
        at org.apache.doris.catalog.FunctionSet.init(FunctionSet.java:87) ~[doris-fe.jar:1.2-SNAPSHOT]
        at org.apache.doris.catalog.Env.<init>(Env.java:585) ~[doris-fe.jar:1.2-SNAPSHOT]
        at org.apache.doris.catalog.Env.getCurrentEnv(Env.java:665) ~[doris-fe.jar:1.2-SNAPSHOT]
        at org.apache.doris.master.Checkpoint.doCheckpoint(Checkpoint.java:143) ~[doris-fe.jar:1.2-SNAPSHOT]
        at org.apache.doris.master.Checkpoint.runAfterCatalogReady(Checkpoint.java:77) ~[doris-fe.jar:1.2-SNAPSHOT]
        at org.apache.doris.common.util.MasterDaemon.runOneCycle(MasterDaemon.java:58) ~[doris-fe.jar:1.2-SNAPSHOT]
        at org.apache.doris.common.util.Daemon.run(Daemon.java:116) ~[doris-fe.jar:1.2-SNAPSHOT]
```
2023-03-30 10:17:21 +08:00
55bf38dbab [feature-wip](MTMV) Use SSB ddl to test (#18150)
Add regression tests for MTMV.
2023-03-30 00:11:38 +08:00
58de8ec2df [enhance](Nereids): add variable to enable Bushy Tree (#18202) 2023-03-29 21:53:24 +08:00
6964d9f99c [fix](function) resubmit-fix AES/SM3/SM4 encrypt/ decrypt algorithm initialization vector bug (#17907)
* Revert "[fix](function) fix AES/SM3/SM4 encrypt/ decrypt algorithm initialization vector bug (#17420)"

This reverts commit 397cc011c4f1ba5a25c770258c13f1cd3f28b47d.

* [fix-resubmit](function) fix AES/SM3/SM4 encrypt/ decrypt algorithm initialization vector bug (#17420)

ECB algorithm, block_encryption_mode does not take effect, it only takes effect when init vector is provided.
Solved: 192/256 supports calculation without init vector

For other algorithms, an error should be reported when there is no init vector

Initialization Vector. The default value for the block_encryption_mode system variable is aes-128-ecb, or ECB mode, which does not require an initialization vector. The alternative permitted block encryption modes CBC, CFB1, CFB8, CFB128, and OFB all require an initialization vector.

Reference: https://dev.mysql.com/doc/refman/8.0/en/encryption-functions.html#function_aes-decrypt

Note: This fix does not support smooth upgrades. during upgrade process, query may report error: funciton not found
2023-03-29 21:13:01 +08:00
f24174ebf1 when estimated rowCount is 0, adjust to 1 (#18174) 2023-03-29 19:03:53 +08:00
b92087dee8 [Fix](Nereids) ReorderJoin rule cannot process MarkJoin correctly (#18159)
Fix two problems,
1. The logical join containing the MarkJoinSlotRefrance column will generate a plan->MarkJoinSlotreference structure when reorderJoin is executed, and the MarkJoinSlotreference column will be restored after the reorder is completed. But when filter+crossJoin exists, it will be transformed into innerJoin in the rules, causing the map to fail, and the corresponding plan cannot be found, thus losing the MarkJoinSlotreference column.
2. Originally, the MarkJoinSlotReference column was used as the NonUserVisibleOutput of logicalJoin. At the same time, when logicalApply was generated, the added logicalProject did not include the MarkJoinSlotReference column, and the invalid logicalProject was deleted based on other rules, so as to ensure that LogicalApply was under the logicalFilter and could recognize the MarkJoinSlotReference column. But there will be problems if logicalProject cannot be deleted.

Repair method
1. For logicalJoin containing MarkJoinSlotreference, the rules of reorderJoin are not executed.
2. Use MarkJoinSlotreference as the output of logicalJoin and also as the output of LogicalApply.
3. When generating LogicalApply, if MarkJoinSlotreference is included, you need to add an additional logicalProject to logicalFilter, and remove the MarkJoinSlotreference column.

eg
```
logicalFilter(subquery with disconjunct)

after SubqueryToApply

logicalProject(without markJoinSlotReference)
+-- logicalFilter(markJoinSlotReference)
    +-- logicalProject(with markJoinSlotReference)
        +-- logicalApply()
```

```
SELECT * FROM sub_query_correlated_subquery1 WHERE k1 IN (SELECT k1 FROM sub_query_correlated_subquery3) OR k1 < 10;
+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| Explain String                                                                                                                                                                                     |
+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| LogicalProject[60] ( distinct=false, projects=[k1#0, k2#1], excepts=[], canEliminate=true )                                                                                                        |
| +--LogicalProject[59] ( distinct=false, projects=[k1#0, k2#1], excepts=[], canEliminate=true )                                                                                                     |
|    +--LogicalFilter[58] ( predicates=($c$1#7#false OR (k1#0 < 10)) )                                                                                                                               |
|       +--LogicalProject[57] ( distinct=false, projects=[k1#0, k2#1, $c$1#7#false], excepts=[], canEliminate=true )                                                                                 |
|          +--LogicalApply ( correlationSlot=[], correlationFilter=Optional.empty, isMarkJoin=true, MarkJoinSlotReference=$c$1#7#false, scalarSubCorrespondingSlot=empty )                           |
|             |--LogicalOlapScan ( qualified=default_cluster:regression_test_nereids_syntax_p0.sub_query_correlated_subquery1, indexName=<index_not_selected>, selectedIndexId=63105, preAgg=ON )    |
|             +--LogicalProject[34] ( distinct=false, projects=[k1#2], excepts=[], canEliminate=true )                                                                                               |
|                +--LogicalOlapScan ( qualified=default_cluster:regression_test_nereids_syntax_p0.sub_query_correlated_subquery3, indexName=<index_not_selected>, selectedIndexId=63115, preAgg=ON ) |
+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
```
2023-03-29 16:12:42 +08:00
Pxl
503c6bf38e [Chore](materialized-view) forbiden create mv with some constant expr and curdate() (#18145)
forbiden create mv with some constant expr and curdate()
2023-03-29 16:08:48 +08:00
f7f7958d35 [fix](bdbje) fix handle bdb RollbackException incorrectly (#17483) 2023-03-29 16:02:55 +08:00
545160a343 [refactor](planner) Separate the planning process for the legacy planner and Nereids (#17991)
1. separate the planning process for legacy planner and Nereids in StmtExecutor
2. add forward to master logic to Nereids
3. refactor Command process for Nereids, add run interface to Command
4. internal query could run on Nereids as normal query
5. fix CreatePolicyCommand syntax, let it exactlly same with legacy planner
6. let Nereids session variables forward to master
2023-03-29 11:36:38 +08:00
db25165498 [fix](nereids)move AdjustAggregateNullableForEmptySet before NormalizeAggregate (#18147) 2023-03-29 11:34:01 +08:00
Pxl
0c01df6bb2 [Bug](view) fix AES_ENCRYPT have wrong result on view (#18034) 2023-03-29 10:49:39 +08:00
Pxl
fd18e34c0c [Chore](planner) add error information for OnClause contain ExistsPredicates (#18090) 2023-03-29 10:47:41 +08:00
7e9e02a173 [Enhancement](auth)Desc table check col auth (#18114)
1.Change permission exception format
2.when desc table ,we show different cols by auth
3.delete unused code
2023-03-29 10:42:18 +08:00
05db6e9b55 [refactor](file-system)(step-2) remove env, file_utils and filesystem_utils (#18009)
Follow #17586.
This PR mainly changes:

Remove env/
Remove FileUtils/FilesystemUtils
Some methods are moved to LocalFileSystem
Remove olap/file_cache
Add s3 client cache for s3 file system
In my test, the time of open s3 file can be reduced significantly
Fix cold/hot separation bug for s3 fs.
This is the last PR of #17764.
After this, all IO operation should be in io/fs.

Except for tests in #17586, I also tested some case related to fs io:

clone
concurrency query on local/s3/hdfs
load error log create and clean
disk metrics
2023-03-29 09:00:52 +08:00
c3fe113894 rename PaloFe to DorisFE (#18167) 2023-03-29 00:30:16 +08:00
5d218388f3 [enhancement](stmt-forward) make fe follower err msg shown to client be consistent with master (#18180)
Found that RPC timeout is too short that RPC client will close before execute result is return.

Therefore, use a coefficient to prolong the RPC client timeout, so that it can wait for the real cause to be recieved.
2023-03-28 21:27:45 +08:00
012f7bd031 [feature](function)Add ST_Area function (#18138) 2023-03-28 19:36:09 +08:00
cff6a7195b [feature](Nereids): add bushy tree rule; (#18130) 2023-03-28 19:32:53 +08:00
b9161295b7 [Fix](plan) fix bug that the case sensibility of column name may impact join method (#17904)
Issue Number: close #17876
2023-03-28 15:18:30 +08:00
6bd2609294 [Enhancement](multi-catalog) add config for external meta cache loade… (#18117)
Add config for external cache-loader's max thread-pool size.
2023-03-28 15:10:19 +08:00
d7dcdfcba9 [Fix](Create View) support create view from tvf (#18087)
Support create view as select * from tvf()
2023-03-28 15:07:32 +08:00
d6339b36a4 [fix](Nereids): correct the order of pushdown semi rules. (#18148) 2023-03-28 14:20:07 +08:00
1956f04aa2 [feature](multi-catalog) add specified_database_list PROPERTY for jdbc/hms/iceberg catalog (#17803)
add specified_database_list PROPERTY for jdbc catalog, user can use many database specified by jdbc catalog
2023-03-28 14:04:41 +08:00
daeaa91dd6 [feature](function) support variadic template type in SQL function (#17985)
Inspired by c++ function `std::vector::emplace_back()`, we can use variadic template for this issue.

e.g.

```
[['struct'], 'STRUCT<TYPES>', ['TYPES'], 'ALWAYS_NOT_NULLABLE', ['TYPES...']]
```

`...TYPES` in template_types defines a variadic template `TYPE`.  Then the variadic template will be expanded to multiple normal templates based on actual input arguments at runtime in FE.

But make sure `TYPES...` is placed on the last position in all template type arguments.

BTW, the origin template function logic is not affected.
2023-03-28 11:08:24 +08:00
Pxl
9c1e86f84f [Bug](materialized-view) add some limit for create mv on aggregate table (#18141)
add some limit for create mv on aggregate table.
```sql
CREATE TABLE t1 (   
p1 INT,   
p2 INT,   
p3 INT,   
v1 INT SUM,   
v2 INT MAX,   
v3 INT MIN ) AGGREGATE KEY (p1, p2, p3) DISTRIBUTED BY HASH (p1) BUCKETS 1 PROPERTIES ('replication_num' = '1');


CREATE MATERIALIZED VIEW mv_1 AS SELECT p1, SUM(v3) FROM t1 GROUP BY p1;  // invalid aggregate type
CREATE MATERIALIZED VIEW mv_2 AS SELECT p1, MIN(v3+v3) FROM t1 GROUP BY p1; // invalid expression calculate on aggregate column
CREATE MATERIALIZED VIEW mv_3 AS SELECT p1, SUM(v1) FROM t1 GROUP BY p1; // cast v1 as bigint, ok
CREATE MATERIALIZED VIEW mv_4 AS SELECT p1, SUM(abs(v1)) FROM t1 GROUP BY p1; // invalid expression calculate on aggregate column

```
2023-03-28 10:28:29 +08:00
84c6f47e4f [Feature](Nereids) add WinMagic rule to rewrite scalar sub-query to window function (#17968)
refer paper: WinMagic - Subquery Elimination Using Window Aggregation

SQL like TPC-H Q2 and Q17, which contains a correlated sub-query with only one aggregation function output, we can eliminate the sub-query and transform it to window function. For example, TPC-H Q17 is

```sql
select
        sum(l_extendedprice) / 7.0 as avg_yearly
    from
        lineitem,
        part
    where
        p_partkey = l_partkey
        and p_brand = 'Brand#23'
        and p_container = 'MED BOX'
        and l_quantity < (
            select
                0.2 * avg(l_quantity)
            from
                lineitem
            where
                l_partkey = p_partkey
        );
```

we rewrite it to

```sql
select
        sum(l_extendedprice) / 7.0 as avg_yearly
    from (
    select
        l_extendedprice, l_quantity, avg(l_quantity) over(partition by l_partkey) avg_l_quantity
    from 
        lineitem,
        part
    where
        p_partkey = l_partkey
        and p_brand = 'Brand#23'
        and p_container = 'MED BOX' )
    where l_quantity < 0.2 * avg_l_quantity
```

now the rule can only handle: where conjuncts in outer scope contain one sub-query and the conjunct contain sub-query is a comparison-predicate, we will support compound-predicate and more than one conjuncts containing sub-query later.
2023-03-27 23:58:41 +08:00
785e3e3bca [Enhancement](multi catalog) Support hive meta cache TTL (#18102)
Currently, if user modify the file on hdfs directly, no through hive. The changes of file will not be noticed by Doris and user
will get wrong data. Support the TTL(Time-to-Live) config of File Cache, so that the stale file info will be invalidated automatically after expiring.

1.Add a parameter configuration to set file cache ttl. "file.meta.cache.ttl-second".
2.Set the value corresponding to guava expireAfterAccess to the configuration value.

Co-authored-by: lexluo <lexluo@tencent.com>
2023-03-27 19:19:31 +08:00
c8e4684578 [enhancement](nereids)support topN opt in nereids (#17741)
1. support topN opt in nereids
2. pushdown limit->proj->sort
2023-03-27 18:57:56 +08:00
894f38a517 [fix](planner) fix conjunct planned on exchange node (#18042)
sql like: 
select k5, k6, SUM(k3) AS k3 
from ( 
    select
        k5,
        date_format(k6, '%Y-%m-%d') as k6,
        count(distinct k3) as k3 
    from t 
    group by k5, k6
) AS temp where 1=1
group by k5, k6;

will throw exception since conjuncts planned on exchange node, because exchange node cannot handle conjuncts, now we skip exchange node when planning conjuncts, which fixes the bug. 
notice: the bug occurs iff the conjunct is always true like 1=1 above.
2023-03-27 17:50:52 +08:00
902629adb6 [fix](planner) fix targetTypeDef NPE when value is null (#18072)
sql like:
select * from (select *, null as top from v1)t where top = 5;
select * from (select *, null as top from v1)t where top is not null;
will cause NPE because targetTypeDef is null when value is null. Now we use cast target type to the targetTypeDef.
2023-03-27 17:29:14 +08:00
cd85b5b262 [conf](nereids) disable new cost model since it hurts performance (#18127) 2023-03-27 16:12:15 +08:00
da8c53a831 [feat](Nereids): pushdown semijoin through agg. (#18105) 2023-03-27 15:27:44 +08:00
642c378fc7 [feature](table-valued-function) add Backends table-valued-function (#17667)
This pr implement a new Metadata TVF called backends. And the implement process tutorial is in #17974.
2023-03-27 15:18:31 +08:00
1576130094 [ehancement](stats) Tune stats framework (#18118) 2023-03-27 14:38:10 +08:00
dc7b2015f5 eh (#18122) 2023-03-27 11:09:35 +08:00
bcf95cd920 [feature](function)Add ST_Angle_Sphere function (#17919) 2023-03-27 10:14:46 +08:00
c2dd005efb [fix](chore) fix BE compile and FE protoc artifact issue (#18120)
add <optional> head to solve the compilation issue
use 3.12.9 as the protoc.artifact's version, because there is no 3.12.21
See: https://repo.maven.apache.org/maven2/com/google/protobuf/protoc/
Remove --show-progress arguments of wget because it is not supported in low version wget
2023-03-27 08:53:42 +08:00
1027dd52ba [feature](Nereids): Pushdown SemiJoin in RBO. (#18099) 2023-03-26 20:58:43 +08:00