Commit Graph

3759 Commits

Author SHA1 Message Date
36955a6769 [regression-test](dynamic-table) add regression test for dynamic table (#16656) 2023-02-14 00:03:19 +08:00
5014ad03e7 [feature](cooldown) Auto delete unused remote files (#16588) 2023-02-13 23:59:39 +08:00
77a3288ce7 [feature](Nereids) support window function (#14397) 2023-02-13 21:20:56 +08:00
ded698127e [fix](planner) fix bug for missing slot (#16601)
In previous version, if the output slot of analyticExpr is not materialized, the analyticExpr is pruned.
But there are some cases that it cannot be pruned.
For example:

                   SELECT
                        count(*)
                    FROM T1,
                        (SELECT dd
                        FROM (
                            SELECT
                                1.1 as cc,
                                ROW_NUMBER() OVER() as dd
                            FROM T2
                            ) V1
                        ORDER BY cc DESC
                        limit 1
                        ) V2;

 analyticExpr(ROW_NUMBER() OVER() as dd) is not materialized, but we have to generate
 WindowGroup for it.
 tmp.dd is used by upper count(*), we have to generate data for tmp.dd

In this fix, if an inline view only output one column(in this example, the 'dd'), we materialize this column.

TODO:
 In order to prune 'ROW_NUMBER() OVER() as dd', we need to rethink the rule of choosing a column
 for count(*). (refer to SingleNodePlanner.materializeTableResultForCrossJoinOrCountStar)
 V2 can be transformed to
                        
       SELECT cc
        FROM (
            SELECT
                1.1 as cc,
                ROW_NUMBER() OVER() as dd
            FROM T2
            ) V1
        ORDER BY cc DESC
        limit 1
        ) V2;

Except the byte size of cc and dd, we need to consider the cost to generate cc and dd.
2023-02-13 15:27:47 +08:00
77be0d13c3 [BugFix](Load) Add a secure path for MySql Load to load local file from fe node (#16653)
MySql load can load fe server node, but it will cause secure issue that user use it to detect the fe node local file.

For this reason, add a configuration named mysql_load_server_secure_path to set a secure path to load data.

By default, load fe local file feature is disabled by this configuration.
2023-02-13 14:39:51 +08:00
a2b9b9edd7 [fix](planner) fix bug in agg on constant column (#16442)
For performance reason, we want to remove constant column from groupingExprs.
For example:
                `select sum(T.A) from T group by T.B, 'xyz'` is equivalent to `select sum(T.A) from T group by T.B`
We can remove constant column `abc` from groupingExprs.

But there is an exception when all groupingExpr are constant
For example:

                sql1: `select 'abc' from t group by 'abc'`
                 is not equivalent to
                sql2: `select 'abc' from t`

                sql3: `select 'abc', sum(a) from t group by 'abc'`
                 is not equivalent to
                sql4: `select 1, sum(a) from t`
                (when t is empty, sql3 returns 0 tuple, sql4 return 1 tuple)

We need to keep some constant columns if all groupingExpr are constant.

Consider sql5 `select a from (select "abc" as a, 'def' as b) T group by b, a;`
if the constant column `a` is in select list, this column should not be removed.
sql5 is transformed to 
sql6 `select a from (select "abc" as a, 'def' as b) T group by a;`
2023-02-13 11:26:08 +08:00
46dd887ae2 [fix](nereids) make slot binding compatible to original planner (#16612)
SELECT a,2 as a FROM (SELECT '1' as a) b HAVING a=1

in original planner, having clause binding failed. Make Nereids failed too.
2023-02-13 11:14:17 +08:00
f41a2055d3 [feature](Load)Remove user/password in properties for mysql load to avoid double auth. (#16073)
Use FE cluster token to auth stream load.
This auth is only open for be, and fe auth still only support http basic auth.

I will use this auth for mysql load to build a no-auth stream load from fe to be.
And this will avoid double auth in mysql load.
More information to see the design doc.
2023-02-13 10:00:08 +08:00
80c1a99ef6 [enhance](Nereids): refactor JoinReorder code. (#16477)
* [enhance](Nereids): refactor JoinReorder code.

* apply nullable

* checkstyle

* set enableDPHypOptimizer default false
2023-02-13 09:08:58 +08:00
cf739e7496 [Enhancement](Stmt) Set insert_into timeout session variable separately (#16343) 2023-02-12 16:56:10 +08:00
78a958467f [improvement](Load) Make broker load support the properties of trim_double_quotes and skip_lines (#16622)
`trim_double_quotes` and `skip_lines` were supported in stream load.
So make it support broker load too.
2023-02-12 16:52:59 +08:00
4350c98b02 [improve](dynamic-table) change addColumns RPC interface fields from required to optional and and config doc (#16632) 2023-02-11 20:57:10 +08:00
37d1519316 [WIP](dynamic-table) support dynamic schema table (#16335)
Issue Number: close #16351

Dynamic schema table is a special type of table, it's schema change with loading procedure.Now we implemented this feature mainly for semi-structure data such as JSON, since JSON is schema self-described we could extract schema info from the original documents and inference the final type infomation.This speical table could reduce manual schema change operation and easily import semi-structure data and extends it's schema automatically.
2023-02-11 13:37:50 +08:00
e99202754e [UT-Fix](MTMV) Fix MTMV FE UT bugs (#16513) 2023-02-11 11:00:20 +08:00
e6abfed6d1 [fix](dlf) Support DLF by catalog properties and update the doc (#16573)
1. Add default credential provider list
2. Support create DLF catalog from catalog properties
3. Update the doc
2023-02-10 20:43:58 +08:00
f95dc28719 [fix](auth)(meta) fix auto info missing when upgrading from 1.1 to 1.2 (#16595)
When upgrading from 1.1.x to 1.2.x, the ADMIN_PRIV of normal user maybe missing.
This PR fix it
2023-02-10 20:34:56 +08:00
3c3110b253 [Fix](Jdbc Catalog) jdbc catalog support to connect to doris database (#16527)
Doris can use mysql-jdbc-jar to connect doris database, but doris has some data type that mysql without.
Such as DecimalV3 and Date/DatetimeV2
I add some case judgments in `Mysql Catalog` , so that Jdbc catalog can identify the data type of DORIS
2023-02-10 20:24:40 +08:00
3929e8214d [improvement](filecache) Use consistent hash to assign the same scan range into the same backend among different queries (#16574)
When file cache enabled, running the same query for the second time may be still slow, for `FE` will assign the same 
scan range into different backends among different queries, and the former cached data in `BE` will be useless if the scan range is changed.

So, this PR introduce consistent hash to assign the same scan range into the same backend among different queries.
2023-02-10 19:49:33 +08:00
ad141747b4 [fix](inverted index) fix array type inverted index query error (#16582) 2023-02-10 17:57:15 +08:00
43eca4f209 [Feature-WIP](inverted index) Implementation for alter inverted index. (#16371)
implementation for add/drop inverted index.
2023-02-10 17:56:17 +08:00
1f631c388d [enhance](cooldown)accelerate cooldown task produce efficiency (#16089) 2023-02-10 16:58:27 +08:00
c08c643ca0 [fix](test) disable failed ut 'SelectRollupIndexTest#testPreAggHint' temporarily (#16593)
UT 'SelectRollupIndexTest#testPreAggHint' failed caused by #16286
Disable it temporarily to avoid block CI/CD
2023-02-10 16:36:15 +08:00
b99e2dc727 [bug](jdbc) fix jdbc can't get object of PGobject (#16496)
when pg table have some  unsupported column type like: point, polygon, jsonb......
jdbc catalog will convert it to string type in doris. but get result set in java is org.postgresql.util.PGobject
 
Some test need this pr: #16442
2023-02-10 16:19:02 +08:00
ae325f546a [refactor](Nereids): mv AggregateStrategies to implementation rules (#16551) 2023-02-10 14:10:59 +08:00
d9924c9b8e [Improvement](topn) add limit threashold session variable and fuzzy for topn optimizations (#16514)
1. add limit threshold for topn runtime pushdown and key topn optimization
2. use unified session variable topn_opt_limit_threshold for all topn optimizations
3. add fuzzy support for topn_opt_limit_threshold
2023-02-10 12:56:33 +08:00
8758cd412f [feature](auth)Implementing privilege management with rbac model (#16091)
change implement of auth to rbac

each user has one default role which can not be drop;

if you grant priv to user,it will grant to default role ,

In the current pr, the user can still only have one role other than the default role, but in the future, the user and role will be many-to-many

rename PaloRole,PaloAuth,PaloPrivilege to Role,Auth,Privilege
2023-02-10 12:30:49 +08:00
e9cd1d64ed (fix)[multi-catalog][nereids] Reset ExternalFileScanNode required slots after Nereids planner do projection. #16549
The new Nereids planner do column projection after creating scan node. For ExternalFileScanNode, this may cause the columns in required_slots mismatch with the slots after projection. This pr is to reset the required_slots after projection.
2023-02-10 11:28:01 +08:00
1b3902baa2 [Feature](Complex-type) Add struct and map type to Doris (#16444)
This commit support:
1、Insert + select for struct/map type
2、Json stream load for struct type
3、m[key] function for map type

How to use:
Set the fe config to create table for struct and map type
1、admin set frontend config("enable_struct_type" = "true");
2、admin set frontend config("enable_map_type" = "true");

#16547

Co-authored-by: xy720 <xuyang25@baidu.com>
Co-authored-by: amory <wangqiannan@selectdb.com>
Co-authored-by: cambyzju <zhuxiaoli01@baidu.com>
Co-authored-by: hucheng01 <hucheng01@baidu.com>
2023-02-10 11:00:33 +08:00
0c20c607b2 fix stats (#16556) 2023-02-10 11:00:01 +08:00
885fe1516f [refactor](datev2) refine logics of auto conversion (#16552)
* [refactor](datev2) refine logics of auto conversion

* uodate

* update

* Revert "uodate"

This reverts commit 2609a13b4022b4a603bf992fad64c133def266e0.
2023-02-10 10:06:47 +08:00
48780dcea0 [BugFix](cooldown) push correct cooldownttl to be (#16553)
There were cooldownttl and cooldownttlms in StoragePolicy, it's so error-prone because they served nearly the same.
For example, the init function would only assign the ttl timestamp to cooldownttl, which would end up pushing cooldownttl 0 to be.
2023-02-10 08:45:04 +08:00
438daaaf1c [enchancement](mv) forbidden craete useless mv in fe (#16286)
forbidden create useless mv in fe
2023-02-09 23:00:09 +08:00
ab34f418c3 [bugfix](information schema) sometimes fe throw thrift_rpc_error (#16555)
mysql> SELECT TABLE_NAME, CHECK_OPTION, IS_UPDATABLE, SECURITY_TYPE, DEFINER FROM INFORMATION_SCHEMA.VIEWS WHERE TABLE_SCHEMA = 'test' ORDER BY TABLE_NAME ASC;
ERROR 2006 (HY000): MySQL server has gone away
No connection. Trying to reconnect...
Connection id: 0
Current database: *** NONE ***

ERROR 1105 (HY000): RpcException, msg: org.apache.doris.rpc.RpcException: failed to call frontend service/n @ 0x563a2b11b6ea doris::Status::ConstructErrorStatus()
@ 0x563a2bcd638f doris::ThriftRpcHelper::rpc<>()
@ 0x563a2b78b777 doris::SchemaHelper::list_table_status()
@ 0x563a2b7a0972 doris::SchemaViewsScanner::get_new_table()
@ 0x563a2b7a0b00 doris::SchemaViewsScanner::get_next_row()
@ 0x563a2ccd0c93 doris::vectorized::VSchemaScanNode::get_next()
@ 0x563a2b7450d6
---------

Co-authored-by: yiguolei <yiguolei@gmail.com>
2023-02-09 21:49:33 +08:00
05ed1f751b [fix](planner)(Nereids) add date and datev2 signature to greatest and least function (#16565) 2023-02-09 21:36:53 +08:00
ab4c718478 [fix](iceberg) remove s3 default temporary credentials #16543
remove TemporaryAWSCredentialsProvider in global s3 source

Co-authored-by: jinzhe <jinzhe@selectdb.com>
2023-02-09 15:36:35 +08:00
ba4b6aa0c0 [hot-fix](cooldown) Fix unknown module cooldownJob when load fe image #16545 2023-02-09 15:36:01 +08:00
4b093d1ef6 [Bug](point query) when prepared statement used lazyEvaluateRangeLocations should clear bucketSeq2locations to avoid memleak (#16531)
When JDBC client enable server side prepared statement, it will cache OlapScanNode and reuse it for performance, but each
time call `addScanRangeLocations` will add new item to `bucketSeq2locations`, so the `bucketSeq2locations` lead to a memleak if OlapScanNode
cached in memory
2023-02-09 14:41:07 +08:00
531616b8ee [Fix](bucket)fix partition with no history data && AutoBucketUtilsTest (#16516)
fix partition with no history data && AutoBucketUtilsTest (#16515)
2023-02-09 10:17:25 +08:00
e1f1386395 [fix](cooldown) Rewrite update cooldown conf (#16488)
Remove error-prone CooldownJob, and use CooldownConfHandler to update Tablet's cooldown conf.
Some bug fix about cooldown.
2023-02-09 09:12:55 +08:00
d1c6b81140 [Bug](log) add some log to find out bug (#16518) 2023-02-08 21:23:02 +08:00
f0b0eedbc5 [fix](planner)group_concat lost order by info in second phase merge agg (#16479) 2023-02-08 20:48:52 +08:00
a512469537 [fix](planner) cannot process more than one subquery in disjunct (#16506)
before this PR, Doris cannot process sql like that
```sql
CREATE TABLE `test_sq_dj1` (
    `c1` int(11) NULL,
    `c2` int(11) NULL,
    `c3` int(11) NULL
) ENGINE=OLAP
DUPLICATE KEY(`c1`)
COMMENT 'OLAP'
DISTRIBUTED BY HASH(`c1`) BUCKETS 3
PROPERTIES (
    "replication_allocation" = "tag.location.default: 1",
    "in_memory" = "false",
    "storage_format" = "V2",
    "disable_auto_compaction" = "false"
);

CREATE TABLE `test_sq_dj2` (
    `c1` int(11) NULL,
    `c2` int(11) NULL,
    `c3` int(11) NULL
) ENGINE=OLAP
DUPLICATE KEY(`c1`)
COMMENT 'OLAP'
DISTRIBUTED BY HASH(`c1`) BUCKETS 3
PROPERTIES (
    "replication_allocation" = "tag.location.default: 1",
    "in_memory" = "false",
    "storage_format" = "V2",
    "disable_auto_compaction" = "false"
);

insert into test_sq_dj1 values(1, 2, 3), (10, 20, 30), (100, 200, 300);
insert into test_sq_dj2 values(10, 20, 30);

-- core
SELECT * FROM test_sq_dj1 WHERE c1 IN (SELECT c1 FROM test_sq_dj2) OR c1 IN (SELECT c1 FROM test_sq_dj2) OR c1 < 10;

-- invalid slot
SELECT * FROM test_sq_dj1 WHERE c1 IN (SELECT c1 FROM test_sq_dj2) OR c1 IN (SELECT c2 FROM test_sq_dj2) OR c1 < 10;
```

there are two problems:
1. we should remove redundant sub-query in one conjuncts to avoid generate useless join node
2. when we have more than one sub-query in one disjunct. we should put the conjunct contains the disjunct at the top node of the set of mark join nodes. And pop up the mark slot to the top node.
2023-02-08 18:46:06 +08:00
bb334de00f [enhancement](load) Change transaction limit from global level to db level (#15830)
Add transaction size quota for database

Co-authored-by: wuhangze <wuhangze@jd.com>
2023-02-08 18:04:26 +08:00
666f7096f2 [Fix](multi catalog)(planner) Fix external table statistic collection bug (#16486)
Add index id to column statistic id. Refresh statistic cache after analyze.
2023-02-08 16:51:30 +08:00
b06e6b25c9 [improvement](fuzzy) print fuzzy session variable in FE audit log (#16493)
* [improvement](fuzzy) print fuzzy session variable in FE audit log
2023-02-08 16:38:04 +08:00
e11437d1fe [fix](planner) npe in RewriteBinaryPredicatesRule (#16401)
RewriteBinaryPredicatesRule rewrite expression like 
`cast(A decimal) > decimal` to `A > some_other_bigint`
in order to:
1. push down the rewrite predicate 
2. avoid convert column A to decimal

We get the datatype of `A` by `expr0.getSrcSlotRef().getColumn().getType()`.
However, when A is result of a function from sub-query, this rule is not applicable.
For example:
```
select * 
from (
       select TIMESTAMPDIFF(MINUTE,startTime,endTime) AS timediff 
        from CNC_SliceSate) T  
where timediff > 5.0;
```
we cannot push predicate down to OlapScan(CNC_SliceSate) to save effort.
2023-02-08 15:57:35 +08:00
2883f67042 [fix](iceberg) update iceberg docs and add credential properties (#16429)
Update iceberg docs
Add new s3 credential and properties
2023-02-08 13:53:01 +08:00
41947c73eb [Feature](array-function) Support array functions for nested type datev2 and datetimev2 (#16382) 2023-02-08 12:51:07 +08:00
98c741d664 [fix](Nereids): FilterOrSelf shouldn't And all predicates.. (#16491) 2023-02-08 12:42:22 +08:00
583001bd92 [Bug](share hash table) Support shared hash table on Nereids (#16474) 2023-02-08 11:51:27 +08:00