Commit Graph

8289 Commits

Author SHA1 Message Date
b35998a3b7 [Bug](datetimev2) Support cast datetimev2 to datetimev2 with different precision #16826 2023-02-17 08:42:36 +08:00
4c7f19ab02 [enhancement](nereids) add eliminate left nullaware anti join rule (#16774)
if no join conjunct is nullable, the left null aware anti join can be converted to left anti join
2023-02-16 21:54:14 +08:00
407ccaaff7 [FIx](planner) create table as select with null_type select item cause be core bug (#16778)
sql: create table t as select null as k will cause be core sometime.
now we change it null_type to tinyint nullable to avoid it.
2023-02-16 20:01:13 +08:00
292926e5aa [Fix](multi catalog)Fix partition case bug (#16763)
Set column names from path to lower case in case-insensitive case.
This is for Iceberg columns from path. Iceberg columns are case sensitive,
which may cause error for table with partitions.
2023-02-16 15:47:23 +08:00
fa052b1a87 [fix](Stmt)pre-block create stmt with column type ALL (#16757) 2023-02-16 15:05:13 +08:00
105a4fb41a [regression](fuzzy) Make pipeline engine fuzzy test mode (#16807) 2023-02-16 15:02:27 +08:00
b6f2dfa994 [test](Nereids) add not nullable test for scalar functions (#16498) 2023-02-16 11:57:19 +08:00
0bb6005143 [Improvement](thrift) optimize thrift messages (#16383)
Now we use a thrift message per fragment instance. However, there are many same messages between instances in a fragment. So this PR aims to extract the same messages and we only need to send thrift message once for a fragment
2023-02-16 11:07:46 +08:00
118ce9cb16 [Enhance](ComputeNode) change logic of BeSelectionPolicy.getCandidateBackends (#16737)
The previous logic is how many cn can be returned at most. Instead,
if the number of cn is less than expectBeNum, need to use mix to fill in,
until the number of cn equals with expectBeNum or mix nodes are also used up
2023-02-16 10:31:24 +08:00
958aee38e9 [fix](Nereids): fix Master Bors problem. (#16794) 2023-02-16 01:53:53 +08:00
ecadd4b392 [feature](Nereids): add OuterJoinAssoc rule (#16676)
* move isIntersecting.

* [feature](Nereids): add OuterJoinAssoc rule

* fix bug

* fix
2023-02-15 19:19:28 +08:00
Pxl
f4ed52906a [Feature](Materialized-View) change mv rewrite from bottom up to up bottom && Compatible with old … (#16750)
1.change mv rewrite from bottom up to up bottom
2.compatible with old version mv
3.restore some ut codes (but disable)
4. fix some ut introduced by [fix](planner)fix bug for missing slot #16601 and [Feature](Materialized-View) support multiple slot on one column in materialized view #16378
2023-02-15 17:24:46 +08:00
0c56a4622c [Feature](struct-type) Add implicitly cast for struct-type (#16613)
Currently not support insert {1, 'a'} into struct<f1:tinyint, f2:varchar(20)>
This commit will support implicitly cast the char type in the struct to varchar.
Add implicitly cast for struct-type.
2023-02-15 16:55:00 +08:00
a6bda81dba [Fix](profile) fix /query_profile action. (#16540)
Co-authored-by: wangxiangyu@360shuke.com <wangxiangyu@360shuke.com>
2023-02-15 14:27:21 +08:00
ad46e529d8 [feature](Nereids): Infer isNotNull from filter and eliminate OuterJoin (#16411) 2023-02-15 13:33:21 +08:00
13134c1bfe [fix](fe)should check slot from both lhs and rhs of outputSmap of join node for colocate join (#16738)
colocated join is depended on if the both side of the join conjuncts are simple column with same distribution policy etc. So the key is to figure out the original source column in scan node if there is one. To do that, we should check the slot from both lhs and rhs of outputSmap in join node.
2023-02-15 12:44:20 +08:00
69c70d27bd [Refactor](auth) Add AccessController to support customized authorization (#16679)
In current implementation, the class Auth is used for:

Manager all authentication and authorization info such as user, role, password, privileges.
Provide an interface for privilege checking
Some user may want to integrate external access management system such as Apache Ranger.
So we should provide a way to let user set their own access controller.

This PR mainly changes:

A new class SystemAccessController
This access controller is used to check the global level privileges and resource privileges.

A new interface CatalogAccessController
This interface is used to check catalog/database/tbl level privileges.
It has a default implements InternalCatalogAccessController.

All privilege checking methods are moved from Auth to either SystemAccessController or
InternalCatalogAccessController

A new class AccessControllerManager
This is the entry point of privilege authentication. All methods previously called from Auth
now are called from AccessControllerManager

Now, user can implement the interface CatalogAccessController to use their own access controller.
And when creating external catalog, user can specified the access controller class name, so that
different external catalog can use different access controller.
2023-02-15 11:40:44 +08:00
db9319b881 [refactor](Nereids) add two phase sort (#16586)
1. Add a rule that generates two-phase sort and one-phase sort
2. Add phase for PhysicalSort

TODO: I'll remove PhysicalLocalSort in next PR.
2023-02-15 10:40:57 +08:00
d013d529c8 [Feature](ipv6)Support IPV6 (#14063)
Support IPV6 in Apache Doris, the main changes are:
1. enable binding to IPV6 address if network priority in config file contains an IPV6 CIDR string
2. BRPC and HTTP support binding to IPV6 address
3. BRPC and HTTP support visiting IPV6 Services
2023-02-14 21:43:10 +08:00
acf5540a9f [fix](planner)Fix colocate query failed #16459
Issue Number: close #16458
Co-authored-by: wangbo36@meituan.com <wangbo36@meituan.com>
2023-02-14 18:51:28 +08:00
4444abc828 avoid contruct groupExpr in graph-simplifier (#16436)
Signed-off-by: xiejiann <jianxie0@gmail.com>
2023-02-14 17:03:21 +08:00
Pxl
ea78184551 [Feature](Materialized-View) support multiple slot on one column in materialized view (#16378) 2023-02-14 16:10:50 +08:00
5e80823c86 [improvement](dynamic-partition) add storage_medium property for dynamic partition (#16648) 2023-02-14 15:14:52 +08:00
0d9714b179 [Fix](multi catalog)Support read hive1.x orc file. (#16677)
Hive 1.x may write orc file with internal column name (_col0, _col1, _col2...).
This will cause query result be NULL because column name in orc file doesn't match
with column name in Doris table schema. This pr is to support query Hive orc files with internal column names. 

For now, we haven't see any problem in Parquet file, will send new pr to fix parquet if any problem show up in the future.
2023-02-14 14:32:27 +08:00
af5dc7565e [bug](udf) fix udf return type of decimal check scale must is 9 (#16497) 2023-02-14 10:53:53 +08:00
bceb0b58a1 [fix](udf) fix create udf function with uppercase database name can't recognize (#16410) 2023-02-14 10:52:11 +08:00
69d3878d9b [Bug](CTAS): Ctas rollback ignore some case (#16255)
Currently, some error are caught due to table can not drop when execute ctas,
I add a session variable to control drop or not table.
2023-02-14 09:19:37 +08:00
de85c57715 [Improve](point query) support retry different backends in PointQueryExecutor (#16380) 2023-02-14 07:31:31 +08:00
f3ab55d27d [Optimization](index) Optimization for no need to read raw data for index column that only in where clause (#16569) 2023-02-14 00:12:45 +08:00
90af1b0113 [fix](subquery) fix bug of using constexpr and some agg func(like count,max) as subquery's output (#16579)
Signed-off-by: nextdreamblue <zxw520blue1@163.com>
2023-02-14 00:11:56 +08:00
36955a6769 [regression-test](dynamic-table) add regression test for dynamic table (#16656) 2023-02-14 00:03:19 +08:00
5014ad03e7 [feature](cooldown) Auto delete unused remote files (#16588) 2023-02-13 23:59:39 +08:00
77a3288ce7 [feature](Nereids) support window function (#14397) 2023-02-13 21:20:56 +08:00
ded698127e [fix](planner) fix bug for missing slot (#16601)
In previous version, if the output slot of analyticExpr is not materialized, the analyticExpr is pruned.
But there are some cases that it cannot be pruned.
For example:

                   SELECT
                        count(*)
                    FROM T1,
                        (SELECT dd
                        FROM (
                            SELECT
                                1.1 as cc,
                                ROW_NUMBER() OVER() as dd
                            FROM T2
                            ) V1
                        ORDER BY cc DESC
                        limit 1
                        ) V2;

 analyticExpr(ROW_NUMBER() OVER() as dd) is not materialized, but we have to generate
 WindowGroup for it.
 tmp.dd is used by upper count(*), we have to generate data for tmp.dd

In this fix, if an inline view only output one column(in this example, the 'dd'), we materialize this column.

TODO:
 In order to prune 'ROW_NUMBER() OVER() as dd', we need to rethink the rule of choosing a column
 for count(*). (refer to SingleNodePlanner.materializeTableResultForCrossJoinOrCountStar)
 V2 can be transformed to
                        
       SELECT cc
        FROM (
            SELECT
                1.1 as cc,
                ROW_NUMBER() OVER() as dd
            FROM T2
            ) V1
        ORDER BY cc DESC
        limit 1
        ) V2;

Except the byte size of cc and dd, we need to consider the cost to generate cc and dd.
2023-02-13 15:27:47 +08:00
77be0d13c3 [BugFix](Load) Add a secure path for MySql Load to load local file from fe node (#16653)
MySql load can load fe server node, but it will cause secure issue that user use it to detect the fe node local file.

For this reason, add a configuration named mysql_load_server_secure_path to set a secure path to load data.

By default, load fe local file feature is disabled by this configuration.
2023-02-13 14:39:51 +08:00
a2b9b9edd7 [fix](planner) fix bug in agg on constant column (#16442)
For performance reason, we want to remove constant column from groupingExprs.
For example:
                `select sum(T.A) from T group by T.B, 'xyz'` is equivalent to `select sum(T.A) from T group by T.B`
We can remove constant column `abc` from groupingExprs.

But there is an exception when all groupingExpr are constant
For example:

                sql1: `select 'abc' from t group by 'abc'`
                 is not equivalent to
                sql2: `select 'abc' from t`

                sql3: `select 'abc', sum(a) from t group by 'abc'`
                 is not equivalent to
                sql4: `select 1, sum(a) from t`
                (when t is empty, sql3 returns 0 tuple, sql4 return 1 tuple)

We need to keep some constant columns if all groupingExpr are constant.

Consider sql5 `select a from (select "abc" as a, 'def' as b) T group by b, a;`
if the constant column `a` is in select list, this column should not be removed.
sql5 is transformed to 
sql6 `select a from (select "abc" as a, 'def' as b) T group by a;`
2023-02-13 11:26:08 +08:00
46dd887ae2 [fix](nereids) make slot binding compatible to original planner (#16612)
SELECT a,2 as a FROM (SELECT '1' as a) b HAVING a=1

in original planner, having clause binding failed. Make Nereids failed too.
2023-02-13 11:14:17 +08:00
f41a2055d3 [feature](Load)Remove user/password in properties for mysql load to avoid double auth. (#16073)
Use FE cluster token to auth stream load.
This auth is only open for be, and fe auth still only support http basic auth.

I will use this auth for mysql load to build a no-auth stream load from fe to be.
And this will avoid double auth in mysql load.
More information to see the design doc.
2023-02-13 10:00:08 +08:00
80c1a99ef6 [enhance](Nereids): refactor JoinReorder code. (#16477)
* [enhance](Nereids): refactor JoinReorder code.

* apply nullable

* checkstyle

* set enableDPHypOptimizer default false
2023-02-13 09:08:58 +08:00
cf739e7496 [Enhancement](Stmt) Set insert_into timeout session variable separately (#16343) 2023-02-12 16:56:10 +08:00
78a958467f [improvement](Load) Make broker load support the properties of trim_double_quotes and skip_lines (#16622)
`trim_double_quotes` and `skip_lines` were supported in stream load.
So make it support broker load too.
2023-02-12 16:52:59 +08:00
4350c98b02 [improve](dynamic-table) change addColumns RPC interface fields from required to optional and and config doc (#16632) 2023-02-11 20:57:10 +08:00
37d1519316 [WIP](dynamic-table) support dynamic schema table (#16335)
Issue Number: close #16351

Dynamic schema table is a special type of table, it's schema change with loading procedure.Now we implemented this feature mainly for semi-structure data such as JSON, since JSON is schema self-described we could extract schema info from the original documents and inference the final type infomation.This speical table could reduce manual schema change operation and easily import semi-structure data and extends it's schema automatically.
2023-02-11 13:37:50 +08:00
e99202754e [UT-Fix](MTMV) Fix MTMV FE UT bugs (#16513) 2023-02-11 11:00:20 +08:00
e6abfed6d1 [fix](dlf) Support DLF by catalog properties and update the doc (#16573)
1. Add default credential provider list
2. Support create DLF catalog from catalog properties
3. Update the doc
2023-02-10 20:43:58 +08:00
f95dc28719 [fix](auth)(meta) fix auto info missing when upgrading from 1.1 to 1.2 (#16595)
When upgrading from 1.1.x to 1.2.x, the ADMIN_PRIV of normal user maybe missing.
This PR fix it
2023-02-10 20:34:56 +08:00
3c3110b253 [Fix](Jdbc Catalog) jdbc catalog support to connect to doris database (#16527)
Doris can use mysql-jdbc-jar to connect doris database, but doris has some data type that mysql without.
Such as DecimalV3 and Date/DatetimeV2
I add some case judgments in `Mysql Catalog` , so that Jdbc catalog can identify the data type of DORIS
2023-02-10 20:24:40 +08:00
3929e8214d [improvement](filecache) Use consistent hash to assign the same scan range into the same backend among different queries (#16574)
When file cache enabled, running the same query for the second time may be still slow, for `FE` will assign the same 
scan range into different backends among different queries, and the former cached data in `BE` will be useless if the scan range is changed.

So, this PR introduce consistent hash to assign the same scan range into the same backend among different queries.
2023-02-10 19:49:33 +08:00
ad141747b4 [fix](inverted index) fix array type inverted index query error (#16582) 2023-02-10 17:57:15 +08:00
43eca4f209 [Feature-WIP](inverted index) Implementation for alter inverted index. (#16371)
implementation for add/drop inverted index.
2023-02-10 17:56:17 +08:00