Commit Graph

3803 Commits

Author SHA1 Message Date
97230a54fb [Refactor](auth)(step-2) Add AccessController to support customized authorization (#16802)
Support specifying AccessControllerFactory when creating catalog

create catalog hive properties(
...
"access_controller.class" = "org.apache.doris.mysql.privilege.RangerAccessControllerFactory",
"access_controller.properties.prop1" = "xxx",
"access_controller.properties.prop2" = "yyy",
...
)
So that user can specified their own access controller, such as RangerAccessController

Add interface to check column level privilege

A new method of CatalogAccessController: checkColsPriv(),
for checking column level privileges.

TODO:
Support grant column level privileges statements in Doris

Add TestExternalCatalog/Database/Table/ScanNode

These classes are used for FE unit test. In unit test you can

create catalog test1 properties(
    "type" = "test"
    "catalog_provider.class" = "org.apache.doris.datasource.ColumnPrivTest$MockedCatalogProvider"
    "access_controller.class" = "org.apache.doris.mysql.privilege.TestAccessControllerFactory",
    "access_controller.properties.key1" = "val1",
    "access_controller.properties.key2" = "val2"
);
To create a test catalog, and specify catalog_provider to mock database/table/schema metadata

Set roles in current user identity in connection context

The roles can be used for authorization in access controller.
2023-02-20 10:32:48 +08:00
5291f14aff [vectorized](udf) java udf support array type (#16841) 2023-02-20 10:00:25 +08:00
1c6c28b8fb [Enhance](ComputeNode) K8sDeployManager support domain (#16897)
Describe your changes.
1.DeployManager adds the ability to obtain domain names from third-party systems

2.When the DeployManager determines whether the node exists, add the domain name judgment logic

3.rename Backend.getHost() to getIp() 

4.Delete the logic for handling UnknownHostException in FQDNManager, because there are two cases of UnknownHostException. If it occurs temporarily, it can wait for the next detection. If the node is deleted, the logic can be handed over to DeployManager for processing.
2023-02-19 21:30:18 +08:00
73f7979b73 [fix](struct-type) forbid struct-type to be distributed key/aggregation key and add more tests (#16626)
This commits forbid struct and map type to be distributed key/aggregation key.

The sql such as:

select distinct stuct_col from struct_table

will report an error.
2023-02-19 15:16:36 +08:00
96a3c60d3b [feature-wip](MTMV) Support alter statement (#16817)
Steps:
1. drop the old MTMV jobs
2. clear the old task records and clean the running and pending tasks
3. set the new scheduler info in MTMV and replay it in followers.
4. create a job in the master node.

Note that if you change the refresh info of MTMV, the old MTMV tasks will be cleaned.
2023-02-19 12:15:17 +08:00
d4cebb39ba [fix](Nereids): fix SemiJoinLogicalJoinTransposeProject. (#16883) 2023-02-18 23:12:34 +08:00
e2e6a0dd83 [Feature](load) Support mutable property for partition (#16036)
The background is described in this issue: #15723,
where users used Apache Druid to satisfy such lambada requirements before.
We will not make Doris dropping data not belonged to current time window automatically like Druid,
which is not flexible. We demand a ability to support mutable/immutable partition, the PR works this way:

1. Support mutable property for a partition.
2. The mutable property of a partition is passed from FE to BE in a load procedure
3. If a record's partition is immutable, we mark this row as "un selected" which will not be included in computation of 'max_filter_ratio',
   so that data write to immutable partition will be neglected and not cause load failure.

Use Example:

1. Add immutable partition or modify an partition to be immutable:
- alter table test_tbl add [temporary] partition xxx values less than ('xxx') ('mutable' = 'true');
- alter table test_tbl modify partition xx set ('mutable' = 'false');

2. Write 5 records into table, two of then belongs to immutable partition
2023-02-18 23:09:34 +08:00
45427b86be [regression](struct-type) add more regression tests for struct and map type (#16790)
This commit forbid struct and map column in Materialized view and add more regression tests.
2023-02-18 20:42:17 +08:00
861e4bc64a [fix](planner) Nullable of slot descriptor is mistaken and cause BE crash #16862 2023-02-18 20:39:56 +08:00
070f42c463 [Enhancement](Es): Support config like whether push down to es (#16800)
Support config like whether push down to es and refactor some code
Like transform to wildcard query and push down to es, this increases the cpu consumption of the es,
I add a switch control it.
2023-02-17 21:56:11 +08:00
fd5d7d6097 [refactor](Nereids) remove local sort (#16819)
After adding phase in sort, the locatSort is no longer needed
change the order of sortPhase in constructor
2023-02-17 18:52:41 +08:00
6a1e3d3435 [fix](cooldown)Fix bug for single cooldown compaction, add remote meta (#16812)
* fix bug, add remote meta for compaction

* fix bug, add remote meta for compaction

* fix bug, add remote meta for compaction

* fix bug, add remote meta for compaction

* fix bug, add remote meta for compaction

* fix bug, add remote meta for compaction

* fix bug, add remote meta for compaction

* fix bug, add remote meta for compaction

* fix bug, add remote meta for compaction

* fix bug, add remote meta for compaction

* fix bug, add remote meta for compaction

* fix bug, add remote meta for compaction

* fix bug, add remote meta for compaction

* fix bug, add remote meta for compaction

* fix bug, add remote meta for compaction

* fix bug, add remote meta for compaction

* fix bug, add remote meta for compaction

* fix bug, add remote meta for compaction

* fix bug, add remote meta for compaction

* fix bug, add remote meta for compaction

* fix bug, add remote meta for compaction

* fix bug, add remote meta for compaction

* fix bug, add remote meta for compaction

* fix bug, add remote meta for compaction

* fix bug, add remote meta for compaction

* fix bug, add remote meta for compaction
2023-02-17 15:13:06 +08:00
6acee1ce88 [Fix](topn opt) double check plan From OriginalPlanner to make sure optimized SQL is a general topn query (#16848)
From the original logic, query like `select * from a where exists (select * from b order by 1) order by 1 limit 1` is a query contains subquery,
but the top query will pass `checkEnableTwoPhaseRead` and set `isTwoPhaseOptEnabled=true`.So check the double plan is a general topn query plan is needed, and rollback the needMaterialize flag setted by the previous `analyze`.
2023-02-17 10:59:35 +08:00
1fc5023d97 [Enhance](ComputeNode) K8sDeployManager support computeNode (#16789)
1.allow have no ELECTABLE or BACKEND
2.add cn NodeType
3.delete deprecated code
2023-02-17 09:08:14 +08:00
b35998a3b7 [Bug](datetimev2) Support cast datetimev2 to datetimev2 with different precision #16826 2023-02-17 08:42:36 +08:00
4c7f19ab02 [enhancement](nereids) add eliminate left nullaware anti join rule (#16774)
if no join conjunct is nullable, the left null aware anti join can be converted to left anti join
2023-02-16 21:54:14 +08:00
407ccaaff7 [FIx](planner) create table as select with null_type select item cause be core bug (#16778)
sql: create table t as select null as k will cause be core sometime.
now we change it null_type to tinyint nullable to avoid it.
2023-02-16 20:01:13 +08:00
292926e5aa [Fix](multi catalog)Fix partition case bug (#16763)
Set column names from path to lower case in case-insensitive case.
This is for Iceberg columns from path. Iceberg columns are case sensitive,
which may cause error for table with partitions.
2023-02-16 15:47:23 +08:00
fa052b1a87 [fix](Stmt)pre-block create stmt with column type ALL (#16757) 2023-02-16 15:05:13 +08:00
105a4fb41a [regression](fuzzy) Make pipeline engine fuzzy test mode (#16807) 2023-02-16 15:02:27 +08:00
b6f2dfa994 [test](Nereids) add not nullable test for scalar functions (#16498) 2023-02-16 11:57:19 +08:00
0bb6005143 [Improvement](thrift) optimize thrift messages (#16383)
Now we use a thrift message per fragment instance. However, there are many same messages between instances in a fragment. So this PR aims to extract the same messages and we only need to send thrift message once for a fragment
2023-02-16 11:07:46 +08:00
118ce9cb16 [Enhance](ComputeNode) change logic of BeSelectionPolicy.getCandidateBackends (#16737)
The previous logic is how many cn can be returned at most. Instead,
if the number of cn is less than expectBeNum, need to use mix to fill in,
until the number of cn equals with expectBeNum or mix nodes are also used up
2023-02-16 10:31:24 +08:00
958aee38e9 [fix](Nereids): fix Master Bors problem. (#16794) 2023-02-16 01:53:53 +08:00
ecadd4b392 [feature](Nereids): add OuterJoinAssoc rule (#16676)
* move isIntersecting.

* [feature](Nereids): add OuterJoinAssoc rule

* fix bug

* fix
2023-02-15 19:19:28 +08:00
Pxl
f4ed52906a [Feature](Materialized-View) change mv rewrite from bottom up to up bottom && Compatible with old … (#16750)
1.change mv rewrite from bottom up to up bottom
2.compatible with old version mv
3.restore some ut codes (but disable)
4. fix some ut introduced by [fix](planner)fix bug for missing slot #16601 and [Feature](Materialized-View) support multiple slot on one column in materialized view #16378
2023-02-15 17:24:46 +08:00
0c56a4622c [Feature](struct-type) Add implicitly cast for struct-type (#16613)
Currently not support insert {1, 'a'} into struct<f1:tinyint, f2:varchar(20)>
This commit will support implicitly cast the char type in the struct to varchar.
Add implicitly cast for struct-type.
2023-02-15 16:55:00 +08:00
a6bda81dba [Fix](profile) fix /query_profile action. (#16540)
Co-authored-by: wangxiangyu@360shuke.com <wangxiangyu@360shuke.com>
2023-02-15 14:27:21 +08:00
ad46e529d8 [feature](Nereids): Infer isNotNull from filter and eliminate OuterJoin (#16411) 2023-02-15 13:33:21 +08:00
13134c1bfe [fix](fe)should check slot from both lhs and rhs of outputSmap of join node for colocate join (#16738)
colocated join is depended on if the both side of the join conjuncts are simple column with same distribution policy etc. So the key is to figure out the original source column in scan node if there is one. To do that, we should check the slot from both lhs and rhs of outputSmap in join node.
2023-02-15 12:44:20 +08:00
69c70d27bd [Refactor](auth) Add AccessController to support customized authorization (#16679)
In current implementation, the class Auth is used for:

Manager all authentication and authorization info such as user, role, password, privileges.
Provide an interface for privilege checking
Some user may want to integrate external access management system such as Apache Ranger.
So we should provide a way to let user set their own access controller.

This PR mainly changes:

A new class SystemAccessController
This access controller is used to check the global level privileges and resource privileges.

A new interface CatalogAccessController
This interface is used to check catalog/database/tbl level privileges.
It has a default implements InternalCatalogAccessController.

All privilege checking methods are moved from Auth to either SystemAccessController or
InternalCatalogAccessController

A new class AccessControllerManager
This is the entry point of privilege authentication. All methods previously called from Auth
now are called from AccessControllerManager

Now, user can implement the interface CatalogAccessController to use their own access controller.
And when creating external catalog, user can specified the access controller class name, so that
different external catalog can use different access controller.
2023-02-15 11:40:44 +08:00
db9319b881 [refactor](Nereids) add two phase sort (#16586)
1. Add a rule that generates two-phase sort and one-phase sort
2. Add phase for PhysicalSort

TODO: I'll remove PhysicalLocalSort in next PR.
2023-02-15 10:40:57 +08:00
d013d529c8 [Feature](ipv6)Support IPV6 (#14063)
Support IPV6 in Apache Doris, the main changes are:
1. enable binding to IPV6 address if network priority in config file contains an IPV6 CIDR string
2. BRPC and HTTP support binding to IPV6 address
3. BRPC and HTTP support visiting IPV6 Services
2023-02-14 21:43:10 +08:00
acf5540a9f [fix](planner)Fix colocate query failed #16459
Issue Number: close #16458
Co-authored-by: wangbo36@meituan.com <wangbo36@meituan.com>
2023-02-14 18:51:28 +08:00
4444abc828 avoid contruct groupExpr in graph-simplifier (#16436)
Signed-off-by: xiejiann <jianxie0@gmail.com>
2023-02-14 17:03:21 +08:00
Pxl
ea78184551 [Feature](Materialized-View) support multiple slot on one column in materialized view (#16378) 2023-02-14 16:10:50 +08:00
5e80823c86 [improvement](dynamic-partition) add storage_medium property for dynamic partition (#16648) 2023-02-14 15:14:52 +08:00
0d9714b179 [Fix](multi catalog)Support read hive1.x orc file. (#16677)
Hive 1.x may write orc file with internal column name (_col0, _col1, _col2...).
This will cause query result be NULL because column name in orc file doesn't match
with column name in Doris table schema. This pr is to support query Hive orc files with internal column names. 

For now, we haven't see any problem in Parquet file, will send new pr to fix parquet if any problem show up in the future.
2023-02-14 14:32:27 +08:00
af5dc7565e [bug](udf) fix udf return type of decimal check scale must is 9 (#16497) 2023-02-14 10:53:53 +08:00
bceb0b58a1 [fix](udf) fix create udf function with uppercase database name can't recognize (#16410) 2023-02-14 10:52:11 +08:00
69d3878d9b [Bug](CTAS): Ctas rollback ignore some case (#16255)
Currently, some error are caught due to table can not drop when execute ctas,
I add a session variable to control drop or not table.
2023-02-14 09:19:37 +08:00
de85c57715 [Improve](point query) support retry different backends in PointQueryExecutor (#16380) 2023-02-14 07:31:31 +08:00
f3ab55d27d [Optimization](index) Optimization for no need to read raw data for index column that only in where clause (#16569) 2023-02-14 00:12:45 +08:00
90af1b0113 [fix](subquery) fix bug of using constexpr and some agg func(like count,max) as subquery's output (#16579)
Signed-off-by: nextdreamblue <zxw520blue1@163.com>
2023-02-14 00:11:56 +08:00
36955a6769 [regression-test](dynamic-table) add regression test for dynamic table (#16656) 2023-02-14 00:03:19 +08:00
5014ad03e7 [feature](cooldown) Auto delete unused remote files (#16588) 2023-02-13 23:59:39 +08:00
77a3288ce7 [feature](Nereids) support window function (#14397) 2023-02-13 21:20:56 +08:00
ded698127e [fix](planner) fix bug for missing slot (#16601)
In previous version, if the output slot of analyticExpr is not materialized, the analyticExpr is pruned.
But there are some cases that it cannot be pruned.
For example:

                   SELECT
                        count(*)
                    FROM T1,
                        (SELECT dd
                        FROM (
                            SELECT
                                1.1 as cc,
                                ROW_NUMBER() OVER() as dd
                            FROM T2
                            ) V1
                        ORDER BY cc DESC
                        limit 1
                        ) V2;

 analyticExpr(ROW_NUMBER() OVER() as dd) is not materialized, but we have to generate
 WindowGroup for it.
 tmp.dd is used by upper count(*), we have to generate data for tmp.dd

In this fix, if an inline view only output one column(in this example, the 'dd'), we materialize this column.

TODO:
 In order to prune 'ROW_NUMBER() OVER() as dd', we need to rethink the rule of choosing a column
 for count(*). (refer to SingleNodePlanner.materializeTableResultForCrossJoinOrCountStar)
 V2 can be transformed to
                        
       SELECT cc
        FROM (
            SELECT
                1.1 as cc,
                ROW_NUMBER() OVER() as dd
            FROM T2
            ) V1
        ORDER BY cc DESC
        limit 1
        ) V2;

Except the byte size of cc and dd, we need to consider the cost to generate cc and dd.
2023-02-13 15:27:47 +08:00
77be0d13c3 [BugFix](Load) Add a secure path for MySql Load to load local file from fe node (#16653)
MySql load can load fe server node, but it will cause secure issue that user use it to detect the fe node local file.

For this reason, add a configuration named mysql_load_server_secure_path to set a secure path to load data.

By default, load fe local file feature is disabled by this configuration.
2023-02-13 14:39:51 +08:00
a2b9b9edd7 [fix](planner) fix bug in agg on constant column (#16442)
For performance reason, we want to remove constant column from groupingExprs.
For example:
                `select sum(T.A) from T group by T.B, 'xyz'` is equivalent to `select sum(T.A) from T group by T.B`
We can remove constant column `abc` from groupingExprs.

But there is an exception when all groupingExpr are constant
For example:

                sql1: `select 'abc' from t group by 'abc'`
                 is not equivalent to
                sql2: `select 'abc' from t`

                sql3: `select 'abc', sum(a) from t group by 'abc'`
                 is not equivalent to
                sql4: `select 1, sum(a) from t`
                (when t is empty, sql3 returns 0 tuple, sql4 return 1 tuple)

We need to keep some constant columns if all groupingExpr are constant.

Consider sql5 `select a from (select "abc" as a, 'def' as b) T group by b, a;`
if the constant column `a` is in select list, this column should not be removed.
sql5 is transformed to 
sql6 `select a from (select "abc" as a, 'def' as b) T group by a;`
2023-02-13 11:26:08 +08:00