Support specifying AccessControllerFactory when creating catalog
create catalog hive properties(
...
"access_controller.class" = "org.apache.doris.mysql.privilege.RangerAccessControllerFactory",
"access_controller.properties.prop1" = "xxx",
"access_controller.properties.prop2" = "yyy",
...
)
So that user can specified their own access controller, such as RangerAccessController
Add interface to check column level privilege
A new method of CatalogAccessController: checkColsPriv(),
for checking column level privileges.
TODO:
Support grant column level privileges statements in Doris
Add TestExternalCatalog/Database/Table/ScanNode
These classes are used for FE unit test. In unit test you can
create catalog test1 properties(
"type" = "test"
"catalog_provider.class" = "org.apache.doris.datasource.ColumnPrivTest$MockedCatalogProvider"
"access_controller.class" = "org.apache.doris.mysql.privilege.TestAccessControllerFactory",
"access_controller.properties.key1" = "val1",
"access_controller.properties.key2" = "val2"
);
To create a test catalog, and specify catalog_provider to mock database/table/schema metadata
Set roles in current user identity in connection context
The roles can be used for authorization in access controller.
Describe your changes.
1.DeployManager adds the ability to obtain domain names from third-party systems
2.When the DeployManager determines whether the node exists, add the domain name judgment logic
3.rename Backend.getHost() to getIp()
4.Delete the logic for handling UnknownHostException in FQDNManager, because there are two cases of UnknownHostException. If it occurs temporarily, it can wait for the next detection. If the node is deleted, the logic can be handed over to DeployManager for processing.
This commits forbid struct and map type to be distributed key/aggregation key.
The sql such as:
select distinct stuct_col from struct_table
will report an error.
Steps:
1. drop the old MTMV jobs
2. clear the old task records and clean the running and pending tasks
3. set the new scheduler info in MTMV and replay it in followers.
4. create a job in the master node.
Note that if you change the refresh info of MTMV, the old MTMV tasks will be cleaned.
The background is described in this issue: #15723,
where users used Apache Druid to satisfy such lambada requirements before.
We will not make Doris dropping data not belonged to current time window automatically like Druid,
which is not flexible. We demand a ability to support mutable/immutable partition, the PR works this way:
1. Support mutable property for a partition.
2. The mutable property of a partition is passed from FE to BE in a load procedure
3. If a record's partition is immutable, we mark this row as "un selected" which will not be included in computation of 'max_filter_ratio',
so that data write to immutable partition will be neglected and not cause load failure.
Use Example:
1. Add immutable partition or modify an partition to be immutable:
- alter table test_tbl add [temporary] partition xxx values less than ('xxx') ('mutable' = 'true');
- alter table test_tbl modify partition xx set ('mutable' = 'false');
2. Write 5 records into table, two of then belongs to immutable partition
Support config like whether push down to es and refactor some code
Like transform to wildcard query and push down to es, this increases the cpu consumption of the es,
I add a switch control it.
* fix bug, add remote meta for compaction
* fix bug, add remote meta for compaction
* fix bug, add remote meta for compaction
* fix bug, add remote meta for compaction
* fix bug, add remote meta for compaction
* fix bug, add remote meta for compaction
* fix bug, add remote meta for compaction
* fix bug, add remote meta for compaction
* fix bug, add remote meta for compaction
* fix bug, add remote meta for compaction
* fix bug, add remote meta for compaction
* fix bug, add remote meta for compaction
* fix bug, add remote meta for compaction
* fix bug, add remote meta for compaction
* fix bug, add remote meta for compaction
* fix bug, add remote meta for compaction
* fix bug, add remote meta for compaction
* fix bug, add remote meta for compaction
* fix bug, add remote meta for compaction
* fix bug, add remote meta for compaction
* fix bug, add remote meta for compaction
* fix bug, add remote meta for compaction
* fix bug, add remote meta for compaction
* fix bug, add remote meta for compaction
* fix bug, add remote meta for compaction
* fix bug, add remote meta for compaction
From the original logic, query like `select * from a where exists (select * from b order by 1) order by 1 limit 1` is a query contains subquery,
but the top query will pass `checkEnableTwoPhaseRead` and set `isTwoPhaseOptEnabled=true`.So check the double plan is a general topn query plan is needed, and rollback the needMaterialize flag setted by the previous `analyze`.
Set column names from path to lower case in case-insensitive case.
This is for Iceberg columns from path. Iceberg columns are case sensitive,
which may cause error for table with partitions.
Now we use a thrift message per fragment instance. However, there are many same messages between instances in a fragment. So this PR aims to extract the same messages and we only need to send thrift message once for a fragment
The previous logic is how many cn can be returned at most. Instead,
if the number of cn is less than expectBeNum, need to use mix to fill in,
until the number of cn equals with expectBeNum or mix nodes are also used up
1.change mv rewrite from bottom up to up bottom
2.compatible with old version mv
3.restore some ut codes (but disable)
4. fix some ut introduced by [fix](planner)fix bug for missing slot #16601 and [Feature](Materialized-View) support multiple slot on one column in materialized view #16378
Currently not support insert {1, 'a'} into struct<f1:tinyint, f2:varchar(20)>
This commit will support implicitly cast the char type in the struct to varchar.
Add implicitly cast for struct-type.
colocated join is depended on if the both side of the join conjuncts are simple column with same distribution policy etc. So the key is to figure out the original source column in scan node if there is one. To do that, we should check the slot from both lhs and rhs of outputSmap in join node.
In current implementation, the class Auth is used for:
Manager all authentication and authorization info such as user, role, password, privileges.
Provide an interface for privilege checking
Some user may want to integrate external access management system such as Apache Ranger.
So we should provide a way to let user set their own access controller.
This PR mainly changes:
A new class SystemAccessController
This access controller is used to check the global level privileges and resource privileges.
A new interface CatalogAccessController
This interface is used to check catalog/database/tbl level privileges.
It has a default implements InternalCatalogAccessController.
All privilege checking methods are moved from Auth to either SystemAccessController or
InternalCatalogAccessController
A new class AccessControllerManager
This is the entry point of privilege authentication. All methods previously called from Auth
now are called from AccessControllerManager
Now, user can implement the interface CatalogAccessController to use their own access controller.
And when creating external catalog, user can specified the access controller class name, so that
different external catalog can use different access controller.
Support IPV6 in Apache Doris, the main changes are:
1. enable binding to IPV6 address if network priority in config file contains an IPV6 CIDR string
2. BRPC and HTTP support binding to IPV6 address
3. BRPC and HTTP support visiting IPV6 Services
Hive 1.x may write orc file with internal column name (_col0, _col1, _col2...).
This will cause query result be NULL because column name in orc file doesn't match
with column name in Doris table schema. This pr is to support query Hive orc files with internal column names.
For now, we haven't see any problem in Parquet file, will send new pr to fix parquet if any problem show up in the future.
In previous version, if the output slot of analyticExpr is not materialized, the analyticExpr is pruned.
But there are some cases that it cannot be pruned.
For example:
SELECT
count(*)
FROM T1,
(SELECT dd
FROM (
SELECT
1.1 as cc,
ROW_NUMBER() OVER() as dd
FROM T2
) V1
ORDER BY cc DESC
limit 1
) V2;
analyticExpr(ROW_NUMBER() OVER() as dd) is not materialized, but we have to generate
WindowGroup for it.
tmp.dd is used by upper count(*), we have to generate data for tmp.dd
In this fix, if an inline view only output one column(in this example, the 'dd'), we materialize this column.
TODO:
In order to prune 'ROW_NUMBER() OVER() as dd', we need to rethink the rule of choosing a column
for count(*). (refer to SingleNodePlanner.materializeTableResultForCrossJoinOrCountStar)
V2 can be transformed to
SELECT cc
FROM (
SELECT
1.1 as cc,
ROW_NUMBER() OVER() as dd
FROM T2
) V1
ORDER BY cc DESC
limit 1
) V2;
Except the byte size of cc and dd, we need to consider the cost to generate cc and dd.
MySql load can load fe server node, but it will cause secure issue that user use it to detect the fe node local file.
For this reason, add a configuration named mysql_load_server_secure_path to set a secure path to load data.
By default, load fe local file feature is disabled by this configuration.
For performance reason, we want to remove constant column from groupingExprs.
For example:
`select sum(T.A) from T group by T.B, 'xyz'` is equivalent to `select sum(T.A) from T group by T.B`
We can remove constant column `abc` from groupingExprs.
But there is an exception when all groupingExpr are constant
For example:
sql1: `select 'abc' from t group by 'abc'`
is not equivalent to
sql2: `select 'abc' from t`
sql3: `select 'abc', sum(a) from t group by 'abc'`
is not equivalent to
sql4: `select 1, sum(a) from t`
(when t is empty, sql3 returns 0 tuple, sql4 return 1 tuple)
We need to keep some constant columns if all groupingExpr are constant.
Consider sql5 `select a from (select "abc" as a, 'def' as b) T group by b, a;`
if the constant column `a` is in select list, this column should not be removed.
sql5 is transformed to
sql6 `select a from (select "abc" as a, 'def' as b) T group by a;`