Set column names from path to lower case in case-insensitive case.
This is for Iceberg columns from path. Iceberg columns are case sensitive,
which may cause error for table with partitions.
Now we use a thrift message per fragment instance. However, there are many same messages between instances in a fragment. So this PR aims to extract the same messages and we only need to send thrift message once for a fragment
The previous logic is how many cn can be returned at most. Instead,
if the number of cn is less than expectBeNum, need to use mix to fill in,
until the number of cn equals with expectBeNum or mix nodes are also used up
1.change mv rewrite from bottom up to up bottom
2.compatible with old version mv
3.restore some ut codes (but disable)
4. fix some ut introduced by [fix](planner)fix bug for missing slot #16601 and [Feature](Materialized-View) support multiple slot on one column in materialized view #16378
Currently not support insert {1, 'a'} into struct<f1:tinyint, f2:varchar(20)>
This commit will support implicitly cast the char type in the struct to varchar.
Add implicitly cast for struct-type.
colocated join is depended on if the both side of the join conjuncts are simple column with same distribution policy etc. So the key is to figure out the original source column in scan node if there is one. To do that, we should check the slot from both lhs and rhs of outputSmap in join node.
In current implementation, the class Auth is used for:
Manager all authentication and authorization info such as user, role, password, privileges.
Provide an interface for privilege checking
Some user may want to integrate external access management system such as Apache Ranger.
So we should provide a way to let user set their own access controller.
This PR mainly changes:
A new class SystemAccessController
This access controller is used to check the global level privileges and resource privileges.
A new interface CatalogAccessController
This interface is used to check catalog/database/tbl level privileges.
It has a default implements InternalCatalogAccessController.
All privilege checking methods are moved from Auth to either SystemAccessController or
InternalCatalogAccessController
A new class AccessControllerManager
This is the entry point of privilege authentication. All methods previously called from Auth
now are called from AccessControllerManager
Now, user can implement the interface CatalogAccessController to use their own access controller.
And when creating external catalog, user can specified the access controller class name, so that
different external catalog can use different access controller.
Support IPV6 in Apache Doris, the main changes are:
1. enable binding to IPV6 address if network priority in config file contains an IPV6 CIDR string
2. BRPC and HTTP support binding to IPV6 address
3. BRPC and HTTP support visiting IPV6 Services
Hive 1.x may write orc file with internal column name (_col0, _col1, _col2...).
This will cause query result be NULL because column name in orc file doesn't match
with column name in Doris table schema. This pr is to support query Hive orc files with internal column names.
For now, we haven't see any problem in Parquet file, will send new pr to fix parquet if any problem show up in the future.
In previous version, if the output slot of analyticExpr is not materialized, the analyticExpr is pruned.
But there are some cases that it cannot be pruned.
For example:
SELECT
count(*)
FROM T1,
(SELECT dd
FROM (
SELECT
1.1 as cc,
ROW_NUMBER() OVER() as dd
FROM T2
) V1
ORDER BY cc DESC
limit 1
) V2;
analyticExpr(ROW_NUMBER() OVER() as dd) is not materialized, but we have to generate
WindowGroup for it.
tmp.dd is used by upper count(*), we have to generate data for tmp.dd
In this fix, if an inline view only output one column(in this example, the 'dd'), we materialize this column.
TODO:
In order to prune 'ROW_NUMBER() OVER() as dd', we need to rethink the rule of choosing a column
for count(*). (refer to SingleNodePlanner.materializeTableResultForCrossJoinOrCountStar)
V2 can be transformed to
SELECT cc
FROM (
SELECT
1.1 as cc,
ROW_NUMBER() OVER() as dd
FROM T2
) V1
ORDER BY cc DESC
limit 1
) V2;
Except the byte size of cc and dd, we need to consider the cost to generate cc and dd.
MySql load can load fe server node, but it will cause secure issue that user use it to detect the fe node local file.
For this reason, add a configuration named mysql_load_server_secure_path to set a secure path to load data.
By default, load fe local file feature is disabled by this configuration.
For performance reason, we want to remove constant column from groupingExprs.
For example:
`select sum(T.A) from T group by T.B, 'xyz'` is equivalent to `select sum(T.A) from T group by T.B`
We can remove constant column `abc` from groupingExprs.
But there is an exception when all groupingExpr are constant
For example:
sql1: `select 'abc' from t group by 'abc'`
is not equivalent to
sql2: `select 'abc' from t`
sql3: `select 'abc', sum(a) from t group by 'abc'`
is not equivalent to
sql4: `select 1, sum(a) from t`
(when t is empty, sql3 returns 0 tuple, sql4 return 1 tuple)
We need to keep some constant columns if all groupingExpr are constant.
Consider sql5 `select a from (select "abc" as a, 'def' as b) T group by b, a;`
if the constant column `a` is in select list, this column should not be removed.
sql5 is transformed to
sql6 `select a from (select "abc" as a, 'def' as b) T group by a;`
Use FE cluster token to auth stream load.
This auth is only open for be, and fe auth still only support http basic auth.
I will use this auth for mysql load to build a no-auth stream load from fe to be.
And this will avoid double auth in mysql load.
More information to see the design doc.
Issue Number: close#16351
Dynamic schema table is a special type of table, it's schema change with loading procedure.Now we implemented this feature mainly for semi-structure data such as JSON, since JSON is schema self-described we could extract schema info from the original documents and inference the final type infomation.This speical table could reduce manual schema change operation and easily import semi-structure data and extends it's schema automatically.
Doris can use mysql-jdbc-jar to connect doris database, but doris has some data type that mysql without.
Such as DecimalV3 and Date/DatetimeV2
I add some case judgments in `Mysql Catalog` , so that Jdbc catalog can identify the data type of DORIS
When file cache enabled, running the same query for the second time may be still slow, for `FE` will assign the same
scan range into different backends among different queries, and the former cached data in `BE` will be useless if the scan range is changed.
So, this PR introduce consistent hash to assign the same scan range into the same backend among different queries.