GRANT USAGE_PRIV ON RESOURCE * TO user;
user will see all database
Describe your changes.
Set a PrivPredicate for show resources and remove USAGE under PrivPredicate in SHOW_ PRIV
1.In vertical compaction, segments will be loaded for every column group, so
we should cache segment ptr to avoid too many repeated io.
2.fix vertical compaction data size bug
This PR #14381 limit the `ExtractCommonFactorsRule` to handle only `WHERE` predicate,
but the predicate in `ON` clause should also be considered. Such as:
```
CREATE TABLE `nation` (
`n_nationkey` int(11) NOT NULL,
`n_name` varchar(25) NOT NULL,
`n_regionkey` int(11) NOT NULL,
`n_comment` varchar(152) NULL
)
DUPLICATE KEY(`n_nationkey`)
DISTRIBUTED BY HASH(`n_nationkey`) BUCKETS 1
PROPERTIES (
"replication_allocation" = "tag.location.default: 1"
);
select * from
nation n1 join nation n2
on (n1.n_name = 'FRANCE' and n2.n_name = 'GERMANY')
or (n1.n_name = 'GERMANY' and n2.n_name = 'FRANCE')
```
There should be predicates:
```
PREDICATES: `n1`.`n_name` IN ('FRANCE', 'GERMANY')
PREDICATES: `n2`.`n_name` IN ('FRANCE', 'GERMANY')
```
On each scan node.
This PR fix this issue by removing the limit of `ExtractCommonFactorsRule`
1. Spark dpp
Move `DppResult` and `EtlJobConfig` to sparkdpp package in `fe-common` module.
So taht `fe-core` is longer depends on `spark-dpp` module, so that the `spark-dpp.jar`
will not be moved into `fe/lib`, which reduce the size of FE output.
2. Modify start_fe.sh
Modify the CLASSPATH to make sure that doris-fe.jar is at front, so that
when loading classes with same qualified name, it will be got from doris-fe.jar firstly.
3. Upgrade hadoop and hive version
hadoop: 2.10.2 -> 3.3.3
hive: 2.3.7 -> 3.1.3
4. Override the IHiveMetastoreClient implementations from dependency
`ProxyMetaStoreClient.java` for Aliyun DLF.
`HiveMetaStoreClient.java` for origin Apache Hive metastore.
Because I need to modified some of their method to make them compatible with
different version of Hive.
5. Exclude some unused dependencies to reduce the size of FE output
Now it is only 370MB (Before is 600MB)
6. Upgrade aws-java-sdk version to 1.12.31
7. Support AWS Glue Data Catalog
8. Remove HudiScanNode(no longer support)
1. add TypeCoercion for (string, decimal) and (date, decimal)
2. The equality of LogicalProject node should consider children in some case
3. don't push down join condition like "t1 join t2 on true/false"
4. add PUSH_DOWN_FILTERS after FindHashConditionForJoin
5. nestloop join should support all kind of join
6. the intermediate tuple should contains slots from both children of nest loop join.
1. support row format using codec of jsonb
2. short path optimize for point query
3. support prepared statement for point query
4. support mysql binary format
Support iceberg schema evolution for parquet file format.
Iceberg use unique id for each column to support schema evolution.
To support this feature in Doris, FE side need to get the current column id for each column and send the ids to be side.
Be read column id from parquet key_value_metadata, set the changed column name in Block to match the name in parquet file before reading data. And set the name back after reading data.
1. add DELETE_SIGN_COLUMN in non-visible-columns in LogicalOlapScan
2. when the table has a delete sign, add a filter `delete_sign_coumn = 0`
3. use output slots and non-visible slots to bind slot
close#16099
1. Make ES resource compatible with `username` property. Keep the same behavior with ES catalog.
2. Change ES catalog `username` to `user` to avoid confusion.
3. Add log in ESRestClient and make debug easier.
We found a problem with inverted index when parser=english,
if there were nulls in columns when flushing inverted index for them, it can cause CLucene throwing an exception.
BE storage Engine has some bug in Date comparison, and hence if we push down predicates like Date'x' < Date 'y', we get error results.
This pr just convert expr like ’Date'x' < Date 'y',‘ to DateTime'x' < DateTime 'y'
TODO:
do storage engine support date slot compare with datetime?
if it support, we could avoid add cast on the slot
and then, this expression could push down to storage engine.
1. `uncheckedCastChild` may generate redundant `CastExpr` like `cast( cast(XXX as Date) as Date)`
2. generate DateLiteral to replace cast(IntLiteral as Date)
Child's slot with same name to the slots in the outputexpression would be discarded which would cause the bind failed, since the slots in the group by expressions cannot find the corresponding bound slots from the child's output
For example, in this case, the `date` in having clause should be bind to alias which has same name, instead of `date` field of the relation
SELECT date_format(date, '%x%v') AS `date` FROM `tb_holiday` WHERE `date` between 20221111 AND 20221116 HAVING date = 202245 ORDER BY date;
1. signatures without order element are wrong
2. signature with one arg is miss
3. group_concat should be NullableAggregateFunction
4. fold constant on fe should not fold NullableAggregateFunction with null arg
TODO
1. reorder rewrite rules, and then only forbid fold constant on NullableAggregateFunction with alwaysNullable == true
This PR optimize topn query like `SELECT * FROM tableX ORDER BY columnA ASC/DESC LIMIT N`.
TopN is is compose of SortNode and ScanNode, when user table is wide like 100+ columns the order by clause is just a few columns.But ScanNode need to scan all data from storage engine even if the limit is very small.This may lead to lots of read amplification.So In this PR I devide TopN query into two phase:
1. The first phase we just need to read `columnA`'s data from storage engine along with an extra RowId column called `__DORIS_ROWID_COL__`.The other columns are pruned from ScanNode.
2. The second phase I put it in the ExchangeNode beacuase it's the central node for topn nodes in the cluster.The ExchangeNode will spawn a RPC to other nodes using the RowIds(sorted and limited from SortNode) read from the first phase and read row by row from storage engine.
After the second phase read, Block will contain all the data needed for the query
The broker implements the interface to juicefs,It supports loading data from juicefs to doris through broker.
At the same time, it also implements the multi catalog to read the hive data stored in juicefs