Add new FE config `ignore_unknown_metadata_module`. Default is false.
If set to true, when reading metadata image file, and there are unknown modules, these modules
will be ignored and skipped.
This is mainly used in downgrade operation, old version can be compatible with new version Image file.
Hive partition columns' stats could be calculated from hive metastore data. Doesn't need to execute sql to get the stats.
This PR is using hive partition metadata to collect partition column stats.
### Before:
return errors when tvf queries an empty file or an error uri:
1. get parsed schema failed, empty csv file
2. Can not get first file, please check uri.
### Now:
we just return empty set when tvf queries an empty file or an error uri.
```sql
mysql> select * from s3(
"uri" = "https://error_uri/exp_1.csv",
"s3.access_key"= "xx",
"s3.secret_key" = "yy",
"format" = "csv") limit 10;
Empty set (1.29 sec)
```
Support transforming trino dialect SQL to logical plan (#21854)
## Proposed changes
Issue Number: #21854
Use io.trino.sql.tree.AstVisitor as vistor, visit coorresponding trino node and transform it to doris logical plan.
## Further comments
Here are some examples for function transforming as following:
**ascii('a')** function is in doris and **codepoint('a')** funtion in trino, they have the same feature and have the same method signature, so we can use [TrinoFnCallTransformer](3b37b76886/fe/fe-core/src/main/java/org/apache/doris/nereids/parser/trino/TrinoFnCallTransformer.java) to handle them.
another example for ComplexTransformer as following:
**date_diff('second', TIMESTAMP '2020-12-25 22:00:00', TIMESTAMP '2020-12-25 21:00:00')"** fuction in trino
and **seconds_diff(2020-12-25 22:00:00, 2020-12-25 21:00:00)")** fuction in doris. They have different method signature, we cant not handle it by TrinoFnCallTransformer simply and we should handle it by individual complex transformer [DateDiffFnCallTransformer](3b37b76886/fe/fe-core/src/main/java/org/apache/doris/nereids/parser/trino/DateDiffFnCallTransformer.java).
At present, `load_to_singlt_tablet` import implementation refers to simple random number remainder, which cannot achieve true averaging. This will lead to uneven disk IO and uneven use of cluster resources. To solve this problem, we are preparing to implement round-robin for each partition tablet imported each time, in order to achieve average load to each tablet.
When generating the load query plan, the tablet index record currently imported is passed to BE.
Add a deamon task in FE to regularly clean up the `loadTabletRecordMap`. The map will get the bucket_number of the partition and update the `load_tablet_index` when `getCurrentLoadTabletIndex`.
Disable infer expr column name when query on old optimizer.
This bug is be brought in #24990
if your query SQL is
select id, name, sum(target) FROM db_test.table_test2 group by id, name;
the result column name when query is as following:
|id|name |sum(cast(target as DOUBLE))|
when you create view as following:
CREATE VIEW v1 as select id, name, sum(target) FROM db_test.table_test2 group by id, name;
then query the view v1, the result is as following:
|id|name |__sum_2|
Problem:
when running pipeline, we get randomly failed of test_leading
Reason:
physical distribute was generated and choosed to be the best plan because we can not get any statistic information of empty table. So we would get some unexpect result because we can not expect the order in memo
Solved:
Add statistic of columns used in test_leading, try repeatly in pipeline
The user has configured the parameter lower_case_table_names, which ignores the case of the table name. When executed on the SQL client, the table name can be queried in both case.
But when using Connector to read doris data, the table names must be in the same case, otherwise an error will be reported.
I want to use Doris Multi-catalog to accelerate HMS query. My organization has custom distributed file system, and we think wrapping the fs access difference into broker (listLocatedFiles, openReader..) would be a elegant approach.
This pr introduce HMS catalog conf `bind.broker.name`. If we set this conf, file split, query scan operation will send to broker.
usage:
create a hms catalog with broker usage
```
CREATE CATALOG hive_catalog_broker PROPERTIES (
'type'='hms',
'hive.metastore.uris' = 'thrift://xxx',
'broker.name' = 'hdfs_broker'
);
```
When we try to query from this catalog, file split and query scan request will send to broker `hdfs_broker`.
More details about this pr:
1. Introduce HMS catalog proporty `bind.broker.name` to specify broker name to do remote path work. When `broker.name` is set, `enable.self.splitter` must be `true` to ensure file splitting process is executed in Fe
2. Introduce 2 more interfaces to broker service:
- `TBrokerIsSplittableResponse isSplittable(1: TBrokerIsSplittableRequest request)`, helps to invoke input format `isSplitable` interface.
- `TBrokerListResponse listLocatedFiles(1: TBrokerListPathRequest request)`, helps to do `listFiles` or `listLocatedStatus` for remote file system
3. 3 parts of whole processing will be executed in broker:
- Check whether the path with specified input format name `isSplittable`
- `listLocatedFiles` of table / partition locations.
- `OpenReader` for specified file splits.
Co-authored-by: chenlinzhong <490103404@qq.com>
In previous, if user property `'resource_tags.location'` is not set, the can use Backends with any resource tag.
It may confuse that when the DBA set part of Backends to resource group A, then the current existing user
should not be able to use this group A util it's `'resource_tags.location'` is set.
So in this PR, I change the behavior, that if user property `'resource_tags.location'` is not set, it can only use the
Backends with `default` tag.
- db support replication_allocation,when create table,if not set `replication_num` or `replication_allocation `,will use it in db
- fix partition property will disappear when table partition is not null