Problem:
when running pipeline, we get randomly failed of test_leading
Reason:
physical distribute was generated and choosed to be the best plan because we can not get any statistic information of empty table. So we would get some unexpect result because we can not expect the order in memo
Solved:
Add statistic of columns used in test_leading, try repeatly in pipeline
this pr
1. fix use podarray push_back() with back() will make heap_use_after_free when podarray is reach capacity which would may make heap free
2. add cases for csv format for nested types. and csv file has two define which are without quote or just like json text
The user has configured the parameter lower_case_table_names, which ignores the case of the table name. When executed on the SQL client, the table name can be queried in both case.
But when using Connector to read doris data, the table names must be in the same case, otherwise an error will be reported.
[fix](case) change dynamic_partition.time_unit from day to month to avoid the error that the intert data not in partition
Co-authored-by: stephen <hello-stephen@qq.com>
By utilizing the zonemap index of the segment, we can ascertain if a predicate is always true. For example, if the segment’s maximum value is 100 and the predicate is col < 101, then this predicate is always true for this segment.
I want to use Doris Multi-catalog to accelerate HMS query. My organization has custom distributed file system, and we think wrapping the fs access difference into broker (listLocatedFiles, openReader..) would be a elegant approach.
This pr introduce HMS catalog conf `bind.broker.name`. If we set this conf, file split, query scan operation will send to broker.
usage:
create a hms catalog with broker usage
```
CREATE CATALOG hive_catalog_broker PROPERTIES (
'type'='hms',
'hive.metastore.uris' = 'thrift://xxx',
'broker.name' = 'hdfs_broker'
);
```
When we try to query from this catalog, file split and query scan request will send to broker `hdfs_broker`.
More details about this pr:
1. Introduce HMS catalog proporty `bind.broker.name` to specify broker name to do remote path work. When `broker.name` is set, `enable.self.splitter` must be `true` to ensure file splitting process is executed in Fe
2. Introduce 2 more interfaces to broker service:
- `TBrokerIsSplittableResponse isSplittable(1: TBrokerIsSplittableRequest request)`, helps to invoke input format `isSplitable` interface.
- `TBrokerListResponse listLocatedFiles(1: TBrokerListPathRequest request)`, helps to do `listFiles` or `listLocatedStatus` for remote file system
3. 3 parts of whole processing will be executed in broker:
- Check whether the path with specified input format name `isSplittable`
- `listLocatedFiles` of table / partition locations.
- `OpenReader` for specified file splits.
Co-authored-by: chenlinzhong <490103404@qq.com>
In previous, if user property `'resource_tags.location'` is not set, the can use Backends with any resource tag.
It may confuse that when the DBA set part of Backends to resource group A, then the current existing user
should not be able to use this group A util it's `'resource_tags.location'` is set.
So in this PR, I change the behavior, that if user property `'resource_tags.location'` is not set, it can only use the
Backends with `default` tag.
- db support replication_allocation,when create table,if not set `replication_num` or `replication_allocation `,will use it in db
- fix partition property will disappear when table partition is not null