[fix](case) change dynamic_partition.time_unit from day to month to avoid the error that the intert data not in partition
Co-authored-by: stephen <hello-stephen@qq.com>
By utilizing the zonemap index of the segment, we can ascertain if a predicate is always true. For example, if the segment’s maximum value is 100 and the predicate is col < 101, then this predicate is always true for this segment.
I want to use Doris Multi-catalog to accelerate HMS query. My organization has custom distributed file system, and we think wrapping the fs access difference into broker (listLocatedFiles, openReader..) would be a elegant approach.
This pr introduce HMS catalog conf `bind.broker.name`. If we set this conf, file split, query scan operation will send to broker.
usage:
create a hms catalog with broker usage
```
CREATE CATALOG hive_catalog_broker PROPERTIES (
'type'='hms',
'hive.metastore.uris' = 'thrift://xxx',
'broker.name' = 'hdfs_broker'
);
```
When we try to query from this catalog, file split and query scan request will send to broker `hdfs_broker`.
More details about this pr:
1. Introduce HMS catalog proporty `bind.broker.name` to specify broker name to do remote path work. When `broker.name` is set, `enable.self.splitter` must be `true` to ensure file splitting process is executed in Fe
2. Introduce 2 more interfaces to broker service:
- `TBrokerIsSplittableResponse isSplittable(1: TBrokerIsSplittableRequest request)`, helps to invoke input format `isSplitable` interface.
- `TBrokerListResponse listLocatedFiles(1: TBrokerListPathRequest request)`, helps to do `listFiles` or `listLocatedStatus` for remote file system
3. 3 parts of whole processing will be executed in broker:
- Check whether the path with specified input format name `isSplittable`
- `listLocatedFiles` of table / partition locations.
- `OpenReader` for specified file splits.
Co-authored-by: chenlinzhong <490103404@qq.com>
In previous, if user property `'resource_tags.location'` is not set, the can use Backends with any resource tag.
It may confuse that when the DBA set part of Backends to resource group A, then the current existing user
should not be able to use this group A util it's `'resource_tags.location'` is set.
So in this PR, I change the behavior, that if user property `'resource_tags.location'` is not set, it can only use the
Backends with `default` tag.
- db support replication_allocation,when create table,if not set `replication_num` or `replication_allocation `,will use it in db
- fix partition property will disappear when table partition is not null
When executing broker load in ASAN mode, BE may crash with error:
```
F20231010 18:18:17.044978 185490 block.cpp:694] Check failed: d.column->use_count() == 1 (3 vs. 1)
*** Check failure stack trace: ***
@ 0x55e9d94c4e46 google::LogMessage::SendToLog()
@ 0x55e9d94c1410 google::LogMessage::Flush()
@ 0x55e9d94c5689 google::LogMessageFatal::~LogMessageFatal()
@ 0x55e9c509f80d doris::vectorized::Block::clear_column_data()
@ 0x55e9b6c170b3 doris::PlanFragmentExecutor::get_vectorized_internal()
@ 0x55e9b6c147e6 doris::PlanFragmentExecutor::open_vectorized_internal()
@ 0x55e9b6c12d9a doris::PlanFragmentExecutor::open()
@ 0x55e9b6c18426 doris::PlanFragmentExecutor::execute()
@ 0x55e9b6945cca doris::FragmentMgr::_exec_actual()
@ 0x55e9b696456c doris::FragmentMgr::exec_plan_fragment()::$_0::operator()()
```
It may happen when there is column maping like:
```
(k1,v2,v3,v4,v5,v6,v7,v8)
set (k2=v4,k3=v4,k4=v4)
```
in load stmt.
Case is covered by Baidu test cases
this PR #25193 have achieve about FE.
eg: select count() from lineorder join supplier on lo_partkey < s_suppkey;
will have a max filter after build hash table , so could use it to filter probe table data.
While doing sample analyze, the result of row count, null number and datasize need to multiply a coefficient based on
the sample percent/rows. This pr is mainly to calculate the coefficient according to the sampled file size over total size.
Fix work load group GC, add cancel load and add logs.
Unify the format and change all to lowercase of GC logs, avoid unnecessary trouble when grep or less
Add logs to help locate the cause of slow GC.