`iceberg-hive-metastore` and `hive-storage-api` have been defined in hive-catalog-shade,
and some classes in the shade have been renamed, so we cannot declare them again.
The classes in the shade should be kept.
The `hive-metastore-api` used in `ranger` can also use the jar in the `shade`.
Since we rename the tool class used inside the `hive`, this has no effect.
now maybe jdbc have problem that there are too many connections and they do not release,
so change the property of datasource: init = 1, min = 1, max = 100, and idle time is 10 minutes.
Previously in Doris FE, there is no specific thread pool for grpc-client-channel,
by default the underlying netty logic would use one dynamic unbounded cache threadpool.
The workload for this grpc threadpool is unseen.
Use ThreadpoolMgr to create one customized threadpool to get Prometheus-compatible metric data.
Because of the limitation of ProjectPlanner, we have to keep set agg functions materialized if there is any virtual slots in the group by list, such as 'GROUPING_ID' in the group by list etc.
In previous implementation, Doris would only persist one task to tract analysis job status. After this PR, each task of column analysis would be persisted.And store a record which task_id is -1 as the job of the user submitted AnalyzeStmt.
AnalyzeStmt <---1-1---> AnalysisJob
AnalysisJob <---1-n---> AnalysisTask
Refresh table object while refresh external table. Including:
Refresh catalog, refresh database and refresh table.
Before visiting database, need to guarantee catalog has been initialized.
Before visiting table, need to guarantee catalog and database have been initialized.
Consider sql
select table_B_alias.b from table_B_alias where table_B_alias.b in ( select a from table_A_alias );
if table_B_alias.b is int and table_A_alias.a is bigint,
we should cast(b as bigint) to make the data type the same as the InSubquery.
Since the goal of `ColumnStatistic#coverage` function is to determine whether the build side range is complete enclosed by the range of probe side, if so, as the comment of `RuntimeFilterPruner` explained, corresponding runtime filter might be thought as useless and get pruned.
Howerver, the original logic of this method is quite confused.
Simplify its logic by this formula:
```java
!(this.maxValue >= other.maxValue && this.maxValue <= other.maxValue)
```
1. Support prefetch some column stats when FE booted, it would load column stats that was got updated recently according to the comment of PR #18460 from @morrySnow
2. Refactor stats cache, split histogram cache from column stats, so that we could avoid some redundant query for column statistics table,for example, update the histogram or column stats only, in the previous implementation a united cache loader would send query request to both column stats table and histogram table,
3. Extract some common logic to StatsUtil
4. Remove some useless codes in unit tests, those codes is hard to maintaince and it's not a good idea for testing the accurracy of stats estimation according to the advise from @englefly
5. Add field type restriction when create analysis tasks to avoid unnecessary failure
`Hive 3` uses the `thrift-0.9.3` package, and `Doris` uses the `thrift-0.16.0` package.
These two packages are not compatible, so we use the `hive-sahde` package to manage hive dependencies
in a unified way. This jar package renames the `thrift` class , so the problem of conflict can be resolved.
Split ExternalFileScanNode into FileQueryScanNode and FileLoadScanNode.
Remove some useless code in FileLoadScanNode.
Remove unused config item: enable_vectorized_load and enable_new_load_scan_node
1. remove TypeCoercion and CharacterLiteralTypeCoercion
2. Nereids Cast do not relay on legacy planner's analyze()
3. fix below problem in legacy planner, after this PR
a. BOOLEAN can cast to DECIMALV2 explicitly
b. compare between BOOLEAN and DATE will cast both side to DOUBLE
c. HLL cannot be implicitly cast to any other type
when group-by-keys does not contain unique column
1. with out distinct: we prefer two phase aggregate to one phase aggregate
2. with distinct: we prefer three phase aggregate to two phase aggregate
steps to repo:
1, create any catalog re; [OK]
2, switch re [OK]
3, show catalogs [OK]
4, drop catalog re [OK]
5, show catalogs [FAIL with "Current catalog is not exist, please switch catalog." ]
expect:
show catalogs should always be OK, not depends on current catalog.
if inner join implemented by NLJ, the runtime filter generation phase will be terminated and children are not be travelled. we fix it by adjust the order of travelling children and handle the node itself.
1. If we set hadoop user property along with kerberos info, the authentication will fail.
2. fix some minor issue of local fs, follow up #18397
3. Add KW_HOSTNAME to keywords region, follow up #17329
4. Fix tvf not working with pipeline engine, follow up #18376
Introduced from #17884.
When replay catalog from image, we should not call `catalog.getProperties()`.
Because it will visit the resource mgr, but resource mgr is not replayed yet.