Currently doris doesn't support datetime type filter stats estimation, but only for date type.
It will cause the filter using datetime type column with the same date and different time computing out a inaccurate selectivity and estimate a wrong row count, such as :
where o.book_time >= '2020-03-01 00:00:00.0' and o.book_time <= '2020-03-01 23:59:59.0';
This pr adds the datetime type(only support hh:mm:ss scale) filter estimation and improve the row count estimation for the above case.
Fix auto analyze doesn't filter unsupported type bug.
Catch throwable in auto analyze thread for each database, otherwise the thread will quit when one database failed to create jobs and all other databases will not get analyzed.
change FE config item full_auto_analyze_simultaneously_running_task_num to auto_analyze_simultaneously_running_task_num
The creation of hudi and iceberg table is disallowed since v1.2.
All these features are covered by hudi/iceberg catalog.
We should remove the code in v2.1
The PR mainly changes:
1. remove the code of hudi/iceberg external table.
2. remove code of iceberg database.
3. disallowed hive external table's creation.
4. disabled odbc,mysql,broker external table by default, and add FE config `disable_odbc_mysql_broker_table` to control it
Introduction to Main Classes:
- MTMVService:MTMV services for other modules to call
- MTMVHookService:All operations that affect the MTMV
- MTMVJobManager:All operations that affect the MTMV job
- MTMVCacheManager:All operations that affect the MTMV Cache
- MTMVTask&MTMVJob:Inherit from job framework
For hive version lower than 2.3.7, there is no enum ClientCapability.INSERT_ONLY_TABLES.
So if we send this enum to the server side, the server side will get a null,
and this will cause some undefined behavior, eg, failed to get tables infos from hms.
normally, mv column's data type should be same as base table. This pr plays as a fail-safe, if mv column's data type is different from base table accidentally, fall back to select base table to make the query works.
### How to reproduce
1. create a database db1 and a table tbl1;
2. insert some data and export with label L1;
3. drop the db1 and tbl1, and recreate them with same name.
4. insert some data and export with same label L1;
Expect: export success
Actual: error: Label L1 have already been used.
This PR fix it.
Before, the auto analyze job start time was the job creation time, not the start to execute time, which is inaccurate. This pr is to change the start time to the first task start to execute time.
If there exists huge datasets with many database and may tables and many columns, Auto collector might be submit too many jobs which would occupy too much of FE memory.
In this PR, limit job each round could submit up to 5