1. Added test p0 for sampling collection statistics
2. Modify the uniqueKeys of table analysis_jobs for deletion based on relevant conditions
3. Solve the problem that incremental statistics p0 is less stable
This pr mainly optimizes the following items:
- the collection of statistics: clear up invalid historical statistics before collecting them, so as not to affect the final table statistics.
- the incremental collection of statistics: in the case of incremental collection, only the corresponding partition statistics need to be collected.
TODO: Supports incremental collection of materialized view statistics.
Fix 2 bugs of show load profile:
For broker load, the second level should be the sub task' id.
show load profile stmt should be forwarded to master FE to execute.
1. fix constant folding failed on decimalv3 type
2. support reduce decimalv3 literal precision in comparison predicate
3. support fe config enable_decimal_conversion
since we cannot do stats derive and cost estimate on agg very good.
this PR remove some aggregate pattern that usually not good.
1. one stage agg after exchange. this pattern is good only when process very few rows.
2. three stage distinct agg with gather middle merge.
This reverts commit 296b0c92f702675b92eee3c8af219f3862802fb2.
we can use drop table force stmt to fast drop tablets, no need to check tablet dropped state in every report
Co-authored-by: caiconghui1 <caiconghui1@jd.com>
Hive support create partition with a specific location. In this case, the file path for the create partition may not contain the partition name and value. Which will cause Doris fail to query the the hive partition.
This pr is to fix this bug.
```sql
select if(
date_format(CONCAT_WS('', '9999-07', '-26'), '%Y-%m') = DATE_FORMAT(curdate(), '%Y-%m'),
curdate(),
DATE_FORMAT(DATE_SUB(month_ceil(CONCAT_WS('', '9999-07', '-26')), 1), '%Y-%m-%d')
)
```
return null when construct new children of if(), we find that the the more than "0" index in result map doesn't replace the const map caused by incorrect value-assignment in code.
Add this option in conf:
/**
* If set false, user couldn't submit analyze SQL and FE won't allocate any related resources.
*/
@ConfField
public static boolean enable_stats = true;
It will be checked during analyze of analyze related stmt and init analyze manager
Fix bug when reading array type in parquet file:
```
ERROR 1105 (HY000): errCode = 2, detailMessage = [INTERNAL_ERROR]Read parquet file xxx failed,
reason = [IO_ERROR]Decode too many values in current page
```
When reading normal columns, `ScalarColumnReader::_read_values` still calls `ColumnSelectVector::set_run_length_null_map` to initialize select vector, but `ScalarColumnReader::_read_nested_column` hasn't do this, making the number of values wrong.
The situation where this error occurs is particularly extreme: The column pages have remaining values to be read,
but all of them are null values at ancestor level, so there's no actual read operation, just skipping null values at ancestor level.
Refactor FileQueryScanNode init and finalize methods.
Handle schema related initialization in init method, handle scan range generation in finalize method.
Introduced from #18609.
When setting global variables from Non Master FE, there will be error like:
`Variable 'password_history' is a GLOBAL variable and should be set with SET GLOBAL`
Because when setting global variables from Non Master FE, Doris will do following step:
1. forward this SetStmt to Master FE to execute.
2. Change this SetStmt to "SESSION" level, and execute it again on this Non Master FE.
But for "GLOBAL only" variable, such ash "password_history", it doesn't allow to set on SESSION level.
So when doing step 2, "set password_history=xxx" without "GLOBAL" keywords will throw exception.
So in this case, we should just ignore this exception and return.