since we cannot do stats derive and cost estimate on agg very good.
this PR remove some aggregate pattern that usually not good.
1. one stage agg after exchange. this pattern is good only when process very few rows.
2. three stage distinct agg with gather middle merge.
some syntax error will cause unclear msg: NPE,because symbol.value is null and cause NPE when call toLowerCase(), we fix it by check if the value is null and return early.
This reverts commit 296b0c92f702675b92eee3c8af219f3862802fb2.
we can use drop table force stmt to fast drop tablets, no need to check tablet dropped state in every report
Co-authored-by: caiconghui1 <caiconghui1@jd.com>
Hive support create partition with a specific location. In this case, the file path for the create partition may not contain the partition name and value. Which will cause Doris fail to query the the hive partition.
This pr is to fix this bug.
segcompaction_p1 contains fairly large load jobs, which will exceed
memlimit or timeout in pipeline under such heavy loads.
Signed-off-by: freemandealer <freeman.zhang1992@gmail.com>
```sql
select if(
date_format(CONCAT_WS('', '9999-07', '-26'), '%Y-%m') = DATE_FORMAT(curdate(), '%Y-%m'),
curdate(),
DATE_FORMAT(DATE_SUB(month_ceil(CONCAT_WS('', '9999-07', '-26')), 1), '%Y-%m-%d')
)
```
return null when construct new children of if(), we find that the the more than "0" index in result map doesn't replace the const map caused by incorrect value-assignment in code.
Add this option in conf:
/**
* If set false, user couldn't submit analyze SQL and FE won't allocate any related resources.
*/
@ConfField
public static boolean enable_stats = true;
It will be checked during analyze of analyze related stmt and init analyze manager
We found qt_q11 in regression test test_external_catalog_hive is very slow.
The result is only one record, so other data should be filtered out in the parquet lazy read situation.
Then we found currently the parquet reader read many records because we can only skip parquet page. But in order to skip parquet page, currently we need to read page header, then it will caused prefetch data. Therefore, prefetch data in this case may be not good.
So there are two issues:
Skip whole row group in this case.
Prefetching data in this case may be not good, need to improve it.
This PR resolve issues 1.
1. Refactor file cache. Before refactor, the file cache config format is "[{"path":"/path/to/file_cache","normal":21474836480,"persistent":10737418240,"query_limit":10737418240}]" and now change to "[{"path":"/mnt/disk3/selectdb_cloud/file_cache","total_size":21474836480,"query_limit":10737418240}]". It will be simpler than before.
2. Support more strategy. Support file cache priority. The file cache will have three queue, name as 'index'/'normal'/'disposable'. We can avoid that the higher priority data is eliminate by the lower priority data.
This will cause FE start fail
1. docs under sql-manual need strict format.
2. Change the rule of github checks, to run FE ut if docs under sql-manual is changed
int64_t months = _year * 12 + _month - 1 + sign * (12 * interval.year + interval.month);
_year = months / 12;
if (_year > 9999) {
return false;
}
_month = (months % 12) + 1;
if (_day > s_days_in_month[_month]) {
_day = s_days_in_month[_month];
if (_month == 2 && doris::is_leap(_year)) {
_day++;
}
}
The variable "months" may be negative. Taking modulus with it (_month) may also result in a negative value, which can cause an array access overflow.