1. Refactor file cache. Before refactor, the file cache config format is "[{"path":"/path/to/file_cache","normal":21474836480,"persistent":10737418240,"query_limit":10737418240}]" and now change to "[{"path":"/mnt/disk3/selectdb_cloud/file_cache","total_size":21474836480,"query_limit":10737418240}]". It will be simpler than before.
2. Support more strategy. Support file cache priority. The file cache will have three queue, name as 'index'/'normal'/'disposable'. We can avoid that the higher priority data is eliminate by the lower priority data.
This will cause FE start fail
1. docs under sql-manual need strict format.
2. Change the rule of github checks, to run FE ut if docs under sql-manual is changed
int64_t months = _year * 12 + _month - 1 + sign * (12 * interval.year + interval.month);
_year = months / 12;
if (_year > 9999) {
return false;
}
_month = (months % 12) + 1;
if (_day > s_days_in_month[_month]) {
_day = s_days_in_month[_month];
if (_month == 2 && doris::is_leap(_year)) {
_day++;
}
}
The variable "months" may be negative. Taking modulus with it (_month) may also result in a negative value, which can cause an array access overflow.
Fix bug when reading array type in parquet file:
```
ERROR 1105 (HY000): errCode = 2, detailMessage = [INTERNAL_ERROR]Read parquet file xxx failed,
reason = [IO_ERROR]Decode too many values in current page
```
When reading normal columns, `ScalarColumnReader::_read_values` still calls `ColumnSelectVector::set_run_length_null_map` to initialize select vector, but `ScalarColumnReader::_read_nested_column` hasn't do this, making the number of values wrong.
The situation where this error occurs is particularly extreme: The column pages have remaining values to be read,
but all of them are null values at ancestor level, so there's no actual read operation, just skipping null values at ancestor level.
Refactor FileQueryScanNode init and finalize methods.
Handle schema related initialization in init method, handle scan range generation in finalize method.
Introduced from #18609.
When setting global variables from Non Master FE, there will be error like:
`Variable 'password_history' is a GLOBAL variable and should be set with SET GLOBAL`
Because when setting global variables from Non Master FE, Doris will do following step:
1. forward this SetStmt to Master FE to execute.
2. Change this SetStmt to "SESSION" level, and execute it again on this Non Master FE.
But for "GLOBAL only" variable, such ash "password_history", it doesn't allow to set on SESSION level.
So when doing step 2, "set password_history=xxx" without "GLOBAL" keywords will throw exception.
So in this case, we should just ignore this exception and return.
Fe will clear transaction info when transaction timeout, but calc delete bitmap
related logic in DeltaWriter::close_wait will continue. In set_txn_related_delete_bitmap,
we return directly in such case.
Currently, there are some useless includes in the codebase. We can use a tool named include-what-you-use to optimize these includes. By using a strict include-what-you-use policy, we can get lots of benefits from it.
1. remove unnecessary project node above scan node.
2. fix in subquery may be recognized as scalar subquery bug
3. fix some Quantile related functions' return type bug
PR1: add new storage file system template and move old storage to new package
PR2: extract some method in old storage to new file system.
PR3: use storages to access remote object storage, and use file systems to access file in local or remote location. Will add some unit tests.
---------
Co-authored-by: jinzhe <jinzhe@selectdb.com>