Fix for SQL with array column:
delete from tbl where c_array is null;
more info please refer to #13360
Co-authored-by: cambyzju <zhuxiaoli01@baidu.com>
We should consider memory which are being flushed from memtable to disk when trying to reduce memory by flushing memtable. Otherwise, we might not release memory space as expected. (e.g. lots of large memtable is in flush, the reduce_mem_usage method picks some small memtables to flush, it can't release enough memory and also can generate lots of small segments, which can cause -238 error)
Overwrite the environment variable PATH to avoid using binutils from Homebrew to build third parties which may cause compilation errors.
Error: building for macOS-x86_64 but attempting to link with file built for unknown-unsupported file format
Currently, ExprId in Nereids is generated by a global gnerator and shared by all statement. There are three problems:
1. ExprId could out of bound
2. hard to debug
3. could not use bitset to present ExprId set
This PR solve this problem by new Id generator for each statement. after this PR ExprId always start from 0 for each statement.
TODO:
1. refactor all place that new StatementContext in test code to ensure the logic is same with main code.
1. remove FE config `enable_array_type`
2. limit the nested depth of array in FE side.
3. Fix bug that when loading array from parquet, the decimal type is treated as bigint
4. Fix loading array from csv(vec-engine), handle null and "null"
5. Change the csv array loading behavior, if the array string format is invalid in csv, it will be converted to null.
6. Remove `check_array_format()`, because it's logic is wrong and meaningless
7. Add stream load csv test cases and more parquet broker load tests
# Proposed changes
Implement predicate pushdown in `OrcReader` by converting doris `ColumnValueRange` to orc `SearchArgument`.
## Remaining problems
1. Orc support `not in`, which may have effect on bloom filter. However, doris `ScanNode` has not push down `not in` to file scanner.
2. Orc support `is null`, and row range has `hasNull` identifier. However, `_contain_null` in `ColumnValueRange` is ambiguous. `_contain_null = true` only means that the value can be nullable, not equal to null.
3. `DateTimeV2` has lost microsecond precision in `ColumnValueRange`, which may cause filtering error when a min-max value equals to the predicate value.
4. `DateTimeV1` is not accurate enough, and only saved to seconds.
5. Orc support the predicate pushdown of `float&double` type, but doris has not push down `float&double` type for precision reason.
Some problems have been found with the setting of parallel_fragment_exec_inistance_num > 1.
Try to use this way to set a random parallel_fragment_exec_inistance_num value for each query to cover more situations.
* [bugfix](VecDateTimeValue) eat the value of microsecond in function from_date_format_str
* add sql based regression test
Co-authored-by: xiaojunjie <xiaojunjie@baidu.com>
Problem:
IColumn::is_date property will lost after ColumnDate::clone called.
Fix:
After ColumnDate created, also set IColumn::is_date.
Co-authored-by: cambyzju <zhuxiaoli01@baidu.com>