The changes of this PR for JdbcOracleClient are as follows:
#### bug fixes:
1. Fix the problem that if there is an approximate table name for Schema synchronization with a table name with `/` characters, the synchronization Column will be confused
2. Fix the NPE problem of metadata synchronization after enabling lower_case_table_names configuration
#### improvement:
1. Modify the method of synchronizing Oracle User to Doris Database mapping, use `metadata.getSchemas` instead of `SELECT DISTINCT OWNER FROM all_tables`
2. When synchronizing metadata, change `null` at the catalog level to `conn.getcatalog`
A row of complex type may be stored across two(or more) pages, and the parameter `align_rows` indicates that whether the reader should read the remaining value of the last row in previous page.
`ParquetReader` confuses logical/physical/slot id of columns. If only reading the scalar types, there's nothing wrong, but when reading complex types, `RowGroup` and `PageIndex` will get wrong statistics. Therefore, if the query contains complex types and pushed-down predicates, the probability of the result set is incorrect.
FEATURE:
1. enable array type in Nereids
2. support generice on function signature
3. support array and map type in type coercion and type check
4. add element_at and element_slice syntax in Nereids parser
REFACTOR:
1. remove AbstractDataType
BUG FIX:
1. remove FROM from nonReserved keyword list
TODO:
1. support lambda expression
2. use Nereids' way do function type coercion
3. use castIfnotSame when do implict cast on BoundFunction
4. let AnyDataType type coercion do same thing as function type coercion
5. add below array function
- array_apply
- array_concat
- array_filter
- array_sortby
- array_exists
- array_first_index
- array_last_index
- array_count
- array_shuffle shuffle
- array_pushfront
- array_pushback
- array_repeat
- array_zip
- reverse
- concat_ws
- split_by_string
- explode
- bitmap_from_array
- bitmap_to_array
- multi_search_all_positions
- multi_match_any
- tokenize
if we write sql with : `select cast(array() as array<varchar(10)>)`
castexpr in fe will call analyze() with `Type.matchExactType(childType, type, true);`
here array type only check contains_null , but should check inner type to make array matchExactType right
ctas with outer join like
```sql
create table a (
id int not null,
name varchar(20) not null
)
distributed by hash(id) buckets 4
properties (
"replication_num"="1"
);
create table b (
id int not null,
age int not null
)
distributed by hash(id) buckets 4
properties (
"replication_num"="1"
);
insert into a values(1, 'ww'), (2, 'zs');
insert into b values(1, 22);
create table c properties("replication_num"="1") as select a.id, a.name, b.age from a left join b on a.id = b.id;
```
the column 'age' in c is not null, but nullable is expected, we fix it by use the nullable mode of the outputs of root plan fragment.
if we write sql with : select array(1.0,2.0,null, null,2.0)
here will pass arg type with uint8 to be which does not match array() func sign with deicmal, and make be core. so here should cast from be and make null tag to cast decimal type
[Fix](orc-reader) Fix filling partition or missing column used incorrect row count.
`_row_reader->nextBatch` returns number of read rows. When orc lazy materialization is turned on, the number of read rows includes filtered rows, so caller must look at `numElements` in the row batch to determine how
many rows were not filtered which will to fill to the block.
In this case, filling partition or missing column used incorrect row count which will cause be crash by `filter.size() != offsets.size()` in filter column step.
When orc lazy materialization is turned off, add `_convert_dict_cols_to_string_cols(block, nullptr)` if `(block->rows() == 0)`.
1. the real value of BE config `file_cache_max_file_reader_cache_size` will be the 1/3 of process's max open file number.
2. use thread pool to create or init the file cache concurrently.
To solve the issue that when there are lots of files in file cache dir, the starting time of BE will be very slow because
it will traverse all file cache dirs sequentially.
should use getHashJoinConjuncts() and getOtherJoinConjuncts() to get hash and other conjuncts of hash join node instead of categorizing them by checking if it's 'EqualTo' expression