this pr is aim to update for filter_by_select logic and change delete limit
only support scala type in delete statement where condition
only support column nullable and predict column support filter_by_select logic, because we can not push down non-scala type to storage layer to pack in predict column but do filter logic
## Proposed changes
Infer the column name when create view if the column is expression
## Further comments
expr column name infer strategy as following:
| expr | example | column name(before) | Inferred column name(if position is 2) |
| ------------- | --------------------------------------- | ------------------------------ | -------------------------------------- |
| function | dayofyear() | dayofyear() | __dayofyear_1 |
| cast | cast(1 as bigint) | CAST(1 AS BIGINT) | __cast_1 |
| anylyticExpr | min() | min() | __min_1 |
| predicate | 1 in (1,2,3,4) | 1 IN (1, 2, 3, 4) | __in_predicate_1 |
| literal | 1 or 'string_var_name' | 1 or 'string_var_name' | __literal_1 |
| arithmeticExpr | & | ... & ... | __arithmetic_expr_1 |
| identifier | a or b | a or b | a or b |
| case | CASE WHEN remark = 's' THEN 1 ELSE 2 END | CASE WHEN remark = 's' THEN 1 ELSE 2 END | __case_1 |
| window | min(timestamp) OVER (...) | min(timestamp) OVER(...) | __min_1 |
SQL for example:
```sql
CREATE VIEW v1 AS
SELECT
error_code,
1,
'string',
now(),
dayofyear(op_time),
cast (source AS BIGINT),
min(`timestamp`) OVER (
ORDER BY
op_time DESC ROWS BETWEEN UNBOUNDED PRECEDING
AND 1 FOLLOWING
),
1 > 2,
2 + 3,
1 IN (1, 2, 3, 4),
remark LIKE '%like',
CASE WHEN remark = 's' THEN 1 ELSE 2 END,
TRUE | FALSE
FROM
db_test.table_test1
```
the output column name is as following:
```
error_code
__literal_1
__literal_2
__now_3
__dayofyear_4
__cast_expr_5
__min_6
__binary_predicate_7
__arithmetic_expr_8
__in_predicate_9
__like_predicate_10
__case_expr_11
__arithmetic_expr_12
```
Push down limit-distinct through left/right outer join or cross join.
such as select t1.c1 from t1 left join t2 on t1.c1 = t2.c1 order by t1.c1 limit 1;
1. Add support for Doris to parse ES date field without time zone info. eg: `2023-04-17T23:01:18.151`, this time will be treated as UTC time, since ES assumes that the time zone for time fields without time zones is UTC.
2. Change local time zone convertion from system local time zone to session variable time zone.
1. To ensure compatibility with the original optimizer, expose the non-lambda signature of highorder function externally.
2. fix some bugs in toSql function in the original optimizer
`ExternalFileTableValuedFunction` now has 3 derived classes:
- LocalTableValuedFunction
- HdfsTableValuedFunction
- S3TableValuedFunction
All these tvfs are for reading data from file. The difference is where to read the file, eg, from HDFS or from local filesystem.
So I refine the fields and methods of these classes.
Now there 3 kinds of properties of these tvfs:
1. File format properties
File format properties, such as `format`, `column_separator`. For all these tvfs, they are common properties.
So these properties should be analyzed in parenet class `ExternalFileTableValuedFunction`.
2. URI or file path
The URI or file path property indicate the file location. For different storage, the format of the uri are not same.
So they should be analyzed in each derived classes.
3. Other properties
All other properties which are special for certain tvf.
So they should be analyzed in each derived classes.
There are 2 new classes:
- `FileFormatConstants`: Define some common property names or variables related to file format.
- `FileFormatUtils`: Define some util methods related to file format.
After this PR, if we want to add some common properties for all these tvfs, only need to handled it in
`ExternalFileTableValuedFunction`, to avoid missing handle it in any one of them.
### Behavior change
1. Remove `fs.defaultFS` property in `hdfs()`, it can be got from `uri`
2. Use `\t` as the default column separator of csv format, same as stream load
when we do NormalizeToSlot, we pushed complex expression and only remain
slot of it. When we do this, we collect alias and their child and
compute its child in bottom project, remain the result slot in current
node. for example
Window(max(...), c1 as a1)
after normalization, we get
Window(max(...), a1)
+-- Project(..., c1 as a1)
But, in some cases, we remove some SlotReference by mistake, for example
Window(max(...), c1, c1 as a1)
after normalization, we get
Window(max(...), a1)
+-- Project(..., c1 as a1)
we lost the SlotReference c1. This PR fix this problem. After this Pr,
we get
Window(max(...), c1, a1)
+-- Project(..., c1, c1 as a1)
Problem:
be core because of bitmap calculation.
Reason:
when be check failed, it would core directly.
Example:
SELECT id_bitmap FROM test_bitmap WHERE id_bitmap IN (NULL) LIMIT 20;
Solved:
Forbidden this kind of expression in fe when analyze. And also forbid bitmap type comparing in other unsupported expressions.
Support complex types in jni framework, and successfully run end-to-end on hudi.
### How to Use
Other scanners only need to implement three interfaces in `ColumnValue`:
```
// Get array elements and append into values
void unpackArray(List<ColumnValue> values);
// Get map key array&value array, and append into keys&values
void unpackMap(List<ColumnValue> keys, List<ColumnValue> values);
// Get the struct fields specified by `structFieldIndex`, and append into values
void unpackStruct(List<Integer> structFieldIndex, List<ColumnValue> values);
```
Developers can take `HudiColumnValue` as an example.
Co-authored-by: sohardforaname <organic_chemistry@foxmail.com>
TODO:
1. support agg_state type
2. support implicit cast literal exception
3. use nereids execute dml for these regression cases:
- test_agg_state_nereids (for TODO 1)
- test_array_insert_overflow (for TODO 2)
- nereids_p0/json_p0/test_json_load_and_function (for TODO 2)
- nereids_p0/json_p0/test_json_unique_load_and_function (for TODO 2)
- nereids_p0/jsonb_p0/test_jsonb_load_and_function (for TODO 2)
- nereids_p0/jsonb_p0/test_jsonb_unique_load_and_function (for TODO 2)
- json_p0/test_json_load_and_function (for TODO 2)
- json_p0/test_json_unique_load_and_function (for TODO 2)
- jsonb_p0/test_jsonb_load_and_function (for TODO 2)
- jsonb_p0/test_jsonb_unique_load_and_function (for TODO 2)
- test_multi_partition_key (for TODO 2)