When enable two level hash table , if there is zero value in the existing one level hash table, it will cause dead loop when converting to two level hash table, because the PartitionedHashTable::_is_partitioned flag is not set correctly when doing the converting.
Hive escapes some special characters in partition value to %XX, for example, / is escaped to %2F.
Doris didn't handle this case which will cause doris failed to list the files under partition with special characters.
This pr is to fix this bug.
`Analyze database db_name ` command couldn't use current catalog, it is always using the internal catalog. This will cause the command failed to find the db. This pr is to fix this bug.
Support alias function, Java UDF, Java UDAF for Nereids.
Implementation:
UDFs(alias function, Java UD(A)F) are saved in database object, we get it by FunctionDesc, which requires function name and arg types. So firstly we bind expressions of its children so that we can get the return type of args. Then we get the best selection.
Secondly:
For alias function:
The original function of the alias function is represented as original planner-style function, it's too hard to translate it to nereids-style expression hence we transfer it to the corresponding sql and parse it. Now we get the nereids-style function, and try to bind the function.
the bound function will also change the type by add cast node of its children to its expecting input types, so that if we travel a bound function more than one times, the cast node will be different. To solve the problem, we add a flag isAnalyzedFunction. it's set false by default and will be set true when return from the visitor function. If the flag is true, it will return immediately in visitor function.
Now we can ensure that the bound functions in children will be the same though we travel it more than one time. we can replace the alias function to its original function and bind the unbound functions.
For JavaUDF and JavaUDAF
JavaUDF and JavaUDAF can be recognized as a catalog function and hard to be entirely translated to Nereids-style function, we create a nereids expression object JavaUdf and JavaUdaf to wrap it.
All in all, now Nereids support UDFs and nesting them.
In parquet, min and max statistics may not be able to handle UTF8 correctly.
Current processing method is using min_value and max_value statistics introduced by PARQUET-1025 if they are used.
If not, current processing method is temporarily ignored. A better way is try to read min and max statistics if it contains
only ASCII characters. I will improve it in the future PR.
* process differently for normal bloom filter index and ngram bf index
* fix review comments for readbility
* add test case
* add testcase for delete condition
Add new rule named "OrToIn", used to convert multi equalTo which has same slot and compare to a literal of disjunction to a InPredicate so that it could be pushdown to storage engine.
for example:
```sql
col1 = 1 or col1 = 2 or col1 = 3 and (col2 = 4)
col1 = 1 and col1 = 3 and col2 = 3 or col2 = 4
(col1 = 1 or col1 = 2) and (col2 = 3 or col2 = 4)
```
would be converted to
```sql
col1 in (1, 2) or col1 = 3 and (col2 = 4)
col1 = 1 and col1 = 3 and col2 = 3 or col2 = 4
(col1 in (1, 2) and (col2 in (3, 4)))
```
1. expand the semantics of variable strict_mode to control the behavior for stream load: if strict_mode is true, the stream load can only update existing rows; if strict_mode is false, the stream load can insert new rows if the key is not present in the table
2. when inserting a new row in non-strict mode stream load, the unmentioned columns should have default value or be nullable
1. forbid all candidates that need to gather process except must do it
2. forbid do local agg after reshuffle of two phase agg of distinct
3. forbid one phase agg after reshuffle
4. forbid three or four phase agg for distinct if any stage need reshuffle
5. forbid multi distinct for one distinct agg if do not need reshuffle
For file scan node, this is a special field `requiredSlot`, this field is set depends on the `isMaterialized` info of slot.
But `isMaterialized` info can be changed during the plan process, so we must update the `requiredSlot`
in `finalize` phase of scan node, otherwise, it may causing BE crash due to mismatching slot info.
1. allow cast boolean as date like type in nereids, the result is null
2. PruneOlapScanTablet rule can prune tablet even if a mv index is selected.
3. constant conjunct should not be pushed through agg node in old planner
Problem:
Select list should be non const when from list have tables or multiple tuples. Or upper query will regard wrong of isConstant
And make wrong constant folding
For example: when using nullif funtion with subquery which result in two alternative constant, planner would treat it as constant expr. So analyzer would report an error of order by clause can not be constant
Solusion:
Change inline view output to non constant, because (select 1 a from table) as view , a in output is no constant when we see
view.a outside
Current runtime filter pushing down to cte internal, we construct the runtime filter expr_order with incremental number, which is not correct. For cte internal rf pushing down, the join node will be always different, the expr_order should be fixed as 0 without incrementation, otherwise, it will lead the checking for expr_order and probe_expr_size illegal or wrong query result.
This pr will revert 2827bc1 temporarily, it will break the cte rf pushing down plan pattern.
Co-authored-by: Jerry Hu <mrhhsg@gmail.com>
1. Filtering is done at the sending end rather than the receiving end
2. Projection is done at the sending end rather than the receiving end
3. Each sender can use different shuffle policies to send data
after running EliminateNotNull rule, the join conjuncts may be removed from inner join node.
So need run ConvertInnerOrCrossJoin rule to convert inner join with no join conjuncts to cross join node.
Issue Number: close#20948
Fix read error in mixed partition locations(for example, some partitions locations are on s3, other are on hdfs) by `getLocationType` of file split level instead of the table level.