Support alias function, Java UDF, Java UDAF for Nereids.
Implementation:
UDFs(alias function, Java UD(A)F) are saved in database object, we get it by FunctionDesc, which requires function name and arg types. So firstly we bind expressions of its children so that we can get the return type of args. Then we get the best selection.
Secondly:
For alias function:
The original function of the alias function is represented as original planner-style function, it's too hard to translate it to nereids-style expression hence we transfer it to the corresponding sql and parse it. Now we get the nereids-style function, and try to bind the function.
the bound function will also change the type by add cast node of its children to its expecting input types, so that if we travel a bound function more than one times, the cast node will be different. To solve the problem, we add a flag isAnalyzedFunction. it's set false by default and will be set true when return from the visitor function. If the flag is true, it will return immediately in visitor function.
Now we can ensure that the bound functions in children will be the same though we travel it more than one time. we can replace the alias function to its original function and bind the unbound functions.
For JavaUDF and JavaUDAF
JavaUDF and JavaUDAF can be recognized as a catalog function and hard to be entirely translated to Nereids-style function, we create a nereids expression object JavaUdf and JavaUdaf to wrap it.
All in all, now Nereids support UDFs and nesting them.
cast(A as date), where A is a string column. the min/max of result column stats should be calc like this:
convert A.minExpr to a date dateA, and then get double value from dateA.
add "explain memo plan select ..." to print memo from mysql client
dump column stats for FileScanNode, used in datalake.
PushdownJoinOtherCondition will pushdown expression in condition into project, it will block JoinReorder, so we need to pushdown project to help JoinReorder
After mergeGroup, the children of the plan are different from GroupExpr. To avoid optimizing out-dated group, we should construct new plan with groupExpr's children rather than plan's children
Use file system type and Conf as key to cache remote file system.
This could avoid get a new file system for each external table partition's location.
The time cost for fetching 100000 partitions with 1 file for each partition is reduced to 22s from about 15 minutes.
arrow is not support key column has null element , but doris default map key column is nullable , so need to deal with if doris map row if key column has null element , we put null to arrow
In parquet, min and max statistics may not be able to handle UTF8 correctly.
Current processing method is using min_value and max_value statistics introduced by PARQUET-1025 if they are used.
If not, current processing method is temporarily ignored. A better way is try to read min and max statistics if it contains
only ASCII characters. I will improve it in the future PR.
Support hudi time travel in external table:
```
select * from hudi_table for time as of '20230712221248';
```
PR(https://github.com/apache/doris/pull/15418) supports to take timestamp or version as the snapshot ID in iceberg, but hudi only has timestamp as the snapshot ID. Therefore, when querying hudi table with `for version as of`, error will be thrown like:
```
ERROR 1105 (HY000): errCode = 2, detailMessage = Hudi table only supports timestamp as snapshot ID
```
The supported formats of timestamp in hudi are: 'yyyy-MM-dd HH:mm:ss[.SSS]' or 'yyyy-MM-dd' or 'yyyyMMddHHmmss[SSS]', which is consistent with the [time-travel-query.](https://hudi.apache.org/docs/quick-start-guide#time-travel-query)
## Partitioning Strategies
Before this PR, hudi's partitions need to be synchronized to hive through [hive-sync-tool](https://hudi.apache.org/docs/syncing_metastore/#hive-sync-tool), or by setting very complex synchronization parameters in [spark conf](https://hudi.apache.org/docs/syncing_metastore/#sync-template). These processes are exceptionally complex and unnecessary, unless you want to query hudi data through hive.
In addition, partitions are changed in time travel. We cannot guarantee the correctness of time travel through partition synchronization.
So this PR directly obtain partitions by reading hudi meta information. Caching and updating table partition information through hudi instant timestamp, and reusing Doris' partition pruning.
* process differently for normal bloom filter index and ngram bf index
* fix review comments for readbility
* add test case
* add testcase for delete condition
* [Fix](rowset) When a rowset is cooled down, it is directly deleted. This can result in data query misses in the second phase of a two-phase query.
related pr #20732
There are two reasons for moving the logic of delayed deletion from the Tablet to the StorageEngine. The first reason is to consolidate the logic and unify the delayed operations. The second reason is that delayed garbage collection during queries can cause rowsets to remain in the "stale rowsets" state, preventing the timely deletion of rowset metadata, It may cause rowset metadata too large.
* not use unused rowsets
Problem:
When doing show export jobs, limit would execute before sort before changed. So the result would not be expected because limit always cut results first and we can not get what we want.
Example:
we having export job1 and job2 with JobId1 > JobId2. We want to get job with JobId1
show export from db order by JobId desc limit 1;
We do limit 1 first, so we would probably get Job2 because JobId assigned from small to large
Solve:
We can not cut results first if we have order by clause. And cut result set after sorting
Currently we find that MetastoreEventsProcessor can not catch up the event producing rate in our cluster, so we need to merge some hms events every round.
add session var: MAX_JOIN_NUMBER_BUSHY_TREE, default is 5
if table number is less than MAX_JOIN_NUMBER_BUSHY_TREE in a join cluster, nereids try bushy tree, o.w. zigzag tree