1. refactor in-predicate filter estimation
example: A in (1, 2, 3, 4)
after in-preidcate filter, A.stats.max<=4 and A.stats.min>=1
2. maintain minExpr and maxExpr in in-predicate stats derive
Currently, the new optimizer don't consider anything about partial update.
This PR add the ability to convert a delete statement to a partial update insert statement
for merge-on-write unique table
Support such grammar
ANALYZE TABLE test WITH CRON "* * * * * ?"
Such job would be scheduled as the cron expr specifie, but natively support minute-level schedule only
the cases as title will not pass in multi-be environment because the be queried doesn't contain outfile data. We will copy the outfile to every instance to fix it.
1. If only read the partition columns, the `JniConnector` will produce empty required fields, so `HudiJniScanner` should read the "_hoodie_record_key" field at least to know how many rows in current hoodie split. Even if the `JniConnector` doesn't read this field, the call of `releaseTable` in `JniConnector` will reclaim the resource.
2. To prevent BE failure and exit, `JniConnector` should call release methods after `HudiJniScanner` is initialized. It should be noted that `VectorTable` is created lazily in `JniScanner`, so we don't need to reclaim the resource when `HudiJniScanner` is failed to initialize.
## Remaining works
Other jni readers like `paimon` and `maxcompute` may encounter the same problems, the jni reader need to handle this abnormal situation on its own, and currently this fix can only ensure that BE will not exit.
* [Fix](multi-catalog) Not throw exceptions when file not exists for query of hms catalog.
* [Fix](multi-catalog) Not throw exceptions when file not exists for query of hms catalog.
---------
Co-authored-by: 王翔宇 <wangxiangyu@360shuke.com>
Upgrade hudi version from 0.13.0 to 0.13.1, and keep the hudi version of jni scanner the same as that of FE.
This may fix the bug of the table schema is not same as parquet schema.
First of all, mysql does not have a boolean type, its boolean type is actually tinyint(1), in the previous logic, We force tinyint(1) to be a boolean by passing tinyInt1isBit=true, which causes an error if tinyint(1) is not a 0 or 1, Therefore, we need to match tinyint(1) according to tinyint instead of boolean, and this change will not affect the correctness of where k = 1 or where k = true queries
This bug is introduced from #21771
Missing fileType field of TFileScanRangeParams, so the delete file of iceberg v2 will be treated as local file
and fail to read.
Current rf pushdown framework doesn't handle cte sender right. On cte consumer, it just return false and this will cause the rf is generated at the wrong place and lead the expr_order checking failed, but actually it should be pushed down on the cte sender. Also, set operation pushing down is unreachable if the outer stmt uses the alias of set operation's output before probeSlot's translation. Both of the above issues will be fixed in this pr
In current cte multicast fragment param computing logic in coordinator, if shared hash table for bc opened, its destination's number will be the same as be hosts'. But the judgment of falling into shared hash table bc part code is wrong, which will cause when a multicast's target is fixed with both bc and partition, the first bc info will overwrite the following partition's, i.e, the destination info will be the host level, which should be per instance. This will cause the hash partition part hang.
Problem:
Minidump unit test failed because of column statistic deserialization need a new column schema but not added to minidump unit test file
Solved:
Add last update time to unit test input file
1. cancel future when meet timeout and add config to modify rpc timeout
2. add config to modify numof BackendServiceProxy since under high concurrent work load GRPC channel will be blocked