This PR fix the alignment process during publish phase when conflict occurs during concurrent partial updates: if we encounter a row with the same key and larger value in sequence column, it means that there exists another load which introduces a row with the same keys and larger sequence column value published successfully after the commit phase of the current load. We should act as follows:
- If the columns we update include sequence column, we should delete the current row becase the partial update on the current row has been overwritten by the previous one with larger sequence column value.
- Otherwise, we should combine the values of the missing columns in the previous row and the values of the including columns in the current row into a new row.
Use the unified jni framework to refactor java udf.
The unified jni framework takes VectorTable as the container to transform data between c++ and java, and hide the details of data format conversion.
In addition, the unified framework supports complex and nested types.
The performance of basic types remains consistent, with a 30% improvement in string types and an order of magnitude improvement in complex types.
Add new FE config `ignore_unknown_metadata_module`. Default is false.
If set to true, when reading metadata image file, and there are unknown modules, these modules
will be ignored and skipped.
This is mainly used in downgrade operation, old version can be compatible with new version Image file.
It has following problems before this PR
use count(*) to check if all column analyzed
return directly when fe count > 1
Co-authored-by: AKIHA <cyborgz1999@example.com>
Hive partition columns' stats could be calculated from hive metastore data. Doesn't need to execute sql to get the stats.
This PR is using hive partition metadata to collect partition column stats.
_do_evaluate will add temp result column into original table block, so in order to only convert correct columns to be nullable, need call convert_block_to_null before _do_evaluate
### Before:
return errors when tvf queries an empty file or an error uri:
1. get parsed schema failed, empty csv file
2. Can not get first file, please check uri.
### Now:
we just return empty set when tvf queries an empty file or an error uri.
```sql
mysql> select * from s3(
"uri" = "https://error_uri/exp_1.csv",
"s3.access_key"= "xx",
"s3.secret_key" = "yy",
"format" = "csv") limit 10;
Empty set (1.29 sec)
```