**Optimize**
PR #14470 has used `Expr` to filter delete rows to match current data file,
but the rows in the delete file are [sorted by file_path then position](https://iceberg.apache.org/spec/#position-delete-files)
to optimize filtering rows while scanning, so this PR remove `Expr` and use binary search to filter delete rows.
In addition, delete files are likely to be encoded in dictionary, it's time-consuming to decode `file_path`
columns into `ColumnString`, so this PR use `ColumnDictionary` to read `file_path` column.
After testing, the performance of iceberg v2's MOR is improved by 30%+.
**Fix Bug**
Lazy-read-block may not have the filter column, if the whole group is filtered by `Expr`
and the batch_eof is generated from next batch.
Support return bitmap data in select statement in vectorization mode
In the scenario of using Bitmap to circle people, users need to return the Bitmap results to the upper layer, which is parsing the contents of the Bitmap to deal with high QPS query scenarios
Losing segmentid info will mess up the _segment_id_to_value_in_dict_flags map
in InListPredicate, causing two distinct segments to collide and crash the BE
at last.
Signed-off-by: freemandealer <freeman.zhang1992@gmail.com>
Signed-off-by: freemandealer <freeman.zhang1992@gmail.com>
BE will crash when querying partitioned hive table with text format
and put partition column at first of select items.
1. FE should use file slots to set the column mapping index of csv file.
2. BE should use `get_by_name` of block to get right column in a block in csv reader.
Fix for SQL with array column:
delete from tbl where c_array is null;
more info please refer to #13360
Co-authored-by: cambyzju <zhuxiaoli01@baidu.com>
Problem:
IColumn::is_date property will lost after ColumnDate::clone called.
Fix:
After ColumnDate created, also set IColumn::is_date.
Co-authored-by: cambyzju <zhuxiaoli01@baidu.com>
As mentioned in #13074, there will be some problem in ColumnVector<int>::insert_many_in_copy_way.
Column::insert_xxx functions will append some data, they should reserve or resize before append data.
Co-authored-by: cambyzju <zhuxiaoli01@baidu.com>
Add `JSON` datatype, following features are implemented by this PR:
1. `CREATE` tables with `JSON` type columns
2. `INSERT` values containing `JSON` type value stored in `String`, which is represented as binary format(AKA `JSONB`) at BE
3. `SELECT` JSON columns
Detail design refers [DSIP-016: Support JSON type](https://cwiki.apache.org/confluence/display/DORIS/DSIP-016%3A+Support+JSON+type)
* add JSONB data storage format type
* fix JsonLiteral resolve bug
* add DataTypeJson case in data_type_factory
* add JSON syntax check in FE
* add operators for jsonb_document, currently not support comparison between any JSON type value
* add ColumnJson and DataTypeJson
* add JsonField to store JsonValue
* add JsonValue to convert String JSON to BINARY JSON and JsonLiteral case for vliteral
* add push_json for MysqlResultWriter
* JSON column need no zone_map_index
* Revert "JSON column need no zone_map_index"
This reverts commit f71d1ce1ded9dbae44a5d58abcec338816b70d79.
* add JSON writer and reader, ignore zone-map for JSON column
* add json_to_string for DataTypeJson
* add olap_data_convertor for JSON type
* add some enum
* add OLAP_FIELD_TYPE_JSON type, FieldTypeTraits for it and corresponding cases or functions
* fix column_json offsets overflow bug, format code
* remove useless TODOs, add CmpType cases for JSON type
* add license header
* format license
* format be codes
* resolve rebase master conflicts
* fix bugs for CREATE and meta related code
* refactor JsonValue constructors, add fe JSON cases and fix some bugs, reformat codes
* modification be codes along code review advice
* fix rebase conflicts with master
* add unit test for json_value and column_json
* fix rebase error
* rename json to jsonb
* fix some data convert bugs, set Mysql type to JSON