When compaction, if some segments miss variant root, there is chance to get emtpy root variant.So add some defence
code in OlapColumnDataConvertorVariant to prevent from accessing null root
```
5# doris::vectorized::ColumnObject::Subcolumn::get_finalized_column_ptr() const at /mnt/ssd01/selectdb-doris-package/enterprise-core/be/src/vec/columns/column_object.cpp:556
6# doris::vectorized::OlapBlockDataConvertor::OlapColumnDataConvertorVariant::set_source_column(doris::vectorized::ColumnWithTypeAndName const&, unsigned long, unsigned long) at /mnt/ssd01/selectdb-doris-package/enterprise-core/be/src
/vec/olap/olap_data_convertor.cpp:1076
7# doris::vectorized::OlapBlockDataConvertor::set_source_content(doris::vectorized::Block const*, unsigned long, unsigned long) at /mnt/ssd01/selectdb-doris-package/enterprise-core/be/src/vec/olap/olap_data_convertor.cpp:207
8# doris::segment_v2::SegmentWriter::append_block(doris::vectorized::Block const*, unsigned long, unsigned long) at /mnt/ssd01/selectdb-doris-package/enterprise-core/be/src/olap/rowset/segment_v2/segment_writer.cpp:727
9# doris::VerticalBetaRowsetWriter::add_columns(doris::vectorized::Block const*, std::vector > const&, bool, unsigned int) at /mnt/ssd01/selectdb-doris-package/enterprise-core/be/src/olap/row
set/vertical_beta_rowset_writer.cpp:125
```
before the agg_state type only support with datatype string,
But with some agg functions, eg: avg,sum,mix...
those functions need serialize type is fixed length object type
1. make ColumnObject exception safe
2. introduce FlushContext and construct schema at memtable flush stage to make segment independent from dynamic schema
3. add more test cases
Currently, there are some useless includes in the codebase. We can use a tool named include-what-you-use to optimize these includes. By using a strict include-what-you-use policy, we can get lots of benefits from it.
The offset of _nullmap and _value are inconsistent in OlapDataConvertor, so the obtained null flag is incorrect when calling get_ data_ at function. When the key column or sequence column has null values, the encoding of the short key index or primary key index may be wrong.
This was introduced by #10883#10925.
When compaction case, memory map offsets coming to same olap convertor which is from 0 to 0+size
but it should be continue in different pages when in one segment writer .
eg :
last block with map offset : [3, 6, 8, ... 100]
this block with map offset : [5, 10, 15 ..., 100]
the same convertor should record last offset to make later coming offset followed last offset.
so after convertor :
the current offset should [105, 110, 115, ... 200], then column writer just call append_data() to make the right offset data append pages
* [Feature](vectorized)(quantile_state): support vectorized quantile state functions
1. now quantile column only support not nullable
2. add up some regression test cases
3. set default enable_quantile_state_type = true
---------
Co-authored-by: spaces-x <weixiang06@meituan.com>
This commit support:
1、Insert + select for struct/map type
2、Json stream load for struct type
3、m[key] function for map type
How to use:
Set the fe config to create table for struct and map type
1、admin set frontend config("enable_struct_type" = "true");
2、admin set frontend config("enable_map_type" = "true");
#16547
Co-authored-by: xy720 <xuyang25@baidu.com>
Co-authored-by: amory <wangqiannan@selectdb.com>
Co-authored-by: cambyzju <zhuxiaoli01@baidu.com>
Co-authored-by: hucheng01 <hucheng01@baidu.com>
1. support row format using codec of jsonb
2. short path optimize for point query
3. support prepared statement for point query
4. support mysql binary format
When schema change and compaction is executing simutaneously, both
nullable and not nullable data can be read for the same column, need to
reset _nullmap for each Block when converting Block data, or else Column
case will be wrong.
Add `JSON` datatype, following features are implemented by this PR:
1. `CREATE` tables with `JSON` type columns
2. `INSERT` values containing `JSON` type value stored in `String`, which is represented as binary format(AKA `JSONB`) at BE
3. `SELECT` JSON columns
Detail design refers [DSIP-016: Support JSON type](https://cwiki.apache.org/confluence/display/DORIS/DSIP-016%3A+Support+JSON+type)
* add JSONB data storage format type
* fix JsonLiteral resolve bug
* add DataTypeJson case in data_type_factory
* add JSON syntax check in FE
* add operators for jsonb_document, currently not support comparison between any JSON type value
* add ColumnJson and DataTypeJson
* add JsonField to store JsonValue
* add JsonValue to convert String JSON to BINARY JSON and JsonLiteral case for vliteral
* add push_json for MysqlResultWriter
* JSON column need no zone_map_index
* Revert "JSON column need no zone_map_index"
This reverts commit f71d1ce1ded9dbae44a5d58abcec338816b70d79.
* add JSON writer and reader, ignore zone-map for JSON column
* add json_to_string for DataTypeJson
* add olap_data_convertor for JSON type
* add some enum
* add OLAP_FIELD_TYPE_JSON type, FieldTypeTraits for it and corresponding cases or functions
* fix column_json offsets overflow bug, format code
* remove useless TODOs, add CmpType cases for JSON type
* add license header
* format license
* format be codes
* resolve rebase master conflicts
* fix bugs for CREATE and meta related code
* refactor JsonValue constructors, add fe JSON cases and fix some bugs, reformat codes
* modification be codes along code review advice
* fix rebase conflicts with master
* add unit test for json_value and column_json
* fix rebase error
* rename json to jsonb
* fix some data convert bugs, set Mysql type to JSON
get_data_at should use offset - offsets[start_index] since
start_index may be changed after OlapColumnDataConvertorArray::set_source_column.
Using just offset may access the memory out of _item_convertor's data range,