1. remove phmap for padding rows
2. add SimpleFieldVisitorToScarlarType for short circuit type deducing
3. correct type coercion for conflict types bettween integers
4. improve nullable column performance
5. remove shared_ptr dependancy for DataType use TypeIndex instead
6. Optimization by caching the order of fields (which is almost always the same)
and a quick check to match the next expected field, instead of searching the hash table.
benchmark:
In clickbench data, load performance:
12m36.799s ->7m10.934s about 43% latency reduce
In variant_p2/performance.groovy:
3min44s20 -> 1min15s80 about 66% latency reducy
* [regression-test](Variant) Add more cases related to schema changes
And fix bugs about schema change for variant:
fix bug schema change crash on doing schema change with tablet schema that contains extracted columns
1. limit the column count to default 2048
2. fix get_inverted_index return nullptr when variant's unique id is -1, using it's parent unique id instead
3. avoid add same path subcolumn duplicately in tablet schema
4. make extracted column unique id -1
* [Improve](dynamic schema) support filtering invalid data
1. Support dynamic schema to filter illegal data.
2. Expand the regular expression for ColumnName to support more column names.
3. Be compatible with PropertyAnalyzer and support legacy tables.
4. Default disable parse multi dimenssion array, since some bug unresolved
1. make ColumnObject exception safe
2. introduce FlushContext and construct schema at memtable flush stage to make segment independent from dynamic schema
3. add more test cases
Currently, there are some useless includes in the codebase. We can use a tool named include-what-you-use to optimize these includes. By using a strict include-what-you-use policy, we can get lots of benefits from it.
rpc common is duplicate, all its method is included in function rpc. So that I remove it.
get_field_type is never used, remove it.
---------
Co-authored-by: yiguolei <yiguolei@gmail.com>
Before this PR when encountering null values with some columns which is specified as `NOT NULL`, null values will not be filtered,thi behavior does not match with the original load behavior.
Second column alignment logic has bug :
```
template <typename ColumnInserterFn>
void align_variant_by_name_and_type(ColumnObject& dst, const ColumnObject& src, size_t row_cnt,
ColumnInserterFn inserter) {
CHECK(dst.is_finalized() && src.is_finalized());
// Use rows() here instead of size(), since size() will check_consistency
// but we could not check_consistency since num_rows will be upgraded even
// if src and dst is empty, we just increase the num_rows of dst and fill
// num_rows of default values when meet new data
size_t num_rows = dst.rows();
```
1. introduce a new type `VARIANT` to encapsulate dynamic generated columns for hidding the detail of types and names of newly generated columns
2. introduce a new expression `SchemaChangeExpr` for doing schema change for extensibility
Issue Number: close#16351
Dynamic schema table is a special type of table, it's schema change with loading procedure.Now we implemented this feature mainly for semi-structure data such as JSON, since JSON is schema self-described we could extract schema info from the original documents and inference the final type infomation.This speical table could reduce manual schema change operation and easily import semi-structure data and extends it's schema automatically.