1. remove phmap for padding rows
2. add SimpleFieldVisitorToScarlarType for short circuit type deducing
3. correct type coercion for conflict types bettween integers
4. improve nullable column performance
5. remove shared_ptr dependancy for DataType use TypeIndex instead
6. Optimization by caching the order of fields (which is almost always the same)
and a quick check to match the next expected field, instead of searching the hash table.
benchmark:
In clickbench data, load performance:
12m36.799s ->7m10.934s about 43% latency reduce
In variant_p2/performance.groovy:
3min44s20 -> 1min15s80 about 66% latency reducy
Add variant type for metadata Add persistent information for variant, including the path of variant sub-columns, persisting them to the segment footer and tablet schema of the rowset.
1. make ColumnObject exception safe
2. introduce FlushContext and construct schema at memtable flush stage to make segment independent from dynamic schema
3. add more test cases
Currently, there are some useless includes in the codebase. We can use a tool named include-what-you-use to optimize these includes. By using a strict include-what-you-use policy, we can get lots of benefits from it.
1. change PipelineTaskState to enum class
2. remove some row-based code on FoldConstantExecutor::_get_result
3. reduce memcpy on minmax runtime filter function(Now we can guarantee that the input data is aligned)
4. add Wunused-template check, and remove some unused function, change some static function to inline function.
Issue Number: close#16351
Dynamic schema table is a special type of table, it's schema change with loading procedure.Now we implemented this feature mainly for semi-structure data such as JSON, since JSON is schema self-described we could extract schema info from the original documents and inference the final type infomation.This speical table could reduce manual schema change operation and easily import semi-structure data and extends it's schema automatically.