…avior (#38847)
https://github.com/apache/doris/pull/38847
## Proposed changes
There are two issues here. First, the results of casting are
inconsistent between FE and BE .
```
FE
mysql [(none)]>select cast('3.000' as int);
+----------------------+
| cast('3.000' as INT) |
+----------------------+
| 3 |
+----------------------+
mysql [(none)]>set debug_skip_fold_constant = true;
BE
mysql [(none)]>select cast('3.000' as int);
+----------------------+
| cast('3.000' as INT) |
+----------------------+
| NULL |
+----------------------+
```
The second issue is that casting on BE converts '3.0' to null. Here, the
casting logic for FE and BE has been unified
<!--Describe your changes.-->
## Proposed changes
Issue Number: close #xxx
<!--Describe your changes.-->
---------
Co-authored-by: Xinyi Zou <zouxinyi02@gmail.com>
pick from master #36478
intro a new rule VARIANT_SUB_PATH_PRUNING to prune variant sub path.
for example, variant slot v in table t has two sub path: 'c1' and 'c2',
after this rule, select v['c1'] from t will only scan one sub path 'c1'
of v to reduce scan time.
This rule accomplishes all the work using two components. The Collector
traverses from the top down, collecting all the element_at functions on
the variant types, and recording the required path from the original
variant slot to the current element_at. The Replacer traverses from the
bottom up, generating the slots for the required sub path on scan,
union, and cte consumer. Then, it replaces the element_at with the
corresponding slot.
* [Fix](Variant Type) forbit distribution info contains variant columns (#33707)
* [Fix](Variant) VariantRootColumnIterator::read_by_rowids with wrong null map size (#33734)
insert_range_from should start from `size` with `count` elements for null map
* [Fix](Variant) check column index validation for extracted columns (#33766)
* [Fix](Variant) forbit table with variant type doing segment compaction temporarily
TODO fix this corretly in later works
* [Bug](Variant) use lower case name for variant's root, since backend treat parent column as lower case
This PR address the problem as blow:
```
errCode = 2, detailMessage = (172.16.56.137)[CANCELLED]failed to initialize storage reader. tablet=17136, res=[INTERNAL_ERROR]Not found field_name, field_name:Tags.tag_key1, schema:[Thread(8), Tags(9), Source(5), tags.tag_key1(-1), Title(6), Level(3), Time(2), CreateDate(1), Message(7), IP(4), AppId(0)]
```
Query like `select * from ut_p partitions(p2) where cast(var['a'] as int) > 0` will fall through parition/tablet prunning since it's plan like
```
mysql> explain analyzed plan select * from ut_p where id = 3 and cast(var['a'] as int) = 789;
+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| Explain String(Nereids Planner) |
+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| LogicalResultSink[26] ( outputExprs=[id#0, var#1] ) |
| +--LogicalProject[25] ( distinct=false, projects=[id#0, var#1], excepts=[] ) |
| +--LogicalFilter[24] ( predicates=((cast(var#4 as INT) = 789) AND (id#0 = 3)) ) |
| +--LogicalFilter[23] ( predicates=(0 = __DORIS_DELETE_SIGN__#2) ) |
| +--LogicalProject[22] ( distinct=false, projects=[id#0, var#1, __DORIS_DELETE_SIGN__#2, __DORIS_VERSION_COL__#3, element_at(var#1, 'a') AS `var`#4], excepts=[] ) |
| +--LogicalOlapScan ( qualified=regression_test_variant_p0.ut_p, indexName=<index_not_selected>, selectedIndexId=10145, preAgg=ON ) |
+----------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
6 rows in set (0.01 sec)
```
with an extra LogicalProject on top of LogicalOlapScan, so we should handle such case to prune parition/tablet
* [Fix](Variant) support materialize view for variant and accessing variant subcolumns
1. fix schema change with path lost and lead to invalid data read
2. support element_at function in BE side and use simdjson to parse data
3. fix multi slot expression
* [Nereids](Variant) Implement variant type in Variant and support new sub column access method
The query SELECT v["a"]["b"] from simple_var WHERE cast(v["a"]["b"] as int) = 1
1. During the binding stage, the expression element_at(var, "xxx") is transformed into a SlotReference with a specified path. This conversion is tracked in the StatementContext, where the parent slot is the primary key and the paths are secondary keys. This structure, known as subColumnSlotRefMap in the StatementContext, helps to eliminate duplicates of the same slot derived from identical paths.
2. A new rule, BindSlotWithPaths, is introduced in the analysis stage. This rule is responsible for converting slots with paths into their respective slot suppliers. To ensure that slots with paths are correctly associated with the appropriate LogicalOlapScan, an additional mapping, slotToRelation, is added to the StatementContext. This mapping links the top-level slot to its corresponding relation (i.e., LogicalOlapScan). Consequently, subsequent slots with paths can determine the correct LogicalOlapScan to merge with and modify accordingly.
* [Feature](Variant) Implement variant new sub column access method
The query SELECT v["a"]["b"] from simple_var WHERE cast(v["a"]["b"] as int) = 1 encompasses three primary testing scenarios:
```
1. A basic test involving the variant data type.
2. A scenario dealing with GitHub event data in the context of a variant.
3. A case related to the TPC-H benchmark using a variant.
```
using the name without paths info will lead to wrong In plan, e.g.
```
where cast(v:a as text) = 'hello' or cast(v:b as text) = 'world'
```
will be rewrite to:
```
where cast(v as text) in ('hello', 'world')
``
This is wrong, because they are different slots