[fix](Nereids) string literal coercion of in predicate ( #35337 )
...
pick from master #35200
Description:
The sql execute much slow when the literal value with string format in `in predicate`; and the real data is integral type。
```
mysql> set enable_nereids_planner = false;
Query OK, 0 rows affected (0.03 sec)
mysql> select id,sum(clicks) from a_table where id in ('787934713', '306960695') group by id limit 10;
+------------+---------------+
| id | sum(`clicks`) |
+------------+---------------+
| 787934713 | 2838 |
| 306960695 | 339 |
+------------+---------------+
2 rows in set (1.81 sec)
mysql> set enable_nereids_planner = true;
Query OK, 0 rows affected (0.02 sec)
mysql> select id,sum(clicks) from a_table where id in ('787934713', '306960695') group by id limit 10;
+------------+-------------+
| id | sum(clicks) |
+------------+-------------+
| 787934713 | 2838 |
| 306960695 | 339 |
+------------+-------------+
2 rows in set (28.14 sec)
```
Reason:
In legacy planner, the string literal with convert to integral value, but in the nereids planner do not do this convert and with do string matching in BE。
Solved:
do process string literal with numeric in `in predicate` like in `comparison predicate`;
test table:
```
create table a_table(
k1 BIGINT NOT NULL,
k2 VARCHAR(100) NOT NULL,
v1 INT SUM NULL DEFAULT "0"
) ENGINE=OLAP
AGGREGATE KEY(k1,k2)
distributed BY hash(k1) buckets 2
properties("replication_num" = "1");
insert into a_table values (10, 'name1', 10),(20, 'name2', 10);
explain plan select * from a_table where k1 in ('10', '20001');
```
before optimize:
```
+--------------------------------------------------------------------------------------------------------------------------------------+
| Explain String(Nereids Planner) |
+--------------------------------------------------------------------------------------------------------------------------------------+
| ========== PARSED PLAN (time: 1ms) ========== |
| UnboundResultSink[4] ( ) |
| +--LogicalProject[3] ( distinct=false, projects=[*], excepts=[] ) |
| +--LogicalFilter[2] ( predicates='k1 IN ('10001', '20001') ) |
| +--LogicalCheckPolicy ( ) |
| +--UnboundRelation ( id=RelationId#0, nameParts=a_table ) |
| |
| ========== ANALYZED PLAN (time: 2ms) ========== |
| LogicalResultSink[15] ( outputExprs=[k1#0, k2#1, v1#2] ) |
| +--LogicalProject[13] ( distinct=false, projects=[k1#0, k2#1, v1#2], excepts=[] ) |
| +--LogicalFilter[11] ( predicates=cast(k1#0 as TEXT) IN ('10001', '20001') ) |
| +--LogicalOlapScan ( qualified=internal.db.a_table, indexName=<index_not_selected>, selectedIndexId=12003, preAgg=UNSET ) |
| |
| ========== REWRITTEN PLAN (time: 6ms) ========== |
| LogicalResultSink[45] ( outputExprs=[k1#0, k2#1, v1#2] ) |
| +--LogicalFilter[43] ( predicates=cast(k1#0 as TEXT) IN ('10001', '20001') ) |
| +--LogicalOlapScan ( qualified=internal.db.a_table, indexName=a_table, selectedIndexId=12003, preAgg=OFF, No aggregate on scan. ) |
| |
| ========== OPTIMIZED PLAN (time: 6ms) ========== |
| PhysicalResultSink[90] ( outputExprs=[k1#0, k2#1, v1#2] ) |
| +--PhysicalDistribute[87]@1 ( stats=0.33, distributionSpec=DistributionSpecGather ) |
| +--PhysicalFilter[84]@1 ( stats=0.33, predicates=cast(k1#0 as TEXT) IN ('10001', '20001') ) |
| +--PhysicalOlapScan[a_table]@0 ( stats=1 ) |
+--------------------------------------------------------------------------------------------------------------------------------------+
```
after optimize:
```
+--------------------------------------------------------------------------------------------------------------------------------------+
| Explain String(Nereids Planner) |
+--------------------------------------------------------------------------------------------------------------------------------------+
| ========== PARSED PLAN (time: 15ms) ========== |
| UnboundResultSink[4] ( ) |
| +--LogicalProject[3] ( distinct=false, projects=[*], excepts=[] ) |
| +--LogicalFilter[2] ( predicates='k1 IN ('10001', '20001') ) |
| +--LogicalCheckPolicy ( ) |
| +--UnboundRelation ( id=RelationId#0, nameParts=a_table ) |
| |
| ========== ANALYZED PLAN (time: 11ms) ========== |
| LogicalResultSink[15] ( outputExprs=[k1#0, k2#1, v1#2] ) |
| +--LogicalProject[13] ( distinct=false, projects=[k1#0, k2#1, v1#2], excepts=[] ) |
| +--LogicalFilter[11] ( predicates=k1#0 IN (10001, 20001) ) |
| +--LogicalOlapScan ( qualified=internal.db.a_table, indexName=<index_not_selected>, selectedIndexId=12003, preAgg=UNSET ) |
| |
| ========== REWRITTEN PLAN (time: 12ms) ========== |
| LogicalResultSink[45] ( outputExprs=[k1#0, k2#1, v1#2] ) |
| +--LogicalFilter[43] ( predicates=k1#0 IN (10001, 20001) ) |
| +--LogicalOlapScan ( qualified=internal.db.a_table, indexName=a_table, selectedIndexId=12003, preAgg=OFF, No aggregate on scan. ) |
| |
| ========== OPTIMIZED PLAN (time: 4ms) ========== |
| PhysicalResultSink[90] ( outputExprs=[k1#0, k2#1, v1#2] ) |
| +--PhysicalDistribute[87]@1 ( stats=0, distributionSpec=DistributionSpecGather ) |
| +--PhysicalFilter[84]@1 ( stats=0, predicates=k1#0 IN (10001, 20001) ) |
| +--PhysicalOlapScan[a_table]@0 ( stats=2 ) |
+--------------------------------------------------------------------------------------------------------------------------------------+
```