1. replace all boost::shared_ptr to std::shared_ptr
2. replace all boost::scopted_ptr to std::unique_ptr
3. replace all boost::scoped_array to std::unique<T[]>
4. replace all boost:thread to std::thread
Remove part of dynamic_cast, reduce the overhead caused by type conversion,
and probably reduce the cpu consumption of parquet file import by about 10%
fix: https://github.com/apache/incubator-doris/issues/3984
1. add `conjunct.size` checking and `slot_desc nullptr` checking logic
2. For historical reasons, the function predicates are added one by one, I just refactor the processing make thelogic for function predicate processing more clearly
[Doris On ES] Skip function_call expr when process predicate
Fixed#3801
Do not push-down function_call such as split_xxx when process predicate, Doris BE is responsible for processing these predicate
All rows in table:
```
+------+------+------+------------+------------+
| k1 | k2 | k3 | UpdateTime | ArriveTime |
+------+------+------+------------+------------+
| NULL | NULL | kkk1 | 123456789 | NULL |
| kkk1 | NULL | NULL | 123456789 | NULL |
| NULL | kkk2 | NULL | 123456789 | NULL |
+------+------+------+------------+------------+
```
The following predicate could not push down to ES.
```
SQL 1:
mysql> select * from (select split_part(k1, "1", 1) as kk from case_replay_for_milimin) t where t.kk is not null;
+------+
| kk |
+------+
| kkk |
+------+
1 row in set (0.02 sec)
SQL 2:
mysql> select * from (select split_part(k1, "1", 1) as kk from case_replay_for_milimin) t where t.kk > 'a';
+------+
| kk |
+------+
| kkk |
+------+
SQL 3:
mysql> select * from (select split_part(k1, "1", 1) as kk from case_replay_for_milimin) t where t.kk > '2';
+------+
| kk |
+------+
| kkk |
+------+
1 row in set (0.03 sec)
```
Process castexpr, such as: k (float) > 2.0, k(int) > 3.2, Doris On Es should ignore this doris native cast transformation for every row's col value, we push down this `cast semantic` to Elasticsearch.
I believe in this `predicate` situation, would decrease the mount of data for transmission。
k1 is float:
````
k1 >= 5
````
push-down filter:
```
{"range":{"k1":{"gte":"5.000000"}}}
```
k2 is int :
```
k2 > 3.2
```
push-down filter:
```
{"range":{"k2":{"gte":"3.2"}}}
```
This PR is just a transitional way,but it is better to move the predicates transformation from Doris BE to Doris BE, in this way, Doris BE is responsible for fetching data from ES.
Add a `enable_keyword_sniff ` configuration item in creating External Elasticsearch Table ,it default to true , would to sniff the `keyword` type on the `text analyzed` Field and return the `json_path` which substitute the origin col name.
```
CREATE EXTERNAL TABLE `test` (
`k1` varchar(20) COMMENT "",
`create_time` datetime COMMENT ""
) ENGINE=ELASTICSEARCH
PROPERTIES (
"hosts" = "http://10.74.167.16:8200",
"user" = "root",
"password" = "root",
"index" = "test",
"type" = "doc",
"enable_keyword_sniff" = "true"
);
```
note: `enable_keyword_sniff` default to "true"
run this SQL:
```
select * from test where k1 = "wu yun feng"
```
Output predicate DSL:
```
{"term":{"k1.keyword":"wu yun feng"}}
```
and in this PR, I remove the elasticsearch version detected logic for now this is useless, maybe future is needed.
Relate Issue: https://github.com/apache/incubator-doris/issues/3248
SQL:
```
select * from test where (k2 = 6 and k3 = 1) or (k2 = 2 and k3 =3 and k4 = 'beijing');
```
Output filter:
```
((#k2:[6 TO 6] #k3:[1 TO 1]) (#(#k2:[2 TO 2] #k3:[3 TO 3]) #k4:beijing))~1
```
SQL:
```
select * from test where (k2 = 6 or k3 = 7) or (k2 = 2 and k3 =3 and (k4 = 'beijing' or k4 = 'zhaochun'));
```
Output filter:
```
(k2:[6 TO 6] k3:[7 TO 7] (#(#k2:[2 TO 2] #k3:[3 TO 3]) #((k4:beijing k4:zhaochun)~1)))~1
```
SQL:
```
select * from test where (k2 = 6 or k3 = 7) or (k2 = 2 and abs(k3) =3 and (k4 = 'beijing' or k4 = 'zhaochun'));
```
Output filter (`abs` can not be pushed down to es, so doris on es would not process this scenario ):
```
match_all
```