* [refactor](Nereids)refactor runtime filter generator (#34275)
1. unify the process of generating rf for hash join and for nested loop join
2. fix some bugs in generating rf
3. remove some duplicated check
(cherry picked from commit 07267faac0d9c6ef3bb1fd4ee101b4c761c8a2f2)
* [refactor](nereids) do not deny a runtime filter by removing an entry in aliasMap (#34559)
in current version, there are 2 approaches to verify whether a join condition can be used to generate a runtime filter, they are
1. remove the output slot from aliasMap
2. pushDownVisitor.visit(...) return false
the 1st approach has some drawbacks, we prefer to the 2ed approach.
In this pr, all the cases are handled by the 2ed approach, and remove the related code for the 1st approach.
(cherry picked from commit a29082bf31e66efa2df193b38347e610f2bf7464)
* rebase
we have an order reserved mappping from string to double.
for string column A, we have double values for A.min and A.max.
when estimating A<"abc", A.min/max could be used to judge whether 'abc' is between A.min and A.max, but it cannot be used to do range estimation. suppose "abc" is mapped to double x. if we compute selectivity by formula "sel = (x-A.min)/(A.max-A.min)", we are likely to obtain extreme values.
The legacy planner encounters issues when handling filters such as: c1(boolean type)=0.0(decimalv3).
The literal 0.0 is interpreted as decimalv3(1,1), and the boolean type c1 is coerced to decimalv3(1,1).
decimalv3(1,1) can only retain values in the range [0,1), while the boolean true is represented as 1, exceeding the upper bound, thus causing an overflow problem.
This pull request addresses this issue by considering the boolean type as decimalv3(1,0), making both c1 and 0.0 being cast to decimal(2,1).
Co-authored-by: feiniaofeiafei <moailing@selectdb.com>
In NormalizeRepeat, three parts of the outputExpression of LogicalRepeat need to be pushed down and outputted by bottom project: flattenGroupingSetExpr, argumentsOfGroupingScalarFunction, argumentsOfAggregateFunction.
In the original code, use these three parts to rewrite the outputExpressions of LogicalRepeat to slots.This can cause problems in some cases, for example:
```sql
SELECT
ROUND( SUM(pk + 1) - 3) col_alias1,
pk + 1 AS col_alias3
FROM
table_20_undef_partitions2_keys3_properties4_distributed_by53
GROUP BY
GROUPING SETS ((pk), ()) ;
```
The three parts expression needed to be pushed down are: pk, pk+1. The original code use pk+1 to rewrite the pk + 1 AS col_alias3 to slot. But the pk+1 is not in the list of grouping outputs, and then report error.
This pr change the rewrite process, divide the expression needed to be pushed down into 2 parts: one is (flattenGroupingSetExpr) and the other one is (argumentsOfGroupingScalarFunction, argumentsOfAggregateFunction).
and use the flattenGroupingSetExpr rewrite all LogicalRepeat outputExpressions, and use the argumentsOfGroupingScalarFunction, argumentsOfAggregateFunction to rewrite only the agg function arguments and the grouping scalar function.
So, in the above sql, the pk + 1 AS col_alias3 will not be rewritten to slot, and can be computed.
this PR is from @sjyango work in #32326,
wants merge #32326 into master branch, but it's draft and not maintain long time. so have this new PR.
Co-authored-by: sjyango <sjyang2022@zju.edu.cn>
This PR supports a Table Value Function called `Query`. He can push a query directly to the catalog source for execution by specifying `catalog` and `query` without parsing by Doris. Doris only receives the results returned by the query.
Currently only JDBC Catalog is supported.
Example:
```
Doris > desc function query('catalog' = 'mysql','query' = 'select count(*) as cnt from test.test');
+-------+--------+------+------+---------+-------+
| Field | Type | Null | Key | Default | Extra |
+-------+--------+------+------+---------+-------+
| cnt | BIGINT | Yes | true | NULL | NONE |
+-------+--------+------+------+---------+-------+
Doris > select * from query('catalog' = 'mysql','query' = 'select count(*) as cnt from test.test');
+----------+
| cnt |
+----------+
| 30000000 |
+----------+
```
In MySQL, it's common to use a simplified syntax like `SELECT constant FROM dual`
which is equivalent to just `SELECT constant`.
This syntax is often used by BI tools when utilizing MySQL connectors to verify connection validity.
To enhance compatibility and ensure seamless integration with such tools,
we have now implemented this feature in Doris.
### Key Changes:
- Doris now interprets `SELECT constant FROM dual` as `SELECT constant`, aligning with MySQL's behavior.
- This update ensures that BI tools can use standard MySQL connectors without modifications or errors when connecting to Doris.
In SQL syntax, `col != ''` equals `col.length() > 0`.
It means that this column must exist in ES doc fields and its content is not empty.
In this PR, we make a special translation for this binary predicate to keep the behavior of both consistent.
---------
Co-authored-by: Luennng <luennng@gmail.com>
Expand bucket shuffle downgrade condition, which originally requiring a single partition after pruning, basic table and bucket number < para number. Currently, we expect this option can be used for disabling bucket shuffle more efficiently, without above restrictions.
Co-authored-by: zhongjian.xzj <zhongjian.xzj@zhongjianxzjdeMacBook-Pro.local>
we save bi-map in cte consumer to get the maping between producer and consumer.
the consumer's output is decided by the map in it.
so, cte consumer should be output prunable, and should remove useless entry from map when do column pruning