1. `uncheckedCastChild` may generate redundant `CastExpr` like `cast( cast(XXX as Date) as Date)`
2. generate DateLiteral to replace cast(IntLiteral as Date)
Child's slot with same name to the slots in the outputexpression would be discarded which would cause the bind failed, since the slots in the group by expressions cannot find the corresponding bound slots from the child's output
For example, in this case, the `date` in having clause should be bind to alias which has same name, instead of `date` field of the relation
SELECT date_format(date, '%x%v') AS `date` FROM `tb_holiday` WHERE `date` between 20221111 AND 20221116 HAVING date = 202245 ORDER BY date;
1. signatures without order element are wrong
2. signature with one arg is miss
3. group_concat should be NullableAggregateFunction
4. fold constant on fe should not fold NullableAggregateFunction with null arg
TODO
1. reorder rewrite rules, and then only forbid fold constant on NullableAggregateFunction with alwaysNullable == true
This PR optimize topn query like `SELECT * FROM tableX ORDER BY columnA ASC/DESC LIMIT N`.
TopN is is compose of SortNode and ScanNode, when user table is wide like 100+ columns the order by clause is just a few columns.But ScanNode need to scan all data from storage engine even if the limit is very small.This may lead to lots of read amplification.So In this PR I devide TopN query into two phase:
1. The first phase we just need to read `columnA`'s data from storage engine along with an extra RowId column called `__DORIS_ROWID_COL__`.The other columns are pruned from ScanNode.
2. The second phase I put it in the ExchangeNode beacuase it's the central node for topn nodes in the cluster.The ExchangeNode will spawn a RPC to other nodes using the RowIds(sorted and limited from SortNode) read from the first phase and read row by row from storage engine.
After the second phase read, Block will contain all the data needed for the query
The broker implements the interface to juicefs,It supports loading data from juicefs to doris through broker.
At the same time, it also implements the multi catalog to read the hive data stored in juicefs
Fix bug that when create jdbc resource with only jdbc driver file name, it will failed to do checksum
This is because we forgot the pass the full driver url to JdbcClient.
Add ResultSet.FETCH_FORWARD and set AutoCommit to false to jdbc connection, so to avoid OOM when fetching large amount of data
set useCursorFetch in jdbc url for both MySQL and PostgreSQL.
Fix some p2 external datasource bug
in previous [PR 12872](https://github.com/apache/doris/pull/12872), we convert multi equals on same slot into `in predicate`. for example, `a =1 or a = 2` => `a in (1, 2)`
This pr makes 4 changes about convert or to in:
1. fix a bug: `Not IN` is merged with equal. `a =1 or a not in (2, 3)` => `a in (1, 2, 3)`
2. extends this rule on more cases
- merge for more than one slot: 'a =1 or a = 2 or b = 3 or b = 4' => `a in (1, 2) or b in (3, 4)`
- merge skip not-equal and not-in: 'a =1 or a = 2 or b !=3 or c not in (1, 2)' => 'a in (1, 2) or b!=3 or c not in (1,2)`
3. rewrite recursively.
4. OrToIn is implemented in ExtractCommonFactorsRule. This rule will generate new exprs. OrToIn should apply on such generated exprs. for example `(a=1 and b=2) or (a=3 and b=4)` => `(a=1 or a=3) and (b=2 or b=4) and [(a=1 and b=2) or (a=3 and b=4)]` => `a in (1,3) and b in (2 ,4) and [(a=1 and b=2) or (a=3 and b=4)]`
In addition, this pr add toString() for some Expr.
if a materialized view is selected, the olap scan node's NonUserVisibleOutput property may contains column from other materialized view. This pr remove invalid column
1. Remove some redundant code.
2. Fix the issue with the state of MTMV task.
3. Fix the case - test_create_mtmv.
## Problem summary
1. We used a retry policy to re-run the failed MTMV tasks, but we set the state to `FAILURE` during re-running the tasks.
We should do this after all the retry runs fail.
2. There are some redundant code can be removed.
3. In the case test_create_mtmv, we created many background tasks to refresh the data. Some task may fail due to the concurrency and cause the test fail. Actually, we only need single one task to verify the functionality.
For now, type information of child expr which is NullLiteral would get lost in the CastExpr#getResultValue, this will produce a NullLiteral with Null type which cause BE core when doing cast
we have some rules that change output's nullable in rewrite step. So we need a rule to adjust nullable at the end of rewrite step.
TODO
- remove the output slot map
- add nullable compare into slot reference
- use exprid to compare two slot if do not need to compare nullable
- merge all rules into one to adjust all type plans
example:
unbound slot k
bounded [k, t.k]
In previous binding algorithm, there are 2 candidate bindings,
in which bounded k is exactly matched unbound slot k, it has higher priority than that of t1.k