Fix bug that when create jdbc resource with only jdbc driver file name, it will failed to do checksum
This is because we forgot the pass the full driver url to JdbcClient.
Add ResultSet.FETCH_FORWARD and set AutoCommit to false to jdbc connection, so to avoid OOM when fetching large amount of data
set useCursorFetch in jdbc url for both MySQL and PostgreSQL.
Fix some p2 external datasource bug
in previous [PR 12872](https://github.com/apache/doris/pull/12872), we convert multi equals on same slot into `in predicate`. for example, `a =1 or a = 2` => `a in (1, 2)`
This pr makes 4 changes about convert or to in:
1. fix a bug: `Not IN` is merged with equal. `a =1 or a not in (2, 3)` => `a in (1, 2, 3)`
2. extends this rule on more cases
- merge for more than one slot: 'a =1 or a = 2 or b = 3 or b = 4' => `a in (1, 2) or b in (3, 4)`
- merge skip not-equal and not-in: 'a =1 or a = 2 or b !=3 or c not in (1, 2)' => 'a in (1, 2) or b!=3 or c not in (1,2)`
3. rewrite recursively.
4. OrToIn is implemented in ExtractCommonFactorsRule. This rule will generate new exprs. OrToIn should apply on such generated exprs. for example `(a=1 and b=2) or (a=3 and b=4)` => `(a=1 or a=3) and (b=2 or b=4) and [(a=1 and b=2) or (a=3 and b=4)]` => `a in (1,3) and b in (2 ,4) and [(a=1 and b=2) or (a=3 and b=4)]`
In addition, this pr add toString() for some Expr.
if a materialized view is selected, the olap scan node's NonUserVisibleOutput property may contains column from other materialized view. This pr remove invalid column
For now, type information of child expr which is NullLiteral would get lost in the CastExpr#getResultValue, this will produce a NullLiteral with Null type which cause BE core when doing cast
we have some rules that change output's nullable in rewrite step. So we need a rule to adjust nullable at the end of rewrite step.
TODO
- remove the output slot map
- add nullable compare into slot reference
- use exprid to compare two slot if do not need to compare nullable
- merge all rules into one to adjust all type plans
example:
unbound slot k
bounded [k, t.k]
In previous binding algorithm, there are 2 candidate bindings,
in which bounded k is exactly matched unbound slot k, it has higher priority than that of t1.k
Because of the rollup has the same keys and the keys's order is same, BE will do linked schema change. The base tablet's segments will link to the new rollup tablet. But the unique id from the base tablet is starting from 0 and as the rollup tablet also. In this case, the unique id 4 in the base table is column 'city', but in the rollup tablet is 'cost'. It will decode the varcode page to bigint page so that be coredump. It needs to be rejected.
I think that if a rollup add by link schema change, it means this rollup is redundant. It brings no additional revenue and wastes storage space. So It needs to be rejected.
If block bytes are bigger than the corresponding block's rows, then the avg_size_per_row would be zero. Which would end up diving zero in the following logic.
## Proposed changes
1. Check the state of MTMV task as the loop condition.
2. Check the data in materialized view.
## Problem summary
There are some minor issues with #15546.
1. The case used a retry strategy as the loop condition, it may not be stable while the host machine is busy.
2. The case didn't check the final data in materialized view.
1. remove forcing nullable for slot on EmptySetNode.
2. order by xxx desc should use nulls last as default order.
3. don't create runtime filter if runtime filter mode is OFF.
4. group by constant value need check the corresponding expr shouldn't have any aggregation functions.
5. fix two left outer join reorder bug( A left join B left join C).
6. fix semi join and left outer join reorder bug.( A left join B semi join C ).
7. fix group by NULL bug.
8. change ceil and floor function to correct signature.
9. add literal comparasion for string and date type.
10. fix the getOnClauseUsedSlots method may not return valid value.
11. the tightness common type of string and date should be date.
12. the nullability of set operation node's result exprs is not set correctly.
13. Sort node should remove redundent ordering exprs.
Describe your changes.
Firstly having clause of Mysql is really very complex, we are hard to follow all rules, so we revert pr15143 to keep the logic the same as before.
Secondly the origin implementation has problem while having clause has multi-conditions.
For example:
case1: here v2 inside having clause use table column test_having_alias_tb.v2
SELECT id, v1-2 as v, sum(v2) v2 FROM test_having_alias_tb GROUP BY id,v having(v2>1);
ERROR 1105 (HY000): errCode = 2, detailMessage = HAVING clause not produced by aggregation output (missing from GROUP BY clause?): (`v2` > 1)
case2: here v2 inside having clause use alias name v2 =sum(test_having_alias_tb.v2), another condition make logic of v2 differently.
SELECT id, v1-2 as v, sum(v2) v2 FROM test_having_alias_tb GROUP BY id,v having(v>0 AND v2>1) ORDER BY id,v;
+------+------+------+
| id | v | v2 |
+------+------+------+
| 2 | 1 | 3 |
+------+------+------+
So here we try to make the having clause rules simple:
Rule1: if alias name inside having clause is the same as column name, we use column name not alias name;
Rule2: if alias name inside having clause do not have same name as column name, we use alias name;
Co-authored-by: cambyzju <zhuxiaoli01@baidu.com>
Though this syntax doesn't get suppoted in many other systems since the order by clause here almost redandunt and useless but we have to keep consistent with the legacy doris syntax
Here is a example:
SELECT * FROM (SELECT k1, k3 FROM tbl1 ORDER BY k3 UNION ALL SELECT k1, k5 FROM tbl2) t;
when need to scan more than one olap table partition and it is not a colocate table or its colocate group is unstable, we need to make it as any distribution even if its distribution type is Hash