#31442
test in #34929
When null value is used as the partition value, BE will return the "null" string, so this string needs to be processed specially.
When `Export` statements are executed concurrently, the background uses
`Job schedule` to manage export tasks. Previously, the default value of
`async_task_consumer_thread_num` was 5, meaning that regardless of the
concurrency setting, a maximum of only 5 threads could execute
concurrently.
On the other hand, not only `Export` uses `Job schedule`, but other
scheduled tasks might also use `Job schedule`, leading to a shortage of
thread resources
Now, we have found that in many scenarios, `Export` needs to be set to a
high concurrency value and run concurrently according to that high
value. Clearly, `async_task_consumer_thread_num = 5` is no longer
sufficient, so we have changed the default value of
`async_task_consumer_thread_num` to 64
Issue Number: close#35024
This bug is because the fe incorrectly sets the update time of paimon
catalog, causing the be to be unable to update paimon's schema in time.
```c++
private void initTable() {
PaimonTableCacheKey key = new PaimonTableCacheKey(ctlId, dbId, tblId, paimonOptionParams, dbName, tblName);
TableExt tableExt = PaimonTableCache.getTable(key);
if (tableExt.getCreateTime() < lastUpdateTime) {
LOG.warn("invalidate cache table:{}, localTime:{}, remoteTime:{}", key, tableExt.getCreateTime(),
lastUpdateTime);
PaimonTableCache.invalidateTableCache(key);
tableExt = PaimonTableCache.getTable(key);
}
this.table = tableExt.getTable();
paimonAllFieldNames = PaimonScannerUtils.fieldNames(this.table.rowType());
if (LOG.isDebugEnabled()) {
LOG.debug("paimonAllFieldNames:{}", paimonAllFieldNames);
}
}
```
eliminate empty relations for following patterns:
topn->empty
sort->empty
distribute->empty
project->empty
(cherry picked from commit 8340f23946c0c8e40510ce937acd3342cb2e28b7)
## Proposed changes
Issue Number: close #xxx
<!--Describe your changes.-->
## Further comments
If this is a relatively large or complex change, kick off the discussion
at [dev@doris.apache.org](mailto:dev@doris.apache.org) by explaining why
you chose the solution you did and what alternatives you considered,
etc...
## Proposed changes
Linked PR : #35389
<!--Describe your changes.-->
## Further comments
If this is a relatively large or complex change, kick off the discussion
at [dev@doris.apache.org](mailto:dev@doris.apache.org) by explaining why
you chose the solution you did and what alternatives you considered,
etc...
backport https://github.com/apache/doris/pull/34672
<!--Describe your changes.-->
## Further comments
If this is a relatively large or complex change, kick off the discussion
at [dev@doris.apache.org](mailto:dev@doris.apache.org) by explaining why
you chose the solution you did and what alternatives you considered,
etc...
backport https://github.com/apache/doris/pull/33836
<!--Describe your changes.-->
## Further comments
If this is a relatively large or complex change, kick off the discussion
at [dev@doris.apache.org](mailto:dev@doris.apache.org) by explaining why
you chose the solution you did and what alternatives you considered,
etc...
pick from master #35463
commit id 0632309209cc3f9b6523ef7054eb1abdb9d0e7d8
when consumer side eliminate some consumers from plan, the size of
consumers is wrong. so we cannot push down some filter in producer side.
this PR fix this problem by update consumer set after rewrite outer side
backport https://github.com/apache/doris/pull/34433
<!--Describe your changes.-->
## Further comments
If this is a relatively large or complex change, kick off the discussion
at [dev@doris.apache.org](mailto:dev@doris.apache.org) by explaining why
you chose the solution you did and what alternatives you considered,
etc...
Add id to statistics map in statement context for cost estimation later
this helps to improve the probability to use materialized view when
query a single table with aggregate and many filter
example:
filter (y=1)
+-- window( ... partition by x)
+-- project( A as x, A as y)
filter(y=1) is equivalent to filter(x=1),
because x and y are in the same equal-set in window#logicalProperties.
And hence we could push filter(y=1) through window operator
bitmap filter is implemented before mark-join. When support mark-join, we forgot to update the bitmap-filter branch.
when convert a bitmap-apply-in to join, we should set markjoinReference to the join if there are markJoinRefereneces
introduced by #31811
sql like this:
select col1, col2 from (select a as col1, a as col2 from mal_test1 group by a) t group by col1, col2 ;
Transformation Description:
In the process of optimizing the query, an agg-project-agg pattern is transformed into a project-agg pattern:
Before Transformation:
LogicalAggregate
+-- LogicalPrject
+-- LogicalAggregate
After Transformation:
LogicalProject
+-- LogicalAggregate
Before the transformation, the projection in the LogicalProject was a AS col1, a AS col2, and the outer aggregate group by keys were col1, col2. After the transformation, the aggregate group by keys became a, a, and the projection remained a AS col1, a AS col2.
Problem:
When building the project projections, the group by key a, a needed to be transformed to a AS col1, a AS col2. The old code had a bug where it used the slot as the map key and the alias in the projections as the map value. This approach did not account for the situation where aliases might have the same slot.
Solution:
The new code fixes this issue by using the original outer aggregate group by expression's exprId. It searches within the original project projections to find the NamedExpression that has the same exprId. These expressions are then placed into the new projections. This method ensures that the correct aliases are maintained, resolving the bug.
Previously, the limitation on whether operations can be performed on materialized views was to determine `opType`.
Now, a `allowOpMTMV()` method is implemented through various `clauses`.
Because some operations have the same `opType`, but some operations allow and some do not.
For example, the `opType` for both `add column` and `create index` is `SCHEMA-CHANGE`, but `add column` is not allowed and `create index` is allowed.
In this pull request, we optimize the ordering of children in the Intersect operator to improve query performance. The proposed change is to place the child with the least row count in the first position of the Intersect operator.
The rationale behind this optimization is that the Intersect operator works by first evaluating the leftmost child and then iterating through the results of the other children to find matching rows. By placing the child with the least row count first, we can minimize the number of iterations required to find the matching rows, thereby reducing the overall execution time of the query.
commitid: 806e241
pr: #34768
Table id may be the same but actually they are different tables. so we optimize the
org.apache.doris.nereids.rules.exploration.mv.mapping.RelationMapping#getTableQualifier with following code:
Objects.hash(table.getDatabase().getCatalog().getId(), table.getDatabase().getId(), table.getId())
table id is long, we identify the table used in mv rewrite is bitSet. the bitSet can only use int, so we mapping the long id to init id in every query when mv rewrite
if query and mv def is as following:
def mv1_1 = """
select t1.L_LINENUMBER,t2.l_extendedprice, t2.L_ORDERKEY
from lineitem t1
inner join lineitem t2 on t1.L_ORDERKEY = t2.L_ORDERKEY;
"""
def query1_1 = """
select t1.L_LINENUMBER, t2.L_ORDERKEY
from lineitem t1
inner join lineitem t2 on t1.L_ORDERKEY = t2.L_ORDERKEY;
"""
this will generate relation mapping by Cartesian, if the num of self join is too much, this will cause the performance problem
so we add `materialized_view_relation_mapping_max_count` session varaible, default 8. if actual num is greater than the value, the excess relation mapping is discarded.
* [Feat](nereids) when dealing insert into stmt with empty table source, fe returns directly (#34418)
When a LogicalOlapScan has no partitions, transform it to a LogicalEmptyRelation.
When dealing insert into stmt with empty table source, fe returns directly.
* [Fix](nereids) fix when insert into select empty table
---------
Co-authored-by: feiniaofeiafei <moailing@selectdb.com>
The output slot's nullable info is not correctly calculated in union node.
Because old code only get correct result if union node has children.
But the union node may have no children but only have constantExprList.
So in that case, we should calculate output's nullable info byboth children and constantExprList.
agg_state is agg intermediate state, detail see
state combinator: https://doris.apache.org/zh-CN/docs/dev/sql-manual/sql-functions/combinators/state
this support agg function roll up as following
+---------------------+---------------------------------------------+---------------------+
| query | materialized view | roll up |
| ------------------- | ------------------------------------------- | ------------------- |
| agg_funtion() | agg_funtion_unoin() or agg_funtion_state() | agg_funtion_merge() |
| agg_funtion_unoin() | agg_funtion_unoin() or agg_funtion_state() | agg_funtion_union() |
| agg_funtion_merge() | agg_funtion_unoin() or agg_funtion_state() | agg_funtion_merge() |
+---------------------+---------------------------------------------+---------------------+
for example which can be rewritten by mv sucessfully as following
MV defination is
```
select
o_orderstatus,
l_partkey,
l_suppkey,
sum_union(sum_state(o_shippriority)),
group_concat_union(group_concat_state(l_shipinstruct)),
avg_union(avg_state(l_linenumber)),
max_by_union(max_by_state(l_shipmode, l_suppkey)),
count_union(count_state(l_orderkey)),
multi_distinct_count_union(multi_distinct_count_state(l_shipmode))
from lineitem
left join orders
on lineitem.l_orderkey = o_orderkey and l_shipdate = o_orderdate
group by
o_orderstatus,
l_partkey,
l_suppkey;
```
Query is
```
select
o_orderstatus,
l_suppkey,
sum(o_shippriority),
group_concat(l_shipinstruct),
avg(l_linenumber),
max_by(l_shipmode,l_suppkey),
count(l_orderkey),
multi_distinct_count(l_shipmode)
from lineitem
left join orders
on l_orderkey = o_orderkey and l_shipdate = o_orderdate
group by
o_orderstatus,
l_suppkey;
```
support count(*) used for window function
CREATE TABLE `t1` (
`id` INT NULL,
`dt` TEXT NULL
)
DISTRIBUTED BY HASH(`id`) BUCKETS 10
PROPERTIES (
"replication_allocation" = "tag.location.default: 1"
);
select *, count(*) over() from t1;