1. get mv statistics, the relation id should be getted from scan plan,
fix it.
2. optimize the `isGroupEquals` method, add `MaterializationContext`
param which maybe used to control the decide group by equals logic.
3. remove `set enable_nereids_timeout = false;` setting in mv rewrite
regression test.
this pr improves #34542, when the real data type is date-like type.
Some users are likely to define date(datetime) column as Varchar type.
when estimating the selectivity of predicate like A>'2020-01-01', if
nereids regards A and '2020-01-01' as date type, the sel is more
accurate than that as string type.
Add synchronized for analysis related maps to avoid
ConcurrentModificationException. For example, modify the map while
writing image will throw ConcurrentModificationException.
```
Caused by: java.lang.NullPointerException
at java.util.concurrent.ConcurrentHashMap.get(ConcurrentHashMap.java:936) ~[?:1.8.0_301]
at org.apache.doris.catalog.ResourceMgr.getResource(ResourceMgr.java:166) ~[doris-fe.jar:1.2-SNAPSHOT]
at org.apache.doris.datasource.CatalogProperty.catalogResource(CatalogProperty.java:67) ~[doris-fe.jar:1.2-SNAPSHOT]
at org.apache.doris.datasource.CatalogProperty.getOrDefault(CatalogProperty.java:77) ~[doris-fe.jar:1.2-SNAPSHOT]
at org.apache.doris.datasource.ExternalCatalog.setDefaultPropsIfMissing(ExternalCatalog.java:173) ~[doris-fe.jar:1.2-SNAPSHOT]
at org.apache.doris.datasource.hive.HMSExternalCatalog.setDefaultPropsIfMissing(HMSExternalCatalog.java:238) ~[doris-fe.jar:1.2-SNAPSHOT]
at org.apache.doris.datasource.ExternalCatalog.gsonPostProcess(ExternalCatalog.java:687) ~[doris-fe.jar:1.2-SNAPSHOT]
at org.apache.doris.persist.gson.GsonUtils$PostProcessTypeAdapterFactory$1.read(GsonUtils.java:640) ~[doris-fe.jar:1.2-SNAPSHOT]
at com.google.gson.TypeAdapter.fromJsonTree(TypeAdapter.java:299) ~[gson-2.10.1.jar:?]
at org.apache.doris.persist.gson.RuntimeTypeAdapterFactory$1.read(RuntimeTypeAdapterFactory.java:289) ~[doris-fe.jar:1.2-SNAPSHOT]
```
Introduced from #33610.
When read meta image, the `resource` maybe null, we should ignore it.
## Proposed changes
1. Add a new global variable: `audit_plugin_load_timeout`, default is
600 sec
Avoid using default stream load, which is 4 hours, too long for audit
log load.
2. Modify the label of audit log load
Add millisecond to avoid same label from different FE.
The backup finished state in the local repo is not logged. So if there
exist multiple local repo backups, and FE was restarted, then these
backup records will be reset to the UPLOAD_INFO state.
bug introduced by #18747
getTableInMinidumpCache use wrong way to compare table's qualified name.
we remove it temporary since it not use in productive env anymore
after we refactor set oepration, let it has projection ability, its
output size may diff with its child's output size. so we should use it
output size as nullable flag list size to ensure it has same size with
set operation's output.
Fix getting related partition table wrongly when multi base partition
table exists
such as base table def is as following:
CREATE TABLE `test1` (
`pre_batch_no` VARCHAR(100) NULL COMMENT 'pre_batch_no',
`batch_no` VARCHAR(100) NULL COMMENT 'batch_no',
`vin_type1` VARCHAR(50) NULL COMMENT 'vin',
`upgrade_day` date COMMENT 'upgrade_day'
) ENGINE=OLAP
unique KEY(`pre_batch_no`,`batch_no`, `vin_type1`, `upgrade_day`)
COMMENT 'OLAP'
PARTITION BY RANGE(`upgrade_day`)
(
FROM ("2024-03-20") TO ("2024-03-31") INTERVAL 1 DAY
)
DISTRIBUTED BY HASH(`vin_type1`) BUCKETS 10
PROPERTIES (
"replication_num" = "1"
);
CREATE TABLE `test2` (
`batch_no` VARCHAR(100) NULL COMMENT 'batch_no',
`vin_type2` VARCHAR(50) NULL COMMENT 'vin',
`status` VARCHAR(50) COMMENT 'status',
`upgrade_day` date not null COMMENT 'upgrade_day'
) ENGINE=OLAP
Duplicate KEY(`batch_no`,`vin_type2`)
COMMENT 'OLAP'
PARTITION BY RANGE(`upgrade_day`)
(
FROM ("2024-01-01") TO ("2024-01-10") INTERVAL 1 DAY
)
DISTRIBUTED BY HASH(`vin_type2`) BUCKETS 10
PROPERTIES (
"replication_num" = "1"
);
if you create partition mv which partition by ` t1.upgrade_day` as
following it will be successful
select
t1.upgrade_day,
t1.batch_no,
t1.vin_type1
from
(
SELECT
batch_no,
vin_type1,
upgrade_day
FROM
test1
where
batch_no like 'c%'
group by
batch_no,
vin_type1,
upgrade_day
) t1
left join (
select
batch_no,
vin_type2,
status
from
test2
group by
batch_no,
vin_type2,
status
) t2 on t1.vin_type1 = t2.vin_type2;
this is brought by https://github.com/apache/doris/pull/33800
if mv is partitioned materialzied view,
the data will be wrong by using the hited materialized view when the
paritions in related base partiton table are deleted, created and so on.
this fix the problem.
if **SET enable_materialized_view_union_rewrite=true;** this will use
the materializd view and make sure the data is corrent
if **SET enable_materialized_view_union_rewrite=false;** this will query
base table directly to make sure the data is right
pick from master #35504
if window expression is not required by its parent, we should prune this
column. If all window expressions of window operator are pruned, we
remove this window operator directly.
#31442
test in #34929
When null value is used as the partition value, BE will return the "null" string, so this string needs to be processed specially.
Issue Number: close#35024
This bug is because the fe incorrectly sets the update time of paimon
catalog, causing the be to be unable to update paimon's schema in time.
```c++
private void initTable() {
PaimonTableCacheKey key = new PaimonTableCacheKey(ctlId, dbId, tblId, paimonOptionParams, dbName, tblName);
TableExt tableExt = PaimonTableCache.getTable(key);
if (tableExt.getCreateTime() < lastUpdateTime) {
LOG.warn("invalidate cache table:{}, localTime:{}, remoteTime:{}", key, tableExt.getCreateTime(),
lastUpdateTime);
PaimonTableCache.invalidateTableCache(key);
tableExt = PaimonTableCache.getTable(key);
}
this.table = tableExt.getTable();
paimonAllFieldNames = PaimonScannerUtils.fieldNames(this.table.rowType());
if (LOG.isDebugEnabled()) {
LOG.debug("paimonAllFieldNames:{}", paimonAllFieldNames);
}
}
```
eliminate empty relations for following patterns:
topn->empty
sort->empty
distribute->empty
project->empty
(cherry picked from commit 8340f23946c0c8e40510ce937acd3342cb2e28b7)
## Proposed changes
Issue Number: close #xxx
<!--Describe your changes.-->
## Further comments
If this is a relatively large or complex change, kick off the discussion
at [dev@doris.apache.org](mailto:dev@doris.apache.org) by explaining why
you chose the solution you did and what alternatives you considered,
etc...
## Proposed changes
Linked PR : #35389
<!--Describe your changes.-->
## Further comments
If this is a relatively large or complex change, kick off the discussion
at [dev@doris.apache.org](mailto:dev@doris.apache.org) by explaining why
you chose the solution you did and what alternatives you considered,
etc...
backport https://github.com/apache/doris/pull/34672
<!--Describe your changes.-->
## Further comments
If this is a relatively large or complex change, kick off the discussion
at [dev@doris.apache.org](mailto:dev@doris.apache.org) by explaining why
you chose the solution you did and what alternatives you considered,
etc...
backport https://github.com/apache/doris/pull/33836
<!--Describe your changes.-->
## Further comments
If this is a relatively large or complex change, kick off the discussion
at [dev@doris.apache.org](mailto:dev@doris.apache.org) by explaining why
you chose the solution you did and what alternatives you considered,
etc...
pick from master #35463
commit id 0632309209cc3f9b6523ef7054eb1abdb9d0e7d8
when consumer side eliminate some consumers from plan, the size of
consumers is wrong. so we cannot push down some filter in producer side.
this PR fix this problem by update consumer set after rewrite outer side
backport https://github.com/apache/doris/pull/34433
<!--Describe your changes.-->
## Further comments
If this is a relatively large or complex change, kick off the discussion
at [dev@doris.apache.org](mailto:dev@doris.apache.org) by explaining why
you chose the solution you did and what alternatives you considered,
etc...
Add id to statistics map in statement context for cost estimation later
this helps to improve the probability to use materialized view when
query a single table with aggregate and many filter
example:
filter (y=1)
+-- window( ... partition by x)
+-- project( A as x, A as y)
filter(y=1) is equivalent to filter(x=1),
because x and y are in the same equal-set in window#logicalProperties.
And hence we could push filter(y=1) through window operator
bitmap filter is implemented before mark-join. When support mark-join, we forgot to update the bitmap-filter branch.
when convert a bitmap-apply-in to join, we should set markjoinReference to the join if there are markJoinRefereneces
introduced by #31811
sql like this:
select col1, col2 from (select a as col1, a as col2 from mal_test1 group by a) t group by col1, col2 ;
Transformation Description:
In the process of optimizing the query, an agg-project-agg pattern is transformed into a project-agg pattern:
Before Transformation:
LogicalAggregate
+-- LogicalPrject
+-- LogicalAggregate
After Transformation:
LogicalProject
+-- LogicalAggregate
Before the transformation, the projection in the LogicalProject was a AS col1, a AS col2, and the outer aggregate group by keys were col1, col2. After the transformation, the aggregate group by keys became a, a, and the projection remained a AS col1, a AS col2.
Problem:
When building the project projections, the group by key a, a needed to be transformed to a AS col1, a AS col2. The old code had a bug where it used the slot as the map key and the alias in the projections as the map value. This approach did not account for the situation where aliases might have the same slot.
Solution:
The new code fixes this issue by using the original outer aggregate group by expression's exprId. It searches within the original project projections to find the NamedExpression that has the same exprId. These expressions are then placed into the new projections. This method ensures that the correct aliases are maintained, resolving the bug.
Previously, the limitation on whether operations can be performed on materialized views was to determine `opType`.
Now, a `allowOpMTMV()` method is implemented through various `clauses`.
Because some operations have the same `opType`, but some operations allow and some do not.
For example, the `opType` for both `add column` and `create index` is `SCHEMA-CHANGE`, but `add column` is not allowed and `create index` is allowed.
In this pull request, we optimize the ordering of children in the Intersect operator to improve query performance. The proposed change is to place the child with the least row count in the first position of the Intersect operator.
The rationale behind this optimization is that the Intersect operator works by first evaluating the leftmost child and then iterating through the results of the other children to find matching rows. By placing the child with the least row count first, we can minimize the number of iterations required to find the matching rows, thereby reducing the overall execution time of the query.