#31442
test in #34929
When null value is used as the partition value, BE will return the "null" string, so this string needs to be processed specially.
## Proposed changes
pick from master [#34276](https://github.com/apache/doris/pull/34276)
<!--Describe your changes.-->
## Further comments
If this is a relatively large or complex change, kick off the discussion
at [dev@doris.apache.org](mailto:dev@doris.apache.org) by explaining why
you chose the solution you did and what alternatives you considered,
etc...
- Add 2 new BE config
- `s3_read_base_wait_time_ms` and `s3_read_max_wait_time_ms`
When meet s3 429 error, the "get" request will
sleep `s3_read_base_wait_time_ms (*1, *2, *3, *4)` ms get try again.
The max sleep time is s3_read_max_wait_time_ms
and the max retry time is max_s3_client_retry
- Add more metrics for s3 file reader
- `s3_file_reader_too_many_request`: counter of 429 error.
- `s3_file_reader_s3_get_request`: the QPS of s3 get request.
- `TotalGetRequest`: Get request counter in profile
- `TooManyRequestErr`: 429 error counter in profile
- `TooManyRequestSleepTime`: Sum of sleep time after 429 error in profile
- `TotalBytesRead`: Total bytes read from s3 in profile
do not process rf on HashJoinBuildSinkLocalState::close when query
```cpp
*** Query id: ee97f0c64a76436b-babc251c7d6702fb ***
*** is nereids: 1 ***
*** tablet id: 0 ***
*** Aborted at 1716780426 (unix time) try "date -d @1716780426" if you are using GNU date ***
*** Current BE git commitID: 813074b ***
*** SIGSEGV address not mapped to object (@0x0) received by PID 12924 (TID 15847 OR 0x7efbe5aa5700) from PID 0; stack trace: ***
0# doris::signal::(anonymous namespace)::FailureSignalHandler(int, siginfo_t*, void*) at /root/doris/be/src/common/signal_handler.h:421
1# PosixSignals::chained_handler(int, siginfo_t*, void*) [clone .part.0] in /usr/lib/jvm/java-17-openjdk-amd64/lib/server/libjvm.so
2# JVM_handle_linux_signal in /usr/lib/jvm/java-17-openjdk-amd64/lib/server/libjvm.so
3# 0x00007F064FF1C090 in /lib/x86_64-linux-gnu/libc.so.6
4# doris::BloomFilterFuncBase::merge(doris::BloomFilterFuncBase*) at /root/doris/be/src/exprs/bloom_filter_func.h:169
5# doris::RuntimePredicateWrapper::merge(doris::RuntimePredicateWrapper const*) at /root/doris/be/src/exprs/runtime_filter.cpp:507
6# doris::IRuntimeFilter::merge_from(doris::RuntimePredicateWrapper const*) at /root/doris/be/src/exprs/runtime_filter.cpp:1497
7# doris::IRuntimeFilter::publish(bool)::$_2::operator()() const in /home/work/unlimit_teamcity/TeamCity/Agents/20240527104837agent_172.16.0.93_1/work/60183217f6ee2a9c/output/be/lib/doris_be
8# doris::IRuntimeFilter::publish(bool) at /root/doris/be/src/exprs/runtime_filter.cpp:1015
9# doris::VRuntimeFilterSlots::publish(bool) at /root/doris/be/src/exprs/runtime_filter_slots.h:137
10# doris::pipeline::HashJoinBuildSinkLocalState::close(doris::RuntimeState*, doris::Status) in /home/work/unlimit_teamcity/TeamCity/Agents/20240527104837agent_172.16.0.93_1/work/60183217f6ee2a9c/output/be/lib/doris_be
11# doris::pipeline::DataSinkOperatorXBase::close(doris::RuntimeState*, doris::Status) at /root/doris/be/src/pipeline/exec/operator.h:491
12# doris::pipeline::PipelineTask::close(doris::Status) at /root/doris/be/src/pipeline/pipeline_task.cpp:436
13# doris::pipeline::_close_task(doris::pipeline::PipelineTask*, doris::Status) at /root/doris/be/src/pipeline/task_scheduler.cpp:88
14# doris::pipeline::TaskScheduler::_do_work(unsigned long) in /home/work/unlimit_teamcity/TeamCity/Agents/20240527104837agent_172.16.0.93_1/work/60183217f6ee2a9c/output/be/lib/doris_be
15# doris::ThreadPool::dispatch_thread() at /root/doris/be/src/util/threadpool.cpp:551
16# doris::Thread::supervise_thread(void*) at /root/doris/be/src/util/thread.cpp:499
17# start_thread at /build/glibc-SzIz7B/glibc-2.31/nptl/pthread_create.c:478
18# __clone at ../sysdeps/unix/sysv/linux/x86_64/clone.S:97
```
When `Export` statements are executed concurrently, the background uses
`Job schedule` to manage export tasks. Previously, the default value of
`async_task_consumer_thread_num` was 5, meaning that regardless of the
concurrency setting, a maximum of only 5 threads could execute
concurrently.
On the other hand, not only `Export` uses `Job schedule`, but other
scheduled tasks might also use `Job schedule`, leading to a shortage of
thread resources
Now, we have found that in many scenarios, `Export` needs to be set to a
high concurrency value and run concurrently according to that high
value. Clearly, `async_task_consumer_thread_num = 5` is no longer
sufficient, so we have changed the default value of
`async_task_consumer_thread_num` to 64
## Proposed changes
Issue Number: close #xxx
<!--Describe your changes.-->
## Further comments
If this is a relatively large or complex change, kick off the discussion
at [dev@doris.apache.org](mailto:dev@doris.apache.org) by explaining why
you chose the solution you did and what alternatives you considered,
etc...
Issue Number: close#35024
This bug is because the fe incorrectly sets the update time of paimon
catalog, causing the be to be unable to update paimon's schema in time.
```c++
private void initTable() {
PaimonTableCacheKey key = new PaimonTableCacheKey(ctlId, dbId, tblId, paimonOptionParams, dbName, tblName);
TableExt tableExt = PaimonTableCache.getTable(key);
if (tableExt.getCreateTime() < lastUpdateTime) {
LOG.warn("invalidate cache table:{}, localTime:{}, remoteTime:{}", key, tableExt.getCreateTime(),
lastUpdateTime);
PaimonTableCache.invalidateTableCache(key);
tableExt = PaimonTableCache.getTable(key);
}
this.table = tableExt.getTable();
paimonAllFieldNames = PaimonScannerUtils.fieldNames(this.table.rowType());
if (LOG.isDebugEnabled()) {
LOG.debug("paimonAllFieldNames:{}", paimonAllFieldNames);
}
}
```
The file list is got from external meta cache, and the file may already
be removed from storage.
We should ignore not found files and that query continue.
## Proposed changes
Before error msg:
```
Failed to submit scanner to scanner pool
```
After error msg:
```
Failed to submit scanner to scanner pool reason:Scan thread pool had shutdown|type 1
```
eliminate empty relations for following patterns:
topn->empty
sort->empty
distribute->empty
project->empty
(cherry picked from commit 8340f23946c0c8e40510ce937acd3342cb2e28b7)
## Proposed changes
Issue Number: close #xxx
<!--Describe your changes.-->
## Further comments
If this is a relatively large or complex change, kick off the discussion
at [dev@doris.apache.org](mailto:dev@doris.apache.org) by explaining why
you chose the solution you did and what alternatives you considered,
etc...
## Proposed changes
backport #35347
## Further comments
If this is a relatively large or complex change, kick off the discussion
at [dev@doris.apache.org](mailto:dev@doris.apache.org) by explaining why
you chose the solution you did and what alternatives you considered,
etc...
## Proposed changes
Linked PR : #35389
<!--Describe your changes.-->
## Further comments
If this is a relatively large or complex change, kick off the discussion
at [dev@doris.apache.org](mailto:dev@doris.apache.org) by explaining why
you chose the solution you did and what alternatives you considered,
etc...
pick from master #35445
## Proposed changes
Issue Number: close #xxx
<!--Describe your changes.-->
## Further comments
If this is a relatively large or complex change, kick off the discussion
at [dev@doris.apache.org](mailto:dev@doris.apache.org) by explaining why
you chose the solution you did and what alternatives you considered,
etc...
backport https://github.com/apache/doris/pull/34672
<!--Describe your changes.-->
## Further comments
If this is a relatively large or complex change, kick off the discussion
at [dev@doris.apache.org](mailto:dev@doris.apache.org) by explaining why
you chose the solution you did and what alternatives you considered,
etc...
backport https://github.com/apache/doris/pull/33836
<!--Describe your changes.-->
## Further comments
If this is a relatively large or complex change, kick off the discussion
at [dev@doris.apache.org](mailto:dev@doris.apache.org) by explaining why
you chose the solution you did and what alternatives you considered,
etc...
pick from master #35463
commit id 0632309209cc3f9b6523ef7054eb1abdb9d0e7d8
when consumer side eliminate some consumers from plan, the size of
consumers is wrong. so we cannot push down some filter in producer side.
this PR fix this problem by update consumer set after rewrite outer side
backport https://github.com/apache/doris/pull/34433
<!--Describe your changes.-->
## Further comments
If this is a relatively large or complex change, kick off the discussion
at [dev@doris.apache.org](mailto:dev@doris.apache.org) by explaining why
you chose the solution you did and what alternatives you considered,
etc...
## Proposed changes
Issue Number: close #xxx
<!--Describe your changes.-->
## Further comments
If this is a relatively large or complex change, kick off the discussion
at [dev@doris.apache.org](mailto:dev@doris.apache.org) by explaining why
you chose the solution you did and what alternatives you considered,
etc...
1. In the past, if error code is not ok and then get status, the status
maybe ok. some dcheck maybe failed.
In this PR use std mutex to make this behavior stable.
If this is a relatively large or complex change, kick off the discussion
at [dev@doris.apache.org](mailto:dev@doris.apache.org) by explaining why
you chose the solution you did and what alternatives you considered,
etc...
---------
Co-authored-by: yiguolei <yiguolei@gmail.com>
Add id to statistics map in statement context for cost estimation later
this helps to improve the probability to use materialized view when
query a single table with aggregate and many filter
example:
filter (y=1)
+-- window( ... partition by x)
+-- project( A as x, A as y)
filter(y=1) is equivalent to filter(x=1),
because x and y are in the same equal-set in window#logicalProperties.
And hence we could push filter(y=1) through window operator
## Proposed changes
Some operators has limit condition, the source operator should notify
the sink operator that limit reached.
Although FE has limit logic but it not always send .
## Further comments
If this is a relatively large or complex change, kick off the discussion
at [dev@doris.apache.org](mailto:dev@doris.apache.org) by explaining why
you chose the solution you did and what alternatives you considered,
etc...
bitmap filter is implemented before mark-join. When support mark-join, we forgot to update the bitmap-filter branch.
when convert a bitmap-apply-in to join, we should set markjoinReference to the join if there are markJoinRefereneces
introduced by #31811
sql like this:
select col1, col2 from (select a as col1, a as col2 from mal_test1 group by a) t group by col1, col2 ;
Transformation Description:
In the process of optimizing the query, an agg-project-agg pattern is transformed into a project-agg pattern:
Before Transformation:
LogicalAggregate
+-- LogicalPrject
+-- LogicalAggregate
After Transformation:
LogicalProject
+-- LogicalAggregate
Before the transformation, the projection in the LogicalProject was a AS col1, a AS col2, and the outer aggregate group by keys were col1, col2. After the transformation, the aggregate group by keys became a, a, and the projection remained a AS col1, a AS col2.
Problem:
When building the project projections, the group by key a, a needed to be transformed to a AS col1, a AS col2. The old code had a bug where it used the slot as the map key and the alias in the projections as the map value. This approach did not account for the situation where aliases might have the same slot.
Solution:
The new code fixes this issue by using the original outer aggregate group by expression's exprId. It searches within the original project projections to find the NamedExpression that has the same exprId. These expressions are then placed into the new projections. This method ensures that the correct aliases are maintained, resolving the bug.