cherry-pick: #39723#41164#41331#41546 because later problem is intro by prev one, so put them together
when using fold constant by be,
the return type of substring('123456',1, 3) would changed to be text, which we want it to be 3 remove windowframe in window expression to avoid folding constant on be
## Proposed changes
upgrade commons-configuration2 to 2.11.0
upgrade logging-interceptor to 4.12.0
upgrade commons-compress to 1.27.1
upgrade jetty-bom to 9.4.56.v20240826
upgrade azure-sdk to 1.2.27
Iceberg depends on configuration2, and configuration2 relies on a newer
version of commons-lang3. However, there were significant breaking
changes in commons-lang3, which made it
incompatible.https://issues.apache.org/jira/browse/LANG-1705 As a
result, I rewrote the clone method.
(cherry picked from commit 945edf8dbffaa25c987bcefad59b6cde52772d4f)
## Proposed changes
Issue Number: close #xxx
<!--Describe your changes.-->
pick: #35925#39715#40167#40958
Add feat of force use/nouse cbo rule hint and fix pr
introduce
when not using this hint, cbo rules like INFER_SET_OPERATOR_DISTINCT
would generate two plans and compare their cost
and nereids optimizer would decide which is better. But when we want to
control the behavior of cbo rules we could use this force cbo rule hint
usage example
explain shape plan
select /*+ USE_CBO_RULE(INFER_SET_OPERATOR_DISTINCT) */
*
from t1
union
select * from t2;
the USE_CBO_RULE(INFER_SET_OPERATOR_DISTINCT) hint would force rule
INFER_SET_OPERATOR_DISTINCT to be used
and generate plan like, which hashAgg below union is generated by this
rule:
-- !with_hint_union_distinct --
----hashAgg[GLOBAL]
--------hashAgg[LOCAL]
----------PhysicalUnion
--------------hashAgg[LOCAL]
----------------PhysicalOlapScan[t1]
--------------hashAgg[LOCAL]
----------------PhysicalOlapScan[t2]
Hint log:
Used: INFER_SET_OPERATOR_DISTINCT
UnUsed:
SyntaxError:
When we want to force disable this rule, we could use
explain shape plan select /*+
NO_USE_CBO_RULE(INFER_SET_OPERATOR_DISTINCT) */ * from t1 union select *
from t2;
which would generate plan with this rule:
-- !with_hint_no_union_distinct --
----hashAgg[GLOBAL]
--------hashAgg[LOCAL]
----------PhysicalUnion
--------------PhysicalOlapScan[t1]
--------------PhysicalOlapScan[t2]
Hint log:
Used: NO_INFER_SET_OPERATOR_DISTINCT
UnUsed:
SyntaxError:
change sessionvariable enableNereidsRules to varType.remove
## Proposed changes
The JOB's execution SQL is currently defined by an older CUP file, which
causes some issues with lexical analysis in the new optimizer as it
doesn't pass under the old optimizer. Since the JOB's underlying
execution already uses the new optimizer, we're planning to fully
migrate to ANTLR4 for consistency.
(cherry picked from commit 334b473deb5ff2e5c29c5eedcfac95dd806ae622)
#41391
pick from master #41719
just like previous PR #41548
this PR process union node to ensure not require any column from its
children when it is required by its parent with empty slot set
bp #41956
This PR #40225 try to pass time zone info from BE to JNI, and it use
`_state->timezone_obj().name()`
to get the timezone name.
But when we do some rolling upgrade of BE, it may coredump like:
```
*** SIGSEGV address not mapped to object (@0x610) received by PID 72661 (TID 73538 OR 0x7f2e898d1640) from PID 1552; stack trace: ***
0# doris::signal::(anonymous namespace)::FailureSignalHandler(int, siginfo_t*, void*) at /home/zcp/repo_center/doris_branch-2.1/doris/be/src/common/signal_handler.h:421
1# os::Linux::chained_handler(int, siginfo_t*, void*) in /usr/lib/jvm/java-8-openjdk-amd64/jre/lib/amd64/server/libjvm.so
2# JVM_handle_linux_signal in /usr/lib/jvm/java-8-openjdk-amd64/jre/lib/amd64/server/libjvm.so
3# signalHandler(int, siginfo_t*, void*) in /usr/lib/jvm/java-8-openjdk-amd64/jre/lib/amd64/server/libjvm.so
4# 0x00007F3070D3E520 in /lib/x86_64-linux-gnu/libc.so.6
5# cctz::time_zone::name[abi:cxx11]() const in /mnt/hdd01/ci/compatibility-deploy/be/lib/doris_be
6# doris::vectorized::JniConnector::open(doris::RuntimeState*, doris::RuntimeProfile*) at /home/zcp/repo_center/doris_branch-2.1/doris/be/src/vec/exec/jni_connector.cpp:87
7# doris::vectorized::AvroJNIReader::init_fetch_table_schema_reader() at /home/zcp/repo_center/doris_branch-2.1/doris/be/src/vec/exec/format/avro/avro_jni_reader.cpp:119
8# std::_Function_handler::_M_invoke(std::_Any_data const&) at /var/local/ldb-toolchain/bin/../lib/gcc/x86_64-linux-gnu/11/../../../../include/c++/11/bits/std_function.h:291
9# doris::WorkThreadPool::work_thread(int) at /home/zcp/repo_center/doris_branch-2.1/doris/be/src/util/work_thread_pool.hpp:159
10# execute_native_thread_routine at ../../../../../libstdc++-v3/src/c++11/thread.cc:84
11# start_thread at ./nptl/pthread_create.c:442
12# 0x00007F3070E22850 at ../sysdeps/unix/sysv/linux/x86_64/clone3.S:83
172.20.50.206 last coredump sql: 2024-10-13 04:12:23,985 [query]
```
This PR use another method: `_state->timezone()`, which just return a
string, instead of reading and initializing
time zone info file, to avoid potential coredump.
bp #41659
## Proposed changes
Because ExternalTable will initialize the previously uninitialized table
when `getCachedRowCount()`, which is unnecessary. So for the
uninitialized table, we directly return -1.
This will increase the speed of our query `information_schema.tables`.
- Insert overwrite on NEREIDS can automatically clean up garbage
temporary partitions after restart, which is not available on old
optimizers
- When insert fails, no longer throw nereids exceptions