Commit Graph

19361 Commits

Author SHA1 Message Date
de61887cdc [chore](log) reduce print warning msg during be starting up #36710 (#37780)
cherry pick from #36710
2024-07-15 14:46:54 +08:00
7bd6818350 [branch-2.1][improvement](jdbc catalog) Added support for Oracle Raw type (#37776)
pick (#37078)
In previous versions, we adopted the strategy of reading the object
address for Oracle's raw type, which would lead to unstable and
meaningless results. Here I changed it to read hexadecimal or UTF8
2024-07-15 14:43:05 +08:00
79f6b647d5 [FIX] should check fe host standing when coordinator is not found. (#37772)
fix https://github.com/apache/doris/pull/37707
2024-07-15 12:27:31 +08:00
232202b71f [improve](load) reduce memory reserved in memtable limiter (#37511) (#37699)
cherry-pick #37511
2024-07-15 11:09:09 +08:00
2759383365 [branch-2.1](timezone) refactor tzdata load to accelerate and unify timezone parsing (#37062) (#37269)
pick https://github.com/apache/doris/pull/37062

1. revert https://github.com/apache/doris/pull/25097. we decide to rely
on OS. not maintain independent tzdata anymore to keep result
consistency
2. refactor timezone load. removed rwlock.

before:
```sql
mysql [optest]>select count(convert_tz(d, 'Asia/Shanghai', 'America/Los_Angeles')), count(convert_tz(dt, 'America/Los_Angeles', '+00:00')) from dates;
+-------------------------------------------------------------------------------------+--------------------------------------------------------+
| count(convert_tz(cast(d as DATETIMEV2(6)), 'Asia/Shanghai', 'America/Los_Angeles')) | count(convert_tz(dt, 'America/Los_Angeles', '+00:00')) |
+-------------------------------------------------------------------------------------+--------------------------------------------------------+
|                                                                            16000000 |                                               16000000 |
+-------------------------------------------------------------------------------------+--------------------------------------------------------+
1 row in set (6.88 sec)
```
now:
```sql
mysql [optest]>select count(convert_tz(d, 'Asia/Shanghai', 'America/Los_Angeles')), count(convert_tz(dt, 'America/Los_Angeles', '+00:00')) from dates;
+-------------------------------------------------------------------------------------+--------------------------------------------------------+
| count(convert_tz(cast(d as DATETIMEV2(6)), 'Asia/Shanghai', 'America/Los_Angeles')) | count(convert_tz(dt, 'America/Los_Angeles', '+00:00')) |
+-------------------------------------------------------------------------------------+--------------------------------------------------------+
|                                                                            16000000 |                                               16000000 |
+-------------------------------------------------------------------------------------+--------------------------------------------------------+
1 row in set (2.61 sec)
```
3. now don't support timezone offset format string like 'UTC+8', like we
already said in
https://doris.apache.org/docs/dev/query/query-variables/time-zone/#usage
4. support case-insensitive timezone parsing in nereids.
5. a bug when parse timezone using nereids. should check DST by input,
but wrongly by now before. now fixed.

doc pr: https://github.com/apache/doris-website/pull/810
2024-07-15 10:56:48 +08:00
351ba4aeb2 [opt](spill) handle oom exception in spill tasks (#35025) (#35171) 2024-07-15 10:33:33 +08:00
31b3afa2c8 [fix](pipeline) fix exception safety issue in MultiCastDataStreamer (#36814)
## Proposed changes

pick #36748

```cpp
RETURN_IF_ERROR(vectorized::MutableBlock(block).merge(*pos_to_pull->_block))
```
this line may throw an exception(cannot allocate)

```
*** Query id: b7b80bfd76cc42a5-a9916f8364d5a4d3 ***
*** tablet id: 0 ***
*** Aborted at 1719187603 (unix time) try "date -d @1719187603" if you are using GNU date ***
*** Current BE git commitID: a8c48f5328 ***
*** SIGSEGV address not mapped to object (@0x47) received by PID 1197117 (TID 1197376 OR 0x7f49a25e4640) from PID 71; stack trace: ***
 0# doris::signal::(anonymous namespace)::FailureSignalHandler(int, siginfo_t*, void*) at /root/doris_branch-2.0/doris/be/src/common/signal_handler.h:417
 1# os::Linux::chained_handler(int, siginfo_t*, void*) in /usr/lib/jvm/java-8-openjdk-amd64/jre/lib/amd64/server/libjvm.so
 2# JVM_handle_linux_signal in /usr/lib/jvm/java-8-openjdk-amd64/jre/lib/amd64/server/libjvm.so
 3# signalHandler(int, siginfo_t*, void*) in /usr/lib/jvm/java-8-openjdk-amd64/jre/lib/amd64/server/libjvm.so
 4# 0x00007F4ABB927520 in /lib/x86_64-linux-gnu/libc.so.6
 5# std::default_delete<doris::vectorized::Block>::operator()(doris::vectorized::Block*) const at /var/local/ldb-toolchain/bin/../lib/gcc/x86_64-linux-gnu/11/../../../../include/c++/11/bits/unique_ptr.h:85
 6# doris::pipeline::MultiCastDataStreamer::close_sender(int) at /root/doris_branch-2.0/doris/be/src/pipeline/exec/multi_cast_data_streamer.cpp:60
 7# doris::pipeline::MultiCastDataStreamerSourceOperator::close(doris::RuntimeState*) at /root/doris_branch-2.0/doris/be/src/pipeline/exec/multi_cast_data_stream_source.cpp:120
 8# doris::pipeline::PipelineTask::close() at /root/doris_branch-2.0/doris/be/src/pipeline/pipeline_task.cpp:334
 9# doris::pipeline::TaskScheduler::_try_close_task(doris::pipeline::PipelineTask*, doris::pipeline::PipelineTaskState) at /root/doris_branch-2.0/doris/be/src/pipeline/task_scheduler.cpp:353
10# doris::pipeline::TaskScheduler::_do_work(unsigned long) in /mnt/disk1/STRESS_ENV/be/lib/doris_be
11# doris::ThreadPool::dispatch_thread() in /mnt/disk1/STRESS_ENV/be/lib/doris_be
12# doris::Thread::supervise_thread(void*) at /root/doris_branch-2.0/doris/be/src/util/thread.cpp:499
13# start_thread at ./nptl/pthread_create.c:442
14# 0x00007F4ABBA0B850 at ../sysdeps/unix/sysv/linux/x86_64/clone3.S:83
```

<!--Describe your changes.-->
2024-07-15 10:32:20 +08:00
3da5b17abf [branch-2.1](timezone) make TimeUtils formatter use correct time_zone (#37465) (#37652)
All timestamp/datetime parsing in Doris is controlled by the session
variable `time_zone`.
Apply it also to interface of `TimeUtils` in FE.

pick https://github.com/apache/doris/pull/37465
2024-07-15 10:23:38 +08:00
9556c07a16 [mac](compile) fix compile error on mac (#37726) 2024-07-15 10:19:42 +08:00
8de13c5cc8 [fix](function) error scale set in unix_timestamp (#36110) (#37619)
## Proposed changes

```
mysql [test]>set DEBUG_SKIP_FOLD_CONSTANT = true;
Query OK, 0 rows affected (0.00 sec)

mysql [test]>select cast(unix_timestamp("2024-01-01",'yyyy-MM-dd') as bigint);
+------------------------------------------------------------+
| cast(unix_timestamp('2024-01-01', 'yyyy-MM-dd') as BIGINT) |
+------------------------------------------------------------+
|                                           1704038400000000 |
+------------------------------------------------------------+
```
now
```
mysql [test]>select cast(unix_timestamp("2024-01-01",'yyyy-MM-dd') as bigint);
+------------------------------------------------------------+
| cast(unix_timestamp('2024-01-01', 'yyyy-MM-dd') as BIGINT) |
+------------------------------------------------------------+
|                                                 1704038400 |
+------------------------------------------------------------+
1 row in set (0.01 sec)
```

The column does not have a scale set, but the cast uses the scale to
perform the cast.


<!--Describe your changes.-->

## Proposed changes

Issue Number: close #xxx

<!--Describe your changes.-->
2024-07-15 10:00:04 +08:00
b55dd6f644 [fix](delete) fix the error message for valid decimal data for 2.1 (#37710)
## Proposed changes

cherry-pick : #36802

<!--Describe your changes.-->
2024-07-15 09:54:42 +08:00
16de141743 [regression](kerberos)add hive kerberos docker regression env (#37657)
## Proposed changes
pick:
[regression](kerberos)fix regression pipeline env when write hosts 
(#37057)
[regression](kerberos)add hive kerberos docker regression env (#36430)
2024-07-15 09:35:39 +08:00
8f39143c14 [test](fix) replace hardcode s3BucketName (#37750)
## Proposed changes

pick from master #37739 

<!--Describe your changes.-->

---------

Co-authored-by: stephen <hello-stephen@qq.com>
2024-07-14 18:38:52 +08:00
747172237a [branch-2.1](memory) Pick some memory GC patch (#37725)
pick
#36768
#37164
#37174
#37525
2024-07-14 15:19:40 +08:00
ec8467f57b [fix](auto bucket) Fix hit not support alter estimate_partition_size #33670 (#37633)
cherry pick from #33670
2024-07-13 22:12:38 +08:00
00a5718541 [fix](test) fix test typo (#37741) 2024-07-13 20:00:56 +08:00
5162789234 [Refactor](Variant) make many insterfaces exception safe (#37640) (#37719) 2024-07-13 16:52:10 +08:00
8930df3b31 [Feature](iceberg-writer) Implements iceberg partition transform. (#37692)
## Proposed changes

Cherry-pick iceberg partition transform functionality. #36289 #36889

---------

Co-authored-by: kang <35803862+ghkang98@users.noreply.github.com>
Co-authored-by: lik40 <lik40@chinatelecom.cn>
Co-authored-by: github-actions[bot] <41898282+github-actions[bot]@users.noreply.github.com>
Co-authored-by: Mingyu Chen <morningman@163.com>
2024-07-13 16:07:50 +08:00
56a207c3f0 [case](paimon/iceberg)move cases from p2 to p0 (#37276) (#37738)
bp #37276

Co-authored-by: wuwenchi <wuwenchihdu@hotmail.com>
2024-07-13 10:01:05 +08:00
d91376cd52 [bugfix](paimon)adding dependencies for clang #37512 (#37737)
cherry pick from #37512

Co-authored-by: wuwenchi <wuwenchihdu@hotmail.com>
2024-07-13 09:59:35 +08:00
7887f51e9b [fix](partial update) fix a mem leak issue (#37706) (#37730)
cherry-pick #37706
2024-07-13 09:20:01 +08:00
20758576b2 [fix](split) remove retry when fetch split batch failed (#37637)
bp: #37636
2024-07-12 22:46:03 +08:00
019cd9b4ec [fix](hudi) return empty if there is no commit implemented (#37703)
bp: #37702
2024-07-12 22:44:58 +08:00
f2556ba182 [feature](insert)support external hive truncate table DDL (#37659)
pick: #36801
2024-07-12 22:37:47 +08:00
326b40cde2 [branch-2.1](memory) Add HTTP API to clear data cache (#37704)
pick #36599

Co-authored-by: Gabriel <gabrielleebuaa@gmail.com>
2024-07-12 17:21:52 +08:00
a61030215e [branch-2.1](memory) Support make all memory snapshots (#37705)
pick #36679
2024-07-12 16:21:37 +08:00
ab8204b33b [fix](regression) fix multi replica case not executed (#37425) (#37698)
cherry-pick #37425
2024-07-12 15:53:20 +08:00
035027f831 [fix](query cancel) Fix query is cancelled when it comes from follower FE #37662 (#37707)
cherry pick from #37662
2024-07-12 15:50:45 +08:00
259d28407e [improvement](statistics)Enable estimate hive table row count using file size. (#37218) (#37694)
backport: https://github.com/apache/doris/pull/37218
2024-07-12 13:47:27 +08:00
37583d2d0a [test](statistics)Add test case for set global variables. (#37582) (#37691)
backport: https://github.com/apache/doris/pull/37582
2024-07-12 13:46:53 +08:00
ef031c5fb2 [branch-2.1](memory) Fix reserve memory compatible with memory GC and logging (#37682)
pick
#36307
#36412
2024-07-12 11:43:26 +08:00
ffa9e49bc7 [feature](mtmv) pick some mtmv pr from master (#37651)
cherry-pick from master
pr: #36318
commitId: c1999479

pr: #36111
commitId: 35ebef62

pr: #36175
commitId: 4c8e66b4

pr: #36414
commitId: 5e009b5a

pr: #36770
commitId: 19e2126c

pr: #36567
commitId: 3da83514
2024-07-12 10:35:54 +08:00
6214d6421f [Fix](planner) fix bug of char(255) toSql (#37340) (#37671)
cherry-pick #37340 from master
2024-07-12 10:33:24 +08:00
4dc933bb28 [cherry-pick] (branch-2.1) fix query errors caused by ignore_above (#37685)
## Proposed changes
pick from master #37679
2024-07-12 09:31:45 +08:00
87912de93f [fix](scan) catch exceptions thrown in scanner (#36101) (#37408)
## Proposed changes

pick #36101

The uncaught exceptions thrown in the scanner will cause the BE to
crash.
2024-07-12 08:49:39 +08:00
79a208259e [cherry-pick] (branch-2.1) Remove the check for inverted index file exists #36945 (#37423) 2024-07-11 21:35:52 +08:00
217eac790b [pick](Variant) pick some refactor and fix #34925 #36317 #36201 #36793 (#37526) 2024-07-11 21:25:34 +08:00
cf2fb6945a [branch-2.1](memory) Refactor LRU cache policy memory tracking (#37658)
pick 
#36235
#35965
2024-07-11 21:04:01 +08:00
62e0230523 [branch-2.1](memory) Add ThreadMemTrackerMgr BE UT (#37654)
## Proposed changes

pick #35518
2024-07-11 21:03:49 +08:00
fed632bf4a [fix](move-memtable) check segment num when closing each tablet (#36753) (#37536)
cherry-pick #36753 and #37660
2024-07-11 20:33:44 +08:00
fdf21ec251 [fix](readconsistency) avoid table not exist error (#37593) (#37641)
Query following createting table would throw table not exist error.

For example.
t1: client issue create table to master fe
t2: client issue query sql to observer fe, the query would fail due to
not exist table in plan phase.
t3: observer fe receive editlog creating the table from the master fe

After the pr:
query at t2 would wait until latest edit log is received from master fe
in the observer fe.

pick #37593

## Proposed changes

Issue Number: close #xxx

<!--Describe your changes.-->
2024-07-11 18:57:53 +08:00
cee3cf8499 [fix](statistics)Fix column cached stats size bug. (#37545) (#37667)
backport: https://github.com/apache/doris/pull/37545
2024-07-11 18:53:12 +08:00
d7cae940d2 [fix](test) fix case conflict between test_tvf_based_broker_load and test_broker_load #37622 (#37631)
cherry pick from #37622
2024-07-11 17:52:21 +08:00
8a0d940914 [fix](publish) Pick Fix publish failed because because task is null (#37546)
## Proposed changes

Pick https://github.com/apache/doris/pull/37531

This pr catch the exception to make the failed txn does not block the
other txns.
2024-07-11 15:22:04 +08:00
39ded1f649 [branch-2.1][improvement](jdbc catalog) Change JdbcExecutor's error reporting from UDF to JDBC (#37635)
pick (#35692)

In the initial version, JdbcExecutor directly used UdfRuntimeException,
which could lead to misunderstanding of the exception. Therefore, I
created a separate Exception for JdbcExecutor to help us view the
exception more clearly.
2024-07-11 15:11:41 +08:00
ef754487d9 [branch-2.1][improvement](jdbc catalog) Catch AbstractMethodError in getColumnValue Method and Suggest Updating to ojdbc8+ (#37634)
pick (#37608)

Catch AbstractMethodError in getColumnValue method. Provide a clear
error message suggesting the use of ojdbc8 or higher versions to avoid
compatibility issues.
2024-07-11 15:10:47 +08:00
e66ffc1b6d [branch-2.1](arrow-flight-sql) Fix pipelineX Unknown result sink type (#37540)
pick ##35804
2024-07-11 12:30:46 +08:00
1eb04cf538 [feature](mtmv) Support query rewrite by materialized view when query is aggregate and materialized view has no aggregate (#36278) (#37497)
cherry-pick from master
pr: #36278
commitId: 649f9bc6
2024-07-11 10:54:50 +08:00
e6b8ebc847 [Fix](Short Circuit) fix no project list in OlapScanNode (#37121) (#37504)
pick from #37121
2024-07-11 10:04:28 +08:00
e1cb568d11 [Optimize] Add session variable `max_fetch_remote_schema_tablet_count… (#37505)
pick from #37217
2024-07-11 10:04:20 +08:00