Commit Graph

8244 Commits

Author SHA1 Message Date
b4875c2789 [fix](jni)fix jni use timezone_obj get timezone be core. (#41956) (#42003)
bp #41956 

This PR #40225 try to pass time zone info from BE to JNI, and it use
`_state->timezone_obj().name()`
to get the timezone name.
But when we do some rolling upgrade of BE, it may coredump like:

```
*** SIGSEGV address not mapped to object (@0x610) received by PID 72661 (TID 73538 OR 0x7f2e898d1640) from PID 1552; stack trace: ***
 0# doris::signal::(anonymous namespace)::FailureSignalHandler(int, siginfo_t*, void*) at /home/zcp/repo_center/doris_branch-2.1/doris/be/src/common/signal_handler.h:421
 1# os::Linux::chained_handler(int, siginfo_t*, void*) in /usr/lib/jvm/java-8-openjdk-amd64/jre/lib/amd64/server/libjvm.so
 2# JVM_handle_linux_signal in /usr/lib/jvm/java-8-openjdk-amd64/jre/lib/amd64/server/libjvm.so
 3# signalHandler(int, siginfo_t*, void*) in /usr/lib/jvm/java-8-openjdk-amd64/jre/lib/amd64/server/libjvm.so
 4# 0x00007F3070D3E520 in /lib/x86_64-linux-gnu/libc.so.6
 5# cctz::time_zone::name[abi:cxx11]() const in /mnt/hdd01/ci/compatibility-deploy/be/lib/doris_be
 6# doris::vectorized::JniConnector::open(doris::RuntimeState*, doris::RuntimeProfile*) at /home/zcp/repo_center/doris_branch-2.1/doris/be/src/vec/exec/jni_connector.cpp:87
 7# doris::vectorized::AvroJNIReader::init_fetch_table_schema_reader() at /home/zcp/repo_center/doris_branch-2.1/doris/be/src/vec/exec/format/avro/avro_jni_reader.cpp:119
 8# std::_Function_handler::_M_invoke(std::_Any_data const&) at /var/local/ldb-toolchain/bin/../lib/gcc/x86_64-linux-gnu/11/../../../../include/c++/11/bits/std_function.h:291
 9# doris::WorkThreadPool::work_thread(int) at /home/zcp/repo_center/doris_branch-2.1/doris/be/src/util/work_thread_pool.hpp:159
10# execute_native_thread_routine at ../../../../../libstdc++-v3/src/c++11/thread.cc:84
11# start_thread at ./nptl/pthread_create.c:442
12# 0x00007F3070E22850 at ../sysdeps/unix/sysv/linux/x86_64/clone3.S:83
172.20.50.206 last coredump sql: 2024-10-13 04:12:23,985 [query] 
```

This PR use another method: `_state->timezone()`, which just return a
string, instead of reading and initializing
time zone info file, to avoid potential coredump.
2024-10-17 14:47:33 +08:00
67d057a711 [cherry-pick](branch-21) fix conv function parser string failure return wrong result (#40530) (#41964)
## Proposed changes

Issue Number: close #39618
cherry-pick from master (#40530)
2024-10-17 14:45:46 +08:00
0b41cd2472 [fix](serde)fix the bug in DataTypeNullableSerDe.deserialize_column_from_fixed_json (#41217) (#41960)
bp #41217 

## Proposed changes

Issue Number: close #xxx

<!--Describe your changes.-->
2024-10-17 14:36:01 +08:00
968e33f07e [cherry-pick](branch-21) pick (#39057) (#41352) (#41958)
## Proposed changes

pick from master (#39057) (#41352)

<!--Describe your changes.-->

---------

Co-authored-by: Mryange <59914473+Mryange@users.noreply.github.com>
2024-10-17 14:30:40 +08:00
1b901f6fcc [cherry-pick](branch-2.1) add parquet tvf cases and fix some parquet bug (#41931)
## Proposed changes
pick pr:
  https://github.com/apache/doris/pull/41683
  https://github.com/apache/doris/pull/41506
  https://github.com/apache/doris/pull/41338
  https://github.com/apache/doris/pull/39326

---------

Co-authored-by: morningman <morningman@163.com>
2024-10-17 14:20:58 +08:00
b8214952a1 [branch-2.1] Fix is_partial_update parameter is not set in append_block_with_partial_content() (#41865)
https://github.com/apache/doris/pull/41439 forgets to set
`is_partial_update` parameter for `Tablet::lookup_row_key()` in
`append_block_with_partial_content()`
2024-10-17 12:44:41 +08:00
19784d420c [opt](inverted index) Improved top-N optimization by refining the sorting column check. (#39496) (#41954)
https://github.com/apache/doris/pull/39496
2024-10-17 11:31:11 +08:00
0b6447faeb [Fix](SchemaChange) refactor variant root column iterator to make row… (#41941)
pick #41700
2024-10-17 10:39:07 +08:00
7d99d5fcc4 [fix](analytic) Fix data distribution after analytic operator (#41902) (#41949)
Fix data distribution after analytic operator

pick #41902
2024-10-16 18:41:56 +08:00
5bd33fc88c [pick](branch-2.1) pick #41292 #41350 #41589 #41628 #41743 #41601 #41667 #41751 (#41927)
## Proposed changes

pick #41292 #41350 #41589 #41628 #41743 #41601 #41667 #41751

<!--Describe your changes.-->

---------

Co-authored-by: Pxl <pxl290@qq.com>
2024-10-16 15:41:28 +08:00
e56216211e [pick](branch-2.1) pick #40667 #40714 (#41905)
pick
#40667
#40714

---------

Co-authored-by: wangbo <wangbo@apache.org>
2024-10-16 14:09:03 +08:00
e6545a36a3 [improvement](iceberg)Parallelize splits for count(*) for 2.1 (#41169) (#41880)
bp: #41169
2024-10-16 10:52:06 +08:00
b185dfcbf6 [pick](branch-2.1) pick #41676 #41740 #41857 (#41904)
pick #41676 #41740 #41857
2024-10-15 22:41:17 +08:00
b91d8e2327 [Improvement](minor) Reduce locking scope (#41845) (#41844)
pick #41845
2024-10-15 18:39:53 +08:00
78b6157aa9 [fix](ip/variant) fix information meta (#41871)
fix datatype information meta  for ip/variant (#41666)

## Proposed changes

Issue Number: close #xxx

<!--Describe your changes.-->
2024-10-15 18:01:14 +08:00
abcba778ff [fix](cancel) Fix cancel msg on branch-2.1 (#41798)
Make sure we can tell cancel reason from:
1. user cancel
2. timeout
3. others

```text
mysql [demo]>set query_timeout=1;
--------------
set query_timeout=1
--------------

Query OK, 0 rows affected (0.00 sec)

mysql [demo]>select sleep(5);
--------------
select sleep(5)
--------------

ERROR 1105 (HY000): errCode = 2, detailMessage = Timeout

mysql [demo]>select sleep(5);
--------------
select sleep(5)
--------------

^C^C -- sending "KILL QUERY 0" to server ...
^C -- query aborted
ERROR 1105 (HY000): errCode = 2, detailMessage = cancel query by user from 127.0.0.1:64208
```
2024-10-15 17:15:05 +08:00
77fbe6397a [fix](http) Remove file if downloading faile is failed #41778 (#41827)
cherry pick from #41778
2024-10-15 15:30:29 +08:00
94687a2f3c [fix](array/map) fix resize impl in array/map (#41595) (#41699)
backport: https://github.com/apache/doris/pull/41595
2024-10-15 09:50:11 +08:00
d97642e9b5 [cherry-pick](branch-21) fix tablet sink shuffle without project not match the output tuple (#40299)(#41293) (#41327)
## Proposed changes

cherry-pick from master  (#40299)(#41293)

<!--Describe your changes.-->
2024-10-15 00:12:23 +08:00
4888c632f4 [cherry-pick](branch2.1) support escape.delim and serialization.null.format for hive text (#41684)
## Proposed changes
pick from master:
https://github.com/apache/doris/pull/40291
2024-10-15 00:08:23 +08:00
ff52e73a07 [Fix](inverted index) fix match null for inverted index #41746 (#41787)
cherry pick from #41746
2024-10-14 14:45:36 +08:00
f112af0fd2 [pick](branch-2.1) pick #41555 #41592 #38204 (#41781)
pick #41555 #41592 #38204
2024-10-14 14:05:08 +08:00
e10458baad [enhancement](err-msg) Output column info when size invalid in block data convertor (#41535) (#41764)
## Proposed changes

pick: #41535

As title.
2024-10-12 21:08:04 +08:00
2ae37626bb [opt](index compaction)Use RAM dir to create tmp index_writer (#41371) (#41705)
## Proposed changes

bp #41371
2024-10-12 17:13:55 +08:00
90d6985f91 [Fix](bug) Is null predicate get error query result (#41704)
cherry-pick #41668
2024-10-12 13:18:14 +08:00
4ac07fe918 [Feature](json) Support json_search function in 2.1 (#41590)
cherry-pick #40948 

Like mysql, json_search returns the path which point to a json string
witch match the pattern.
`SELECT JSON_SEARCH('["A",[{"B":"1"}],{"C":"AB"},{"D":"BC"}]', 'one',
'A_') as res;`
```
+----------+
| res      |
+----------+
| "$[2].C" |
+----------+
```

Co-authored-by: liutang123 <liulijia@gmail.com>
2024-10-11 16:33:07 +08:00
e9cfbb56b3 [bugfix](becore) use after free problem when the segment is pop (#41685) (#41697)
## Proposed changes

pick #41685
Issue Number: close #xxx
introduced by #41608

<!--Describe your changes.-->

## Proposed changes

Issue Number: close #xxx

<!--Describe your changes.-->

Co-authored-by: yiguolei <yiguolei@gmail.com>
2024-10-11 14:07:46 +08:00
8c0f73cb90 [Enhancement](MaxCompute)Refactoring maxCompute catalog using Storage API.(#40225 , #40888 ,#41386 ) (#41610)
bp #40225 , #40888 ,#41386

## Proposed changes
Among them, #40225 is the new api of mc,
#40888 is used to fix the bug when reading null between the new and old
apis,
#41386 is used for compatibility between the new and old versions
2024-10-11 11:55:41 +08:00
b489cdf840 [opt](merge-on-write) avoid to check delete bitmap while lookup rowkey in some situation to reduce CPU cost (#41480) (#41439)
## Proposed changes

Issue Number: close #xxx

cherry-pick #41480
2024-10-11 10:15:39 +08:00
6dddd4c499 [function](cast)Make string casting to integers more like MySQL's beh… (#41541)
…avior (#38847)
https://github.com/apache/doris/pull/38847
## Proposed changes

There are two issues here. First, the results of casting are
inconsistent between FE and BE .
```
FE
mysql [(none)]>select cast('3.000' as int); 
+----------------------+
| cast('3.000' as INT) |
+----------------------+
|                    3 |
+----------------------+

mysql [(none)]>set debug_skip_fold_constant = true;

BE
mysql [(none)]>select cast('3.000' as int);
+----------------------+
| cast('3.000' as INT) |
+----------------------+
|                 NULL |
+----------------------+
```
The second issue is that casting on BE converts '3.0' to null. Here, the
casting logic for FE and BE has been unified

<!--Describe your changes.-->

## Proposed changes

Issue Number: close #xxx

<!--Describe your changes.-->

---------

Co-authored-by: Xinyi Zou <zouxinyi02@gmail.com>
2024-10-11 09:32:00 +08:00
4c9ebbb3b9 [fix](cloud) cloud group commit should skip repaly wal if label is already used and the txn state is committed or visible (#41262) (#41461)
pick https://github.com/apache/doris/pull/41262
2024-10-10 22:27:04 +08:00
f2ba1f2fb3 [bugfix](segmentload) should remove segment from segment cache if load segment failed (#41608) (#41660) 2024-10-10 19:40:22 +08:00
0fb42d3a48 [Enhancement](tvf)catalog tvf implements user permission checks and hides sensitive information (#41497) (#41604)
bp #41497 

before #21790
## Proposed changes
This PR unifies the duplicate parts of `catalog tvf` and `show
catalogs`, adds permission check when querying `catalog tvf`, and hides
sensitive information.
2024-10-10 17:55:40 +08:00
1db0aef9b7 [feature](array_agg) support array_agg with param is array/map/struct… (#41651)
… (#40697)

this pr we support array_agg function support param with array map
struct type

## Proposed changes

Issue Number: close #xxx

<!--Describe your changes.-->
2024-10-10 17:54:54 +08:00
3120bfb6e3 [fix](pipelinex) fix fragment instance progress reports (part 2) (#40694) (#41641)
backport #40694
2024-10-10 17:49:41 +08:00
30492a2438 [opt](load) print more detailed log when stream load finished #41398 (#41639)
cherry pick from #41398
2024-10-10 17:47:48 +08:00
d32688e091 [Enhancement](multi-catalog) Set hdfs native client logger to glog and redirect jvm stdout/stderr logger to jni.log. (#41633)
Backport #39540.

Co-authored-by: Mingyu Chen <morningman@163.com>
2024-10-10 17:47:21 +08:00
a26079c09d [Opt](load) Optimize the error messages of -235 and -238 for loading #41048 (#41638)
cherry pick from #41048
2024-10-10 14:20:52 +08:00
33fad04341 [opt](Nereids) use 1 instead narrowest column when do column pruning (#41548) (#41627)
pick from master #41548
2024-10-10 14:02:23 +08:00
aa541fddf9 [fix](load) disable num segments check in compatibility mode (#41053) (#41552)
backport #41053
2024-10-10 11:20:16 +08:00
e218fd2314 [Fix](inverted index) add DATEV2 and DATETIMEV2 for inverted index reader #41565 (#41579)
cherry pick from #41565
2024-10-09 15:32:41 +08:00
31b506c8cc [Enhancement](inverted index) return OK instead of not supported in expr evaluate_inverted_index #41567 (#41578)
cherry pick from #41567
2024-10-09 15:14:38 +08:00
0185f8069f [fix](crash) fix be crash because of int overflow (#41554) (#41568) 2024-10-09 14:20:55 +08:00
9fe77b335c [Enhancement](inverted index) apply inverted index when has any #41547 (#41584)
cherry pick from #41547
2024-10-09 14:13:38 +08:00
afb477c66d [Fix](inverted index) Fix wrong need read data opt when enable_common_expr_pushdown is disabled #40689 (#41562)
cherry pick from #40689
2024-10-08 22:12:10 +08:00
c24ff2ff81 [fix](upgrade) fix version check failure of window_funnel when upgrading (#41542)
## Proposed changes

Issue Number: close #xxx

Fix fix version check failure of window_funnel when upgrading from 2.1.6
and higher version to latest branch 2.1.
```
02:49:13   F20240930 02:47:48.546983  7581 block.cpp:89] Check failed: BeExecVersionManager::check_be_exec_version(be_exec_version)
02:49:13   *** Check failure stack trace: ***
02:49:13       @     0x564640041856  google::LogMessage::SendToLog()
02:49:13       @     0x56464003e2a0  google::LogMessage::Flush()
02:49:13       @     0x564640042099  google::LogMessageFatal::~LogMessageFatal()
02:49:13       @     0x56463922d106  doris::vectorized::Block::deserialize()
02:49:13       @     0x5646390a82bf  doris::vectorized::WindowFunnelState<>::read()
02:49:13       @     0x5646390a6889  doris::vectorized::IAggregateFunctionDataHelper<>::deserialize_and_merge()
02:49:13       @     0x5646390acdc3  doris::vectorized::IAggregateFunctionHelper<>::deserialize_and_merge_from_column_range()
02:49:13       @     0x56463fa77152  doris::pipeline::AggSinkLocalState::_merge_without_key()
02:49:13       @     0x56463fa9d114  doris::pipeline::AggSinkLocalState::Executor<>::execute()
02:49:13       @     0x56463fa78569  doris::pipeline::AggSinkOperatorX::sink()
02:49:13       @     0x564640013296  doris::pipeline::PipelineXTask::execute()
02:49:13       @     0x56464001d41c  doris::pipeline::TaskScheduler::_do_work()
02:49:13       @     0x56463663e078  doris::ThreadPool::dispatch_thread()
02:49:13       @     0x564636634901  doris::Thread::supervise_thread()
02:49:13       @     0x7fb64cf58ac3  (unknown)
02:49:13       @     0x7fb64cfea850  (unknown)
02:49:13       @              (nil)  (unknown)
02:49:13   *** Query id: b0cd194940184766-961c310e833e92b1 ***
02:49:13   *** is nereids: 1 ***
02:49:13   *** tablet id: 0 ***
02:49:13   *** Aborted at 1727635668 (unix time) try "date -d @1727635668" if you are using GNU date ***
02:49:13   *** Current BE git commitID: 653e315ba5 ***
02:49:13   *** SIGABRT unknown detail explain (@0x1648) received by PID 5704 (TID 7581 OR 0x7fb354a9a640) from PID 5704; stack trace: ***
02:49:13    0# doris::signal::(anonymous namespace)::FailureSignalHandler(int, siginfo_t*, void*) at /home/zcp/repo_center/doris_branch-2.1/doris/be/src/common/signal_handler.h:421
02:49:13    1# 0x00007FB64CF06520 in /lib/x86_64-linux-gnu/libc.so.6
02:49:13    2# pthread_kill at ./nptl/pthread_kill.c:89
02:49:13    3# raise at ../sysdeps/posix/raise.c:27
02:49:13    4# abort at ./stdlib/abort.c:81
02:49:13    5# 0x000056464004C06D in /mnt/hdd01/ci/compatibility-deploy/be/lib/doris_be
02:49:13    6# 0x000056464003E76A in /mnt/hdd01/ci/compatibility-deploy/be/lib/doris_be
02:49:13    7# google::LogMessage::SendToLog() in /mnt/hdd01/ci/compatibility-deploy/be/lib/doris_be
02:49:13    8# google::LogMessage::Flush() in /mnt/hdd01/ci/compatibility-deploy/be/lib/doris_be
02:49:13    9# google::LogMessageFatal::~LogMessageFatal() in /mnt/hdd01/ci/compatibility-deploy/be/lib/doris_be
02:49:13   10# doris::vectorized::Block::deserialize(doris::PBlock const&) at /home/zcp/repo_center/doris_branch-2.1/doris/be/src/vec/core/block.cpp:113
02:49:13   11# doris::vectorized::WindowFunnelState<(doris::vectorized::TypeIndex)14, long>::read(doris::vectorized::BufferReadable&) at /home/zcp/repo_center/doris_branch-2.1/doris/be/src/vec/aggregate_functions/aggregate_function_window_funnel.h:363
02:49:13   12# doris::vectorized::IAggregateFunctionDataHelper<doris::vectorized::WindowFunnelState<(doris::vectorized::TypeIndex)14, long>, doris::vectorized::AggregateFunctionWindowFunnel<(doris::vectorized::TypeIndex)14, long> >::deserialize_and_merge(char*, char*, doris::vectorized::BufferReadable&, doris::vectorized::Arena*) const at /home/zcp/repo_center/doris_branch-2.1/doris/be/src/vec/aggregate_functions/aggregate_function.h:517
02:49:13   13# doris::vectorized::IAggregateFunctionHelper<doris::vectorized::AggregateFunctionNullVariadicInline<doris::vectorized::AggregateFunctionWindowFunnel<(doris::vectorized::TypeIndex)14, long>, false> >::deserialize_and_merge_from_column_range(char*, doris::vectorized::IColumn const&, unsigned long, unsigned long, doris::vectorized::Arena*) const at /home/zcp/repo_center/doris_branch-2.1/doris/be/src/vec/aggregate_functions/aggregate_function.h:465
02:49:13   14# doris::pipeline::AggSinkLocalState::_merge_without_key(doris::vectorized::Block*) at /home/zcp/repo_center/doris_branch-2.1/doris/be/src/pipeline/exec/aggregation_sink_operator.cpp:389
02:49:13   15# doris::pipeline::AggSinkLocalState::Executor<true, true>::execute(doris::pipeline::AggSinkLocalState*, doris::vectorized::Block*) at /home/zcp/repo_center/doris_branch-2.1/doris/be/src/pipeline/exec/aggregation_sink_operator.h:73
02:49:13   16# doris::pipeline::AggSinkOperatorX::sink(doris::RuntimeState*, doris::vectorized::Block*, bool) at /home/zcp/repo_center/doris_branch-2.1/doris/be/src/pipeline/exec/aggregation_sink_operator.cpp:744
02:49:13   17# doris::pipeline::PipelineXTask::execute(bool*) at /home/zcp/repo_center/doris_branch-2.1/doris/be/src/pipeline/pipeline_x/pipeline_x_task.cpp:332
02:49:13   18# doris::pipeline::TaskScheduler::_do_work(unsigned long) at /home/zcp/repo_center/doris_branch-2.1/doris/be/src/pipeline/task_scheduler.cpp:347
02:49:13   19# doris::ThreadPool::dispatch_thread() in /mnt/hdd01/ci/compatibility-deploy/be/lib/doris_be
02:49:13   20# doris::Thread::supervise_thread(void*) at /home/zcp/repo_center/doris_branch-2.1/doris/be/src/util/thread.cpp:499
02:49:13   21# start_thread at ./nptl/pthread_create.c:442
02:49:13   22# 0x00007FB64CFEA850 at ../sysdeps/unix/sysv/linux/x86_64/clone3.S:83
02:49:13   [2024-09-30 02:49:13,147 __main__:796] [INFO]: 172.20.50.73 last coredump sql: 2024-09-30 02:48:18,328 [query] Query b0cd194940184766-961c310e833e92b1 1 times with new query id: 2e0e00de0e7548dd-95f9abc9d8d11c3a
```
2024-10-08 17:13:33 +08:00
cb24ccc112 [bugfix](brpc) Should use status to generate protobuf message, because it will encoding Backend Info (#41515) (#41522)
Should use status to generate protobuf message, because it will encoding
Backend Info

## Proposed changes

Issue Number: close #xxx

<!--Describe your changes.-->
2024-10-04 23:03:55 +08:00
d33a3eb1c5 [cherry-pick](branch-2.1) Pick "[Fix](LZ4 compression) Fix wrong LZ4 compression max input size limit (#41239)" (#41505)
## Proposed changes

LZ4 compression max supported value is LZ4_MAX_INPUT_SIZE, which is
0x7E000000(2,113,929,216 bytes). Doris use wrong max size INT_MAX, which
is 2,147,483,647, to check. If input data size is between this two size,
then it can pass the check but LZ4 compression will fail.

This PR fix it.

<!--Describe your changes.-->

## Proposed changes

Issue Number: close #xxx

<!--Describe your changes.-->
2024-10-01 22:43:11 +08:00
98a1311aa2 [Opt](scanner-scheduler) Opt scanner scheduler starvation issue. (#41484)
## Proposed changes

Backport #40641
2024-09-30 15:40:20 +08:00
2b9c963edb [fix](scanner) Check query status when iterating through rowsets and segments #41363 (#41452)
cherry pick from #41363
2024-09-30 09:49:46 +08:00