Commit Graph

8105 Commits

Author SHA1 Message Date
749c9f7b56 [fix](group commit) fix repaly wal check label status (#38883) (#38997)
pick https://github.com/apache/doris/pull/38883
2024-08-07 22:06:59 +08:00
773008d6fa [Fix](Json) fix some cast issue (#38683) (#39025)
#38683
2024-08-07 22:05:43 +08:00
91dcaaf7dd [fix](MoW) fix MoW & segcompaction conflict on cache of temp segment … (#38992)
…(#37760)

MoW will update delete bitmap during load, and the page cache could be
modified by segcompaction. Disable page cache touchs when doing
segcompaction could solve this problem.

Signed-off-by: freemandealer <freeman.zhang1992@gmail.com>
Co-authored-by: zhengyu <freeman.zhang1992@gmail.com>
2024-08-07 21:18:10 +08:00
7e95d7cbec [bugfix](backup)(cooldown) cancel backup properly when be backup failed (#38724) (#38993)
Co-authored-by: zhangyuan <ayuanzhang@tencent.com>
2024-08-07 15:58:11 +08:00
7550fbaff7 [Fix](Exception) throw exception in defer may result std::terminate (… (#39007)
pick #38935
2024-08-07 13:46:23 +08:00
8cb5aa64f4 [test](inverted index) add an Inverted Index Testing Switch (#38077) (#38947)
https://github.com/apache/doris/pull/38077
2024-08-07 11:25:36 +08:00
fc0222a64c [opt](info) processlist schema table support show all fe (#38701) (#38953)
pick #38701
2024-08-07 11:01:46 +08:00
b856530b09 [fix](inverted index) disable range query in StringTypeInvertedIndexReader (#38218) (#38926)
## Proposed changes

pick from master #38218

Issue Number: close #xxx

<!--Describe your changes.-->
2024-08-07 10:44:02 +08:00
e400859531 [fix](update null map) Fix update_null_map #38787 (#38920)
cherry pick from #38787
2024-08-07 10:21:41 +08:00
2543b569bb [Optimize](Row store) pick #37145, #38236 (#38932) 2024-08-07 09:55:42 +08:00
bc644cb253 [opt](catalog) merge scan range to avoid too many splits (#38311) (#38964)
bp #38311
2024-08-06 21:57:02 +08:00
3abb222064 [fix](group commit) Fix test_group_commit_async_wal_msg_fault_injection case (#35313) (#38911)
pick https://github.com/apache/doris/pull/35313
2024-08-06 17:57:22 +08:00
fe6ea3b8b5 [Fix](inverted index) fix missed array inverted index null bitmap #38907 (#38934)
cherry pick from #38907
2024-08-06 17:17:28 +08:00
21a67dba5d [fix](index) fix inverted index compound file entry size int32 overflow #38891 (#38928) 2024-08-06 15:57:09 +08:00
28c0510440 [fix](pipeline) Fix mem control in local exchanger (#38885) (#38910)
If a block (>128M) is dequeue by local exchange source operator and it
is the last block, both of source operators and sink operators will be
hang. This PR fixed it.

pick #38885
2024-08-06 14:45:41 +08:00
ba5c6fba98 [scheduler](core) Use signed int as number of cores (#38514) (#38913)
pick #38514

*** is nereids: 0 ***
*** tablet id: 0 ***
*** Aborted at 1722279016 (unix time) try "date -d @1722279016" if you
are using GNU date ***
*** Current BE git commitID: e9f12fac47e ***
*** SIGSEGV unknown detail explain (@0x0) received by PID 1116227 (TID
1116498 OR 0x7f009ac00640) from PID 0; stack trace: *** 0#
doris::signal::(anonymous namespace)::FailureSignalHandler(int,
siginfo_t*, void*) at

/home/zcp/repo_center/doris_branch-2.1/doris/be/src/common/signal_handler.h:421
1# PosixSignals::chained_handler(int, siginfo*, void*) [clone .part.0]
in /usr/lib/jvm/java-17-openjdk-amd64/lib/server/libjvm.so 2#
JVM_handle_linux_signal in
/usr/lib/jvm/java-17-openjdk-amd64/lib/server/libjvm.so
 3# 0x00007F01E49B0520 in /lib/x86_64-linux-gnu/libc.so.6
 4# pthread_mutex_lock at ./nptl/pthread_mutex_lock.c:80
5# doris::pipeline::MultiCoreTaskQueue::take(unsigned long) at
/home/zcp/repo_center/doris_branch-2.1/doris/be/src/pipeline/task_queue.cpp:154
6# doris::pipeline::TaskScheduler::_do_work(unsigned long) at
/home/zcp/repo_center/doris_branch-2.1/doris/be/src/pipeline/task_scheduler.cpp:268
7# doris::ThreadPool::dispatch_thread() in
/mnt/disk1/STRESS_ENV/be/lib/doris_be
8# doris::Thread::supervise_thread(void*) at
/home/zcp/repo_center/doris_branch-2.1/doris/be/src/util/thread.cpp:499
 9# start_thread at ./nptl/pthread_create.c:442
10# 0x00007F01E4A94850 at ../sysdeps/unix/sysv/linux/x86_64/clone3.S:83

## Proposed changes

Issue Number: close #xxx

<!--Describe your changes.-->
2024-08-06 14:44:59 +08:00
8ce30963cd [fix] (compaction) fix time series compaction policy (#38220) (#38917)
## Proposed changes

pick from #38220
2024-08-06 14:26:42 +08:00
ff6fa33021 [opt](inverted index) mow supports index optimization #(#38180)
## Proposed changes

https://github.com/apache/doris/pull/37428
https://github.com/apache/doris/pull/37429

<!--Describe your changes.-->
2024-08-06 11:18:13 +08:00
bcea54147c [feature](inverted index) String type inverted index match function c… (#38872)
https://github.com/apache/doris/pull/38170
2024-08-06 09:06:05 +08:00
c7b59b38ef [fix](hist) Fix unstable result of aggregrate function hist #38608 (#38893)
cherry pick from #38608
2024-08-06 08:52:03 +08:00
e9bf0776d7 [fix](parquet) disable parquet page index by default #38691 (#38901)
bp #38691
2024-08-06 08:51:39 +08:00
70a518e099 [Fix](multi-catalog) Fix not throw error when call close() in hive/iceberg writer. (#38902)
## Proposed changes
[Fix] (multi-catalog) Fix not throw error when call close() in
hive/iceberg writer.

When the file writer closes(), it will sync buffer to commit. Therefore,
sometimes data is written only when close() is called, which can expose
some errors. For example, hdfs_file_writer. Therefore, this error needs
to be captured in the entire close process.
2024-08-06 08:51:12 +08:00
0711423ee3 [Chore](pipeline) set PipelineFragmentContext::_timeout (#38890)
## Proposed changes

Now we use `query_timeout` to set a timeout value for queries. But for
pipelineX engine, Doris do not use it so each query will not end before
EOS. This PR fix it.

pick #35328

<!--Describe your changes.-->
2024-08-05 21:47:08 +08:00
9d5af7febd [opt](inverted index) Optimization of the initialization process in topn (#38870)
pick https://github.com/apache/doris/pull/37722
2024-08-05 18:26:00 +08:00
bf1c7a1c15 [fix](clone) fix stale tablet report miss the new cloning replica #38695 (#38839)
cherry pick from #38695
2024-08-05 18:04:24 +08:00
0f69a2a47f [fix](compaction) fix mismatch between segment key and value column rows during compaction (#37960)(#38251)(#38356) (#38835)
pick master #37960 #38251 #38356
2024-08-05 16:48:08 +08:00
4c75fecea9 [fix](compile) be compile failed in mac due to std::max (#37238) (#38860)
cherry-pick #37238 to branch-2.1
2024-08-05 16:31:39 +08:00
bb962a8291 [minor](fix) Fix incorrect fmt arguments (#38840) (#38861)
pick #38840
2024-08-05 16:06:32 +08:00
65154f8abe [branch-2.1] (doris-future) Support auto partition name function (#38853)
cherry-pick https://github.com/apache/doris/pull/34258 to branch-2.1
2024-08-05 16:04:24 +08:00
Pxl
86ef0069ea [Feature](function) support group concat with distinct and order by (#38851)
pick from #38744 and #38776
2024-08-05 15:44:51 +08:00
607c0b82a9 [opt](serde)Optimize the filling of fixed values ​​into block columns without repeated deserialization. (#37377) (#38245) (#38810)
## Proposed changes
pick pr: #38575  and fix this pr bug :  #38245
2024-08-05 09:13:08 +08:00
2653087843 [pick](array-funcs)fix array with empty arg in be behavior (#38708)
## Proposed changes
backport: https://github.com/apache/doris/pull/36845
Issue Number: close #xxx

<!--Describe your changes.-->
2024-08-05 09:08:28 +08:00
1b3d4b4d31 [cherry-pick](branch-21)fix operator do_projections should use local_state intermediate_projections (#38612) (#38765)
## Proposed changes

cherry-pick from master https://github.com/apache/doris/pull/38612

<!--Describe your changes.-->
2024-08-05 09:07:16 +08:00
5d02c48715 [feature](hive)Support reading renamed Parquet Hive and Orc Hive tables. (#38432) (#38809)
bp #38432 

## Proposed changes
Add `hive_parquet_use_column_names` and `hive_orc_use_column_names`
session variables to read the table after rename column in `Hive`.

These two session variables are referenced from
`parquet_use_column_names` and `orc_use_column_names` of `Trino` hive
connector.

By default, these two session variables are true. When they are set to
false, reading orc/parquet will access the columns according to the
ordinal position in the Hive table definition.

For example:
```mysql
in Hive :
hive> create table tmp (a int , b string) stored as parquet;
hive> insert into table tmp values(1,"2");
hive> alter table tmp  change column  a new_a int;
hive> insert into table tmp values(2,"4");

in Doris :
mysql> set hive_parquet_use_column_names=true;
Query OK, 0 rows affected (0.00 sec)

mysql> select  * from tmp;
+-------+------+
| new_a | b    |
+-------+------+
|  NULL | 2    |
|     2 | 4    |
+-------+------+
2 rows in set (0.02 sec)

mysql> set hive_parquet_use_column_names=false;
Query OK, 0 rows affected (0.00 sec)

mysql> select  * from tmp;
+-------+------+
| new_a | b    |
+-------+------+
|     1 | 2    |
|     2 | 4    |
+-------+------+
2 rows in set (0.02 sec)
```

You can use `set
parquet.column.index.access/orc.force.positional.evolution = true/false`
in hive 3 to control the results of reading the table like these two
session variables. However, for the rename struct inside column parquet
table, the effects of hive and doris are different.
2024-08-05 09:06:49 +08:00
53773ae6b7 [opt](join) check datatype of intermediate slots in hash join (#38556) (#38792)
## Proposed changes

pick #38556
2024-08-05 09:03:21 +08:00
8fa0710cb3 [branch-2.1](load) fix miss writer in concurrency incremental open (#38605) (#38793)
pick https://github.com/apache/doris/pull/38605
2024-08-05 08:56:23 +08:00
6035edad0b [fix](multi table) fix single stream multi table memory leak (#38255) (#38824)
pick (#38255)

We meet OOM when using single stream multi table


![image](https://github.com/user-attachments/assets/748e9914-d591-4f41-8b28-412d3cecc841)

It exist memory leak, and heap profile like:


![image](https://github.com/user-attachments/assets/af30c593-88ea-44f6-bba1-82436b13f99f)

The stream load context will not release in some exception conditions as
plan failed for high concurrency causing timeout when obtaining read
lock. It is introduced by https://github.com/apache/doris/pull/35458

The solution effect is shown in the following figure, which can run
stably with a small amount of memory


![image](https://github.com/user-attachments/assets/4483e0a5-6c0c-4cdc-b8ed-3408da6a86b2)
2024-08-04 22:12:44 +08:00
0603ec1d9d [enhancement](compaction) optimizing memory usage for compaction (#37099) (#37486) 2024-08-04 10:49:18 +08:00
7bdc508ac7 [Bug](fix) fix coredump case in (not null, null) execpt (not null, not null) case (#38756)
## Proposed changes

Issue Number: close #38612

<!--Describe your changes.-->
2024-08-04 10:44:10 +08:00
64b69ed1ba [branch-2.1] Picks "[opt](merge-on-write) Skip the alignment process of some rowsets in partial update #38487" (#38682)
## Proposed changes

picks https://github.com/apache/doris/pull/38487
2024-08-02 20:05:31 +08:00
556f0fc784 [pick](json-keys) support json_keys function (#38631)
## Proposed changes
backport: https://github.com/apache/doris/pull/36411
Issue Number: close #xxx

<!--Describe your changes.-->
2024-08-02 19:10:00 +08:00
9b07cd2069 [pick](json-serde)pick jsonb string deserialize with spec char (#38711)
## Proposed changes
backport: https://github.com/apache/doris/pull/37176
Issue Number: close #xxx

<!--Describe your changes.-->
2024-08-02 13:37:41 +08:00
b3f335ba5f [enhancement](index compaction) Enable index compaction by default (#36812) (#38676)
## Proposed changes

bp #36812
2024-08-02 12:03:57 +08:00
1d982ada45 [pick](array-funcs)pick array func array_enumerate_uniq bugfix (#38721)
## Proposed changes
backport: https://github.com/apache/doris/pull/38384
Issue Number: close #xxx

<!--Describe your changes.-->
2024-08-02 11:25:17 +08:00
f5bc65989c [pick](array-range)improve array_range func for large param (#38707)
## Proposed changes
backport: https://github.com/apache/doris/pull/38284
Issue Number: close #xxx

<!--Describe your changes.-->
2024-08-02 11:22:46 +08:00
b7e1588be9 [pick](upgrade)fix log message (#38710)
## Proposed changes
backport: https://github.com/apache/doris/pull/38254
Issue Number: close #xxx

<!--Describe your changes.-->
2024-08-02 11:20:20 +08:00
327069fdbc [branch-2.1](log) add tablet clear cache log (#38713) 2024-08-02 08:40:02 +08:00
0da388ade5 [fix](inverted index) fix match_phrase_ edge query result error #38327 (#38740) 2024-08-01 23:17:53 +08:00
4d980b8235 [feature](http action)Add http action to show nested inverted index file (#38272) (#38672)
backport #38272
2024-08-01 19:30:59 +08:00
3e5255a862 [pipeline](fix) Fix blocking task which is not triggered by 2nd RPC (… (#38694)
…#38568)

Once a query is cancelled due to any reason, BE may not receive 2nd RPC
from FE. If so, we must ensure the execution dependency is ready so
tasks will not be blocked.
2024-08-01 18:23:41 +08:00