Commit Graph

8079 Commits

Author SHA1 Message Date
4c75fecea9 [fix](compile) be compile failed in mac due to std::max (#37238) (#38860)
cherry-pick #37238 to branch-2.1
2024-08-05 16:31:39 +08:00
bb962a8291 [minor](fix) Fix incorrect fmt arguments (#38840) (#38861)
pick #38840
2024-08-05 16:06:32 +08:00
65154f8abe [branch-2.1] (doris-future) Support auto partition name function (#38853)
cherry-pick https://github.com/apache/doris/pull/34258 to branch-2.1
2024-08-05 16:04:24 +08:00
Pxl
86ef0069ea [Feature](function) support group concat with distinct and order by (#38851)
pick from #38744 and #38776
2024-08-05 15:44:51 +08:00
607c0b82a9 [opt](serde)Optimize the filling of fixed values ​​into block columns without repeated deserialization. (#37377) (#38245) (#38810)
## Proposed changes
pick pr: #38575  and fix this pr bug :  #38245
2024-08-05 09:13:08 +08:00
2653087843 [pick](array-funcs)fix array with empty arg in be behavior (#38708)
## Proposed changes
backport: https://github.com/apache/doris/pull/36845
Issue Number: close #xxx

<!--Describe your changes.-->
2024-08-05 09:08:28 +08:00
1b3d4b4d31 [cherry-pick](branch-21)fix operator do_projections should use local_state intermediate_projections (#38612) (#38765)
## Proposed changes

cherry-pick from master https://github.com/apache/doris/pull/38612

<!--Describe your changes.-->
2024-08-05 09:07:16 +08:00
5d02c48715 [feature](hive)Support reading renamed Parquet Hive and Orc Hive tables. (#38432) (#38809)
bp #38432 

## Proposed changes
Add `hive_parquet_use_column_names` and `hive_orc_use_column_names`
session variables to read the table after rename column in `Hive`.

These two session variables are referenced from
`parquet_use_column_names` and `orc_use_column_names` of `Trino` hive
connector.

By default, these two session variables are true. When they are set to
false, reading orc/parquet will access the columns according to the
ordinal position in the Hive table definition.

For example:
```mysql
in Hive :
hive> create table tmp (a int , b string) stored as parquet;
hive> insert into table tmp values(1,"2");
hive> alter table tmp  change column  a new_a int;
hive> insert into table tmp values(2,"4");

in Doris :
mysql> set hive_parquet_use_column_names=true;
Query OK, 0 rows affected (0.00 sec)

mysql> select  * from tmp;
+-------+------+
| new_a | b    |
+-------+------+
|  NULL | 2    |
|     2 | 4    |
+-------+------+
2 rows in set (0.02 sec)

mysql> set hive_parquet_use_column_names=false;
Query OK, 0 rows affected (0.00 sec)

mysql> select  * from tmp;
+-------+------+
| new_a | b    |
+-------+------+
|     1 | 2    |
|     2 | 4    |
+-------+------+
2 rows in set (0.02 sec)
```

You can use `set
parquet.column.index.access/orc.force.positional.evolution = true/false`
in hive 3 to control the results of reading the table like these two
session variables. However, for the rename struct inside column parquet
table, the effects of hive and doris are different.
2024-08-05 09:06:49 +08:00
53773ae6b7 [opt](join) check datatype of intermediate slots in hash join (#38556) (#38792)
## Proposed changes

pick #38556
2024-08-05 09:03:21 +08:00
8fa0710cb3 [branch-2.1](load) fix miss writer in concurrency incremental open (#38605) (#38793)
pick https://github.com/apache/doris/pull/38605
2024-08-05 08:56:23 +08:00
6035edad0b [fix](multi table) fix single stream multi table memory leak (#38255) (#38824)
pick (#38255)

We meet OOM when using single stream multi table


![image](https://github.com/user-attachments/assets/748e9914-d591-4f41-8b28-412d3cecc841)

It exist memory leak, and heap profile like:


![image](https://github.com/user-attachments/assets/af30c593-88ea-44f6-bba1-82436b13f99f)

The stream load context will not release in some exception conditions as
plan failed for high concurrency causing timeout when obtaining read
lock. It is introduced by https://github.com/apache/doris/pull/35458

The solution effect is shown in the following figure, which can run
stably with a small amount of memory


![image](https://github.com/user-attachments/assets/4483e0a5-6c0c-4cdc-b8ed-3408da6a86b2)
2024-08-04 22:12:44 +08:00
0603ec1d9d [enhancement](compaction) optimizing memory usage for compaction (#37099) (#37486) 2024-08-04 10:49:18 +08:00
7bdc508ac7 [Bug](fix) fix coredump case in (not null, null) execpt (not null, not null) case (#38756)
## Proposed changes

Issue Number: close #38612

<!--Describe your changes.-->
2024-08-04 10:44:10 +08:00
64b69ed1ba [branch-2.1] Picks "[opt](merge-on-write) Skip the alignment process of some rowsets in partial update #38487" (#38682)
## Proposed changes

picks https://github.com/apache/doris/pull/38487
2024-08-02 20:05:31 +08:00
556f0fc784 [pick](json-keys) support json_keys function (#38631)
## Proposed changes
backport: https://github.com/apache/doris/pull/36411
Issue Number: close #xxx

<!--Describe your changes.-->
2024-08-02 19:10:00 +08:00
9b07cd2069 [pick](json-serde)pick jsonb string deserialize with spec char (#38711)
## Proposed changes
backport: https://github.com/apache/doris/pull/37176
Issue Number: close #xxx

<!--Describe your changes.-->
2024-08-02 13:37:41 +08:00
b3f335ba5f [enhancement](index compaction) Enable index compaction by default (#36812) (#38676)
## Proposed changes

bp #36812
2024-08-02 12:03:57 +08:00
1d982ada45 [pick](array-funcs)pick array func array_enumerate_uniq bugfix (#38721)
## Proposed changes
backport: https://github.com/apache/doris/pull/38384
Issue Number: close #xxx

<!--Describe your changes.-->
2024-08-02 11:25:17 +08:00
f5bc65989c [pick](array-range)improve array_range func for large param (#38707)
## Proposed changes
backport: https://github.com/apache/doris/pull/38284
Issue Number: close #xxx

<!--Describe your changes.-->
2024-08-02 11:22:46 +08:00
b7e1588be9 [pick](upgrade)fix log message (#38710)
## Proposed changes
backport: https://github.com/apache/doris/pull/38254
Issue Number: close #xxx

<!--Describe your changes.-->
2024-08-02 11:20:20 +08:00
327069fdbc [branch-2.1](log) add tablet clear cache log (#38713) 2024-08-02 08:40:02 +08:00
0da388ade5 [fix](inverted index) fix match_phrase_ edge query result error #38327 (#38740) 2024-08-01 23:17:53 +08:00
4d980b8235 [feature](http action)Add http action to show nested inverted index file (#38272) (#38672)
backport #38272
2024-08-01 19:30:59 +08:00
3e5255a862 [pipeline](fix) Fix blocking task which is not triggered by 2nd RPC (… (#38694)
…#38568)

Once a query is cancelled due to any reason, BE may not receive 2nd RPC
from FE. If so, we must ensure the execution dependency is ready so
tasks will not be blocked.
2024-08-01 18:23:41 +08:00
82c681595e [fix](local exchange) Fix local exchange blocked by a huge data block… (#38693)
… (#38657)

If a huge block is push into local exchanger, it will be blocked due to
concurrent problems. This PR use a unique lock to resolve it .
2024-08-01 18:04:19 +08:00
e8690b62ee [fix](group commit) Pick add debug log show why group commit not work; delete wal when replay success (#38611) (#38659)
Pick https://github.com/apache/doris/pull/38611
2024-08-01 16:59:54 +08:00
9d23ccf1f2 [Improvement](schema scan) Use async scanner for schema scanners (#38… (#38666)
…403)
2024-08-01 16:05:24 +08:00
4042cdf553 [Fix](memory) Fix allocator.h compiling failed on mac. (#38646)
Backport #38562. Fix allocator.h compiling failed on mac which
introduced by #37257.
2024-08-01 13:56:53 +08:00
63a3ff570b [Opt](load) print tablet id when memtable flush coredump #38618 (#38656)
cherry pick from #38618
2024-08-01 13:52:50 +08:00
28998300d4 [Bug](fix) fix ubsan use int32_t pointer access bool value (#38621)
## Proposed changes

Issue Number: close #38617

<!--Describe your changes.-->
2024-08-01 13:52:12 +08:00
338fa32303 [pick](simdjson) fix simdjson with object array when jsonroot is not empty (#38633)
## Proposed changes
backport: https://github.com/apache/doris/pull/38490
Issue Number: close #xxx

<!--Describe your changes.-->
2024-08-01 11:04:54 +08:00
41fa7bc9fd [bugfix](paimon)Fixed the reading of timestamp with time zone type data for 2.1 (#37716) (#38592)
bp: #37716
2024-08-01 10:23:06 +08:00
184b8cbbe4 [pick](json)fix jsonb deseriaze (#38630)
## Proposed changes
backport: https://github.com/apache/doris/pull/37251
Issue Number: close #xxx

<!--Describe your changes.-->
2024-08-01 10:18:27 +08:00
66ebf709ba [Fix](inverted index) fix fast execute for not_in expr #37745 (#38594)
cherry pick from #37745
2024-07-31 19:58:12 +08:00
7730aa2170 [Fix](inverted index) fix wrong no need read data when same column in inverted index and like function #36687 (#38581)
cherry pick from #36687
2024-07-31 19:41:39 +08:00
a75511ae08 [Feature](inverted index) add no need read data optimize config (#38584)
pick from #36686
2024-07-31 19:39:17 +08:00
232ee74566 [Fix](inverted index) fix memory leak for index compaction (#38586)
Pick from (#36209)
2024-07-31 19:19:38 +08:00
aed0cc8ba0 [Fix](inverted index) remove duplicate stats of inverted_index_query_cache_miss #36707 (#38580)
cherry pick from #36707
2024-07-31 19:18:58 +08:00
7357d7bd3b [Update](inverted index) Add column name to debug point for "no need to read data" optimization #37649 (#38579)
cherry pick from #37649
2024-07-31 19:17:46 +08:00
3b234cfab6 [performance](exec) Performance problem create too many scanner task (#38460)
## Proposed changes

cherry pick the pr: #38430

<!--Describe your changes.-->
2024-07-31 14:34:01 +08:00
aa9bdd76d0 [Pick](Variant) pick some fix #38413 #38364 (#38512) 2024-07-31 11:03:31 +08:00
182bf4d323 [chore](fe) Returns dropped tables in GetMeta request (#38541)
Cherry-pick #38019
2024-07-31 10:57:00 +08:00
017dad8c54 [fix](type)support runtime predicate for time type (#38258) (#38465)
## Proposed changes
https://github.com/apache/doris/pull/38258
Issue Number: close #xxx

<!--Describe your changes.-->
2024-07-31 10:27:36 +08:00
715bcd13f1 [opt](mow) opt mow lookup with sequence column (#38287) (#38406) 2024-07-30 09:46:09 +08:00
cefee4dbc0 [Pick 2.1](clucene) update clucene version (#38496)
## Proposed changes

backport #38482
2024-07-30 09:40:04 +08:00
17d351af80 [fix](csv reader) fix csv parser incorrect if enclosing line_delimiter (#38347) (#38445)
Csv reader parse data incorrect when data enclosing line_delimiter, for
example, line_delimiter is \n and enclose is ', data as follows:
```
'aaaaaaaaaaaa
bbbb'
```
it will be parsed as two columns: `'aaaaaaaaaaaa` and `bbbb',` rather
than one column
```
'aaaaaaaaaaaa
bbbb'
```

The reason why this happened is csv reader will not reset result when
not match enclose in this `output_buf_read`, causing incorrect
truncation was made.

Co-authored-by: Xin Liao <liaoxinbit@126.com>
2024-07-29 14:55:45 +08:00
87cf2d1fb4 [fix](spill) Duplicate calls to Dependency::set_ready() in hash join(#37461) (#38399)
## Proposed changes

pick #37461
Duplicate calling the function `Dependency::set_ready()` will cause
pipeline tasks to be scheduled incorrectly.
2024-07-29 09:44:48 +08:00
e9f12fac47 [fix](load) fix no error url for stream load #38325 (#38417)
cherry pick from #38325
2024-07-28 19:06:57 +08:00
d8744cd3d0 [Opt](load) don't print stack when some errors occur for stream load #38332 (#38418)
cherry pick from #38332
2024-07-28 19:04:24 +08:00
c93f3bd24e [Improvement](bloom filter) Forbid small bloom filter (#38349) (#38392)
Bloom filter has a expected filter ratio when data is enough. This PR
forbid too small bloom filter which has a big bias for filter ratio.

pick #38349
2024-07-26 10:11:31 +08:00