ff6fa33021
[opt](inverted index) mow supports index optimization #( #38180 )
...
## Proposed changes
https://github.com/apache/doris/pull/37428
https://github.com/apache/doris/pull/37429
<!--Describe your changes.-->
2024-08-06 11:18:13 +08:00
bcea54147c
[feature](inverted index) String type inverted index match function c… ( #38872 )
...
https://github.com/apache/doris/pull/38170
2024-08-06 09:06:05 +08:00
c7b59b38ef
[fix](hist) Fix unstable result of aggregrate function hist #38608 ( #38893 )
...
cherry pick from #38608
2024-08-06 08:52:03 +08:00
e9bf0776d7
[fix](parquet) disable parquet page index by default #38691 ( #38901 )
...
bp #38691
2024-08-06 08:51:39 +08:00
70a518e099
[Fix](multi-catalog) Fix not throw error when call close() in hive/iceberg writer. ( #38902 )
...
## Proposed changes
[Fix] (multi-catalog) Fix not throw error when call close() in
hive/iceberg writer.
When the file writer closes(), it will sync buffer to commit. Therefore,
sometimes data is written only when close() is called, which can expose
some errors. For example, hdfs_file_writer. Therefore, this error needs
to be captured in the entire close process.
2024-08-06 08:51:12 +08:00
0711423ee3
[Chore](pipeline) set PipelineFragmentContext::_timeout ( #38890 )
...
## Proposed changes
Now we use `query_timeout` to set a timeout value for queries. But for
pipelineX engine, Doris do not use it so each query will not end before
EOS. This PR fix it.
pick #35328
<!--Describe your changes.-->
2024-08-05 21:47:08 +08:00
9d5af7febd
[opt](inverted index) Optimization of the initialization process in topn ( #38870 )
...
pick https://github.com/apache/doris/pull/37722
2024-08-05 18:26:00 +08:00
bf1c7a1c15
[fix](clone) fix stale tablet report miss the new cloning replica #38695 ( #38839 )
...
cherry pick from #38695
2024-08-05 18:04:24 +08:00
0f69a2a47f
[fix](compaction) fix mismatch between segment key and value column rows during compaction ( #37960 )( #38251 )( #38356 ) ( #38835 )
...
pick master #37960 #38251 #38356
2024-08-05 16:48:08 +08:00
4c75fecea9
[fix](compile) be compile failed in mac due to std::max ( #37238 ) ( #38860 )
...
cherry-pick #37238 to branch-2.1
2024-08-05 16:31:39 +08:00
bb962a8291
[minor](fix) Fix incorrect fmt arguments ( #38840 ) ( #38861 )
...
pick #38840
2024-08-05 16:06:32 +08:00
65154f8abe
[branch-2.1] (doris-future) Support auto partition name function ( #38853 )
...
cherry-pick https://github.com/apache/doris/pull/34258 to branch-2.1
2024-08-05 16:04:24 +08:00
86ef0069ea
[Feature](function) support group concat with distinct and order by ( #38851 )
...
pick from #38744 and #38776
2024-08-05 15:44:51 +08:00
607c0b82a9
[opt](serde)Optimize the filling of fixed values into block columns without repeated deserialization. ( #37377 ) ( #38245 ) ( #38810 )
...
## Proposed changes
pick pr: #38575 and fix this pr bug : #38245
2024-08-05 09:13:08 +08:00
2653087843
[pick](array-funcs)fix array with empty arg in be behavior ( #38708 )
...
## Proposed changes
backport: https://github.com/apache/doris/pull/36845
Issue Number: close #xxx
<!--Describe your changes.-->
2024-08-05 09:08:28 +08:00
1b3d4b4d31
[cherry-pick](branch-21)fix operator do_projections should use local_state intermediate_projections ( #38612 ) ( #38765 )
...
## Proposed changes
cherry-pick from master https://github.com/apache/doris/pull/38612
<!--Describe your changes.-->
2024-08-05 09:07:16 +08:00
5d02c48715
[feature](hive)Support reading renamed Parquet Hive and Orc Hive tables. ( #38432 ) ( #38809 )
...
bp #38432
## Proposed changes
Add `hive_parquet_use_column_names` and `hive_orc_use_column_names`
session variables to read the table after rename column in `Hive`.
These two session variables are referenced from
`parquet_use_column_names` and `orc_use_column_names` of `Trino` hive
connector.
By default, these two session variables are true. When they are set to
false, reading orc/parquet will access the columns according to the
ordinal position in the Hive table definition.
For example:
```mysql
in Hive :
hive> create table tmp (a int , b string) stored as parquet;
hive> insert into table tmp values(1,"2");
hive> alter table tmp change column a new_a int;
hive> insert into table tmp values(2,"4");
in Doris :
mysql> set hive_parquet_use_column_names=true;
Query OK, 0 rows affected (0.00 sec)
mysql> select * from tmp;
+-------+------+
| new_a | b |
+-------+------+
| NULL | 2 |
| 2 | 4 |
+-------+------+
2 rows in set (0.02 sec)
mysql> set hive_parquet_use_column_names=false;
Query OK, 0 rows affected (0.00 sec)
mysql> select * from tmp;
+-------+------+
| new_a | b |
+-------+------+
| 1 | 2 |
| 2 | 4 |
+-------+------+
2 rows in set (0.02 sec)
```
You can use `set
parquet.column.index.access/orc.force.positional.evolution = true/false`
in hive 3 to control the results of reading the table like these two
session variables. However, for the rename struct inside column parquet
table, the effects of hive and doris are different.
2024-08-05 09:06:49 +08:00
53773ae6b7
[opt](join) check datatype of intermediate slots in hash join ( #38556 ) ( #38792 )
...
## Proposed changes
pick #38556
2024-08-05 09:03:21 +08:00
8fa0710cb3
[branch-2.1](load) fix miss writer in concurrency incremental open ( #38605 ) ( #38793 )
...
pick https://github.com/apache/doris/pull/38605
2024-08-05 08:56:23 +08:00
6035edad0b
[fix](multi table) fix single stream multi table memory leak ( #38255 ) ( #38824 )
...
pick (#38255 )
We meet OOM when using single stream multi table

It exist memory leak, and heap profile like:

The stream load context will not release in some exception conditions as
plan failed for high concurrency causing timeout when obtaining read
lock. It is introduced by https://github.com/apache/doris/pull/35458
The solution effect is shown in the following figure, which can run
stably with a small amount of memory

2024-08-04 22:12:44 +08:00
0603ec1d9d
[enhancement](compaction) optimizing memory usage for compaction ( #37099 ) ( #37486 )
2024-08-04 10:49:18 +08:00
7bdc508ac7
[Bug](fix) fix coredump case in (not null, null) execpt (not null, not null) case ( #38756 )
...
## Proposed changes
Issue Number: close #38612
<!--Describe your changes.-->
2024-08-04 10:44:10 +08:00
64b69ed1ba
[branch-2.1] Picks "[opt](merge-on-write) Skip the alignment process of some rowsets in partial update #38487 " ( #38682 )
...
## Proposed changes
picks https://github.com/apache/doris/pull/38487
2024-08-02 20:05:31 +08:00
556f0fc784
[pick](json-keys) support json_keys function ( #38631 )
...
## Proposed changes
backport: https://github.com/apache/doris/pull/36411
Issue Number: close #xxx
<!--Describe your changes.-->
2024-08-02 19:10:00 +08:00
9b07cd2069
[pick](json-serde)pick jsonb string deserialize with spec char ( #38711 )
...
## Proposed changes
backport: https://github.com/apache/doris/pull/37176
Issue Number: close #xxx
<!--Describe your changes.-->
2024-08-02 13:37:41 +08:00
b3f335ba5f
[enhancement](index compaction) Enable index compaction by default ( #36812 ) ( #38676 )
...
## Proposed changes
bp #36812
2024-08-02 12:03:57 +08:00
1d982ada45
[pick](array-funcs)pick array func array_enumerate_uniq bugfix ( #38721 )
...
## Proposed changes
backport: https://github.com/apache/doris/pull/38384
Issue Number: close #xxx
<!--Describe your changes.-->
2024-08-02 11:25:17 +08:00
f5bc65989c
[pick](array-range)improve array_range func for large param ( #38707 )
...
## Proposed changes
backport: https://github.com/apache/doris/pull/38284
Issue Number: close #xxx
<!--Describe your changes.-->
2024-08-02 11:22:46 +08:00
b7e1588be9
[pick](upgrade)fix log message ( #38710 )
...
## Proposed changes
backport: https://github.com/apache/doris/pull/38254
Issue Number: close #xxx
<!--Describe your changes.-->
2024-08-02 11:20:20 +08:00
327069fdbc
[branch-2.1](log) add tablet clear cache log ( #38713 )
2024-08-02 08:40:02 +08:00
0da388ade5
[fix](inverted index) fix match_phrase_ edge query result error #38327 ( #38740 )
2024-08-01 23:17:53 +08:00
4d980b8235
[feature](http action)Add http action to show nested inverted index file ( #38272 ) ( #38672 )
...
backport #38272
2024-08-01 19:30:59 +08:00
3e5255a862
[pipeline](fix) Fix blocking task which is not triggered by 2nd RPC (… ( #38694 )
...
…#38568)
Once a query is cancelled due to any reason, BE may not receive 2nd RPC
from FE. If so, we must ensure the execution dependency is ready so
tasks will not be blocked.
2024-08-01 18:23:41 +08:00
82c681595e
[fix](local exchange) Fix local exchange blocked by a huge data block… ( #38693 )
...
… (#38657 )
If a huge block is push into local exchanger, it will be blocked due to
concurrent problems. This PR use a unique lock to resolve it .
2024-08-01 18:04:19 +08:00
e8690b62ee
[fix](group commit) Pick add debug log show why group commit not work; delete wal when replay success ( #38611 ) ( #38659 )
...
Pick https://github.com/apache/doris/pull/38611
2024-08-01 16:59:54 +08:00
9d23ccf1f2
[Improvement](schema scan) Use async scanner for schema scanners (#38… ( #38666 )
...
…403)
2024-08-01 16:05:24 +08:00
4042cdf553
[Fix](memory) Fix allocator.h compiling failed on mac. ( #38646 )
...
Backport #38562 . Fix allocator.h compiling failed on mac which
introduced by #37257 .
2024-08-01 13:56:53 +08:00
63a3ff570b
[Opt](load) print tablet id when memtable flush coredump #38618 ( #38656 )
...
cherry pick from #38618
2024-08-01 13:52:50 +08:00
28998300d4
[Bug](fix) fix ubsan use int32_t pointer access bool value ( #38621 )
...
## Proposed changes
Issue Number: close #38617
<!--Describe your changes.-->
2024-08-01 13:52:12 +08:00
338fa32303
[pick](simdjson) fix simdjson with object array when jsonroot is not empty ( #38633 )
...
## Proposed changes
backport: https://github.com/apache/doris/pull/38490
Issue Number: close #xxx
<!--Describe your changes.-->
2024-08-01 11:04:54 +08:00
41fa7bc9fd
[bugfix](paimon)Fixed the reading of timestamp with time zone type data for 2.1 ( #37716 ) ( #38592 )
...
bp: #37716
2024-08-01 10:23:06 +08:00
184b8cbbe4
[pick](json)fix jsonb deseriaze ( #38630 )
...
## Proposed changes
backport: https://github.com/apache/doris/pull/37251
Issue Number: close #xxx
<!--Describe your changes.-->
2024-08-01 10:18:27 +08:00
66ebf709ba
[Fix](inverted index) fix fast execute for not_in expr #37745 ( #38594 )
...
cherry pick from #37745
2024-07-31 19:58:12 +08:00
7730aa2170
[Fix](inverted index) fix wrong no need read data when same column in inverted index and like function #36687 ( #38581 )
...
cherry pick from #36687
2024-07-31 19:41:39 +08:00
a75511ae08
[Feature](inverted index) add no need read data optimize config ( #38584 )
...
pick from #36686
2024-07-31 19:39:17 +08:00
232ee74566
[Fix](inverted index) fix memory leak for index compaction ( #38586 )
...
Pick from (#36209 )
2024-07-31 19:19:38 +08:00
aed0cc8ba0
[Fix](inverted index) remove duplicate stats of inverted_index_query_cache_miss #36707 ( #38580 )
...
cherry pick from #36707
2024-07-31 19:18:58 +08:00
7357d7bd3b
[Update](inverted index) Add column name to debug point for "no need to read data" optimization #37649 ( #38579 )
...
cherry pick from #37649
2024-07-31 19:17:46 +08:00
3b234cfab6
[performance](exec) Performance problem create too many scanner task ( #38460 )
...
## Proposed changes
cherry pick the pr: #38430
<!--Describe your changes.-->
2024-07-31 14:34:01 +08:00
aa9bdd76d0
[Pick](Variant) pick some fix #38413 #38364 ( #38512 )
2024-07-31 11:03:31 +08:00