Commit Graph

19668 Commits

Author SHA1 Message Date
ff6fa33021 [opt](inverted index) mow supports index optimization #(#38180)
## Proposed changes

https://github.com/apache/doris/pull/37428
https://github.com/apache/doris/pull/37429

<!--Describe your changes.-->
2024-08-06 11:18:13 +08:00
ab3057b2d4 [Feat](nereids) support date function in partition prune (#38743) (#38898)
cherry-pick #38743 to branch-2.1
2024-08-06 09:13:13 +08:00
bcea54147c [feature](inverted index) String type inverted index match function c… (#38872)
https://github.com/apache/doris/pull/38170
2024-08-06 09:06:05 +08:00
c7b59b38ef [fix](hist) Fix unstable result of aggregrate function hist #38608 (#38893)
cherry pick from #38608
2024-08-06 08:52:03 +08:00
e9bf0776d7 [fix](parquet) disable parquet page index by default #38691 (#38901)
bp #38691
2024-08-06 08:51:39 +08:00
70a518e099 [Fix](multi-catalog) Fix not throw error when call close() in hive/iceberg writer. (#38902)
## Proposed changes
[Fix] (multi-catalog) Fix not throw error when call close() in
hive/iceberg writer.

When the file writer closes(), it will sync buffer to commit. Therefore,
sometimes data is written only when close() is called, which can expose
some errors. For example, hdfs_file_writer. Therefore, this error needs
to be captured in the entire close process.
2024-08-06 08:51:12 +08:00
3b9394a8c7 [improvement](tablet scheduler) Adjust tablet sched priority to help load data succ #38528 (#38884)
cherry pick from #38528
2024-08-06 02:13:47 +08:00
0711423ee3 [Chore](pipeline) set PipelineFragmentContext::_timeout (#38890)
## Proposed changes

Now we use `query_timeout` to set a timeout value for queries. But for
pipelineX engine, Doris do not use it so each query will not end before
EOS. This PR fix it.

pick #35328

<!--Describe your changes.-->
2024-08-05 21:47:08 +08:00
9c020f9db1 [fix](fe) Fix the default value of ReplacePartitionClause.isStrictRange (#38688) (#38879) 2024-08-05 20:59:50 +08:00
ce75e6adfe [fix](group commit) Fix group commit debug log and improve performance (#38754) (#38841)
Pick https://github.com/apache/doris/pull/38754
2024-08-05 18:34:49 +08:00
0f0b0e9b37 [Feat](nereids) Support date_trunc function in partition prune (#38025) (#38849)
cherry-pick #38025 to branch-2.1
2024-08-05 18:29:10 +08:00
9d5af7febd [opt](inverted index) Optimization of the initialization process in topn (#38870)
pick https://github.com/apache/doris/pull/37722
2024-08-05 18:26:00 +08:00
40567b5d69 [fix](nereids)support group_concat with distinct and order by (#38871)
## Proposed changes

pick from master https://github.com/apache/doris/pull/38080

<!--Describe your changes.-->
2024-08-05 18:23:55 +08:00
bf1c7a1c15 [fix](clone) fix stale tablet report miss the new cloning replica #38695 (#38839)
cherry pick from #38695
2024-08-05 18:04:24 +08:00
0f69a2a47f [fix](compaction) fix mismatch between segment key and value column rows during compaction (#37960)(#38251)(#38356) (#38835)
pick master #37960 #38251 #38356
2024-08-05 16:48:08 +08:00
994c56f914 [fix](txn) fix abortTxn by label does not acquire table write lock (#38777) (#38842)
pick https://github.com/apache/doris/pull/38777
2024-08-05 16:33:20 +08:00
7d4ff34d1f [fix](regression) fix test_primary_key_simple_case (#38798) (#38844)
pick https://github.com/apache/doris/pull/38798
2024-08-05 16:32:41 +08:00
4c75fecea9 [fix](compile) be compile failed in mac due to std::max (#37238) (#38860)
cherry-pick #37238 to branch-2.1
2024-08-05 16:31:39 +08:00
bb962a8291 [minor](fix) Fix incorrect fmt arguments (#38840) (#38861)
pick #38840
2024-08-05 16:06:32 +08:00
65154f8abe [branch-2.1] (doris-future) Support auto partition name function (#38853)
cherry-pick https://github.com/apache/doris/pull/34258 to branch-2.1
2024-08-05 16:04:24 +08:00
Pxl
86ef0069ea [Feature](function) support group concat with distinct and order by (#38851)
pick from #38744 and #38776
2024-08-05 15:44:51 +08:00
5dfc5d2c77 [enhancement](querycancel) print detail message when query is cancelled (#38859)
## Proposed changes

Issue Number: close #xxx

<!--Describe your changes.-->

---------

Co-authored-by: yiguolei <yiguolei@gmail.com>
2024-08-05 14:47:03 +08:00
808397e0d2 [fix](testcase) add order by to fix unstable output of passwordLeaked #38813 (#38855)
cherry pick from #38813
2024-08-05 13:51:54 +08:00
aaee1d9bbd [fix](regression) fix prepare_insert when execute prepare stmt in observer fe (#38545) (#38850)
pick https://github.com/apache/doris/pull/38545
2024-08-05 13:45:13 +08:00
de9b9d6a39 [Fix](nereids) change char(0) to char(1), varchar(0) to varchar(65533) when create table (#38427) (#38530)
cherry-pick #38427 to branch-2.1

---------

Co-authored-by: morrySnow <101034200+morrySnow@users.noreply.github.com>
2024-08-05 09:18:18 +08:00
9430b27e68 [branch-2.1][improvement](jdbc catalog) improvement some jdbc catalog properties check order (#38770)
pick (#38439)

1. Move the execution of testJdbcConnection() to checkWhenCreating
instead of the constructor
2. Move the logic of renaming lower_case_table_names to
lower_case_meta_names to setDefaultPropsIfMissing
2024-08-05 09:14:04 +08:00
607c0b82a9 [opt](serde)Optimize the filling of fixed values ​​into block columns without repeated deserialization. (#37377) (#38245) (#38810)
## Proposed changes
pick pr: #38575  and fix this pr bug :  #38245
2024-08-05 09:13:08 +08:00
2653087843 [pick](array-funcs)fix array with empty arg in be behavior (#38708)
## Proposed changes
backport: https://github.com/apache/doris/pull/36845
Issue Number: close #xxx

<!--Describe your changes.-->
2024-08-05 09:08:28 +08:00
1b3d4b4d31 [cherry-pick](branch-21)fix operator do_projections should use local_state intermediate_projections (#38612) (#38765)
## Proposed changes

cherry-pick from master https://github.com/apache/doris/pull/38612

<!--Describe your changes.-->
2024-08-05 09:07:16 +08:00
5d02c48715 [feature](hive)Support reading renamed Parquet Hive and Orc Hive tables. (#38432) (#38809)
bp #38432 

## Proposed changes
Add `hive_parquet_use_column_names` and `hive_orc_use_column_names`
session variables to read the table after rename column in `Hive`.

These two session variables are referenced from
`parquet_use_column_names` and `orc_use_column_names` of `Trino` hive
connector.

By default, these two session variables are true. When they are set to
false, reading orc/parquet will access the columns according to the
ordinal position in the Hive table definition.

For example:
```mysql
in Hive :
hive> create table tmp (a int , b string) stored as parquet;
hive> insert into table tmp values(1,"2");
hive> alter table tmp  change column  a new_a int;
hive> insert into table tmp values(2,"4");

in Doris :
mysql> set hive_parquet_use_column_names=true;
Query OK, 0 rows affected (0.00 sec)

mysql> select  * from tmp;
+-------+------+
| new_a | b    |
+-------+------+
|  NULL | 2    |
|     2 | 4    |
+-------+------+
2 rows in set (0.02 sec)

mysql> set hive_parquet_use_column_names=false;
Query OK, 0 rows affected (0.00 sec)

mysql> select  * from tmp;
+-------+------+
| new_a | b    |
+-------+------+
|     1 | 2    |
|     2 | 4    |
+-------+------+
2 rows in set (0.02 sec)
```

You can use `set
parquet.column.index.access/orc.force.positional.evolution = true/false`
in hive 3 to control the results of reading the table like these two
session variables. However, for the rename struct inside column parquet
table, the effects of hive and doris are different.
2024-08-05 09:06:49 +08:00
53773ae6b7 [opt](join) check datatype of intermediate slots in hash join (#38556) (#38792)
## Proposed changes

pick #38556
2024-08-05 09:03:21 +08:00
40767003c6 [Fix](ScanNode) Move the finalize phase of ScanNode to after the end of the Physical Translate phase (#38604)
bp: #37565

Currently, Doris first obtains splits and then performs projection.
After column pruning, it calls `updateRequiredSlots` to update the
scanRange information. However, the Trino connector's column pruning
pushdown needs to be completed before obtaining splits.

Therefore, we move the finalize phase of `ScanNode` to after the end of
the `Physical Translate` phase, so that `createScanRangeLocations` can
use the final columns which have been pruning.

## Proposed changes

Issue Number: close #xxx

<!--Describe your changes.-->
2024-08-05 08:58:59 +08:00
8fa0710cb3 [branch-2.1](load) fix miss writer in concurrency incremental open (#38605) (#38793)
pick https://github.com/apache/doris/pull/38605
2024-08-05 08:56:23 +08:00
f76397277e [fix](routine load) fix show routine load task result incorrect (#38523) (#38826)
pick (#38523)

Create a job:
```
CREATE ROUTINE LOAD testShow ON test_show_routine_load
COLUMNS TERMINATED BY ","
PROPERTIES
(
"max_batch_interval" = "5",
"max_batch_rows" = "300000",
"max_batch_size" = "209715200"
)
FROM KAFKA
(
"kafka_broker_list" = "127.0.0.1:19092",
"kafka_topic" = "test_show_routine_load",
"property.kafka_default_offsets" = "OFFSET_BEGINNING"
);
```
show routine load task:
```
SHOW ROUTINE LOAD TASK WHERE JobName = "testShow";
```
result:
```
ERROR 1105 (HY000): errCode = 2, detailMessage = The job named testshowdoes not exists or job state is stopped or cancelled
```

Do not  use `toLowerCase` method;
2024-08-04 22:18:25 +08:00
79b07d0b8a [fix](routine load) fix enclose and escape can not set in routine load job (#38402) (#38825)
pick (#38402)
2024-08-04 22:17:12 +08:00
6035edad0b [fix](multi table) fix single stream multi table memory leak (#38255) (#38824)
pick (#38255)

We meet OOM when using single stream multi table


![image](https://github.com/user-attachments/assets/748e9914-d591-4f41-8b28-412d3cecc841)

It exist memory leak, and heap profile like:


![image](https://github.com/user-attachments/assets/af30c593-88ea-44f6-bba1-82436b13f99f)

The stream load context will not release in some exception conditions as
plan failed for high concurrency causing timeout when obtaining read
lock. It is introduced by https://github.com/apache/doris/pull/35458

The solution effect is shown in the following figure, which can run
stably with a small amount of memory


![image](https://github.com/user-attachments/assets/4483e0a5-6c0c-4cdc-b8ed-3408da6a86b2)
2024-08-04 22:12:44 +08:00
8e4fad99a1 [test](routine load) add routine load case with timestamp as offset(#38567) (#38822)
pick (#38567)
2024-08-04 22:05:19 +08:00
eef8c87fb5 [chore](test) disable fault injection to make pipeline task check happy (#38665) (#38821)
pick (#38665)

test_delta_writer_v2_back_pressure_fault_injection would make pipeline
task can not finish, disable it temporarily to make pipeline task check
happy.
2024-08-04 11:18:56 +08:00
7c70f75198 [Fix](Load)Audit logs avoid recording certain sensitive information #38769 (#38784)
…

## Proposed changes

#38769

<!--Describe your changes.-->
2024-08-04 10:53:03 +08:00
0603ec1d9d [enhancement](compaction) optimizing memory usage for compaction (#37099) (#37486) 2024-08-04 10:49:18 +08:00
7bdc508ac7 [Bug](fix) fix coredump case in (not null, null) execpt (not null, not null) case (#38756)
## Proposed changes

Issue Number: close #38612

<!--Describe your changes.-->
2024-08-04 10:44:10 +08:00
c0caca7c55 [fix](ES Catalog)Fix unstable test test_es_query (#38801) (#38802)
## Proposed changes

bp #38801
2024-08-03 23:49:00 +08:00
fe3e3d0fab [fix](test)Fix build index fault test (#38736) (#38762)
## Proposed changes

backport #38736
2024-08-03 23:48:29 +08:00
74908c123a [fix](test)Fix unstable test drop index fault #38768 (#38772)
## Proposed changes

bp #38768

<!--Describe your changes.-->
2024-08-03 23:47:55 +08:00
64b69ed1ba [branch-2.1] Picks "[opt](merge-on-write) Skip the alignment process of some rowsets in partial update #38487" (#38682)
## Proposed changes

picks https://github.com/apache/doris/pull/38487
2024-08-02 20:05:31 +08:00
556f0fc784 [pick](json-keys) support json_keys function (#38631)
## Proposed changes
backport: https://github.com/apache/doris/pull/36411
Issue Number: close #xxx

<!--Describe your changes.-->
2024-08-02 19:10:00 +08:00
2425730609 [enhance](auth)support cache ranger datamask and row filter (#37723) (#38575)
pick: https://github.com/apache/doris/pull/37723
2024-08-02 14:59:32 +08:00
f24d55fc94 [fix](syntax) multi statements must delim with semicolon (#38670) (#38753)
pick from master #38670
2024-08-02 14:49:51 +08:00
da7b2cf578 [refactor](catalog) set "use_meta_cache" default to true (#38244)(#38352)(#38619) (#38355)
bp #38244 #38352 #38619

---------

Co-authored-by: Yulei-Yang <yulei.yang0699@gmail.com>
2024-08-02 14:13:38 +08:00
9b07cd2069 [pick](json-serde)pick jsonb string deserialize with spec char (#38711)
## Proposed changes
backport: https://github.com/apache/doris/pull/37176
Issue Number: close #xxx

<!--Describe your changes.-->
2024-08-02 13:37:41 +08:00