Commit Graph

19261 Commits

Author SHA1 Message Date
969f7532d6 [fix](deps) fix NoSuchMethodError: newInstanceFromKeytab when use kerberos (#37322) 2024-07-08 10:19:02 +08:00
97e4025ee0 [branch-2.1](routine-load) increase routine load job default max batch size and rows (#37388)
pick #36632

Most users only care about the size of **max_batch_interval**, but in
order to achieve an interval effect, they have to configure
**max_batch_rows** and **max_batch_size** according to the
characteristics of the data. By adjusting these two default values,
users do not need to worry about configuration in most scenarios.

## Proposed changes

Issue Number: close #xxx

<!--Describe your changes.-->
2024-07-07 18:35:08 +08:00
a05406ecc9 [branch-2.1] Picks "[Fix](delete) Fix delete job timeout when executing delete from ... #37363" (#37374)
## Proposed changes

picks https://github.com/apache/doris/pull/37363
2024-07-07 18:33:17 +08:00
423483ed8f [branch-2.1](routine-load) optimize out of range error message (#37391)
## Proposed changes
pick #36450

before
```
ErrorReason{code=errCode = 105, msg='be 10002 abort task, task id: d846f3d3-7c9e-44a7-bee0-3eff8cd11c6f job id: 11310 with reason: [INTERNAL_ERROR]Offset out of range,

        0#  doris::Status doris::Status::Error<6, true>(std::basic_string_view<char, std::char_traits<char> >) at /mnt/disk1/laihui/doris/be/src/common/status.h:422
        1#  doris::Status doris::Status::InternalError<true>(std::basic_string_view<char, std::char_traits<char> >) at /mnt/disk1/laihui/doris/be/src/common/status.h:468
        2#  doris::KafkaDataConsumer::group_consume(doris::BlockingQueue<RdKafka::Message*>*, long) at /mnt/disk1/laihui/doris/be/src/runtime/routine_load/data_consumer.cpp:226
        3#  doris::KafkaDataConsumerGroup::actual_consume(std::shared_ptr<doris::DataConsumer>, doris::BlockingQueue<RdKafka::Message*>*, long, std::function<void (doris::Status const&)>) at /mnt/disk1/laihui/doris/be/src/runtime/routine_load/data_consumer_group.cpp:200
        4#  void std::__invoke_impl<void, void (doris::KafkaDataConsumerGroup::*&)(std::shared_ptr<doris::DataConsumer>, doris::BlockingQueue<RdKafka::Message*>*, long, std::function<void (doris::Status const&)>), doris::KafkaDataConsumerGroup*&, std::shared_ptr<doris::DataConsumer>&, doris::BlockingQueue<RdKafka::Message*>*&, long&, doris::KafkaDataConsumerGroup::start_all(std::shared_ptr<doris::StreamLoadContext>, std::shared_ptr<doris::io::KafkaConsumerPipe>)::$_0&>(std::__invoke_memfun_deref, void (doris::KafkaDataConsumerGroup::*&)(std::shared_ptr<doris::DataConsumer>, doris::BlockingQueue<RdKafka::Message*>*, long, std::function<void (doris::Status const&)>), doris::KafkaDataConsumerGroup*&, std::shared_ptr<doris::DataConsumer>&, doris::BlockingQueue<RdKafka::Message*>*&, long&, doris::KafkaDataConsumerGroup::start_all(std::shared_ptr<doris::StreamLoadContext>, std::shared_ptr<doris::io::KafkaConsumerPipe>)::$_0&) at /mnt/disk1/laihui/build/ldb_toolchain/bin/../lib/gcc/x86_64-linux-gnu/11/../../../../include/c++/11/bits/invoke.h:74
...
```

now
```
ErrorReason{code=errCode = 105, msg='be 10002 abort task, task id: 3ba0c0f4-d13c-4dfa-90ce-3df922fd9340 job id: 11310 with reason: [INTERNAL_ERROR]Offset out of range, consume partition 0, consume offset 100, the offset used by job does not exist in kafka, please check the offset, using the Alter ROUTINE LOAD command to modify it, and resume the job'}
```

## Proposed changes

Issue Number: close #xxx

<!--Describe your changes.-->
2024-07-07 18:29:04 +08:00
89857d3780 [cherry-pick](branch-2.1) Pick "Use async group commit rpc call (#36499)" (#37380)
## Proposed changes

Issue Number: close #xxx

<!--Describe your changes.-->

Pick #36499
2024-07-07 18:28:19 +08:00
7d423b3a6a [chery-pick](branch-2.1) Pick "[Fix](group commit) Fix group commit block queue mem estimate fault" (#37379)
Pick [Fix](group commit) Fix group commit block queue mem estimate faule
#35314

## Proposed changes

Issue Number: close #xxx

<!--Describe your changes.-->

**Problem:** When `group commit=async_mode` and NULL data is imported
into a `variant` type column, it causes incorrect memory statistics for
group commit backpressure, leading to a stuck issue. **Cause:** In group
commit mode, blocks are first added to a queue in batches using `add
block`, and then blocks are retrieved from the queue using `get block`.
To track memory usage during backpressure, we add the block size to the
memory statistics during `add block` and subtract the block size from
the memory statistics during `get block`. However, for `variant` types,
during the `add block` write to WAL, serialization occurs, which can
merge types (e.g., merging `int` and `bigint` into `bigint`), thereby
changing the block size. This results in a discrepancy between the block
size during `get block` and `add block`, causing memory statistics to
overflow.
**Solution:** Record the block size at the time of `add block` and use
this recorded size during `get block` instead of the actual block size.
This ensures consistency in the memory addition and subtraction.

## Further comments

If this is a relatively large or complex change, kick off the discussion
at [dev@doris.apache.org](mailto:dev@doris.apache.org) by explaining why
you chose the solution you did and what alternatives you considered,
etc...

## Proposed changes

Issue Number: close #xxx

<!--Describe your changes.-->
2024-07-07 18:27:49 +08:00
32529ecda2 [cherry-pick](branch-2.1) Pick "[Enhancement](partial update) Add partial update mix cases (#37113)" (#37384)
#37113
2024-07-07 18:26:46 +08:00
61bc624938 [branch-2.1](move-memtable) fix move memtable core when use multi table load (#37370)
## Proposed changes

pick https://github.com/apache/doris/pull/35458
2024-07-07 18:25:00 +08:00
f2693152bb [fix](multi-table-load) fix be core when multi table load pipe finish fail (#37383)
pick #36269
2024-07-07 18:24:16 +08:00
d08a418dd8 [branch-2.1](routine-load) optimize routine load job auto resume policy (#37373)
pick #35266
2024-07-07 18:16:56 +08:00
af960f7c70 [branch-2.1](routine-load) dealing with the high watermark of Kafka may fallback (#37372)
pick #35901
2024-07-07 18:15:54 +08:00
960b02f293 [branch-2.1](routine-load) add retry when get Kafka meta info (#37371)
pick #35376
2024-07-07 18:14:38 +08:00
c399a0e216 [opt](inverted index) reduce generation of the rowid_result if not necessary #35357 (#36569) 2024-07-06 21:33:03 +08:00
38b3870fe8 [branch-2.1] Picks "[fix](autoinc) Fix AutoIncrementGenerator and add more logs about auto-increment column #37306" (#37366)
## Proposed changes

picks https://github.com/apache/doris/pull/37306
2024-07-06 16:53:29 +08:00
ef59af8df0 [branch-2.1] Picks "[fix](regression) Fix p0 case test_modify_reorder_column #37256" (#37332)
## Proposed changes

picks https://github.com/apache/doris/pull/37256
2024-07-05 22:20:14 +08:00
7ce4a42d28 [fix](fe) Fix the sql of AddPartitionRecord (#37341)
Cherry-pick #37295

The range field is accidentally compared to DUMMY_ITEM.

It was introduced by #35461.
2024-07-05 22:05:12 +08:00
5de6aa74c0 [branch-2.1] Picks "[opt](autoinc) Forbid some schema change when the table has auto-increment column #37186" (#37331)
## Proposed changes

picks https://github.com/apache/doris/pull/37186
2024-07-05 21:59:30 +08:00
8a0d05d9b0 [opt](mtmv) Materialized view partition track supports date_trunc and optimize the fail reason (#35562) (#36947)
cherry pick from master #35562
commitId: 43d0f191
2024-07-05 15:12:43 +08:00
a803e1493a [pipeline](fix) Set upstream operators always runnable once source op… (#37325)
…erator closed (#37297)

Some kinds of source operators has a 1-1 relationship with a sink
operator (such as AnalyticOperator). We must ensure AnalyticSinkOperator
will not be blocked if AnalyticSourceOperator already closed.

pick #37297
2024-07-05 13:54:34 +08:00
f8cee439b6 [feature](ES Catalog) map nested/object type in ES to JSON type in Doris (#37101) (#37182)
backport #37101
2024-07-05 10:48:32 +08:00
256221a574 [fix](Nereids) normalize aggregate should not push down lambda's param (#37109) (#37285)
pick from master #37109

ArrayItemSlot should not be inputSlot
2024-07-05 09:33:57 +08:00
8373610281 [opt](ctas) add a variable to control varchar length in ctas (#37069) (#37284)
pick from master #37069

add a new session variable: use_max_length_of_varchar_in_ctas

In CTAS (Create Table As Select), if CHAR/VARCHAR columns do not
originate from the source table, whether to set the length of such a
column to MAX, which is 65533. The default is true.
2024-07-04 22:09:41 +08:00
4e4f3d204e [feat](Nereids) push down predicates with multi columns through LogicalWindow and LogicalPartitionTopN (#36828) (#36981)
cherry-pick #36828 to branch-2.1

The requirement for predicate pushdown through the window operator is
that the partition by slots of the window contains all slots in the
predicate. The original implementation of doris only allows predicate
pushdown with one slot. This PR relaxes this restriction and allows for
predicate pushdown with multiple slots. The same applies to the
predicate pushdown of the LogicalPartitionTopN operator. The following
sql is an example.

select
	*
from
	(
	select
		row_number() over(partition by id, value1 order by value1) as num,
		id,
		value1
	from
		push_down_multi_column_predicate_through_window_t ) t
where
	abs(id + value1)<4
	and num <= 2;


Co-authored-by: feiniaofeiafei <moailing@selectdb.com>
2024-07-04 21:00:08 +08:00
c8978fc9d1 [fix](HadoopLz4BlockCompression)Fixed the bug that HadoopLz4BlockCompression creates _decompressor every time it decompresses.(#37187) (#37299)
bp : #37187
2024-07-04 20:22:27 +08:00
55636e8035 [test](migrate) move 3 cases from p2 to p0 (#36957) (#37264)
bp #36957

Co-authored-by: slothever <18522955+wsjz@users.noreply.github.com>
2024-07-04 20:09:59 +08:00
Pxl
e2c2702dff [Bug](runtime-filter) fix some rf error problems (#37155)
## Proposed changes
pick from #37273
2024-07-04 20:03:46 +08:00
6ec0476412 [chore](Nereids) opt part not exists error msg in bind relation (#36792)(#37160) (#37280)
pick from master #36792 #37160

print table name when partition not exists in bind relation
2024-07-04 19:19:36 +08:00
c7ad1f3d21 [fix](Nereids) simplify window expression should inherit data type (#37061) (#37283)
pick from master #37061

after window expression rewritten by literal.
literal's data type should same with original window expression.
2024-07-04 19:19:05 +08:00
e4fb506c20 [fix](Nereids) null type in result set will be cast to tinyint (#37019) (#37281)
pick from master #37019
2024-07-04 19:18:35 +08:00
b272247a57 [pick]log thread num (#37258)
## Proposed changes

pick #37159
2024-07-04 15:27:52 +08:00
ceef9ee123 [feature](serde) support presto compatible output format (#37039) (#37253)
bp #37039
2024-07-04 13:56:05 +08:00
3613413a54 [fix](hive) support find serde info from both tbl properties and serde properties (#37043) (#37188)
bp #37043
2024-07-04 13:55:38 +08:00
5f3e1e44b2 [bugfix]thread pool resource leak for 2.1 #36990 (#37247)
bp: #36990
2024-07-04 11:23:47 +08:00
bf3ea1839c [test]Mv external p2 test case to p0. (#37070) (#37140)
backport: https://github.com/apache/doris/pull/37070
2024-07-04 11:19:31 +08:00
fb344b66ca [fix](hash join) fix numeric overflow when calculating hash table bucket size #37193 (#37213)
## Proposed changes

Bp #37193
2024-07-04 11:12:52 +08:00
4532ba990a [fix](pipeline) Avoid to close task twice (#36747) (#37115) 2024-07-04 10:02:56 +08:00
26be313d40 [mv](nereids) mv cost related PRs (#35652 #35701 #35864 #36368 #36789 #34970) (#37097)
## Proposed changes
pick from #35652 #35701 #35864 #36368 #36789 #34970

Issue Number: close #xxx

<!--Describe your changes.-->
2024-07-04 09:42:11 +08:00
077fda4259 [enhance](mtmv)show create materialized view (#36188) (#37125)
pick: https://github.com/apache/doris/pull/36188
2024-07-03 22:48:43 +08:00
69aebc2d25 [branch-2.1] Picks "[Fix](schema change) Fix can't do reorder column schema change for MOW table and duplicate key table #37067" (#37226)
## Proposed changes

picks https://github.com/apache/doris/pull/37067
2024-07-03 22:42:51 +08:00
Pxl
70e1c563b3 [Chore](runtime-filter) enlarge sync filter size rpc timeout limit (#37103) (#37225)
pick from #37103
2024-07-03 21:02:26 +08:00
Pxl
ffc57c9ef4 [Bug](runtime-filter) fix brpc ctrl use after free (#37223)
part of #35186
2024-07-03 21:01:50 +08:00
97945af947 [fix](merge-on-write) when full clone failed, duplicate key might occur (#37001) (#37229)
cherry-pick #37001
2024-07-03 19:48:10 +08:00
a9f9113c48 [branch-2.1][test](external)move hive cases from p2 to p0 (#37149)
pk (#36855)
test_hive_same_db_table_name
test_hive_special_char_partition
test_complex_types
test_wide_table
2024-07-03 19:44:52 +08:00
84f5bb73da [refactor](nereids) refactor analyze view (#37106) (#37163)
The Analyzer of NereidsPlanner use different rules to analyze normal plan and view, to prevent the plans in the views analyze multiple times, because some rules can not apply multiple times, say, decimal type coercion, if this rule applied multiple times, it will generate a wrong result.

But this design is trick. Normally, after process the LogicalView, the whole plan tree in the LogicalView should not contains unbound plan, but the current situation is not like this. And this problem block development of some rules, so I refactor it:
1. the Analyzer will not traverse the children of the LogicalView
2. After link the LogicalView to the outer plan tree, the whole plan tree of the LogicalView will not contains unbound plan
3. analyze view and table use the same rules, keep it simple
2024-07-03 19:09:49 +08:00
45fc1c7182 [opt](hive) save hive table schema in transaction for 2.1 (#37127)
## Proposed changes

pick #37008
2024-07-03 17:32:58 +08:00
e5695e058f [test](migrate) move 2 cases from p2 to p0 (#36935) (#37200)
bp #36935

Co-authored-by: zhangdong <493738387@qq.com>
2024-07-03 17:29:01 +08:00
b3f2bd20e3 [feat](nereids) support explain delete from clause #36782 (#37100)
## Proposed changes
pick from  #36782

support explain like:
explain delete from T where A=1

Issue Number: close #xxx

<!--Describe your changes.-->

(cherry picked from commit dc369cd13096dbb90700f7fbf8f35a9059d9906f)

## Proposed changes

Issue Number: close #xxx

<!--Describe your changes.-->
2024-07-03 15:08:08 +08:00
5969d6521f [branch-2.1](function) fix nereids fold constant wrong result of abs (#37065) (#37108)
pick https://github.com/apache/doris/pull/37065
2024-07-03 11:58:06 +08:00
fb642d0227 [Fix](hive-writer) Fixed the issue where uncompletedMpuPendingUploads did not remove objects correctly. (#37173)
Backport #36905.
2024-07-03 11:09:46 +08:00
e857680661 [Migrate-Test](multi-catalog) Migrate p2 tests from p2 to p0. (#37175)
Backport #36989.
2024-07-03 11:08:49 +08:00