Files

abmdocrt 7d423b3a6a [chery-pick](branch-2.1) Pick "[Fix](group commit) Fix group commit block queue mem estimate fault" (#37379 )

Pick [Fix](group commit) Fix group commit block queue mem estimate faule
#35314

## Proposed changes

Issue Number: close #xxx

<!--Describe your changes.-->

**Problem:** When `group commit=async_mode` and NULL data is imported
into a `variant` type column, it causes incorrect memory statistics for
group commit backpressure, leading to a stuck issue. **Cause:** In group
commit mode, blocks are first added to a queue in batches using `add
block`, and then blocks are retrieved from the queue using `get block`.
To track memory usage during backpressure, we add the block size to the
memory statistics during `add block` and subtract the block size from
the memory statistics during `get block`. However, for `variant` types,
during the `add block` write to WAL, serialization occurs, which can
merge types (e.g., merging `int` and `bigint` into `bigint`), thereby
changing the block size. This results in a discrepancy between the block
size during `get block` and `add block`, causing memory statistics to
overflow.
**Solution:** Record the block size at the time of `add block` and use
this recorded size during `get block` instead of the actual block size.
This ensures consistency in the memory addition and subtraction.

## Further comments

If this is a relatively large or complex change, kick off the discussion
at [dev@doris.apache.org](mailto:dev@doris.apache.org) by explaining why
you chose the solution you did and what alternatives you considered,
etc...

## Proposed changes

Issue Number: close #xxx

<!--Describe your changes.-->

2024-07-07 18:27:49 +08:00

common

…

conf

[test](ES Catalog) Add test cases for ES 5.x (#34441 ) (#36993 )

2024-06-28 16:58:07 +08:00

ctas_p0

…

data

[cherry-pick](branch-2.1) Pick "[Enhancement](partial update) Add partial update mix cases (#37113 )" (#37384 )

2024-07-07 18:26:46 +08:00

framework

[case](udf) Only one backend, skip scp udf file (#36810 ) (#36964 )

2024-06-28 16:31:30 +08:00

java-udf-src

…

pipeline

[pick21][opt](mow) reduce memory usage for mow table compaction (#36865 ) (#36968 )

2024-07-01 15:33:18 +08:00

plugins

[fix](httpapi) restore compaction/run_status api can show be's overall compaction status and refactor code (#35409 )

2024-05-28 09:43:43 +08:00

script

…

ssl_default_certificate

…

suites

[chery-pick](branch-2.1) Pick "[Fix](group commit) Fix group commit block queue mem estimate fault" (#37379 )

2024-07-07 18:27:49 +08:00

certificate.p12

…

README.md

[chore](case) update regression-test README #29031

2024-01-12 11:46:29 +08:00

README.md

新加case注意事项

变量名前要写 def，否则是全局变量，并行跑的 case 的时候可能被其他 case 影响。

Problematic code:
```
ret = ***
```
Correct code:
```
def ret = ***
```
尽量不要在 case 中 global 的设置 session variable，或者修改集群配置，可能会影响其他 case。

Problematic code:
```
sql """set global enable_pipeline_x_engine=true;"""
```
Correct code:
```
sql """set enable_pipeline_x_engine=true;"""
```
如果必须要设置 global，或者要改集群配置，可以指定 case 以 nonConcurrent 的方式运行。

示例
case 中涉及时间相关的，最好固定时间，不要用类似 now() 函数这种动态值，避免过一段时间后 case 就跑不过了。

Problematic code:
```
sql """select count(*) from table where created < now();"""
```
Correct code:
```
sql """select count(*) from table where created < '2023-11-13';"""
```

case 中 streamload 后请加上 sync 一下，避免在多 FE 环境中执行不稳定。

Problematic code:

streamLoad { ... }
sql """select count(*) from table """

Correct code:

streamLoad { ... }
sql """sync"""
sql """select count(*) from table """