Commit Graph

212 Commits

Author SHA1 Message Date
0e63133c80 [Chore](job) Provides configuration of job execution queue size (#42253) (#42530)
When dealing with a large number of tasks, the default execution queue
size is 1024. This can lead to tasks being dropped if the queue becomes
full.
eg

`dispatch instant task failed, job id is xxx`

To address this, you can add the parameters `insert_task_queue_size` and
`mtmv_task_queue_size` in the `fe.conf` configuration file. These
parameters must be set to a power of 2.

**Keep in mind, increasing this value is recommended only when thread
resources are limited; otherwise, you should consider increasing the
number of task execution threads.**

(cherry picked from commit f9ea8f8229e9f5514c1773bd25c3cc11985c63fb)

## Proposed changes

Issue Number: #42253

<!--Describe your changes.-->
2024-10-28 13:42:08 +08:00
2bb29d7f36 [fix](test) fix move memtable injection test may cause other case stuck #40356 (#42508)
cherry pick from #40356

---------

Co-authored-by: hui lai <1353307710@qq.com>
2024-10-26 23:11:45 +08:00
e7229c77c8 [fix](config) increase JVM memory of BE #42052 (#42194)
bp #42052
2024-10-21 20:14:43 +08:00
e56216211e [pick](branch-2.1) pick #40667 #40714 (#41905)
pick
#40667
#40714

---------

Co-authored-by: wangbo <wangbo@apache.org>
2024-10-16 14:09:03 +08:00
c744eb87c5 [fix](regression)fix some regression test (#40928) (#41046)
bp #40928
2024-09-20 18:17:44 +08:00
5f583fa329 [branch-2.1][test](jdbc catalog) add oceanbase ce jdbc catalog test (#40978)
pick #34972)
2024-09-19 22:11:24 +08:00
7155711431 [cherry-pick](branch-2.1) Improve local shuffle strategy (#40030)
pick #34122 #35454 #35716 #37195
2024-08-29 14:16:16 +08:00
508c7a7040 [fix](hive)Modify the Hive notification event processing method when using meta cache and add parameters to the Hive catalog. (#39239) (#39865)
bp #39239

Co-authored-by: daidai <2017501503@qq.com>
2024-08-23 23:21:02 +08:00
e716658fba [branch-2.1](arrow-flight-sql) Fix exceed user property max connection cause Reach limit of connections (#39836)
pick #39127
pick #39802
2024-08-23 17:27:34 +08:00
baf5b71b39 [branch-2.1](memory) Modify thedefault JEMALLOC_CONF and support flush Jemalloc tcache (#39829)
pick #38185
2024-08-23 17:21:42 +08:00
30e2c3fb11 [feat](lock)add deadlock detection tool and monitored lock implementations #39015 (#39099)
## Proposed changes
#39015
### Description:

This issue proposes the addition of new features to the project,
including a deadlock detection tool and monitored lock implementations.
These features will help in identifying and debugging potential
deadlocks and monitoring lock usage. Features:


#### AbstractMonitoredLock:

A monitored version of Lock that tracks and logs lock acquisition and
release times. Functionality:
Overrides lock(), unlock(), tryLock(), and tryLock(long timeout,
TimeUnit unit) methods. Logs information about lock acquisition time,
release time, and any failure to acquire the lock within the specified
timeout. ##### eg
```log
2024-08-07 12:02:59  [ Thread-2:2006 ] - [ WARN ]  Thread ID: 12, Thread Name: Thread-2 - Lock held for 1912 ms, exceeding hold timeout of 1000 ms 
Thread stack trace:
	at java.lang.Thread.getStackTrace(Thread.java:1564)
	at org.example.lock.AbstractMonitoredLock.afterUnlock(AbstractMonitoredLock.java:49)
	at org.example.lock.MonitoredReentrantLock.unlock(MonitoredReentrantLock.java:32)
	at org.example.ExampleService.timeout(ExampleService.java:17)
	at org.example.Main.lambda$test2$1(Main.java:39)
	at java.lang.Thread.run(Thread.java:750)
```












#### DeadlockCheckerTool:

Uses ScheduledExecutorService for periodic deadlock checks. Logs
deadlock information including thread names, states, lock info, and
stack traces.

**ThreadMXBean accesses thread information in the local JVM, which is
already in memory, so accessing it is less expensive than fetching data
from external resources such as disk or network. Thread state cache: The
JVM typically maintains a cache of thread states, reducing the need for
real-time calculations or additional data processing.** ##### eg
```log
Thread Name: Thread-0
Thread State: WAITING
Lock Name: java.util.concurrent.locks.ReentrantLock$NonfairSync@1d653213
Lock Owner Name: Thread-1
Lock Owner Id: 12
Waited Time: -1
Blocked Time: -1
Lock Info: java.util.concurrent.locks.ReentrantLock$NonfairSync@1d653213
Blocked by: java.util.concurrent.locks.ReentrantLock$NonfairSync@1d653213
Stack Trace: 
	at sun.misc.Unsafe.park(Native Method)
	at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
	at java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:836)
	at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireQueued(AbstractQueuedSynchronizer.java:870)
	at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquire(AbstractQueuedSynchronizer.java:1199)
	at java.util.concurrent.locks.ReentrantLock$NonfairSync.lock(ReentrantLock.java:209)
	at java.util.concurrent.locks.ReentrantLock.lock(ReentrantLock.java:285)
	at org.example.lock.MonitoredReentrantLock.lock(MonitoredReentrantLock.java:22)
	at org.example.Main.lambda$testDeadLock$3(Main.java:79)
	at org.example.Main$$Lambda$1/1221555852.run(Unknown Source)
	at java.lang.Thread.run(Thread.java:750)


2024-08-07 14:11:28  [ pool-1-thread-1:2001 ] - [ WARN ]  Deadlocks detected:
Thread Name: Thread-1
Thread State: WAITING
Lock Name: java.util.concurrent.locks.ReentrantLock$NonfairSync@13a2dfcf
Lock Owner Name: Thread-0
Lock Owner Id: 11
Waited Time: -1
Blocked Time: -1
Lock Info: java.util.concurrent.locks.ReentrantLock$NonfairSync@13a2dfcf
Blocked by: java.util.concurrent.locks.ReentrantLock$NonfairSync@13a2dfcf
Stack Trace: 
	at sun.misc.Unsafe.park(Native Method)
	at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
	at java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:836)
	at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireQueued(AbstractQueuedSynchronizer.java:870)
	at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquire(AbstractQueuedSynchronizer.java:1199)
	at java.util.concurrent.locks.ReentrantLock$NonfairSync.lock(ReentrantLock.java:209)
	at java.util.concurrent.locks.ReentrantLock.lock(ReentrantLock.java:285)
	at org.example.lock.MonitoredReentrantLock.lock(MonitoredReentrantLock.java:22)
	at org.example.Main.lambda$testDeadLock$4(Main.java:93)
	at org.example.Main$$Lambda$2/1556956098.run(Unknown Source)
	at java.lang.Thread.run(Thread.java:750)


```
##### benchmark
```
    @Warmup(iterations = 5, time = 1, timeUnit = TimeUnit.SECONDS)
    @Measurement(iterations = 2, time = 2, timeUnit = TimeUnit.SECONDS)
    @Threads(1)

Benchmark                                                          Mode  Cnt       Score   Error   Units
LockBenchmark.testMonitoredLock                                   thrpt    2   15889.407          ops/ms
LockBenchmark.testMonitoredLock:·gc.alloc.rate                    thrpt    2     678.061          MB/sec
LockBenchmark.testMonitoredLock:·gc.alloc.rate.norm               thrpt    2      56.000            B/op
LockBenchmark.testMonitoredLock:·gc.churn.PS_Eden_Space           thrpt    2     668.249          MB/sec
LockBenchmark.testMonitoredLock:·gc.churn.PS_Eden_Space.norm      thrpt    2      55.080            B/op
LockBenchmark.testMonitoredLock:·gc.churn.PS_Survivor_Space       thrpt    2       0.075          MB/sec
LockBenchmark.testMonitoredLock:·gc.churn.PS_Survivor_Space.norm  thrpt    2       0.006            B/op
LockBenchmark.testMonitoredLock:·gc.count                         thrpt    2      20.000          counts
LockBenchmark.testMonitoredLock:·gc.time                          thrpt    2       6.000              ms
LockBenchmark.testNativeLock                                      thrpt    2  103130.635          ops/ms
LockBenchmark.testNativeLock:·gc.alloc.rate                       thrpt    2      ≈ 10⁻⁴          MB/sec
LockBenchmark.testNativeLock:·gc.alloc.rate.norm                  thrpt    2      ≈ 10⁻⁶            B/op
LockBenchmark.testNativeLock:·gc.count                            thrpt    2         ≈ 0          counts

    @Warmup(iterations = 5, time = 1, timeUnit = TimeUnit.SECONDS)
    @Measurement(iterations = 2, time = 2, timeUnit = TimeUnit.SECONDS)
    @Threads(100)

Benchmark                                                          Mode  Cnt       Score   Error   Units
LockBenchmark.testMonitoredLock                                   thrpt    2   10994.606          ops/ms
LockBenchmark.testMonitoredLock:·gc.alloc.rate                    thrpt    2     488.508          MB/sec
LockBenchmark.testMonitoredLock:·gc.alloc.rate.norm               thrpt    2      56.002            B/op
LockBenchmark.testMonitoredLock:·gc.churn.PS_Eden_Space           thrpt    2     481.390          MB/sec
LockBenchmark.testMonitoredLock:·gc.churn.PS_Eden_Space.norm      thrpt    2      55.163            B/op
LockBenchmark.testMonitoredLock:·gc.churn.PS_Survivor_Space       thrpt    2       0.020          MB/sec
LockBenchmark.testMonitoredLock:·gc.churn.PS_Survivor_Space.norm  thrpt    2       0.002            B/op
LockBenchmark.testMonitoredLock:·gc.count                         thrpt    2      18.000          counts
LockBenchmark.testMonitoredLock:·gc.time                          thrpt    2       9.000              ms
LockBenchmark.testNativeLock                                      thrpt    2  558652.036          ops/ms
LockBenchmark.testNativeLock:·gc.alloc.rate                       thrpt    2       0.016          MB/sec
LockBenchmark.testNativeLock:·gc.alloc.rate.norm                  thrpt    2      ≈ 10⁻⁴            B/op
LockBenchmark.testNativeLock:·gc.count                            thrpt    2         ≈ 0          counts
```

## Proposed changes

Issue Number: close #xxx

<!--Describe your changes.-->
2024-08-08 21:15:49 +08:00
54772dc3d8 [fix](case) adjust ak sk for multi cloud test case (#38749) (#39070)
pick from master #38749

Co-authored-by: stephen <hello-stephen@qq.com>
2024-08-08 16:38:55 +08:00
eef8c87fb5 [chore](test) disable fault injection to make pipeline task check happy (#38665) (#38821)
pick (#38665)

test_delta_writer_v2_back_pressure_fault_injection would make pipeline
task can not finish, disable it temporarily to make pipeline task check
happy.
2024-08-04 11:18:56 +08:00
d17b196459 [regression](s3) add default conf for s3 releated cases (#37952) (#38472)
replace COS with OSS in the TeamCity pipeline to improve stability

pick from master #37952

---------

## Proposed changes

Issue Number: close #xxx

<!--Describe your changes.-->
2024-07-29 18:01:27 +08:00
1f779ba9de [branch-2.1](arrow-flight-sql) Open regression-test/pipeline/p0/arrow_flight_sql (#37727)
pick #36854
2024-07-16 16:23:43 +08:00
16de141743 [regression](kerberos)add hive kerberos docker regression env (#37657)
## Proposed changes
pick:
[regression](kerberos)fix regression pipeline env when write hosts 
(#37057)
[regression](kerberos)add hive kerberos docker regression env (#36430)
2024-07-15 09:35:39 +08:00
56a207c3f0 [case](paimon/iceberg)move cases from p2 to p0 (#37276) (#37738)
bp #37276

Co-authored-by: wuwenchi <wuwenchihdu@hotmail.com>
2024-07-13 10:01:05 +08:00
798d9d6fc6 [pick21][opt](mow) reduce memory usage for mow table compaction (#36865) (#36968)
cherry-pick https://github.com/apache/doris/pull/36865 to branch-2.1
2024-07-01 15:33:18 +08:00
4dcceaefea [test](ES Catalog) Add test cases for ES 5.x (#34441) (#36993)
backport #34441
2024-06-28 16:58:07 +08:00
73eda9bdb7 [fix](ci) external pipeline use regression-test/pipeline/external/conf/be.conf (#36139)
external pipeline use regression-test/pipeline/external/conf/be.conf instead of regression-test/pipeline/p0/conf/be.conf
relate to master #36132
Co-authored-by: stephen <hello-stephen@qq.com>
2024-06-12 11:40:16 +08:00
d1318a7d08 [branch-2.1](jvm) disable BE's jvm metrics (#36009)
disable BE's jvm metrics on external p0, because there is some issue
with ASAN when BE exit.
2024-06-07 15:28:02 +08:00
1d5b7cb559 [fix](branch-2.1)(jdbc catalog) fix mariadb test conf port (#35982) 2024-06-06 16:54:20 +08:00
cd808c3ea0 [fix](mtmv) Fix that the storage medium specified for the mtmv is SSD, but the partition storage medium for the mtmv is still HDD (#35644) (#35955)
pick from master:#35644
2024-06-06 15:36:49 +08:00
e755d64e62 [feature](be jvm monitor)append enable_jvm_monitor in be.conf to control jvm monitor. (#35608) (#35764)
bp #35608

Co-authored-by: daidai <2017501503@qq.com>
2024-06-02 00:18:44 +08:00
e3b4d4e630 Reset workload_group_max_num for regression test (#35430) 2024-05-27 14:10:25 +08:00
20e2d2e2f8 [Fix](executor)Fix workload thread start failed when follower convert to master 2024-05-12 09:30:14 +08:00
45556686ea [fix](test) fix some external test cases (#34209)
Fix some test cases and enable `test_information_schema_external` suite
2024-04-27 23:25:33 +08:00
50f9d47e96 [test](hive) run suite cases both in hive2 and hive3 (#33874) (#34156)
bp #33874

Co-authored-by: 苏小刚 <suxiaogang223@icloud.com>
2024-04-26 13:48:09 +08:00
47ded2c6a0 Revert "[fix](compile) fix two compile errors on MacOS (#33834) (#34005)"
This reverts commit 743fb62a2c42cc5cc662583c235f7336d5e6ddef.
2024-04-26 00:55:21 +08:00
743fb62a2c [fix](compile) fix two compile errors on MacOS (#33834) (#34005) 2024-04-25 19:39:35 +08:00
716c146750 [fix](insert)fix hive external return msgs and exception and pass all columns to BE (#32824)
[fix](insert)fix hive external return msgs and exception and pass all columns to BE
2024-04-12 10:23:52 +08:00
f3a6132214 [chore] Format regression-conf.groovy (#32713) 2024-04-12 10:21:47 +08:00
3ee14a80ab [chore](ci) adjust ckb expect result (#32856)
Co-authored-by: stephen <hello-stephen@qq.com>
2024-04-10 11:34:30 +08:00
7b94cfdba1 Revert "[Fix](tests) add regression tests for trino-connector (#32552)"
This reverts commit 3fc3a4650681cb519405730899a2f22f268b38c1.
2024-03-25 22:38:21 +08:00
3fc3a46506 [Fix](tests) add regression tests for trino-connector (#32552) 2024-03-25 22:31:55 +08:00
0e493add69 [regression-test](case) forbid test_stream_stub_fault_injection (#32540) 2024-03-21 14:07:49 +08:00
ea8d4f2d0b [fix][regression]update ccr test project (#32445) 2024-03-21 14:07:24 +08:00
194f3432ab [Improvement](executor)Routine load support workload group #31671 2024-03-12 14:20:18 +08:00
1d094a46ec [regression-test](pipeline) remove sys_log_verbose_modules in pipeline #32015 2024-03-09 19:55:47 +08:00
794d9405de [ci](jdk17) adjust fe.conf (#31683) 2024-03-02 01:08:51 +08:00
07224686ef [feature](jdbc catalog) support db2 jdbc catalog (#31627) 2024-03-01 14:19:28 +08:00
1316ee4942 Add p1 debug log (#31560) 2024-02-29 12:38:03 +08:00
cffe79feba open workload group for broker load regression test (#30797) 2024-02-05 22:00:26 +08:00
d32292b292 [regression-test][conf] add master_sync_policy = WRITE_NO_SYNC replica_sync_policy = WRITE_NO_SYNC (#30494)
There is no power off scene in regression-test, so add these two configure has no side-effect.
2024-02-04 22:21:16 +08:00
2ca911fb5d [revert](move-memtable) Revert enable brpc debug log in regression pipelines (#30389) (#30611)
This reverts commit 4bf47e229f930714572d8f91d6f9e94b4608bd20.
2024-02-02 13:31:47 +08:00
f0eeb45355 [chore](ci) trigger a must success pipeline (#30711)
Co-authored-by: stephen <hello-stephen@qq.com>
2024-02-01 23:14:14 +08:00
ea427e8c51 [fix](JDK17) It will report an exception whenwe start BE with JDK17 and query AVRO table : InaccessibleObjectException (#30541)
* [fix](JDK17) It will report an exception whenwe start BE with JDK17 and query AVRO  table : InaccessibleObjectException (#30003)
2024-01-30 15:33:40 +08:00
904182685b [debug](move-memtable) enable brpc debug log in regression pipelines (#30389) 2024-01-27 09:10:41 +08:00
1aa006c80f [fix](ci) add single quote to the value of the session variables when setting it (#30295)
Co-authored-by: stephen <hello-stephen@qq.com>
2024-01-25 13:24:09 +08:00
8c6e5202d4 [chore](ci) remove some unused code (#30253)
Co-authored-by: stephen <hello-stephen@qq.com>
2024-01-24 09:58:31 +08:00