Commit Graph

517 Commits

Author SHA1 Message Date
c8fb934bc1 branch-2.1: [chore](config) disable restore_reset_index_id by default #46104 (#46127)
Cherry-picked from #46104

Co-authored-by: walter <maochuan@selectdb.com>
2024-12-30 11:52:58 +08:00
fca9c91193 [fix](restore) Add restore_reset_index_id config #45283 (#45574)
cherry pick from #45283
2024-12-18 22:45:53 +08:00
5517045598 branch-2.1: [chore](checkpoint) add enable_checkpoint config #45301 (#45328)
Cherry-picked from #45301

Co-authored-by: walter <maochuan@selectdb.com>
2024-12-17 20:11:51 +08:00
b6a1803127 [improve](backup) Add config ignore_backup_tmp_partitions #45240 (#45331)
cherry pick from #45240
2024-12-17 14:31:06 +08:00
36695e871a [feature](statistics)Support auto analyze columns that haven't been analyzed for a long time. #42399 (#45250)
backport: https://github.com/apache/doris/pull/42399
2024-12-12 01:57:44 +08:00
3714063975 branch-2.1: [feat](catalog)Replace HadoopUGI with HadoopKerberosAuthenticator to Support Kerberos Ticket Auto-Renewal #44916 (#45138)
Cherry-picked from #44916

Co-authored-by: Calvin Kirs <guoqiang@selectdb.com>
2024-12-06 22:13:31 -08:00
68e6cbf033 branch-2.1: [feat](backup) Add config backup_handler_update_interval_millis #44628 (#44640)
Cherry-picked from #44628

Co-authored-by: walter <maochuan@selectdb.com>
2024-11-28 18:43:48 +08:00
dd4708af47 branch-2.1: [fix](backup) Automatic adapt upload/download snapshot batch size #44560 (#44641)
Cherry-picked from #44560

Co-authored-by: walter <maochuan@selectdb.com>
2024-11-28 13:43:33 +08:00
28d7e9f357 branch-2.1: [fix](config) fe config sync_image_timeout_second should not be masterOnly #43954 (#44384)
Cherry-picked from #43954

Co-authored-by: nsn_huang <38585669+nsnhuang@users.noreply.github.com>
Co-authored-by: huangwenbo04 <huangwenbo04@meituan.com>
2024-11-22 23:14:45 +08:00
7d123edcf8 [fix](filesystem)Use simple authentication directly in S3FileSystem for 2.1 (#43636) (#44238)
bp: #43636
2024-11-22 11:45:56 +08:00
8da1e8c084 branch-2.1: [feat](catalog)Support Pre-Execution Authentication for HMS Type Iceberg Catalog Operations. #43445 (#44129)
Cherry-picked from #43445

Co-authored-by: Calvin Kirs <guoqiang@selectdb.com>
2024-11-18 14:28:25 +08:00
97ca90075c [chore](agent) log the binary message size of agent tasks #43239 (#43598)
cherry pick from #43239
2024-11-11 19:39:13 +08:00
14d511fe3a [feat](restore) Support compressed snapshot meta and job info #43516 (#43569)
cherry pick from #43516
2024-11-11 19:29:27 +08:00
80fd76677e branch-2.1: [Improvement](LDAP Auth)Enhance LDAP authentication with a configurable group filter (#43293)
Cherry-picked from #42038

Co-authored-by: nsivarajan <117266407+nsivarajan@users.noreply.github.com>
Co-authored-by: Sivarajan Narayanan <narayanan_sivarajan@apple.com>
2024-11-10 10:06:13 +08:00
fba06b33b9 [cherry-pick](branch-2.1)add SessionVariable for enableCooldownReplicaAffinity (#42675)
pick from master:https://github.com/apache/doris/pull/41741
2024-11-10 00:46:26 +08:00
2ba88ed2a8 [improve](report) split agent batch tasks automaticlly #43257 (#43365)
cherry pick from #43257
2024-11-08 18:59:53 +08:00
31480d11d7 [improve](task) Support splitting agent batch tasks automatically #42703 (#43483)
cherry pick from #42703
2024-11-08 15:51:04 +08:00
6006907c79 [improve](restore) Compress backup/restore job log size by compress (#42463)
ref #42459
2024-11-08 10:43:14 +08:00
ae88d032db [chore](ddl) support force_enable_feature_binlog #41796 (#42926)
cherry pick from #41796
2024-10-31 09:53:45 +08:00
85674814eb [fix](query-forward) Fix forward query exception or stuck or potential query result loss (#41303) (#42369)
## Proposed changes

1. Fix forward query exception if no status code is set in master
execution. EOF may result in this status.

2. Fix forward query stuck due to no result packet sent to mysql
channel. Should use result packets from master.

3. Fix potential forward query result loss if follower can read status
change during query process. Should judge by the status once before
execution.

4. Add assertion for regression test.
2024-10-28 17:39:57 +08:00
0e63133c80 [Chore](job) Provides configuration of job execution queue size (#42253) (#42530)
When dealing with a large number of tasks, the default execution queue
size is 1024. This can lead to tasks being dropped if the queue becomes
full.
eg

`dispatch instant task failed, job id is xxx`

To address this, you can add the parameters `insert_task_queue_size` and
`mtmv_task_queue_size` in the `fe.conf` configuration file. These
parameters must be set to a power of 2.

**Keep in mind, increasing this value is recommended only when thread
resources are limited; otherwise, you should consider increasing the
number of task execution threads.**

(cherry picked from commit f9ea8f8229e9f5514c1773bd25c3cc11985c63fb)

## Proposed changes

Issue Number: #42253

<!--Describe your changes.-->
2024-10-28 13:42:08 +08:00
379e00f421 [improve](group commit) set internal group commit timeout (#41404) (#41688)
pick https://github.com/apache/doris/pull/41404
2024-10-11 17:55:43 +08:00
a0aed77218 [cherry-pick](branch2.1) fix hudi jni scanner (#41566)
pick from https://github.com/apache/doris/pull/41316
2024-10-09 10:31:50 +08:00
d659750fd9 [pick](Serde-2.1) fix variant serde may lost num_rows when subcolumns empty (#41438)
serialization object with empty subcolumns may lost num_rows, so need to
record num_rows and set back num_rows in serdes

backport #38413
2024-09-29 09:45:37 +08:00
9a9226e541 [fix](block_rule) SQL block rule not working after FE restart (#41228) (#41250)
pick: https://github.com/apache/doris/pull/41228
2024-09-28 10:08:59 +08:00
11bad4cbc9 [opt](routine load) optimize routine load timeout logic (#40818) (#41135)
pick #40818

If IO/CPU resources are tight, routine load task is likely to timeout.
The current method is self-adaption backoff
https://github.com/apache/doris/pull/32227, but the problem is it will
do some ineffective work to match proper timeout. For one routine load
task, a better way to handle task is finishing executing instead of
retry when resources are tight. Therefore, this pr increase timeout to
make "task always finish even if it is slow when resources are tight".
2024-09-25 14:14:02 +08:00
e1057ac26d [branch-2.1][fix](metadata)Add FE metadata-related file checks #40546 (#41113)
## Proposed changes

#40546
2024-09-23 17:13:35 +08:00
49dec9f39d [branch-2.1] Picks "[opt](merge-on-write) Reduce the version not continuous logs for merge-on-write table #40946" (#40996)
picks https://github.com/apache/doris/pull/40946
2024-09-19 23:58:05 +08:00
782973ee77 [fix](auth)Http api check auth (#40688) (#40865)
pick: https://github.com/apache/doris/pull/40688
2024-09-15 23:50:54 +08:00
51c8b62d1c [opt](Nereids) fix several insert into related issues (#40467) (#40755)
pick from master #40467

- http_stream TVF should always generate one fragment plan
- http_stream TVF plan should not check root as scan node
- distinguish group_commit TVF with normal insert statement
- index and generate slot should based on type cast base slot
- agg_state could cast from nullable to non-nullable
- colocated and bucket scan range compute should only on scan node
2024-09-13 10:19:56 +08:00
92752b90e7 [feature](metacache) add system table catalog_meta_cache_statistics #40155 (#40210)
bp #40155
2024-09-02 23:23:35 +08:00
93da0ebaf4 [chore](backup) limit the involved tablets in a backup job #39987 (#40080)
cherry pick from #39987
2024-08-29 12:03:14 +08:00
460605ae3c [branch-2.1] pick some prs (#39860)
## Proposed changes

Issue Number: close #xxx

https://github.com/apache/doris/pull/38385 optimize parsing datetime
https://github.com/apache/doris/pull/38978 make stream load failure
message more clear and disable some error's stacktrace by default
https://github.com/apache/doris/pull/39255 fix random function coredump
https://github.com/apache/doris/pull/39324 fix function corr
inconsistency with doc
https://github.com/apache/doris/pull/39449 check auto partitoin nullity
when creating partition
https://github.com/apache/doris/pull/39695 make
DynamicPartitionScheduler immediately know interval's change
https://github.com/apache/doris/pull/39754 Add some partition expr check
on creating table
2024-08-24 17:26:42 +08:00
5ed56770d4 [bugfix](external) Prevent multiple fs from being generated (#39663) (#39870)
bp #39663

Co-authored-by: wuwenchi <wuwenchihdu@hotmail.com>
2024-08-24 14:17:26 +08:00
1c566253a8 [Pick][Improment]Query queued by be memory (#37559) (#39733)
pick #37559
2024-08-22 15:14:47 +08:00
dfd21bd2a0 [fix](fe-log) add position info in async mode #39419 (#39571)
pick part of #39419
2024-08-20 22:01:34 +08:00
0e21dba817 [opt](catalog) modify some meta cache logic (#38506) (#39628)
#38506
2024-08-20 21:57:55 +08:00
021678c7c3 [fix](window_funnel) fix wrong result of window_funnel #38954 (#39270)
## Proposed changes

BP #38954
2024-08-16 09:59:31 +08:00
4e889bbc6d [fix](Nereids) support implicit cast ip types to string (#39318) (#39440)
pick from master #39318
2024-08-16 09:57:02 +08:00
3c535e80dd [fix](compatibility) type toSql should return lowercase string (#38012) (#38517)
pick from master #38012

revert #25951
2024-08-09 11:35:42 +08:00
30e2c3fb11 [feat](lock)add deadlock detection tool and monitored lock implementations #39015 (#39099)
## Proposed changes
#39015
### Description:

This issue proposes the addition of new features to the project,
including a deadlock detection tool and monitored lock implementations.
These features will help in identifying and debugging potential
deadlocks and monitoring lock usage. Features:


#### AbstractMonitoredLock:

A monitored version of Lock that tracks and logs lock acquisition and
release times. Functionality:
Overrides lock(), unlock(), tryLock(), and tryLock(long timeout,
TimeUnit unit) methods. Logs information about lock acquisition time,
release time, and any failure to acquire the lock within the specified
timeout. ##### eg
```log
2024-08-07 12:02:59  [ Thread-2:2006 ] - [ WARN ]  Thread ID: 12, Thread Name: Thread-2 - Lock held for 1912 ms, exceeding hold timeout of 1000 ms 
Thread stack trace:
	at java.lang.Thread.getStackTrace(Thread.java:1564)
	at org.example.lock.AbstractMonitoredLock.afterUnlock(AbstractMonitoredLock.java:49)
	at org.example.lock.MonitoredReentrantLock.unlock(MonitoredReentrantLock.java:32)
	at org.example.ExampleService.timeout(ExampleService.java:17)
	at org.example.Main.lambda$test2$1(Main.java:39)
	at java.lang.Thread.run(Thread.java:750)
```












#### DeadlockCheckerTool:

Uses ScheduledExecutorService for periodic deadlock checks. Logs
deadlock information including thread names, states, lock info, and
stack traces.

**ThreadMXBean accesses thread information in the local JVM, which is
already in memory, so accessing it is less expensive than fetching data
from external resources such as disk or network. Thread state cache: The
JVM typically maintains a cache of thread states, reducing the need for
real-time calculations or additional data processing.** ##### eg
```log
Thread Name: Thread-0
Thread State: WAITING
Lock Name: java.util.concurrent.locks.ReentrantLock$NonfairSync@1d653213
Lock Owner Name: Thread-1
Lock Owner Id: 12
Waited Time: -1
Blocked Time: -1
Lock Info: java.util.concurrent.locks.ReentrantLock$NonfairSync@1d653213
Blocked by: java.util.concurrent.locks.ReentrantLock$NonfairSync@1d653213
Stack Trace: 
	at sun.misc.Unsafe.park(Native Method)
	at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
	at java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:836)
	at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireQueued(AbstractQueuedSynchronizer.java:870)
	at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquire(AbstractQueuedSynchronizer.java:1199)
	at java.util.concurrent.locks.ReentrantLock$NonfairSync.lock(ReentrantLock.java:209)
	at java.util.concurrent.locks.ReentrantLock.lock(ReentrantLock.java:285)
	at org.example.lock.MonitoredReentrantLock.lock(MonitoredReentrantLock.java:22)
	at org.example.Main.lambda$testDeadLock$3(Main.java:79)
	at org.example.Main$$Lambda$1/1221555852.run(Unknown Source)
	at java.lang.Thread.run(Thread.java:750)


2024-08-07 14:11:28  [ pool-1-thread-1:2001 ] - [ WARN ]  Deadlocks detected:
Thread Name: Thread-1
Thread State: WAITING
Lock Name: java.util.concurrent.locks.ReentrantLock$NonfairSync@13a2dfcf
Lock Owner Name: Thread-0
Lock Owner Id: 11
Waited Time: -1
Blocked Time: -1
Lock Info: java.util.concurrent.locks.ReentrantLock$NonfairSync@13a2dfcf
Blocked by: java.util.concurrent.locks.ReentrantLock$NonfairSync@13a2dfcf
Stack Trace: 
	at sun.misc.Unsafe.park(Native Method)
	at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
	at java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:836)
	at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireQueued(AbstractQueuedSynchronizer.java:870)
	at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquire(AbstractQueuedSynchronizer.java:1199)
	at java.util.concurrent.locks.ReentrantLock$NonfairSync.lock(ReentrantLock.java:209)
	at java.util.concurrent.locks.ReentrantLock.lock(ReentrantLock.java:285)
	at org.example.lock.MonitoredReentrantLock.lock(MonitoredReentrantLock.java:22)
	at org.example.Main.lambda$testDeadLock$4(Main.java:93)
	at org.example.Main$$Lambda$2/1556956098.run(Unknown Source)
	at java.lang.Thread.run(Thread.java:750)


```
##### benchmark
```
    @Warmup(iterations = 5, time = 1, timeUnit = TimeUnit.SECONDS)
    @Measurement(iterations = 2, time = 2, timeUnit = TimeUnit.SECONDS)
    @Threads(1)

Benchmark                                                          Mode  Cnt       Score   Error   Units
LockBenchmark.testMonitoredLock                                   thrpt    2   15889.407          ops/ms
LockBenchmark.testMonitoredLock:·gc.alloc.rate                    thrpt    2     678.061          MB/sec
LockBenchmark.testMonitoredLock:·gc.alloc.rate.norm               thrpt    2      56.000            B/op
LockBenchmark.testMonitoredLock:·gc.churn.PS_Eden_Space           thrpt    2     668.249          MB/sec
LockBenchmark.testMonitoredLock:·gc.churn.PS_Eden_Space.norm      thrpt    2      55.080            B/op
LockBenchmark.testMonitoredLock:·gc.churn.PS_Survivor_Space       thrpt    2       0.075          MB/sec
LockBenchmark.testMonitoredLock:·gc.churn.PS_Survivor_Space.norm  thrpt    2       0.006            B/op
LockBenchmark.testMonitoredLock:·gc.count                         thrpt    2      20.000          counts
LockBenchmark.testMonitoredLock:·gc.time                          thrpt    2       6.000              ms
LockBenchmark.testNativeLock                                      thrpt    2  103130.635          ops/ms
LockBenchmark.testNativeLock:·gc.alloc.rate                       thrpt    2      ≈ 10⁻⁴          MB/sec
LockBenchmark.testNativeLock:·gc.alloc.rate.norm                  thrpt    2      ≈ 10⁻⁶            B/op
LockBenchmark.testNativeLock:·gc.count                            thrpt    2         ≈ 0          counts

    @Warmup(iterations = 5, time = 1, timeUnit = TimeUnit.SECONDS)
    @Measurement(iterations = 2, time = 2, timeUnit = TimeUnit.SECONDS)
    @Threads(100)

Benchmark                                                          Mode  Cnt       Score   Error   Units
LockBenchmark.testMonitoredLock                                   thrpt    2   10994.606          ops/ms
LockBenchmark.testMonitoredLock:·gc.alloc.rate                    thrpt    2     488.508          MB/sec
LockBenchmark.testMonitoredLock:·gc.alloc.rate.norm               thrpt    2      56.002            B/op
LockBenchmark.testMonitoredLock:·gc.churn.PS_Eden_Space           thrpt    2     481.390          MB/sec
LockBenchmark.testMonitoredLock:·gc.churn.PS_Eden_Space.norm      thrpt    2      55.163            B/op
LockBenchmark.testMonitoredLock:·gc.churn.PS_Survivor_Space       thrpt    2       0.020          MB/sec
LockBenchmark.testMonitoredLock:·gc.churn.PS_Survivor_Space.norm  thrpt    2       0.002            B/op
LockBenchmark.testMonitoredLock:·gc.count                         thrpt    2      18.000          counts
LockBenchmark.testMonitoredLock:·gc.time                          thrpt    2       9.000              ms
LockBenchmark.testNativeLock                                      thrpt    2  558652.036          ops/ms
LockBenchmark.testNativeLock:·gc.alloc.rate                       thrpt    2       0.016          MB/sec
LockBenchmark.testNativeLock:·gc.alloc.rate.norm                  thrpt    2      ≈ 10⁻⁴            B/op
LockBenchmark.testNativeLock:·gc.count                            thrpt    2         ≈ 0          counts
```

## Proposed changes

Issue Number: close #xxx

<!--Describe your changes.-->
2024-08-08 21:15:49 +08:00
6f37e483f8 [improve](config)del useless creation config for inverted index (#39005)
## Proposed changes
delete useless config : enable_create_inverted_index_for_array
backport: https://github.com/apache/doris/pull/39006
Issue Number: close #xxx

<!--Describe your changes.-->
2024-08-07 17:13:05 +08:00
3b9394a8c7 [improvement](tablet scheduler) Adjust tablet sched priority to help load data succ #38528 (#38884)
cherry pick from #38528
2024-08-06 02:13:47 +08:00
2425730609 [enhance](auth)support cache ranger datamask and row filter (#37723) (#38575)
pick: https://github.com/apache/doris/pull/37723
2024-08-02 14:59:32 +08:00
b0943064e0 [fix](kerberos)fix and refactor ugi login for kerberos and simple authentication (#38607)
pick from  (#37301)
2024-08-01 14:01:32 +08:00
6bd93b119f [pick](cast)Feature cast complexttype2 json (#38632)
## Proposed changes
backport: https://github.com/apache/doris/pull/36548
Issue Number: close #xxx

<!--Describe your changes.-->
2024-08-01 09:18:15 +08:00
Pxl
b4e82d2322 [Improvement](rpc) set grpc channel's keepAliveTime and remove proxy … (#38381)
…on InterruptedExcep… (#37304)

## Proposed changes
1. set grpc channel's keepAliveTime
2. remove proxy on InterruptedException/TimeoutException to avoid
channel unavailable
pick from #37304
## Proposed changes

Issue Number: close #xxx

<!--Describe your changes.-->
2024-07-25 22:11:23 +08:00
e396f853a0 Pick "[enhance](Cooldown) Use config to control whether use cooldown replica for scanning first" (#38322)
## Proposed changes

<!--Describe your changes.-->

Same as master #37492
2024-07-25 12:17:38 +08:00
81a7542cae [pick]Add audit log event queue size limit (#37914)
## Proposed changes
pick #37786
2024-07-16 19:00:22 +08:00
63c2d22513 [cherry-pick](branch-2.1) Pick "[Fix](delete command) Mark delete sign when do delete command in MoW table (#35917)" (#37594)
Pick #35917 and #37151
2024-07-15 18:54:01 +08:00