Commit Graph

471 Commits

Author SHA1 Message Date
1c566253a8 [Pick][Improment]Query queued by be memory (#37559) (#39733)
pick #37559
2024-08-22 15:14:47 +08:00
dfd21bd2a0 [fix](fe-log) add position info in async mode #39419 (#39571)
pick part of #39419
2024-08-20 22:01:34 +08:00
0e21dba817 [opt](catalog) modify some meta cache logic (#38506) (#39628)
#38506
2024-08-20 21:57:55 +08:00
021678c7c3 [fix](window_funnel) fix wrong result of window_funnel #38954 (#39270)
## Proposed changes

BP #38954
2024-08-16 09:59:31 +08:00
4e889bbc6d [fix](Nereids) support implicit cast ip types to string (#39318) (#39440)
pick from master #39318
2024-08-16 09:57:02 +08:00
3c535e80dd [fix](compatibility) type toSql should return lowercase string (#38012) (#38517)
pick from master #38012

revert #25951
2024-08-09 11:35:42 +08:00
30e2c3fb11 [feat](lock)add deadlock detection tool and monitored lock implementations #39015 (#39099)
## Proposed changes
#39015
### Description:

This issue proposes the addition of new features to the project,
including a deadlock detection tool and monitored lock implementations.
These features will help in identifying and debugging potential
deadlocks and monitoring lock usage. Features:


#### AbstractMonitoredLock:

A monitored version of Lock that tracks and logs lock acquisition and
release times. Functionality:
Overrides lock(), unlock(), tryLock(), and tryLock(long timeout,
TimeUnit unit) methods. Logs information about lock acquisition time,
release time, and any failure to acquire the lock within the specified
timeout. ##### eg
```log
2024-08-07 12:02:59  [ Thread-2:2006 ] - [ WARN ]  Thread ID: 12, Thread Name: Thread-2 - Lock held for 1912 ms, exceeding hold timeout of 1000 ms 
Thread stack trace:
	at java.lang.Thread.getStackTrace(Thread.java:1564)
	at org.example.lock.AbstractMonitoredLock.afterUnlock(AbstractMonitoredLock.java:49)
	at org.example.lock.MonitoredReentrantLock.unlock(MonitoredReentrantLock.java:32)
	at org.example.ExampleService.timeout(ExampleService.java:17)
	at org.example.Main.lambda$test2$1(Main.java:39)
	at java.lang.Thread.run(Thread.java:750)
```












#### DeadlockCheckerTool:

Uses ScheduledExecutorService for periodic deadlock checks. Logs
deadlock information including thread names, states, lock info, and
stack traces.

**ThreadMXBean accesses thread information in the local JVM, which is
already in memory, so accessing it is less expensive than fetching data
from external resources such as disk or network. Thread state cache: The
JVM typically maintains a cache of thread states, reducing the need for
real-time calculations or additional data processing.** ##### eg
```log
Thread Name: Thread-0
Thread State: WAITING
Lock Name: java.util.concurrent.locks.ReentrantLock$NonfairSync@1d653213
Lock Owner Name: Thread-1
Lock Owner Id: 12
Waited Time: -1
Blocked Time: -1
Lock Info: java.util.concurrent.locks.ReentrantLock$NonfairSync@1d653213
Blocked by: java.util.concurrent.locks.ReentrantLock$NonfairSync@1d653213
Stack Trace: 
	at sun.misc.Unsafe.park(Native Method)
	at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
	at java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:836)
	at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireQueued(AbstractQueuedSynchronizer.java:870)
	at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquire(AbstractQueuedSynchronizer.java:1199)
	at java.util.concurrent.locks.ReentrantLock$NonfairSync.lock(ReentrantLock.java:209)
	at java.util.concurrent.locks.ReentrantLock.lock(ReentrantLock.java:285)
	at org.example.lock.MonitoredReentrantLock.lock(MonitoredReentrantLock.java:22)
	at org.example.Main.lambda$testDeadLock$3(Main.java:79)
	at org.example.Main$$Lambda$1/1221555852.run(Unknown Source)
	at java.lang.Thread.run(Thread.java:750)


2024-08-07 14:11:28  [ pool-1-thread-1:2001 ] - [ WARN ]  Deadlocks detected:
Thread Name: Thread-1
Thread State: WAITING
Lock Name: java.util.concurrent.locks.ReentrantLock$NonfairSync@13a2dfcf
Lock Owner Name: Thread-0
Lock Owner Id: 11
Waited Time: -1
Blocked Time: -1
Lock Info: java.util.concurrent.locks.ReentrantLock$NonfairSync@13a2dfcf
Blocked by: java.util.concurrent.locks.ReentrantLock$NonfairSync@13a2dfcf
Stack Trace: 
	at sun.misc.Unsafe.park(Native Method)
	at java.util.concurrent.locks.LockSupport.park(LockSupport.java:175)
	at java.util.concurrent.locks.AbstractQueuedSynchronizer.parkAndCheckInterrupt(AbstractQueuedSynchronizer.java:836)
	at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquireQueued(AbstractQueuedSynchronizer.java:870)
	at java.util.concurrent.locks.AbstractQueuedSynchronizer.acquire(AbstractQueuedSynchronizer.java:1199)
	at java.util.concurrent.locks.ReentrantLock$NonfairSync.lock(ReentrantLock.java:209)
	at java.util.concurrent.locks.ReentrantLock.lock(ReentrantLock.java:285)
	at org.example.lock.MonitoredReentrantLock.lock(MonitoredReentrantLock.java:22)
	at org.example.Main.lambda$testDeadLock$4(Main.java:93)
	at org.example.Main$$Lambda$2/1556956098.run(Unknown Source)
	at java.lang.Thread.run(Thread.java:750)


```
##### benchmark
```
    @Warmup(iterations = 5, time = 1, timeUnit = TimeUnit.SECONDS)
    @Measurement(iterations = 2, time = 2, timeUnit = TimeUnit.SECONDS)
    @Threads(1)

Benchmark                                                          Mode  Cnt       Score   Error   Units
LockBenchmark.testMonitoredLock                                   thrpt    2   15889.407          ops/ms
LockBenchmark.testMonitoredLock:·gc.alloc.rate                    thrpt    2     678.061          MB/sec
LockBenchmark.testMonitoredLock:·gc.alloc.rate.norm               thrpt    2      56.000            B/op
LockBenchmark.testMonitoredLock:·gc.churn.PS_Eden_Space           thrpt    2     668.249          MB/sec
LockBenchmark.testMonitoredLock:·gc.churn.PS_Eden_Space.norm      thrpt    2      55.080            B/op
LockBenchmark.testMonitoredLock:·gc.churn.PS_Survivor_Space       thrpt    2       0.075          MB/sec
LockBenchmark.testMonitoredLock:·gc.churn.PS_Survivor_Space.norm  thrpt    2       0.006            B/op
LockBenchmark.testMonitoredLock:·gc.count                         thrpt    2      20.000          counts
LockBenchmark.testMonitoredLock:·gc.time                          thrpt    2       6.000              ms
LockBenchmark.testNativeLock                                      thrpt    2  103130.635          ops/ms
LockBenchmark.testNativeLock:·gc.alloc.rate                       thrpt    2      ≈ 10⁻⁴          MB/sec
LockBenchmark.testNativeLock:·gc.alloc.rate.norm                  thrpt    2      ≈ 10⁻⁶            B/op
LockBenchmark.testNativeLock:·gc.count                            thrpt    2         ≈ 0          counts

    @Warmup(iterations = 5, time = 1, timeUnit = TimeUnit.SECONDS)
    @Measurement(iterations = 2, time = 2, timeUnit = TimeUnit.SECONDS)
    @Threads(100)

Benchmark                                                          Mode  Cnt       Score   Error   Units
LockBenchmark.testMonitoredLock                                   thrpt    2   10994.606          ops/ms
LockBenchmark.testMonitoredLock:·gc.alloc.rate                    thrpt    2     488.508          MB/sec
LockBenchmark.testMonitoredLock:·gc.alloc.rate.norm               thrpt    2      56.002            B/op
LockBenchmark.testMonitoredLock:·gc.churn.PS_Eden_Space           thrpt    2     481.390          MB/sec
LockBenchmark.testMonitoredLock:·gc.churn.PS_Eden_Space.norm      thrpt    2      55.163            B/op
LockBenchmark.testMonitoredLock:·gc.churn.PS_Survivor_Space       thrpt    2       0.020          MB/sec
LockBenchmark.testMonitoredLock:·gc.churn.PS_Survivor_Space.norm  thrpt    2       0.002            B/op
LockBenchmark.testMonitoredLock:·gc.count                         thrpt    2      18.000          counts
LockBenchmark.testMonitoredLock:·gc.time                          thrpt    2       9.000              ms
LockBenchmark.testNativeLock                                      thrpt    2  558652.036          ops/ms
LockBenchmark.testNativeLock:·gc.alloc.rate                       thrpt    2       0.016          MB/sec
LockBenchmark.testNativeLock:·gc.alloc.rate.norm                  thrpt    2      ≈ 10⁻⁴            B/op
LockBenchmark.testNativeLock:·gc.count                            thrpt    2         ≈ 0          counts
```

## Proposed changes

Issue Number: close #xxx

<!--Describe your changes.-->
2024-08-08 21:15:49 +08:00
6f37e483f8 [improve](config)del useless creation config for inverted index (#39005)
## Proposed changes
delete useless config : enable_create_inverted_index_for_array
backport: https://github.com/apache/doris/pull/39006
Issue Number: close #xxx

<!--Describe your changes.-->
2024-08-07 17:13:05 +08:00
3b9394a8c7 [improvement](tablet scheduler) Adjust tablet sched priority to help load data succ #38528 (#38884)
cherry pick from #38528
2024-08-06 02:13:47 +08:00
2425730609 [enhance](auth)support cache ranger datamask and row filter (#37723) (#38575)
pick: https://github.com/apache/doris/pull/37723
2024-08-02 14:59:32 +08:00
b0943064e0 [fix](kerberos)fix and refactor ugi login for kerberos and simple authentication (#38607)
pick from  (#37301)
2024-08-01 14:01:32 +08:00
6bd93b119f [pick](cast)Feature cast complexttype2 json (#38632)
## Proposed changes
backport: https://github.com/apache/doris/pull/36548
Issue Number: close #xxx

<!--Describe your changes.-->
2024-08-01 09:18:15 +08:00
Pxl
b4e82d2322 [Improvement](rpc) set grpc channel's keepAliveTime and remove proxy … (#38381)
…on InterruptedExcep… (#37304)

## Proposed changes
1. set grpc channel's keepAliveTime
2. remove proxy on InterruptedException/TimeoutException to avoid
channel unavailable
pick from #37304
## Proposed changes

Issue Number: close #xxx

<!--Describe your changes.-->
2024-07-25 22:11:23 +08:00
e396f853a0 Pick "[enhance](Cooldown) Use config to control whether use cooldown replica for scanning first" (#38322)
## Proposed changes

<!--Describe your changes.-->

Same as master #37492
2024-07-25 12:17:38 +08:00
81a7542cae [pick]Add audit log event queue size limit (#37914)
## Proposed changes
pick #37786
2024-07-16 19:00:22 +08:00
63c2d22513 [cherry-pick](branch-2.1) Pick "[Fix](delete command) Mark delete sign when do delete command in MoW table (#35917)" (#37594)
Pick #35917 and #37151
2024-07-15 18:54:01 +08:00
16de141743 [regression](kerberos)add hive kerberos docker regression env (#37657)
## Proposed changes
pick:
[regression](kerberos)fix regression pipeline env when write hosts 
(#37057)
[regression](kerberos)add hive kerberos docker regression env (#36430)
2024-07-15 09:35:39 +08:00
259d28407e [improvement](statistics)Enable estimate hive table row count using file size. (#37218) (#37694)
backport: https://github.com/apache/doris/pull/37218
2024-07-12 13:47:27 +08:00
6214d6421f [Fix](planner) fix bug of char(255) toSql (#37340) (#37671)
cherry-pick #37340 from master
2024-07-12 10:33:24 +08:00
dd18652861 [branch-2.1](routine-load) make get Kafka meta timeout configurable (#37399)
pick #36619
2024-07-08 10:39:17 +08:00
d08a418dd8 [branch-2.1](routine-load) optimize routine load job auto resume policy (#37373)
pick #35266
2024-07-07 18:16:56 +08:00
b3eaf0e4d2 [bugfix](hive)Prevent multiple fs from being generated for 2.1 (#37142)
pick #36954
2024-07-02 22:54:40 +08:00
e25717458e [opt](catalog) add some profile for parquet reader and change meta cache config (#37040) (#37146)
bp #37040
2024-07-02 20:58:43 +08:00
3f382b797a [branch-2.1][improvement](sqlserver catalog) Configurable whether to use encrypt when connecting to SQL Server using the catalog (#36971)
pick (#36659)
pick #37015
In previous versions, we used druid as the default JDBC connection pool,
which can use custom decryption to parse the certificate when SQL Server
encryption is turned on. However, in the new version, after changing
HikariCP as the default connection pool, the SQLServer certificate
cannot be parsed, so encryption needs to be turned off for normal use.
Therefore, a parameter is added to decide whether to disable SQLServer
encryption. It is not disabled by default.
2024-07-02 10:14:43 +08:00
22cb7b8fcb [improvement](compaction) be do not compact invisible version to avoid query error -230 #28082 (#36222)
cherry pick from #28082
2024-06-27 13:45:21 +08:00
58cc1dca7f [improve](fe) Support to config max msg/frame size of the thrift server (#36594)
Cherry-pick #35845
2024-06-21 00:15:15 +08:00
74162a1b7e [enhancement](prepared statement) Handle unsigned numeric type in prepare statement (#36388)
## Proposed changes

Issue Number: bp #36133

<!--Describe your changes.-->
2024-06-18 19:33:12 +08:00
7c0ec4ea2e [fix](autobucket) fix autobucket config masterOnly=true #36116 (#36286)
cherry pick from #36116
2024-06-14 14:26:23 +08:00
9708ca8fcb [Feature](Prepared Statment) Implement in nereids planner (#35318) (#36172) 2024-06-12 19:54:17 +08:00
b5a35b9cef [FIX] Pick array inverted index bugfix (#35837)
here with some array with inverted index bugfix:
see also: 
https://github.com/apache/doris/pull/34766
https://github.com/apache/doris/pull/35086
https://github.com/apache/doris/pull/34683
https://github.com/apache/doris/pull/34076
2024-06-06 09:54:14 +08:00
5c8f87e01e [opt](log) refine the FE logger (#35679)
Previously, FE logs were written to files. The main FE logs include
fe.log, fe.warn.log, fe.audit.log, fe.out, and fe.gc.log.
In a K8s deployment environment, logs usually need to be output to
standard output, and then other components process the log stream.

This PR made the following changes:

1. Modified the log4j configuration template

- When started with `--daemon`, logs are still written to various files,
and the format remains unchanged.
- When started with `--console`, all logs are output to standard output
and marked with different prefixes:

		- `StdoutLogger`: logs for standard output
		- `StderrLogger`: logs for standard error output
		- `RuntimeLogger`: logs for fe.log or fe.warn.log
		- `AuditLogger:` logs for fe.audit.log
		- No prefix: logs for fe.gc.log

		Examples are as follows:

		```
RuntimeLogger 2024-06-03 14:54:51,229 INFO (binlog-gcer|62)
[BinlogManager.gc():359] begin gc binlog
		```

2. Added a new FE config: `enable_file_logger`

Defaults to true. Indicates that logs will be recorded to files
regardless of the startup method. For example, if it is started with
`--console`, the log will be output to both the file and the standard
output. If it is `false`, the log will not be recorded in the file
regardless of the startup method.

3. Optimized the log format of standard output

The byte streams of stdout and stderr are captured. The logs previously
outputted using `System.out` will be captured in fe.log for unified
management.
2024-06-04 18:20:30 +08:00
f94222a04e [fix](log) Support fe log rollover size strategy (#34446) 2024-06-04 18:18:16 +08:00
db3bbc2437 [feature](merge-cloud) Change fe log rolling max size (#32777) 2024-06-04 18:17:33 +08:00
bc6b316e87 [chore](index) add config enable_create_bitmap_index_as_inverted_index default true #33434 (#35521) 2024-06-04 12:07:03 +08:00
4f0365e0bf [fix](s3) move s3 providers to fe-common to be accessible for jni reader (#35779)
backport: #35690

`PropertyConverter.setS3FsAccess` has add customized s3 providers:
```
public static final List<String> AWS_CREDENTIALS_PROVIDERS = Arrays.asList(
            DataLakeAWSCredentialsProvider.class.getName(),
            TemporaryAWSCredentialsProvider.class.getName(),
            SimpleAWSCredentialsProvider.class.getName(),
            EnvironmentVariableCredentialsProvider.class.getName(),
            IAMInstanceCredentialsProvider.class.getName());
```
And these providers are set as configuration value of
`fs.s3a.aws.credentials.provider`, which will be used as configuration
to build s3 reader in JNI readers. However,
`DataLakeAWSCredentialsProvider` is in `fe-core`, that is not dependent
by JNI readers, so we have to move s3 providers to `fe-common'.
2024-06-03 14:04:39 +08:00
d83c714824 [branch-2.1](routine-load) adjusting the default configuration of routing load (#35753)
#34898
2024-06-01 11:22:21 +08:00
fd23386ec5 [fix](auth)fix simple auth check and default username (#35620)
fix simple auth check and default username
we should set simple auth to valid by default, and check whether to set
the default username in loginWithUGI
2024-05-30 19:59:37 +08:00
b0e2461181 [branch-2.1][improvement](JdbcScan) Change the mysql function that does not support pushdown in JdbcScan to Config (#35631)
pk #35196
2024-05-30 15:40:08 +08:00
bddaeb9261 [Fix](JobSchedual) Modify the default value of async_task_consumer_thread_num (#35456)
When `Export` statements are executed concurrently, the background uses
`Job schedule` to manage export tasks. Previously, the default value of
`async_task_consumer_thread_num` was 5, meaning that regardless of the
concurrency setting, a maximum of only 5 threads could execute
concurrently.

On the other hand, not only `Export` uses `Job schedule`, but other
scheduled tasks might also use `Job schedule`, leading to a shortage of
thread resources

Now, we have found that in many scenarios, `Export` needs to be set to a
high concurrency value and run concurrently according to that high
value. Clearly, `async_task_consumer_thread_num = 5` is no longer
sufficient, so we have changed the default value of
`async_task_consumer_thread_num` to 64
2024-05-28 18:54:06 +08:00
af7b16f213 [optimize](desc) display the correct data type of aggStateType (#34968)
If a table column is AGG_STATE type, we can't get the clear defined data type if we use `desc tbl` statement.

create table a_table(
    k1 int null,
    k2 agg_state<max_by(int not null,int)> generic,
    k3 agg_state<group_concat(string)> generic
)
aggregate key (k1)
distributed BY hash(k1) buckets 3
properties("replication_num" = "1");

before optimize:

mysql> desc a_table;
+-------+------------------------------------------------+------+-------+---------+---------+
| Field | Type                                           | Null | Key   | Default | Extra   |
+-------+------------------------------------------------+------+-------+---------+---------+
| k1    | INT                                            | Yes  | true  | NULL    |         |
| k2    | org.apache.doris.catalog.AggStateType@239f771c | No   | false | NULL    | GENERIC |
| k3    | org.apache.doris.catalog.AggStateType@2e535f50 | No   | false | NULL    | GENERIC |
+-------+------------------------------------------------+------+-------+---------+---------+
3 rows in set (0.00 sec)


after optimize:

mysql> desc a_table;
+-------+------------------------------------+------+-------+---------+---------+
| Field | Type                               | Null | Key   | Default | Extra   |
+-------+------------------------------------+------+-------+---------+---------+
| k1    | INT                                | Yes  | true  | NULL    |         |
| k2    | AGG_STATE<max_by(INT, INT NULL)>   | No   | false | NULL    | GENERIC |
| k3    | AGG_STATE<group_concat(TEXT NULL)> | No   | false | NULL    | GENERIC |
+-------+------------------------------------+------+-------+---------+---------+


Co-authored-by: duanxujian <duanxujian@jd.com>
2024-05-22 10:03:31 +08:00
5012ddd87a [fix](Nereids) fix sql cache return old value when truncate partition (#34698)
1. fix sql cache return old value when truncate partition
2. use expire_sql_cache_in_fe_second to control the expire time of the sql cache which in the NereidsSqlCacheManager
2024-05-18 18:05:31 +08:00
1a24895257 [opt](routine-load) optimize routine load task thread pool and related param(#32282) (#34896) 2024-05-15 12:42:02 +08:00
f9c42f34dd [fix](auth)Compatible with previously enabled ldap configuration (#34891) 2024-05-15 12:36:47 +08:00
cadbbdd2c0 [fix](config) for compatibility issue of log dir config (#34734)
* [fix](config) for compatibility issue of log dir config

* 1
2024-05-12 09:44:50 +08:00
ec34bc0386 [bug](config) Fix modifying label_num_threshold does not take effect (#34575) 2024-05-10 22:12:17 +08:00
9a94681b29 [refactor](type) AggStateType should not extends ScalarType (#34463)
1. let AggStateType extends Type
2. remove useless interface isFixedLengthType and supportsTablePartitioning
3. let MapType implement interface isSupported
4. let VariantType extends ScalarType
2024-05-10 22:10:42 +08:00
853dbdcb00 [Feature](PreparedStatement) implement general server side prepared (#33807) 2024-05-10 22:10:11 +08:00
6c11dd2231 [Fix](planner) fix ScalarType.getAssignmentCompatibleType() when deal boolean and decimal (#34435)
The legacy planner encounters issues when handling filters such as: c1(boolean type)=0.0(decimalv3).
The literal 0.0 is interpreted as decimalv3(1,1), and the boolean type c1 is coerced to decimalv3(1,1).
decimalv3(1,1) can only retain values in the range [0,1), while the boolean true is represented as 1, exceeding the upper bound, thus causing an overflow problem.
This pull request addresses this issue by considering the boolean type as decimalv3(1,0), making both c1 and 0.0 being cast to decimal(2,1).


Co-authored-by: feiniaofeiafei <moailing@selectdb.com>
2024-05-10 22:07:16 +08:00
07207b7b51 [feature](shuffle) enable strict consistency dml by default (#32958) (#34641) 2024-05-10 14:31:50 +08:00
3ae3f9d6e1 [opt](catalog) support using loading cache for db/table list in external catalog (#33610) (#34596)
bp #33610
2024-05-09 17:50:39 +08:00