Support Aliyun DLF
Support data on s3-compatible object storage, such as aliyun oss.
Refactor some interface of catalog, to make it more tidy.
Fix bug that the default text format field delimiter of hive should be \x01
Add a new class PooledHiveMetaStoreClient to wrap the IMetaStoreClient.
## Design
### Trigger
Every time when a rowset writer produces more than N (e.g. 10) segments, we trigger segment compaction. Note that only one segment compaction job for a single rowset at a time to ensure no recursing/queuing nightmare.
### Target Selection
We collect segments during every trigger. We skip big segments whose row num > M (e.g. 10000) coz we get little benefits from compacting them comparing our effort. Hence, we only pick the 'Longest Consecutive Small" segment group to do actual compaction.
### Compaction Process
A new thread pool is introduced to help do the job. We submit the above-mentioned 'Longest Consecutive Small" segment group to the pool. Then the worker thread does the followings:
- build a MergeIterator from the target segments
- create a new segment writer
- for each block readed from MergeIterator, the Writer append it
### SegID handling
SegID must remain consecutive after segment compaction.
If a rowset has small segments named seg_0, seg_1, seg_2, seg_3 and a big segment seg_4:
- we create a segment named "seg_0-3" to save compacted data for seg_0, seg_1, seg_2 and seg_3
- delete seg_0, seg_1, seg_2 and seg_3
- rename seg_0-3 to seg_0
- rename seg_4 to seg_1
It is worth noticing that we should wait inflight segment compaction tasks to finish before building rowset meta and committing this txn.
There are several configs related to tcmalloc, users do know how to config them. Actually users just want two modes, performance or compact, in performance mode, users want doris run query and load quickly while in compact mode, users want doris run with less memory usage.
If we want to config tcmalloc individually, we can use env variables which are supported by tcmalloc.
1.remove quick_compaction's rowset pick policy, call cu compaction when trigger
quick compaction
2. skip tablet's compaction task when compaction score is too small
Co-authored-by: yixiutt <yixiu@selectdb.com>
Support delete from partitioned table without partition specified in [DELETE] stmt.
## Usage
If it is a partitioned table, you can specify a partition.
If not specified, Doris will infer partition from the given conditions.
In two cases, Doris cannot infer the partition from conditions:
1) the conditions do not contain partition columns;
2) The operator of the partition column is `not in`.
When a partition table does not specify the partition,
or the partition cannot be inferred from the conditions,
the session variable `delete_without_partition` needs to be `true`
to make delete statement be applied to all partitions.
## Test case
Test case is added in `regression-test/suites/delete_p0/test_delete_from_partition.groovy`,
user can delete from partitioned table without partition specified now.
The regression test may failed because of heartbeat failure occasionally.
So I add 2 new FE config to relax this limit
1. `disable_backend_black_list`
Set to true to not put Backend to black list even if we failed to send task to it. Default is false.
2. `max_backend_heartbeat_failure_tolerance_count`
Only if the failure time of heartbeat exceed this config, we can set Backend as dead. Default is 1.
1. Modify default behavior of `build.sh`
The `BUILD_JAVA_UDF` is default ON, so that jvm is needed for compilation and runtime.
2. Add docker-compose for MySQL 5.7, PostgreSQL 14 and Hive 2
See `docker/thirdparties/docker-compose`.
3. Add some regression test cases for jdbc query on MySQL, PG and Hive Catalog
The default is `false`, if set to true, you need first start docker for MySQL/PG/Hive.
4. Support `if not exists` and `if exists` for create/drop resource and create/drop encryptkey
1. remove FE config `enable_array_type`
2. limit the nested depth of array in FE side.
3. Fix bug that when loading array from parquet, the decimal type is treated as bigint
4. Fix loading array from csv(vec-engine), handle null and "null"
5. Change the csv array loading behavior, if the array string format is invalid in csv, it will be converted to null.
6. Remove `check_array_format()`, because it's logic is wrong and meaningless
7. Add stream load csv test cases and more parquet broker load tests