The regression test may failed because of heartbeat failure occasionally.
So I add 2 new FE config to relax this limit
1. `disable_backend_black_list`
Set to true to not put Backend to black list even if we failed to send task to it. Default is false.
2. `max_backend_heartbeat_failure_tolerance_count`
Only if the failure time of heartbeat exceed this config, we can set Backend as dead. Default is 1.
1. Modify default behavior of `build.sh`
The `BUILD_JAVA_UDF` is default ON, so that jvm is needed for compilation and runtime.
2. Add docker-compose for MySQL 5.7, PostgreSQL 14 and Hive 2
See `docker/thirdparties/docker-compose`.
3. Add some regression test cases for jdbc query on MySQL, PG and Hive Catalog
The default is `false`, if set to true, you need first start docker for MySQL/PG/Hive.
4. Support `if not exists` and `if exists` for create/drop resource and create/drop encryptkey
1. remove FE config `enable_array_type`
2. limit the nested depth of array in FE side.
3. Fix bug that when loading array from parquet, the decimal type is treated as bigint
4. Fix loading array from csv(vec-engine), handle null and "null"
5. Change the csv array loading behavior, if the array string format is invalid in csv, it will be converted to null.
6. Remove `check_array_format()`, because it's logic is wrong and meaningless
7. Add stream load csv test cases and more parquet broker load tests
1. Refactor the file reader creation in FileFactory, for simplicity.
Previously, FileFactory had too many `create_file_reader` interfaces.
Now unified into two categories: the interface used by the previous BrokerScanNode,
and the interface used by the new FileScanNode.
And separate the creation methods of readers that read `StreamLoadPipe` and other readers that read files.
2. Modify the StreamLoadPlanner on FE side to support using ExternalFileScanNode
3. Now for generic reader, the file reader will be created inside the reader, not passed from the outside.
4. Add some test cases for csv stream load, the behavior is same as the old broker scanner.
1. this pr is used to update the json load docs for import data to array column.
when we use json to import data to array column, the Rapidjson will cause precision problems.
so we update the json-load docs to specify how to avoid these problems.
Issue Number: #7570
Co-authored-by: hucheng01 <hucheng01@baidu.com>
### Motivation
TABLESAMPLE allows you to limit the number of rows from a table in the FROM clause.
Used for data detection, quick verification of the accuracy of SQL, table statistics collection.
### Grammar
```
[TABLET tids] TABLESAMPLE n [ROWS | PERCENT] [REPEATABLE seek]
```
Limit the number of rows read from the table in the FROM clause,
select a number of Tablets pseudo-randomly from the table according to the specified number of rows or percentages,
and specify the number of seeds in REPEATABLE to return the selected samples again.
In addition, can also manually specify the TableID,
Note that this can only be used for OLAP tables.
### Example
Q1:
```
SELECT * FROM t1 TABLET(10001,10002) limit 1000;
```
explain:
```
partitions=1/1, tablets=2/12, tabletList=10001,10002
```
Select the specified tabletID of the t1.
Q2:
```
SELECT * FROM t1 TABLESAMPLE(1000000 ROWS) REPEATABLE 1 limit 1000;
```
explain:
```
partitions=1/1, tablets=3/12, tabletList=10001,10002,10003
```
Q3:
```
SELECT * FROM t1 TABLESAMPLE(1000000 ROWS) REPEATABLE 2 limit 1000;
```
explain:
```
partitions=1/1, tablets=3/12, tabletList=10002,10003,10004
```
Pseudo-randomly sample 1000 rows in t1.
Note that several Tablets are actually selected according to the statistics of the table,
and the total number of selected Tablet rows may be greater than 1000,
so if you want to explicitly return 1000 rows, you need to add Limit.
### Design
First, determine how many rows to sample from each partition according to the number of partitions.
Then determine the number of Tablets to be selected for each partition according to the average number of rows of Tablet,
If seek is not specified, the specified number of Tablets are pseudo-randomly selected from each partition.
If seek is specified, it will be selected sequentially from the seek tablet of the partition.
And add the manually specified Tablet id to the selected Tablet.
#12716 removed the mem limit for single load task, in this PR I propose to remove the session variable load_mem_limit, to avoid confusing.
For compatibility, load_mem_limit in thrift not removed, the value is set equal to exec_mem_limit in FE
Add restore new property 'reserve_dynamic_partition_enable', which means you can
get a table with dynamic_partition_enable property which has the same value
as before the backup. before this commit, you always get a table with property
'dynamic_partition_enable=false' when restore.