Reproduce:
1. start a routine load, send a routine load task to BE
2. BE executes task successfully and commit to FE.
3. Commit request failed on FE because database is renamed(throw db not found exception)
4. After commit failed, BE will send rollback request to FE.
5. FE receive this rollback request and mistakenly update the routine load progress,
because the number of loaded rows in this rollback request's attachment is larger than 0
Random distribution is no longer supported since version 0.9.
And we need a way to convert the random distribution to hash distribution.
ALTER TABLE db.tbl SET ("distribution_type" = "hash");
1. Support specifying label to Insert Into stmt.
INSERT INTO tbl1 WITH LABEL label1 ...;
2. Return job' state corresponding to the existing label in result of stream load.
...
"Status": "Label Already Exists",
"ExistingJobStatus": "FINISHED"
...
3. Return the recent 2000 transactions in SHOW PROC '/transactions'
ISSUES-1725: The result of union stmt whose child is outer join stmt is incorrect.
Example:
sql: (select k1 from empty) union all (select b.k1 k1 from left_table a left join empty b on a.k2 = b.k2);
context: the empty table has no data.
error result: 0
expect result: null
Reason:
The judgment (columns k1 who belongs to union tuple is nullable ) is incorrect.
It could not be determined by slot attribute of children when the slot is produced by the outer join.
The slot A is not nullable while the result of outer join is nullable which is same as slot A.
So, the judgment needs to consider if the slot is come from the outer join.
This commit check the all of parsed column include hadoop function and other function.
Otherwise, the load will thrown the "Column has no default value" exception while the column also has been defined by a non-hadoop function.
Path Hash of a replica in metadata should be set immediately after replica is created.
And we should not depend on path hash to find replicas. Because path hash may be set
delayed.
This config is used to control the max concurrent task num per be.
The cluster max concurrent task num = max_concurrent_task_num_per_be * number of be.
And we should reduce the number of partition info in BrokerScanNode param if user already
set target partitions to load, instead of adding all partitions' info.
It will cause the size of RPC packet too large.
Currently, we do not support parsing encoded/compressed columns in file path, eg: extract column k1 from file path /path/to/dir/k1=1/xxx.csv
This patch is able to parse columns from file path like in Spark(Partition Discovery).
This patch parse partition columns at BrokerScanNode.java and save parsing result of each file path as a property of TBrokerRangeDesc, then the broker reader of BE can read the value of specified partition column.
If strict mode is true, and at least one row is filtered, the insert operation will fail and a url will be given to get the error rows.
```
ERROR 1064 (HY000): all partitions have no load data. url: http://host:ip/api/_load_error_log?file=__shard_2/error_log_insert_stmt_e0a620e93dc54461-b89ec64768367d25_e0a620e93dc54461_b89ec64768367d25
```
If all rows are good, insert will return OK with affected rows:
```
Query OK, 1 row affected (0.26 sec)
```
If strict mode is false, and at least one row is good, the insert operation will return OK with affected rows and warnings. If has error row num, a label will be returned:
```
Query OK, 1 row affected, 1 warning (0.32 sec)
{'label':'7d66c457-658b-4a3e-bdcf-8beee872ef2c'}
```