ISSUES-1725: The result of union stmt whose child is outer join stmt is incorrect.
Example:
sql: (select k1 from empty) union all (select b.k1 k1 from left_table a left join empty b on a.k2 = b.k2);
context: the empty table has no data.
error result: 0
expect result: null
Reason:
The judgment (columns k1 who belongs to union tuple is nullable ) is incorrect.
It could not be determined by slot attribute of children when the slot is produced by the outer join.
The slot A is not nullable while the result of outer join is nullable which is same as slot A.
So, the judgment needs to consider if the slot is come from the outer join.
This commit check the all of parsed column include hadoop function and other function.
Otherwise, the load will thrown the "Column has no default value" exception while the column also has been defined by a non-hadoop function.
Path Hash of a replica in metadata should be set immediately after replica is created.
And we should not depend on path hash to find replicas. Because path hash may be set
delayed.
This config is used to control the max concurrent task num per be.
The cluster max concurrent task num = max_concurrent_task_num_per_be * number of be.
And we should reduce the number of partition info in BrokerScanNode param if user already
set target partitions to load, instead of adding all partitions' info.
It will cause the size of RPC packet too large.
Currently, we do not support parsing encoded/compressed columns in file path, eg: extract column k1 from file path /path/to/dir/k1=1/xxx.csv
This patch is able to parse columns from file path like in Spark(Partition Discovery).
This patch parse partition columns at BrokerScanNode.java and save parsing result of each file path as a property of TBrokerRangeDesc, then the broker reader of BE can read the value of specified partition column.
If strict mode is true, and at least one row is filtered, the insert operation will fail and a url will be given to get the error rows.
```
ERROR 1064 (HY000): all partitions have no load data. url: http://host:ip/api/_load_error_log?file=__shard_2/error_log_insert_stmt_e0a620e93dc54461-b89ec64768367d25_e0a620e93dc54461_b89ec64768367d25
```
If all rows are good, insert will return OK with affected rows:
```
Query OK, 1 row affected (0.26 sec)
```
If strict mode is false, and at least one row is good, the insert operation will return OK with affected rows and warnings. If has error row num, a label will be returned:
```
Query OK, 1 row affected, 1 warning (0.32 sec)
{'label':'7d66c457-658b-4a3e-bdcf-8beee872ef2c'}
```
When the replica is recovered from bad on BE, the report process
should change the bad status of replica on FE to false, or the replica
can not be recovered.
The pathtrie could not distinguish the different param key with the same prefix path.
So the prefix of table info apis has been change to /api/external which is used by spark-doris-connector.
2 cases:
Sometimes a missing version replica can not be repaired. Which may cause query failed
with error: failed to initialize storage reader. tablet=xxx, res=-214
Cancel the rollup job when there are load jobs on that table may cause load job fail.
We should ignore "table not found" exception when committing the txn.
* Broker load supports function
The commit support the column function in broker load.
The grammar of LoadStmt has not been changed.
Example:
columns terminated by ',' (tmp_c1, tmp_c2) set (c1=tmp_c1+tmp_c2)
Also, the old function is compatible such as default_value, strftime etc.
After this commit, there are no difference in column function between stream load and broker load except old function.
The bug is described in issue #1580 . And this patch will fix 2 cases of cluster balance
After finish adding the new replica, the new replica's version may not catch up with
the visible version, so the new replica may be treated as a stale and redundant replica, which
will be deleting at next tablet checking round.
I add a mark named needFurtherRepair to the newly added replica, only mark it when that replica's version does not catch up with visible version. This replica will receive a further repair at next tablet checking round, instead of being deleted.
When deleting the redundant replicas, there may be some load jobs on it. Delete these replicas may cause the load job fail.
Before deleting a redundant replica, I first mark the next txn id on that replica, and set replica's
state to CLONE. The CLONE state will ensure that no more load jobs will be on that replica, and we
will wait all load jobs before the marked txn id to be finished. After that, the replica can be deleted safely.