When the replica is recovered from bad on BE, the report process
should change the bad status of replica on FE to false, or the replica
can not be recovered.
1. Calculate cumulative point when loading tablet first time.
2. Simplify pick rowsets logic upon delete predicate.
3. Saving meta and modify rowsets only once after cumulative compaction.
The pathtrie could not distinguish the different param key with the same prefix path.
So the prefix of table info apis has been change to /api/external which is used by spark-doris-connector.
BetaRowsetWriter is used to write rowset in V2 segment format.
This PR contains several interface changes
1. Rowset.make_snapshot() is renamed to `link_files_to` because hard links are also useful in copy task, linked schema change, etc
2. Rowset.copy_files_to_path() is renamed to `copy_files_to` to be consistent with other names
3. RowsetWriter.mem_pool() is removed because not all rowset writers use MemPool
4. RowsetWriter.garbage_collection() is removed because it's not used by clients
5. SegmentGroup's make_snapshot() is removed because link_segments_to_path() provides similar functionality
2 cases:
Sometimes a missing version replica can not be repaired. Which may cause query failed
with error: failed to initialize storage reader. tablet=xxx, res=-214
Cancel the rollup job when there are load jobs on that table may cause load job fail.
We should ignore "table not found" exception when committing the txn.
* Broker load supports function
The commit support the column function in broker load.
The grammar of LoadStmt has not been changed.
Example:
columns terminated by ',' (tmp_c1, tmp_c2) set (c1=tmp_c1+tmp_c2)
Also, the old function is compatible such as default_value, strftime etc.
After this commit, there are no difference in column function between stream load and broker load except old function.
The bug is described in issue #1580 . And this patch will fix 2 cases of cluster balance
After finish adding the new replica, the new replica's version may not catch up with
the visible version, so the new replica may be treated as a stale and redundant replica, which
will be deleting at next tablet checking round.
I add a mark named needFurtherRepair to the newly added replica, only mark it when that replica's version does not catch up with visible version. This replica will receive a further repair at next tablet checking round, instead of being deleted.
When deleting the redundant replicas, there may be some load jobs on it. Delete these replicas may cause the load job fail.
Before deleting a redundant replica, I first mark the next txn id on that replica, and set replica's
state to CLONE. The CLONE state will ensure that no more load jobs will be on that replica, and we
will wait all load jobs before the marked txn id to be finished. After that, the replica can be deleted safely.
We create a new segment format for BetaRowset. New format merge
data file and index file into one file. And we create a new format
for short key index. In origin code index is stored in format like
RowCusor which is not efficient to compare. Now we encode multiple
column into binary, and we assure that this binary is sorted same
with the key columns.
When creating table with OLAP engine, use can specify multi parition columns.
eg:
PARTITION BY RANGE(`date`, `id`)
(
PARTITION `p201701_1000` VALUES LESS THAN ("2017-02-01", "1000"),
PARTITION `p201702_2000` VALUES LESS THAN ("2017-03-01", "2000"),
PARTITION `p201703_all` VALUES LESS THAN ("2017-04-01")
)
Notice that load by hadoop cluster does not support multi parition column table.
The TabletQuorumFailedException will be thrown in commitTxn while the success replica num of tablet is less then quorom replica num.
The Hadoop load does not handle this exception because the push task will retry it later.
The streaming broker, insert, stream and mini load will catch this exception and abort the txn after that.
Doris support deploy multi BE on one host. So when allocating BE for replicas of
a tablet, we should select different host. But there is a bug in tablet scheduler
that same host may be selected for one tablet. This patch will fix this problem.
There are some places related to this problem:
1. Create Table
There is no bug in Create Table process.
2. Tablet Scheduler
Fixed when selecting BE for REPLICA_MISSING and REPLICA_RELOCATING.
Fixed when balance the tablet.
3. Colocate Table Balancer
Fixed when selecting BE for repairing colocate backend sequence.
Not fix in colocate group balance. Leave it to colocate repairing.
4. Tablet report
Tablet report may add replica to catalog. But I did not check the host here,
Tablet Scheduler will fix it.