Base compaction may choose tablet which has missed versions now.
After compaction, it will failed to check tablet integrity by versions and cored dump.
Ignore this tablet when find tablet to base_compaction.
1. add a needSchedulerTasksQueue in LoadManager: the RoutineLoadTaskScheduler will poll task from this queue and schedule task.
2. add a frontend interface named rlTaskCommit: commit txn, update offset and renew a task for the same partitions
3. add extra property in transaction state: in rlTaskCommit, extra property which looks like {"job_id": xxx, "progress": xxx}
When fe initialize routine load job meta from logs, all of txn state which related to routine load job will be used for initializing progress of job.
Add a TxnStateChangeListener interface for transaction
1. onCommitted , onAborted, beforeAborted will be called by different type of txn
2. RoutineLoadJob will update job progress and renew a task when onCommitted callback
3. Add TxnStateChangeListener into TransactionState
4. set transactionState to committed will call onCommitted callback if callback is not null
5. set transactionState to aborted will call beforeAborted and onAborted
6. beforeAborted in RoutineLoadJob will check if there is related task when TxnStatusChangeReason is TIMEOUT. It will prevent abort when there is a related task by throw TransactionException
7. Other reason of abort will not prevent abort. The onAborted will be call and job state will be change to paused
Change extra to TxnCommitAttachment in TLoadTxnCommitRequest
1. The KAFKA source of TTxnSourceType means that this is a routine load task commit. And the TRLTaskTxnCommitAttachment is the commitInfo of this task.
2. TRLTaskTxnCommitAttachment will be convert to RLTaskTxnCommitAttachment which include progress of this task, task id, numOfErrorData etc.
Add param TxnCommitAttachment into commitTransaction
1. The TxnCommitAttachment will be updated in commitTransaction
When heartbeat failed, we should clear the connections cached
in client pool, or we will get broken connections from the pool.
Since we don't have the REOPEN logic(which may cause ugly code style),
a broken connection may cause a rpc blocked and failed.
So clear them all and recreate them when needed is a simple way to
resolve this problem.
We only clear connections in backend and broker pool.
No need to clear heartbeat pool because heartbeat is very frequent,
such the connections can be invalid automatically.
* Refactor heartbeat logic
Currently we only have Backend heartbeat. And without Frontend
or Broker heartbeat, we don't know the status of these nodes,
thus can't do failover logic in some cases.
1. Add Frontend and Broker heartbeat.
Frontend heartbeat using BootstrapFinish http rest api
Broker heartbeat using ping() rpc.
2. All heartbeats are managed in HeartbeatMgr.
3. Rename BrokerAddress to FsBroker.
* Support colocate join
Colocate join means two table are distributed by the columns being joined,
then we can join them locally on each backend.
Colocate join no data movement and has more concurrency.
* Support TRUNCATE TABLE stmt
User can use TRUNCATE TABLE stmt to empties a table
or partitions completely.
Unlike DELETE, it will drop the tablets directly, and
without any performance impact.
* Fix bugs that new partition should use new ID
* Use equals() to compare Integer
* Fix compile bug
* Fix bug on single range parititon
* Check table's state again after creating partition
* Avoid 'No more data to read' error when handling stream load rpc
1. Catch throwable of all stream load rpc.
2. Avoid setting null string as error msg of rpc result status.
* Change setError_msgs to addToError_msgs
1. Only collect all error replicas if publish task is timeout.
2. Add 2 metrics to monitor the success of failure of txn.
3. Change publish timeout to Config.load_straggler_wait_second
1. Use the data size save in tablet's header to calculate the disk used capacity.
2. Decrease the default interval of disk and tablet report, from 10 min to 1 min.
Step1: updateBeIdTaskMaps, remove unavailable BE and add new alive BE
Step2: process timeout tasks, if a task has already been allocated to BE but not finished before DEFAULT_TASK_TIMEOUT, it will be discarded.
At the same time, the partitions belong to old tasks will be allocated to a new task. The new task with a signature will be added in the queue of needSchedulerRoutineLoadTask.
Step3: process all needSchedulerRoutineLoadTasks, allocate task to BE. The task will be executed by BE.
* Fix bug of #307
There is a bug to use symbolic link directory as storage root path.
It is a problem that whether the path is canonical
In DownloadAction, checking fails by comparing canonical path with non-canonical path.
So fix the bug by convert all path to canonical path before comparison