When performing storage_medium_migration, the header may be already droped.
In this scenario, header returned will be null pointer.
Saving data to a null pointer will cause core dump.
The previous setting of timeout of a publish version task is mess.
I change it to a configurable time, default it 30 seconds.
And when the table is under rollup or schema change, I double this timeout.
This a kind of best-effort-optimization. Because with a short timeout,
a replica's publish version task is more likely to fail. And if quorum replicas
of a tablet fail to publish, the alter job will fail.
If the table is not under rollup or schema change, the failure of a replica's
publish version task has a minor effect because the replica can be repaired
by tablet repair process very soon. But the tablet repair process will not
repair rollup replicas.
* This commit has brought contribution to streaming mini load
The operation of streaming mini load is sames as previous. Also, user can check the load by frontend.
The difference is that streaming mini load finish the task before reply of REST API while the non-streaming only register a load.
* When updating doris
Updating fe or be firstly are also supported. After fe and be are updated, the streaming mini load will take effect.
* For multi mini load
The non-streaming mini load still has been used by multi mini load. The behavior of multi mini load has not been changed.
* Add a interface named isSupportedFunction
This function is used to protect the correctness of new feature which consists of be and fe during updaing.
* Add streaming job in LoadProc
* Add a config named desired_max_waiting_jobs
1. If the number of pending load jobs is more then desired_max_waiting_jobs, the create load stmt will be rejected.
2. If the number of need_scheduler load jobs is more then desired_max_waiting_jobs, the new routine load job will be rejected.
3. Desired max size is only a expect number, so the size of queue may be more then this number sometimes.
* Merge load manager and load jobs in jobs proc dir
Currently, historical alter jobs will keep for a while before being removed.
And this time is configured by label_keep_max_second. Which is also used for
Load jobs.
But to avoid too many historical load jobs being kept in memory,
'label_keep_max_second' always set to a short time, causing alter jobs to be
removed vary soon.
Add a new FE config 'history_job_keep_max_second' to configure the keep time of
alter jobs. Default is 7 days.
If there are only 3 backends and replication num is 3. If one replica of a
tablet is bad, there is no 4th backend for tablet repair. So we need to delete
a bad replica first to make room for new replica.
In streaming load, one version will generate multi SegmentGroups.
Upon creating rollup, the previous code only check version exists or not.
Instead, every SegmentGroup should be checked independently.
This change adds a load property named strict_mode which is used to prohibit the incorrect data.
When it is set to false, the incorrect data will be loaded by NULL just like before.
When it is set to true, the incorrect data which belongs to a column without expr will be filtered.
The strict_mode is supported in broker load v2 now. It will be supported in stream load later.
For example, we start the process for the first time. The pid is 12345. Due to the accident, the process is killed and the fe.pid exists. Then we start the process for the second time. The pid is 6789. The fe.pid shows 67895 , Because file.write only cover the first four digits. This case can happen easily when we use supervise. Then I add the file.setLength(0) and delete the old data.