[Load]
When performing a long-time load job, the following errors may occur. Causes the load to fail.
load channel manager add batch with unknown load id: xxx
There is a case of this error because Doris opened an unrelated channel during the load
process. This channel will not receive any data during the entire load process. Therefore,
after a fixed timeout, the channel will be released.
And after the entire load job is completed, it will try to close all open channels. When it try to
close this channel, it will find that the channel no longer exists and an error is reported.
This CL will pass the timeout of load job to the load channel, so that the timeout of load channels
will be same as load job's.
Use same UUID as query ID and load ID of a load execution plan.
Each load execution plan has a load ID, and as a plan, there is also a query ID.
We can use same UUID as query ID and load ID, for tracing the load process more easily.
Change the load ID when retrying a load execution plan.
When a load execution plan retry, the load ID should be changed, otherwise BE can not
distinguish the old and new load requests.
Cancel the running loading task when cancelling the broker load.
When user cancel a broker load, the running loading task should also be cancelled, or
it may occupies the worker thread for a long time.
Remove the unnecessary query report when doing load execution plan.
Only the last query report is needed.
Add a new BE config tablet_writer_rpc_timeout_sec.
It is used for RPC of tablet sink. The default is 600 seconds. which is long enough for flushing
about 6GB data. The long timeout config will reduce the possibility of encountering fail to send batch error when loading.
Use streaming_load_max_mb instead of mini_load_max_mb in BE config.
Add more logs for tracing a broker load process easily.
The Operator wants to known when the job being scheduled as PENDING
and LOADING. And how long it takes to finish these sub states.
Also add 2 metrics on BE to monitor the memtable's flush time.
`memtable_flush_total` and `memtable_flush_duration_us`
Mini load is now using stream load framework. But we should keep the
mini load return behavior and result json format be same as old.
So PUBLISH_TIMEOUT error should be treated as OK in mini load.
Also add 2 counters for OlapTableSink profile:
SerializeBatchTime: time of serializing all row batch.
WaitInFlightPacketTime: time of waiting last send packet
NOTE: This patch would modify all Backend's data.
And this will cause a very long time to restart be.
So if you want to interferer your product environment,
you should upgrade backend one by one.
1. Refactoring be is to clarify the structure the codes.
2. Use unique id to indicate a rowset.
Nameing rowset with tablet_id and version will lead to
many conflicts among compaction, clone, restore.
3. Extract an rowset interface to encapsulate rowsets
with different format.