MaxScannerThreadNum in file scan operator when turn on pipelinex is incorrect, it will cost many memory and causing performance degradation. This PR fix it.
Cached blocks may be empty when VFileScanner return NOT_FOUND. This feature is introduced by https://github.com/apache/doris/pull/15226. Move this function inner `VFileScanner`.
There are 2 potential reasons to cancel pipelineX query timeout.
Cancel fragment context first and set ready to execute will set cancel flag to false.
Dead lock.
`ScannerContext` will schedule scanners even after stopped, and confused with `_is_finished` and `_should_stop`.
Only Fix the concurrency bugs when scanner is stopped or finished reported in https://github.com/apache/doris/pull/28384
using weak ptr as a lock between fragment execute thread and scanner thread, to solve the core problem in scanner's dctor to access scannode's profile.
VScanNode::get_next will check whether the ScanNode has reached limit condition, and send eos to TaskScheduler, and TaskScheduler will try to close ScanNode.
However, ScanNode must wait all running scanners finished, so even if ScanNode has reached limit condition, it can't be closed immediately.
This PR try to interrupt the running readers, and make ScanNode to end as soon as possible.
* [fix] scanner hangs due to negative num_running_scanners
Before the patch, num_running_scanners is increased after submitting,
then it may be decreased before increasing then negative values can
be seen by get_block_from_queue and a expected submit does not happend.
Co-authored-by: Mingyu Chen <morningman.cmy@gmail.com>
1.Reconstruct the logic of decode to read parquet. The parquet reader first reads the data according to the parquet physical type, and then performs a type conversion.
2.Support hive alter table.
When enable shared scan, all scanners will be created by one instance. When the main instance reach eos and quit, all states of it will be released. But other instances are still possible to get block from those scanners. So we must assure scanners will not be dependent on any states of the main instance after it quit.
* [improvement](scanner_schedule) reduce memory consumption of scanner
1. limit scanner by memory consumptin rather than blocks.
2. scheduler run correcty instread of at lest 1.