[improvement](scan) avoid too many scanners for file scan node (#25727)

In previous, when using file scan node(eq, querying hive table), the max number of scanner for each scan node will be the `doris_scanner_thread_pool_thread_num`(default is 48). And if the query parallelism is N, the total number of scanner would be 48 * N, which is too many. In this PR, I change the logic, the max number of scanner for each scan node will be the `doris_scanner_thread_pool_thread_num / query parallelism`. So that the total number of scanners will be up to `doris_scanner_thread_pool_thread_num`. Reduce the number of scanner can significantly reduce the memory usage of query.
2023-10-29 17:41:31 +08:00
parent 99b45e1938
commit e20cab64f4
35 changed files with 83 additions and 50 deletions
--- a/be/src/vec/exec/vmysql_scan_node.cpp
+++ b/be/src/vec/exec/vmysql_scan_node.cpp
@ -227,7 +227,8 @@ void VMysqlScanNode::debug_string(int indentation_level, std::stringstream* out)
    }
 }

-Status VMysqlScanNode::set_scan_ranges(const std::vector<TScanRangeParams>& scan_ranges) {
+Status VMysqlScanNode::set_scan_ranges(RuntimeState* state,
+                                       const std::vector<TScanRangeParams>& scan_ranges) {
    return Status::OK();
 }
-} // namespace doris::vectorized
+} // namespace doris::vectorized