# Proposed changes
Implement predicate pushdown in `OrcReader` by converting doris `ColumnValueRange` to orc `SearchArgument`.
## Remaining problems
1. Orc support `not in`, which may have effect on bloom filter. However, doris `ScanNode` has not push down `not in` to file scanner.
2. Orc support `is null`, and row range has `hasNull` identifier. However, `_contain_null` in `ColumnValueRange` is ambiguous. `_contain_null = true` only means that the value can be nullable, not equal to null.
3. `DateTimeV2` has lost microsecond precision in `ColumnValueRange`, which may cause filtering error when a min-max value equals to the predicate value.
4. `DateTimeV1` is not accurate enough, and only saved to seconds.
5. Orc support the predicate pushdown of `float&double` type, but doris has not push down `float&double` type for precision reason.
# Proposed changes
This PR fixed lots of issues when building from source on macOS with Apple M1 chip.
## ATTENTION
The job for supporting macOS with Apple M1 chip is too big and there are lots of unresolved issues during runtime:
1. Some errors with memory tracker occur when BE (RELEASE) starts.
2. Some UT cases fail.
...
Temporarily, the following changes are made on macOS to start BE successfully.
1. Disable memory tracker.
2. Use tcmalloc instead of jemalloc.
This PR kicks off the job. Guys who are interested in this job can continue to fix these runtime issues.
## Use case
```shell
./build.sh -j 8 --be --clean
cd output/be/bin
ulimit -n 60000
./start_be.sh --daemon
```
## Something else
It takes around _**10+**_ minutes to build BE (with prebuilt third-parties) on macOS with M1 chip. We will improve the development experience on macOS greatly when we finish the adaptation job.
1. Refactor the file reader creation in FileFactory, for simplicity.
Previously, FileFactory had too many `create_file_reader` interfaces.
Now unified into two categories: the interface used by the previous BrokerScanNode,
and the interface used by the new FileScanNode.
And separate the creation methods of readers that read `StreamLoadPipe` and other readers that read files.
2. Modify the StreamLoadPlanner on FE side to support using ExternalFileScanNode
3. Now for generic reader, the file reader will be created inside the reader, not passed from the outside.
4. Add some test cases for csv stream load, the behavior is same as the old broker scanner.
Allocator of rapidjson does not release memory, this fix use allocator with local buffer and call Clear to release memory allocated beyond local buffer.
1. Fix issue #13115
2. Modify the method of `get_next_block` or `GenericReader`, to return "read_rows" explicitly.
Some columns in block may not be filled in reader, if the first column is not filled, use `block->rows()` can not return real row numbers.
3. Add more checks for broker load test cases.
This change serves the following purposes:
1. use ScanPredicate instead of TCondition for external table, it can reuse old code branch.
2. simplify and delete some useless old code
3. use ColumnValueRange to save predicate
* [fix](parquet) fix write error data as parquet format.
Fix incorrect data conversion when writing tiny int and small int data
to parquet files in non-vectorized engine.
1. HttpServer threads allocate bytebuffer and put them into streamload pipe, but scanner thread release them with query tracker.
2. We can assume brpc allocate memory in doris thread.
Above problems leads to wrong result of memtracker.
Currently, Doris has a variety of readers for different file formats,
such as parquet reader, orc reader, csv reader, json reader and so on.
The interfaces of these readers are not unified, which makes it impossible to call them through a unified method.
In this PR, I added a `GenericReader` interface class, and other Readers will implement this interface class
to use the `get_next_block()` method.
This PR currently only modifies `arrow_reader` and `parquet reader`.
Other readers will be modified one by one in subsequent PRs.
Related pr:
https://github.com/apache/doris/pull/11582https://github.com/apache/doris/pull/12048
Using new file scan node and new scheduling framework to do the load job, replace the old broker scan node.
The load part (Be part) is work in progress. Query part (Fe) has been tested using tpch benchmark.
Please review only the FE code in this pr, BE code has been disabled by enable_new_load_scan_node configuration. Will send another pr soon to fix be side code.
Each NodeChannel has its own queue, with size up to 1/20 exec_mem_limit.
User will crash into OOM if set exec_mem_limit high. This commit uses
fixed number to control the total max memory used by NodeChannels.
Signed-off-by: freemandealer <freeman.zhang1992@gmail.com>
In some cases, when the user executes the query for the first time, an error of the exceeded mem limit will be reported, and the query will be successful only after the second execution.
This is because when the query is executed for the first time, the memory consumed by adding the page cache and other caches is recorded in the query mem tracker, hoping to unify the behavior of multiple queries.
A temporary solution, remove the hook of scanner thread, test clickbench q13
Before removing the scanner thread hook
Enable page cache: 3G for the first query, 3G for the tracker; 900M for the second query, 900M for the tracker.
Turn off page cache: 1.9G for the first query, 1.9G for the tracker; 900M for the second query, 900M for the tracker
After removing the scanner thread hook and fix MemTrackerLimiter::cache_consume_local bug
Enable page cache: 2916M for the first query, 1147M for the tracker; 979M for the second query, 1144M for the tracker
Turn off page cache: 1809M for the first query, 1147M for the tracker; 975M for the second query, 1145M for the tracker
TODO, a better solution is to track storage-related memory separately, in the scanner thread. Otherwise, it is impossible to know where the process memory grows when querying.
We have added logical project before, but to actually finish the prune to reduce the data IO, we need to add related supports in translator and BE.
This PR:
- add projections on each ExecNode in BE
- translate PhysicalProject into projections on PlanNode in FE
- do column prune on ScanNode in FE
Co-authored-by: HappenLee <happenlee@hotmail.com>
Copy most of profiles from VOlapScanNode and VOlapScanner to NewOlapScanNode and NewOlapScanner.
Fix some blocking bug of new scan framework.
TODO:
Memtracker
Opentelemetry spen
The new framework is still disabled by default, so it will not effect other feature.
* table function node enhancement
* also avoid copy for non-vec table function node
* fix table function node output slots calculation while lateral view involves subquery
Co-authored-by: cambyzju <zhuxiaoli01@baidu.com>
There are currently many types of ScanNodes in Doris. And most of the logic of these ScanNodes is the same, including:
Runtime filter
Predicate pushdown
Scanner generation and scheduling
So I intend to unify the common logic of all ScanNodes.
Different data sources only need to implement different Scanners for data access.
So that the future optimization for scan can be applied to the scan of all data sources,
while also reducing the code duplication.
This PR mainly adds 4 new class:
VScanner
All Scanners' parent class. The subclasses can inherit this class to implement specific data access methods.
VScanNode
The unified ScanNode, and is responsible for common logic including RuntimeFilter, predicate pushdown, Scanner generation and scheduling.
ScannerContext
ScannerContext is responsible for recording the execution status
of a group of Scanners corresponding to a ScanNode.
Including how many scanners are being scheduled, and maintaining
a producer-consumer blocks queue between scanners and scan nodes.
ScannerContext is also the scheduling unit of ScannerScheduler.
ScannerScheduler schedules a ScannerContext at a time,
and submits the Scanners to the scanner thread pool for data scanning.
ScannerScheduler
Unified responsible for all Scanner scheduling tasks
Test:
This work is still in progress and default is disabled.
I tested it with jmeter with 50 concurrency, but currently the scanner is just return without data.
The QPS can reach about 9000.
I can't compare it to origin implement because no data is read for now. I will test it when new olap scanner is ready.
Co-authored-by: morningman <morningman@apache.org>