1. Fix issue #13115
2. Modify the method of `get_next_block` or `GenericReader`, to return "read_rows" explicitly.
Some columns in block may not be filled in reader, if the first column is not filled, use `block->rows()` can not return real row numbers.
3. Add more checks for broker load test cases.
For Mac M1, the default is python3 instead of python.
When FE compiles, there will be an error that python cannot be found.
This PR complements this part of the description.
Add more detail profile for ParquetReader:
ParquetColumnReadTime: the total time of reading parquet columns
ParquetDecodeDictTime: time to parse dictionary page
ParquetDecodeHeaderTime: time to parse page header
ParquetDecodeLevelTime: time to parse page's definition/repetition level
ParquetDecodeValueTime: time to decode page data into doris column
ParquetDecompressCount: counter of decompressing page data
ParquetDecompressTime: time to decompress page data
ParquetParseMetaTime: time to parse parquet meta data
Add partition info into LogicalAggregate and set it as original group expression list of aggregate when we do aggregate disassemble with distinct aggregate function.
optimize planner by:
1. reduce duplicated calculation on equals, getOutput, computeOutput eq.
2. getOnClauseUsedSlots: the two side of equalTo is centainly slot, so not need to use List.
This change serves the following purposes:
1. use ScanPredicate instead of TCondition for external table, it can reuse old code branch.
2. simplify and delete some useless old code
3. use ColumnValueRange to save predicate
the basic idea of star-schema support is:
1. fact_table JOIN dimension_table, if dimension table are filtered, the result can be regarded as applying a filter on fact table.
2. fact_table JOIN dimension_table, if the dimension table is not filtered, the number of join result tuple equals to the number of fact tuples.
3. dimension table JOIN fact table, the number of join result tuple is that of fact table or 2 times of dimension table.
If star-schema support is enabled:
1. nereids regard duplicate key(unique key/aggregation key) as primary key
2. nereids try to regard one join key as primary key and another join key as foreign key.
3. if nereids found that no join key is primary key, nereids fall back to normal estimation.
If a scan node has predicate, we can not limit the concurrency of scanner.
Because we don't know how much data need to be scan.
If we limit the concurrency, this will cause query to be very slow.
For exmple:
select * from tbl limit 1, the concurrency will be 1;
select * from tbl where k1=1 limit 1, the concurrency will not limit.