Fix some bugs when add REWRITE rule to Cascades Optimizer
- all rule should set as not rewrite rule when use them in Cascades Optimizer
- IMPLEMENT rule promise should large than others since we should do exploration first.
In old planner, Predicate set its type in analyzeImpl(). However, function analyzeImpl() is in old planner path, but not in nereids path. And hence the type is invalid.
Because all predicate has type bool, we set its type in constructor.
Currently, nereids doesn't support aggregate function with no slot reference in query, since all the column would be pruned, e.g.
SELECT COUNT(1) FROM t;
This PR reserve the column with the smallest amount of data when doing column prune under this situation.
To be noticed, this PR ONLY handle aggregate functions. So projection with no slot reference need to be handled in future.
#11392 made _input_block in each BetaRowsetReaders sharable. However, for some types (e.g. nested array with more than 1 depth), the _column_vector_batches in RowBlockV2 can be nested which means that there is a ColumnVectorBatch inside another ColumnVectorBatch. In this case, the data of inner ColumnVectorBatch
may be corrupted because the data of _input_block is copied shallowly to the _output_block.
Currently, explain string print all expression as slot id, e.g. `<slot 1>`.
This PR, print its name with slot id instead, e.g. `column_a[#1]`. For details:
- print qualified table name for OlapScanNode
- print NamedExpression name with SlotId instead of just SlotId
- OlapScanNode's node name use "OlapScanNode" instead of table name
Currently, there are still lots of bugs related to ARRAY<NOT_NULL(T)>.
We decide that we don't support ARRAY<NOT_NULL(T)> types at the first version and all elements in ARRAY are nullable.
Co-authored-by: cambyzju <zhuxiaoli01@baidu.com>
Add compile check for document format
Avoid document formatting issues that fail in the daily build release of the official website
so that we can find problems and fix them in time to avoid repeated modifications
Since the compiler for the website is now in the doris-website repo, we pull the code from this repo, delete the documentation inside, and copy the documentation from doris master to perform the compiler check
Strip debug info of most of thridparty dependencies' static lib.
If can significantly reduce the size of thirdparty libs: 3.4G -> 1.6G
And the doris_be binary size will be reduced: 1.5G -> 868M (clang build)
And after compress, the BE binary is only 195M with debug info!
1. `ExprContext` is delete in `ParquetReader::close()`, but it has not been closed,
so the `DCHECH` in `~ExprContext()` is failed. the lifetime of `ExprContext` is managed by scan node,
so we should not delete its pointer in `ParquetReader::close()`.
2. `RowGroupReader::next_batch` will update `_read_rows` in every column loop,
and does not ensure the number of rows in every column are equal.
3. The skipped row ranges are variables in stack, which are released when calling `ArrayColumnReader::read_column_data`, so we should copy them out.
`groupPlan()` pattern means to find a `GroupPlan` in memo. Since we have no `GroupPlan` in memo, it is always return nothing.
When we want write a pattern to match any GROUP, we should use `group()`. But pattern `groupPlan` is very confusing, and easy misuse.
So, this PR ban `groupPlan()` pattern ti avoid misuse.
When ExecNode's projections is not empty, it use output row descriptor to initialize the block before doing projection. But we should use original row descriptor. This PR fix it.
In VCollectIterator&VGenericIterator, use insert_range_from to copy rows
in a block which is continuous to save cpu cost.
If rows in rowset and segment are non overlapping, this whill improve 30%
throughput of compaction.If rows are completely overlapping such as load two
same files, the throughput goes nearly same as before.
Co-authored-by: yixiutt <yixiu@selectdb.com>