In earlier PR #11976 , we changed DistributionSpecHash#equalsSatisfy, and forgot to check whether the length of both side are same. When required's shuffle slot size longer than current one, exception will be thrown.
## Fix five bugs:
1. Parquet dictionary data may be compressed, but `ColumnChunkReader` try to parse dictionary data before creating compression codec, causing unexpected data errors.
2. `FE` doesn't resolve array type
3. `ParquetFileHdfsScanner` doesn't fill partition values when the table is partitioned
4. `ParquetFileHdfsScanner` set `_scanner_eof = true` when a scan range is empty, causing the end of the scanner, and resulting in data loss
5. typographical error in `PageReader`
When the load channel is canceled, the memtracker does not subtract the memory released by the load channel. This will cause the memory usage counted by the memtracker of the load channel mgr to be larger than the actual memory usage.
Each NodeChannel has its own queue, with size up to 1/20 exec_mem_limit.
User will crash into OOM if set exec_mem_limit high. This commit uses
fixed number to control the total max memory used by NodeChannels.
Signed-off-by: freemandealer <freeman.zhang1992@gmail.com>
Simplify the code of getting input/output slots from `Expression` or `Plan`.
**new interfaces add**
`Expression`:
`getInputSlots`: Get all the input slots of the expression.
`Plan`:
- `getOutputSet`: Get the output slot set of the plan.
- `getInputSlots`: Get the input slot set of the plan.
**changed interface**
`TreeNode`:
- `collect`: return `set` as result instead of `list`.
In the earlier PR #11812 , we split join condition into two parts: hash join conjuncts and other condition. But we forgot to translate other condition into other conjuncts in HashJoinNode of legacy planner. So we get wrong result if query has other condition on join node. Such as:
SELECT * FROM lineorder INNER JOIN part ON lo_partkey = p_partkey WHERE lo_orderkey > p_size;
In the current spark load implementation, the types of source data, that BE reads from the Broker, are all set to varchar.
However, the two types of varchar and bitmap are not compatible anymore after version 1.1.0, which will cause spark load failure.
An example of spark load error message:
detailMessage = type not match, originType=VARCHAR(*), targeType=BITMAP
Describe your changes.
Set the src type of the bitmap columns from varchar to bitmapwhen fe pushtasks.
Implement the having clause for Nereids Planner.
NOTE:
This PR aims at making Nereids Planner generate the correct logical plan and physical plan only. The runtime correctness is not the goal in this PR due to GROUP BY is not ready in Nereids Planner.
This PR
1. add support below join algorithm already supported by legacy to Nereids
- colocate join
- bucket shuffle join
- shuffle join
- broadcast join
2. update all cost enforce derive utils
- ChildOutputPropertyDeriver
- EnforceMissingPropertiesHelper
- RequestPropertyDeriver
3. add a local quick sort plan used in enforce
4. set PhysicalProperties to PhysicalPlan when choose best plan from memo
5. rename Job#pushTask to Job#pushJob
After applying NormalizeAggregate rule, owner groups of all aggregate children are removed.
The root cause is the new aggregate node is regarded as the old aggregate node, because LogicalAggregate.equals() does not take some attributes ("normalized", "disassembled") into account.
In earlier PR #11842, we add the ability of projection on each ExecNode.
But, we cannot get the projection expr list in explain. This is inconvenience to debug.
This PR add them into explain string if they exist.
Added regression test of sub-queries. Currently only associated sub-queries are added. Non-associated sub-queries will be added after project revision.
In old Doris version string offsets are 32bit, but it is not enough for Array type.
If we change string offsets from 32bit to 64bit, there will be problem if we upgrade BE one by one. Because at the same time 32bit Offsets and 64 bit Offsets String will exist at the same time.
As a result, we separate the Codes for Array Offsets.
Co-authored-by: cambyzju <zhuxiaoli01@baidu.com>
Add a new property called 'reserve_replica', which means you can
get a table with same partitions with the same replication num
as before the backup.
Co-authored-by: Stalary <stalary@163.com>
Co-authored-by: camby <104178625@qq.com>