[opt](Nereids) Optimize Join Penalty Calculation Based on Build Side Data Volume (#36107)
pick from master #35773 This PR introduces an optimization that adjusts the penalty applied during join operations based on the volume of data on the build side. Specifically, when the number of rows and width of the tables being joined are equal, the materialization costs are now considered more accurately. The update ensures that joins with a larger dataset on the build side incur a higher penalty, improving overall query performance and resource allocation.
This commit is contained in:
@ -298,6 +298,13 @@ class CostModelV1 extends PlanVisitor<Cost, PlanContext> {
|
||||
if (rightConnectivity < leftConnectivity) {
|
||||
leftRowCount += 1;
|
||||
}
|
||||
if (probeStats.getWidthInJoinCluster() == buildStats.getWidthInJoinCluster()
|
||||
&& probeStats.computeTupleSize() < buildStats.computeTupleSize()) {
|
||||
// When the number of rows and the width on both sides of the join are the same,
|
||||
// we need to consider the cost of materializing the output.
|
||||
// When there is more data on the build side, a greater penalty will be given.
|
||||
leftRowCount += 1e-3;
|
||||
}
|
||||
}
|
||||
|
||||
/*
|
||||
|
||||
@ -122,7 +122,7 @@ public class Statistics {
|
||||
&& expressionToColumnStats.get(s).isUnKnown);
|
||||
}
|
||||
|
||||
private double computeTupleSize() {
|
||||
public double computeTupleSize() {
|
||||
if (tupleSize <= 0) {
|
||||
double tempSize = 0.0;
|
||||
for (ColumnStatistic s : expressionToColumnStats.values()) {
|
||||
|
||||
Reference in New Issue
Block a user