[opt](Nereids) Optimize Join Penalty Calculation Based on Build Side Data Volume (#36107)

pick from master #35773

This PR introduces an optimization that adjusts the penalty applied
during join operations based on the volume of data on the build side.
Specifically, when the number of rows and width of the tables being
joined are equal, the materialization costs are now considered more
accurately. The update ensures that joins with a larger dataset on the
build side incur a higher penalty, improving overall query performance
and resource allocation.
This commit is contained in:
谢健
2024-06-19 14:49:09 +08:00
committed by GitHub
parent 1e54a5a66e
commit 349b943e12
13 changed files with 83 additions and 34 deletions

View File

@ -298,6 +298,13 @@ class CostModelV1 extends PlanVisitor<Cost, PlanContext> {
if (rightConnectivity < leftConnectivity) {
leftRowCount += 1;
}
if (probeStats.getWidthInJoinCluster() == buildStats.getWidthInJoinCluster()
&& probeStats.computeTupleSize() < buildStats.computeTupleSize()) {
// When the number of rows and the width on both sides of the join are the same,
// we need to consider the cost of materializing the output.
// When there is more data on the build side, a greater penalty will be given.
leftRowCount += 1e-3;
}
}
/*

View File

@ -122,7 +122,7 @@ public class Statistics {
&& expressionToColumnStats.get(s).isUnKnown);
}
private double computeTupleSize() {
public double computeTupleSize() {
if (tupleSize <= 0) {
double tempSize = 0.0;
for (ColumnStatistic s : expressionToColumnStats.values()) {