Recalculate where-needed data accurately after a join removal.

Up to now, remove_rel_from_query() has done a pretty shoddy job of updating our where-needed bitmaps (per-Var attr_needed and per-PlaceHolderVar ph_needed relid sets). It removed direct mentions of the to-be-removed baserel and outer join, which is the minimum amount of effort needed to keep the data structures self-consistent. But it didn't account for the fact that the removed join ON clause probably mentioned Vars of other relations, and those Vars might now not be needed as high up in the join tree as before. It's easy to show cases where this results in failing to remove a lower outer join that could also have been removed. To fix, recalculate the where-needed bitmaps from scratch after each successful join removal. This sounds expensive, but it seems to add only negligible planner runtime. (We cheat a little bit by preserving "relation 0" entries in the bitmaps, allowing us to skip re-scanning the targetlist and HAVING qual.) The submitted test case drew attention because we had successfully optimized away the lower join prior to v16. I suspect that that's somewhat accidental and there are related cases that were never optimized before and now can be. I've not tried to come up with one, though. Perhaps we should back-patch this into v16 and v17 to repair the performance regression. However, since it took a year for anyone to notice the problem, it can't be affecting too many people. Let's let the patch bake awhile in HEAD, and see if we get more complaints. Per bug #18627 from Mikaël Gourlaouen. No back-patch for now. Discussion: https://postgr.es/m/18627-44f950eb6a8416c2@postgresql.org
2026-02-23 14:57:03 +08:00 · 2024-09-27 16:04:04 -04:00
parent 7f7474a8e4
commit a3179ab692
9 changed files with 320 additions and 37 deletions
--- a/src/test/regress/sql/join.sql
+++ b/src/test/regress/sql/join.sql
@ -1952,17 +1952,22 @@ CREATE TEMP TABLE a (id int PRIMARY KEY, b_id int);
 CREATE TEMP TABLE b (id int PRIMARY KEY, c_id int);
 CREATE TEMP TABLE c (id int PRIMARY KEY);
 CREATE TEMP TABLE d (a int, b int);
+CREATE TEMP TABLE e (id1 int, id2 int, PRIMARY KEY(id1, id2));
 INSERT INTO a VALUES (0, 0), (1, NULL);
 INSERT INTO b VALUES (0, 0), (1, NULL);
 INSERT INTO c VALUES (0), (1);
 INSERT INTO d VALUES (1,3), (2,2), (3,1);
+INSERT INTO e VALUES (0,0), (2,2), (3,1);

-- all three cases should be optimizable into a simple seqscan
+-- all these cases should be optimizable into a simple seqscan
 explain (costs off) SELECT a.* FROM a LEFT JOIN b ON a.b_id = b.id;
 explain (costs off) SELECT b.* FROM b LEFT JOIN c ON b.c_id = c.id;
 explain (costs off)
  SELECT a.* FROM a LEFT JOIN (b left join c on b.c_id = c.id)
  ON (a.b_id = b.id);
+explain (costs off)
+  SELECT a.* FROM a LEFT JOIN b ON a.id = b.id
+  LEFT JOIN e ON e.id1 = a.b_id AND b.c_id = e.id2;

 -- check optimization of outer join within another special join
 explain (costs off)