postgresql

mirror of https://git.postgresql.org/git/postgresql.git synced 2026-02-23 14:57:03 +08:00

Author	SHA1	Message	Date
Alexander Korotkov	1aa7cf9eb8	Disallow removing placeholders during Self-Join Elimination. fc069a3a6319 implements Self-Join Elimination (SJE), which can remove base relations when appropriate. However, regressions tests for SJE only cover the case when placeholder variables (PHVs) are evaluated and needed only in a single base rel. If this baserel is removed due to SJE, its clauses, including PHVs, will be transferred to the keeping relation. Removing these PHVs may trigger an error on plan creation -- thanks to the b3ff6c742f6c for detecting that. This commit skips removal of PHVs during SJE. This might also happen that we skip the removal of some PHVs that could be removed. However, the overhead of extra PHVs is small compared to the complexity of analysis needed to remove them. Reported-by: Alexander Lakhin <exclusion@gmail.com> Author: Alena Rybakina <a.rybakina@postgrespro.ru> Author: Andrei Lepikhov <lepihov@gmail.com> Reviewed-by: Alexander Korotkov <aekorotkov@gmail.com> Reviewed-by: Richard Guo <guofenglinux@gmail.com>	2025-04-28 01:40:42 +03:00
Alexander Korotkov	fc069a3a63	Implement Self-Join Elimination The Self-Join Elimination (SJE) feature removes an inner join of a plain table to itself in the query tree if it is proven that the join can be replaced with a scan without impacting the query result. Self-join and inner relation get replaced with the outer in query, equivalence classes, and planner info structures. Also, the inner restrictlist moves to the outer one with the removal of duplicated clauses. Thus, this optimization reduces the length of the range table list (this especially makes sense for partitioned relations), reduces the number of restriction clauses and, in turn, selectivity estimations, and potentially improves total planner prediction for the query. This feature is dedicated to avoiding redundancy, which can appear after pull-up transformations or the creation of an EquivalenceClass-derived clause like the below. SELECT * FROM t1 WHERE x IN (SELECT t3.x FROM t1 t3); SELECT * FROM t1 WHERE EXISTS (SELECT t3.x FROM t1 t3 WHERE t3.x = t1.x); SELECT * FROM t1,t2, t1 t3 WHERE t1.x = t2.x AND t2.x = t3.x; In the future, we could also reduce redundancy caused by subquery pull-up after unnecessary outer join removal in cases like the one below. SELECT * FROM t1 WHERE x IN (SELECT t3.x FROM t1 t3 LEFT JOIN t2 ON t2.x = t1.x); Also, it can drastically help to join partitioned tables, removing entries even before their expansion. The SJE proof is based on innerrel_is_unique() machinery. We can remove a self-join when for each outer row: 1. At most, one inner row matches the join clause; 2. Each matched inner row must be (physically) the same as the outer one; 3. Inner and outer rows have the same row mark. In this patch, we use the next approach to identify a self-join: 1. Collect all merge-joinable join quals which look like a.x = b.x; 2. Add to the list above the baseretrictinfo of the inner table; 3. Check innerrel_is_unique() for the qual list. If it returns false, skip this pair of joining tables; 4. Check uniqueness, proved by the baserestrictinfo clauses. To prove the possibility of self-join elimination, the inner and outer clauses must match exactly. The relation replacement procedure is not trivial and is partly combined with the one used to remove useless left joins. Tests covering this feature were added to join.sql. Some of the existing regression tests changed due to self-join removal logic. Discussion: https://postgr.es/m/flat/64486b0b-0404-e39e-322d-0801154901f3%40postgrespro.ru Author: Andrey Lepikhov <a.lepikhov@postgrespro.ru> Author: Alexander Kuzmenkov <a.kuzmenkov@postgrespro.ru> Co-authored-by: Alexander Korotkov <aekorotkov@gmail.com> Co-authored-by: Alena Rybakina <lena.ribackina@yandex.ru> Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us> Reviewed-by: Robert Haas <robertmhaas@gmail.com> Reviewed-by: Andres Freund <andres@anarazel.de> Reviewed-by: Simon Riggs <simon@2ndquadrant.com> Reviewed-by: Jonathan S. Katz <jkatz@postgresql.org> Reviewed-by: David Rowley <david.rowley@2ndquadrant.com> Reviewed-by: Thomas Munro <thomas.munro@enterprisedb.com> Reviewed-by: Konstantin Knizhnik <k.knizhnik@postgrespro.ru> Reviewed-by: Heikki Linnakangas <hlinnaka@iki.fi> Reviewed-by: Hywel Carver <hywel@skillerwhale.com> Reviewed-by: Laurenz Albe <laurenz.albe@cybertec.at> Reviewed-by: Ronan Dunklau <ronan.dunklau@aiven.io> Reviewed-by: vignesh C <vignesh21@gmail.com> Reviewed-by: Zhihong Yu <zyu@yugabyte.com> Reviewed-by: Greg Stark <stark@mit.edu> Reviewed-by: Jaime Casanova <jcasanov@systemguards.com.ec> Reviewed-by: Michał Kłeczek <michal@kleczek.org> Reviewed-by: Alena Rybakina <lena.ribackina@yandex.ru> Reviewed-by: Alexander Korotkov <aekorotkov@gmail.com>	2025-02-17 12:44:12 +02:00
Alexander Korotkov	627d63419e	Allow usage of match_orclause_to_indexcol() for joins This commit allows transformation of OR-clauses into SAOP's for index scans within nested loop joins. That required the following changes. 1. Make match_orclause_to_indexcol() and group_similar_or_args() understand const-ness in the same way as match_opclause_to_indexcol(). This generally makes our approach more uniform. 2. Make match_join_clauses_to_index() pass OR-clauses to match_clause_to_index(). 3. Also switch match_join_clauses_to_index() to use list_append_unique_ptr() for adding clauses to *joinorclauses. That avoids possible duplicates when processing the same clauses with different indexes. Previously such duplicates were elimited in match_clause_to_index(), but now group_similar_or_args() each time generates distinct copies of grouped OR clauses. Discussion: https://postgr.es/m/CAPpHfdv%2BjtNwofg-p5z86jLYZUTt6tR17Wy00ta0dL%3DwHQN3ZA%40mail.gmail.com Reviewed-by: Andrei Lepikhov <lepihov@gmail.com> Reviewed-by: Alena Rybakina <a.rybakina@postgrespro.ru> Reviewed-by: Pavel Borisov <pashkin.elfe@gmail.com>	2025-02-04 23:21:49 +02:00
Richard Guo	e28033fe1a	Ignore nullingrels when looking up statistics When looking up statistical data about an expression, we do not need to concern ourselves with the outer joins that could null the Vars/PHVs contained in the expression. Accounting for nullingrels in the expression could cause estimate_num_groups to count the same Var multiple times if it's marked with different nullingrels. This is incorrect, and could lead to "ERROR: corrupt MVNDistinct entry" when searching for multivariate n-distinct. Furthermore, the nullingrels could prevent us from matching an expression to expressional index columns or to the expressions in extended statistics, leading to inaccurate estimates. To fix, strip out all the nullingrels from the expression before we look up statistical data about it. There is one ensuing plan change in the regression tests, but it looks reasonable and does not compromise its original purpose. This patch could result in plan changes, but it fixes an actual bug, so back-patch to v16 where the outer-join-aware-Var infrastructure was introduced. Author: Richard Guo Discussion: https://postgr.es/m/CAMbWs4-2Z4k+nFTiZe0Qbu5n8juUWenDAtMzi98bAZQtwHx0-w@mail.gmail.com	2025-01-02 18:06:00 +09:00
Melanie Plageman	754c610e13	Fix bitmap table scan crash on iterator release 1a0da347a7ac98db replaced Bitmap Table Scan's individual private and shared iterators with a unified iterator. It neglected, however, to check if the iterator had already been cleaned up before doing so on rescan. Add this check both on rescan and end scan to be safe. Reported-by: Richard Guo Author: Richard Guo Discussion: https://postgr.es/m/CAMbWs48nrhcLY1kcd-u9oD%2B6yiS631F_8Fx8ZGsO-BYDwH%2Bbyw%40mail.gmail.com	2024-12-19 11:55:03 -05:00
Richard Guo	d8f335156c	Improve the test case from 5668a857d In commit 5668a857d, we fixed an issue with incorrect results in right semi joins and introduced a test case to verify the fix. The test case involves SubPlans and InitPlans, which may not be immediately apparent in relation to the issue we addressed. This patch simplifies the test case with a more straightforward query. Per discussion with Melanie Plageman. Author: Richard Guo Discussion: https://postgr.es/m/CAAKRu_a-Cip2XCXp13fmxq+T9BhLAVApHTyjr94awL2mbXHC-Q@mail.gmail.com	2024-12-12 11:21:51 +09:00
Richard Guo	5668a857de	Fix right-semi-joins in HashJoin rescans When resetting a HashJoin node for rescans, if it is a single-batch join and there are no parameter changes for the inner subnode, we can just reuse the existing hash table without rebuilding it. However, for join types that depend on the inner-tuple match flags in the hash table, we need to reset these match flags to avoid incorrect results. This applies to right, right-anti, right-semi, and full joins. When I introduced "Right Semi Join" plan shapes in aa86129e1, I failed to reset the match flags in the hash table for right-semi joins in rescans. This oversight has been shown to produce incorrect results. This patch fixes it. Author: Richard Guo Discussion: https://postgr.es/m/CAMbWs4-nQF9io2WL2SkD0eXvfPdyBc9Q=hRwfQHCGV2usa0jyA@mail.gmail.com	2024-12-09 20:36:23 +09:00
Alexander Korotkov	d4378c0005	Transform OR-clauses to SAOP's during index matching This commit makes match_clause_to_indexcol() match "(indexkey op C1) OR (indexkey op C2) ... (indexkey op CN)" expression to the index while transforming it into "indexkey op ANY(ARRAY[C1, C2, ...])" (ScalarArrayOpExpr node). This transformation allows handling long OR-clauses with single IndexScan avoiding diving them into a slower BitmapOr. We currently restrict Ci to be either Const or Param to apply this transformation only when it's clearly beneficial. However, in the future, we might switch to a liberal understanding of constants, as it is in other cases. Discussion: https://postgr.es/m/567ED6CA.2040504%40sigaev.ru Author: Alena Rybakina, Andrey Lepikhov, Alexander Korotkov Reviewed-by: Peter Geoghegan, Ranier Vilela, Alexander Korotkov, Robert Haas Reviewed-by: Jian He, Tom Lane, Nikolay Shaplov	2024-11-24 01:40:20 +02:00
Richard Guo	ffe12d1d22	Remove the RTE_GROUP RTE if we drop the groupClause For an EXISTS subquery, the only thing that matters is whether it returns zero or more than zero rows. Therefore, we remove certain SQL features that won't affect that, among them the GROUP BY clauses. After we drop the groupClause, we'd better remove the RTE_GROUP RTE and clear the hasGroupRTE flag, as they depend on the groupClause. Failing to do so could result in a bogus RTE_GROUP entry in the parent query, leading to an assertion failure on the hasGroupRTE flag. Reported-by: David Rowley Author: Richard Guo Discussion: https://postgr.es/m/CAApHDvp2_yht8uPLyWO-kVGWZhYvx5zjGfSrg4fBQ9fsC13V0g@mail.gmail.com	2024-10-25 09:52:34 +09:00
David Rowley	9ca67658d1	Don't store intermediate hash values in ExprState->resvalue adf97c156 made it so ExprStates could support hashing and changed Hash Join to use that instead of manually extracting Datums from tuples and hashing them one column at a time. When hashing multiple columns or expressions, the code added in that commit stored the intermediate hash value in the ExprState's resvalue field. That was a mistake as steps may be injected into the ExprState between each hashing step that look at or overwrite the stored intermediate hash value. EEOP_PARAM_SET is an example of such a step. Here we fix this by adding a new dedicated field for storing intermediate hash values and adjust the code so that all apart from the final hashing step store their result in the intermediate field. In passing, rename a variable so that it's more aligned to the surrounding code and also so a few lines stay within the 80 char margin. Reported-by: Andres Freund Reviewed-by: Alena Rybakina <a.rybakina@postgrespro.ru> Discussion: https://postgr.es/m/CAApHDvqo9eenEFXND5zZ9JxO_k4eTA4jKMGxSyjdTrsmYvnmZw@mail.gmail.com	2024-10-17 14:25:08 +13:00
Tom Lane	a3179ab692	Recalculate where-needed data accurately after a join removal. Up to now, remove_rel_from_query() has done a pretty shoddy job of updating our where-needed bitmaps (per-Var attr_needed and per-PlaceHolderVar ph_needed relid sets). It removed direct mentions of the to-be-removed baserel and outer join, which is the minimum amount of effort needed to keep the data structures self-consistent. But it didn't account for the fact that the removed join ON clause probably mentioned Vars of other relations, and those Vars might now not be needed as high up in the join tree as before. It's easy to show cases where this results in failing to remove a lower outer join that could also have been removed. To fix, recalculate the where-needed bitmaps from scratch after each successful join removal. This sounds expensive, but it seems to add only negligible planner runtime. (We cheat a little bit by preserving "relation 0" entries in the bitmaps, allowing us to skip re-scanning the targetlist and HAVING qual.) The submitted test case drew attention because we had successfully optimized away the lower join prior to v16. I suspect that that's somewhat accidental and there are related cases that were never optimized before and now can be. I've not tried to come up with one, though. Perhaps we should back-patch this into v16 and v17 to repair the performance regression. However, since it took a year for anyone to notice the problem, it can't be affecting too many people. Let's let the patch bake awhile in HEAD, and see if we get more complaints. Per bug #18627 from Mikaël Gourlaouen. No back-patch for now. Discussion: https://postgr.es/m/18627-44f950eb6a8416c2@postgresql.org	2024-09-27 16:04:04 -04:00
Richard Guo	0ffc0acaf3	Fix right-anti-joins when the inner relation is proven unique For an inner_unique join, we always assume that the executor will stop scanning for matches after the first match. Therefore, for a mergejoin that is inner_unique and whose mergeclauses are sufficient to identify a match, we set the skip_mark_restore flag to true, indicating that the executor need not do mark/restore calls. However, merge-right-anti-join did not get this memo and continues scanning the inner side for matches after the first match. If there are duplicates in the outer scan, we may incorrectly skip matching some inner tuples, which can lead to wrong results. Here we fix this issue by ensuring that merge-right-anti-join also advances to next outer tuple after the first match in inner_unique cases. This also saves cycles by avoiding unnecessary scanning of inner tuples after the first match. Although hash-right-anti-join does not suffer from this wrong results issue, we apply the same change to it as well, to help save cycles for the same reason. Per bug #18522 from Antti Lampinen, and bug #18526 from Feliphe Pozzer. Back-patch to v16 where right-anti-join was introduced. Author: Richard Guo Discussion: https://postgr.es/m/18522-c7a8956126afdfd0@postgresql.org	2024-07-08 10:11:46 +09:00
Richard Guo	aa86129e19	Support "Right Semi Join" plan shapes Hash joins can support semijoin with the LHS input on the right, using the existing logic for inner join, combined with the assurance that only the first match for each inner tuple is considered, which can be achieved by leveraging the HEAP_TUPLE_HAS_MATCH flag. This can be very useful in some cases since we may now have the option to hash the smaller table instead of the larger. Merge join could likely support "Right Semi Join" too. However, the benefit of swapping inputs tends to be small here, so we do not address that in this patch. Note that this patch also modifies a test query in join.sql to ensure it continues testing as intended. With this patch the original query would result in a right-semi-join rather than semi-join, compromising its original purpose of testing the fix for neqjoinsel's behavior for semi-joins. Author: Richard Guo Reviewed-by: wenhui qiu, Alena Rybakina, Japin Li Discussion: https://postgr.es/m/CAMbWs4_X1mN=ic+SxcyymUqFx9bB8pqSLTGJ-F=MHy4PW3eRXw@mail.gmail.com	2024-07-05 09:26:48 +09:00
Melanie Plageman	a3e6c6f929	BitmapHeapScan: Remove incorrect assert and reset field 04e72ed617be pushed the skip fetch optimization (allowing bitmap heap scans to operate like index-only scans if none of the underlying data is needed) into heap AM implementations of bitmap table scan callbacks. 04e72ed617be added an assert that all tuples in blocks eligible for the optimization had been NULL-filled and emitted by the end of the scan. This assert is incorrect when not all tuples need be scanned to execute the query; for example: a join in which not all inner tuples need to be scanned before skipping to the next outer tuple. Remove the assert and reset the field on which it previously asserted to avoid incorrectly emitting NULL-filled tuples from a previous scan on rescan. Author: Melanie Plageman Reviewed-by: Tomas Vondra, Michael Paquier, Alvaro Herrera Reported-by: Melanie Plageman Reproduced-by: Tomas Vondra, Richard Guo Discussion: https://postgr.es/m/CAMbWs48orzZVXa7-vP9Nt7vQWLTE04Qy4PePaLQYsVNQgo6qRg%40mail.gmail.com	2024-05-16 11:00:43 -04:00
Alexander Korotkov	d1d286d83c	Revert: Remove useless self-joins This commit reverts d3d55ce5713 and subsequent fixes 2b26a694554, 93c85db3b5b, b44a1708abe, b7f315c9d7d, 8a8ed916f73, b5fb6736ed3, 0a93f803f45, e0477837ce4, a7928a57b9f, 5ef34a8fc38, 30b4955a466, 8c441c08279, 028b15405b4, fe093994db4, 489072ab7a9, and 466979ef031. We are quite late in the release cycle and new bugs continue to appear. Even though we have fixes for all known bugs, there is a risk of throwing many bugs to end users. The plan for self-join elimination would be to do more review and testing, then re-commit in the early v18 cycle. Reported-by: Tom Lane Discussion: https://postgr.es/m/2422119.1714691974%40sss.pgh.pa.us	2024-05-06 14:36:36 +03:00
Alexander Korotkov	ff9f72c68f	revert: Transform OR clauses to ANY expression This commit reverts 72bd38cc99 due to implementation and design issues. Reported-by: Tom Lane Discussion: https://postgr.es/m/3604469.1712628736%40sss.pgh.pa.us	2024-04-10 02:28:09 +03:00
Alexander Korotkov	72bd38cc99	Transform OR clauses to ANY expression Replace (expr op C1) OR (expr op C2) ... with expr op ANY(ARRAY[C1, C2, ...]) on the preliminary stage of optimization when we are still working with the expression tree. Here Cn is a n-th constant expression, 'expr' is non-constant expression, 'op' is an operator which returns boolean result and has a commuter (for the case of reverse order of constant and non-constant parts of the expression, like 'Cn op expr'). Sometimes it can lead to not optimal plan. This is why there is a or_to_any_transform_limit GUC. It specifies a threshold value of length of arguments in an OR expression that triggers the OR-to-ANY transformation. Generally, more groupable OR arguments mean that transformation will be more likely to win than to lose. Discussion: https://postgr.es/m/567ED6CA.2040504%40sigaev.ru Author: Alena Rybakina <lena.ribackina@yandex.ru> Author: Andrey Lepikhov <a.lepikhov@postgrespro.ru> Reviewed-by: Peter Geoghegan <pg@bowt.ie> Reviewed-by: Ranier Vilela <ranier.vf@gmail.com> Reviewed-by: Alexander Korotkov <aekorotkov@gmail.com> Reviewed-by: Robert Haas <robertmhaas@gmail.com> Reviewed-by: Jian He <jian.universality@gmail.com>	2024-04-08 01:27:52 +03:00
Alexander Korotkov	466979ef03	Replace lateral references to removed rels in subqueries This commit introduces a new field 'sublevels_up' in ReplaceVarnoContext, and enhances replace_varno_walker() to: 1) recurse into subselects with sublevels_up increased, and 2) perform the replacement only when varlevelsup is equal to sublevels_up. This commit also fixes some outdated comments. And besides adding relevant test cases, it makes some unification over existing SJE test cases. Discussion: https://postgr.es/m/CAMbWs4-%3DPO6Mm9gNnySbx0VHyXjgnnYYwbN9dth%3DTLQweZ-M%2Bg%40mail.gmail.com Author: Richard Guo Reviewed-by: Andrei Lepikhov, Alexander Korotkov	2024-02-24 00:35:17 +02:00
Alexander Korotkov	489072ab7a	Replace relids in lateral subquery parse tree during SJE Reported-by: Alexander Lakhin Discussion: https://postgr.es/m/56ee4520-e9d1-d519-54fe-c8bff880ce9b%40gmail.com Author: Alexander Korotkov, Andrei Lepikhov	2024-02-20 14:10:10 +02:00
Alexander Korotkov	9f13376396	Pull up ANY-SUBLINK with the necessary lateral support. For ANY-SUBLINK, we adopted a two-stage pull-up approach to handle different types of scenarios. In the first stage, the sublink is pulled up as a subquery. Because of this, when writing this code, we did not have the ability to perform lateral joins, and therefore, we were unable to pull up Var with varlevelsup=1. Now that we have the ability to use lateral joins, we can eliminate this limitation. Author: Andy Fan <zhihui.fan1213@gmail.com> Author: Tom Lane <tgl@sss.pgh.pa.us> Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us> Reviewed-by: Richard Guo <guofenglinux@gmail.com> Reviewed-by: Alena Rybakina <lena.ribackina@yandex.ru> Reviewed-by: Andrey Lepikhov <a.lepikhov@postgrespro.ru>	2024-02-15 12:06:12 +02:00
Alexander Korotkov	fe093994db	Fix 'negative bitmapset member' error When removing a useless join, we'd remove PHVs that are not used at join partner rels or above the join. A PHV that references the join's relid in ph_eval_at is logically "above" the join and thus should not be removed. We have the following check for that: !bms_is_member(ojrelid, phinfo->ph_eval_at) However, in the case of SJE removing a useless inner join, 'ojrelid' is set to -1, which would trigger the "negative bitmapset member not allowed" error in bms_is_member(). Fix it by skipping examining ojrelid for inner joins in this check. Reported-by: Zuming Jiang Bug: #18260 Discussion: https://postgr.es/m/18260-1b6a0c4ae311b837%40postgresql.org Author: Richard Guo Reviewed-by: Andrei Lepikhov	2024-01-15 17:45:16 +02:00
Tom Lane	29f114b6ff	Allow subquery pullup to wrap a PlaceHolderVar in another one. The code for wrapping subquery output expressions in PlaceHolderVars believed that if the expression already was a PlaceHolderVar, it was never necessary to wrap that in another one. That's wrong if the expression is underneath an outer join and involves a lateral reference to outside that scope: failing to add an additional PHV risks evaluating the expression at the wrong place and hence not forcing it to null when the outer join should do so. This is an oversight in commit 9e7e29c75, which added logic to forcibly wrap lateral-reference Vars in PlaceHolderVars, but didn't see that the adjacent case for PlaceHolderVars needed the same treatment. The test case we have for this doesn't fail before 4be058fe9, but now that I see the problem I wonder if it is possible to demonstrate related errors before that. That's moot though, since all such branches are out of support. Per bug #18284 from Holger Reise. Back-patch to all supported branches. Discussion: https://postgr.es/m/18284-47505a20c23647f8@postgresql.org	2024-01-11 15:28:22 -05:00
Alexander Korotkov	8c441c0827	Forbid SJE with result relation The target relation for INSERT/UPDATE/DELETE/MERGE has a different behavior than other relations in EvalPlanQual() and RETURNING clause. This is why we forbid target relation to be either source or target relation in SJE. It's not clear if we could ever support this. Reported-by: Alexander Lakhin Discussion: https://postgr.es/m/b9e8f460-f9a6-0e9b-e8ba-60d59f0bc22c%40gmail.com	2024-01-09 10:01:22 +02:00
Alexander Korotkov	30b4955a46	Fix misuse of RelOptInfo.unique_for_rels cache by SJE When SJE uses RelOptInfo.unique_for_rels cache, it passes filtered quals to innerrel_is_unique_ext(). That might lead to an invalid match to cache entries made by previous non self-join checking calls. Add UniqueRelInfo.self_join flag to prevent such cases. Also, fix that SJE should require a strict match of outerrelids to make sure UniqueRelInfo.extra_clauses are valid. Reported-by: Alexander Lakhin Discussion: https://postgr.es/m/4788f781-31bd-9796-d7d6-588a751c8787%40gmail.com	2024-01-09 00:09:06 +02:00
Alexander Korotkov	bea18b1c94	Strengthen tests for 5ef34a8fc3 The test query in 5ef34a8fc3 is running over the empty emp1 table giving the same (empty) return both with and without the fix. Add one row to that table to make not just the test query plan, but also the test query result different. Reported-by: Richard Guo Bug: #18261 Discussion: https://postgr.es/m/CAMbWs49igjcszLgicb4D1N21_5iNDoxheJ7KFmAcs_z%3DLx6jhg%40mail.gmail.com	2024-01-08 15:00:42 +02:00
Alexander Korotkov	5ef34a8fc3	Fix the issue that SJE mistakenly omits qual clauses When the SJE code handles the transfer of qual clauses from the removed relation to the remaining one, it replaces the Vars of the removed relation with the Vars of the remaining relation for each clause, and then reintegrates these clauses into the appropriate restriction or join clause lists, while attempting to avoid duplicates. However, the code compares RestrictInfo->clause to determine if two clauses are duplicates. This is just flat wrong. Two RestrictInfos with the same clause can have different required_relids, incompatible_relids, is_pushed_down, and so on. This can cause qual clauses to be mistakenly omitted, leading to wrong results. This patch fixes it by comparing the entire RestrictInfos not just their clauses ignoring 'rinfo_serial' field (otherwise almost all RestrictInfos will be unique). Making 'rinfo_serial' equal_ignore would break other code. This is why this commit implements our own comparison function for checking the equality of RestrictInfos. Reported-by: Zuming Jiang Bug: #18261 Discussion: https://postgr.es/m/18261-2a75d748c928609b%40postgresql.org Author: Richard Guo	2024-01-06 14:10:00 +02:00
Robert Haas	0d9937d118	Fix typos in comments and in one isolation test. Dagfinn Ilmari Mannsåker, reviewed by Shubham Khanna. Some subtractions by me. Discussion: http://postgr.es/m/87le9fmi01.fsf@wibble.ilmari.org	2024-01-02 12:05:41 -05:00
Alexander Korotkov	a7928a57b9	Replace the relid in some missing fields during SJE Reported-by: Alexander Lakhin Discussion: https://postgr.es/m/a89f480f-8143-0965-f22d-0a892777f501%40gmail.com Author: Andrei Lepikhov	2024-01-02 09:48:50 +02:00
Alexander Korotkov	e0477837ce	Make replace_relid() leave argument unmodified There are a lot of situations when we share the same pointer to a Bitmapset structure across different places. In order to evade undesirable side effects replace_relid() function should always return a copy. Reported-by: Richard Guo Discussion: https://postgr.es/m/CAMbWs4_wJthNtYBL%2BSsebpgF-5L2r5zFFk6xYbS0A78GKOTFHw%40mail.gmail.com Reviewed-by: Richard Guo, Andres Freund, Ashutosh Bapat, Andrei Lepikhov	2023-12-27 03:57:57 +02:00
Alexander Korotkov	b5fb6736ed	Don't constrain self-join removal due to PHVs Self-join removal appears to be safe to apply with placeholder variables as long as we handle PlaceHolderVar in replace_varno_walker() and replace relid in phinfo->ph_lateral. Discussion: https://postgr.es/m/18187-831da249cbd2ff8e%40postgresql.org Author: Richard Guo Reviewed-by: Andrei Lepikhov	2023-12-25 01:33:26 +02:00
Alexander Korotkov	b7f315c9d7	Fix how SJE checks against PHVs It seems that a PHV evaluated/needed at or below the self join should not have a problem if we remove the self join. But this requires further investigation. For now, we just do not remove self joins if the rel to be removed is laterally referenced by PHVs. Discussion: https://postgr.es/m/CAMbWs4-ns73VF9gi37q61G3dS6Xuos+HtryMaBh37WQn=BsaJw@mail.gmail.com Author: Richard Guo	2023-11-10 22:46:46 +02:00
Tom Lane	36f5594c0f	Fix computation of varnullingrels when const-folding field selection. We can simplify FieldSelect on a whole-row Var into a plain Var for the selected field. However, we should copy the whole-row Var's varnullingrels when we do so, because the new Var is clearly nullable by exactly the same rels as the original. Failure to do this led to errors like "wrong varnullingrels (b) (expected (b 3)) for Var 2/2". Richard Guo, per bug #18184 from Marian Krucina. Back-patch to v16 where varnullingrels was introduced. Discussion: https://postgr.es/m/18184-5868dd258782058e@postgresql.org	2023-11-09 15:46:16 -05:00
Alexander Korotkov	b44a1708ab	Fix the way SJE removes references from PHVs Add missing replacement of relids in phv->phexpr. Also, remove extra replace_relid() over phv->phrels. Reported-by: Zuming Jiang Bug: #18187 Discussion: https://postgr.es/m/flat/18187-831da249cbd2ff8e%40postgresql.org Author: Richard Guo Reviewed-by: Andrei Lepikhov	2023-11-09 14:25:13 +02:00
Alexander Korotkov	ec63622c03	Fix usage of the parse tree for estimate_num_groups() in set operations recurse_set_operations() uses the parse tree for the group number estimation, because of the "varno 0" hack. At the same time 2489d76c49 made root->parse and corresponding parent_root->simple_rte_array[]->subquery distinct copies of the parse tree, while d3d55ce571 introduced self-join removal replacing relid of removed relation only in one of the copies. The present commit fixes this bug by making recurse_set_operations() call estimate_num_groups() with the copy of the parse tree processed by self-join removal. In future, we may think about maintaining just one copy of the parse tree and/or keeping removed relids as aliases. Reported-by: Zuming Jiang Bug: #18170 Discussion: https://postgr.es/m/flat/18170-f1d17bf9a0d58b24%40postgresql.org Author: Richard Guo, Alexander Korotkov Reviewed-by: Andrei Lepikhov	2023-11-04 03:30:18 +02:00
Alexander Korotkov	9ba9c7074f	Fix some regression tests for d3d55ce57136 Add missing (cost off) to explain.	2023-10-25 14:53:14 +03:00
Alexander Korotkov	d3d55ce571	Remove useless self-joins The Self Join Elimination (SJE) feature removes an inner join of a plain table to itself in the query tree if is proved that the join can be replaced with a scan without impacting the query result. Self join and inner relation are replaced with the outer in query, equivalence classes, and planner info structures. Also, inner restrictlist moves to the outer one with removing duplicated clauses. Thus, this optimization reduces the length of the range table list (this especially makes sense for partitioned relations), reduces the number of restriction clauses === selectivity estimations, and potentially can improve total planner prediction for the query. The SJE proof is based on innerrel_is_unique machinery. We can remove a self-join when for each outer row: 1. At most one inner row matches the join clause. 2. Each matched inner row must be (physically) the same row as the outer one. In this patch we use the next approach to identify a self-join: 1. Collect all merge-joinable join quals which look like a.x = b.x 2. Add to the list above the baseretrictinfo of the inner table. 3. Check innerrel_is_unique() for the qual list. If it returns false, skip this pair of joining tables. 4. Check uniqueness, proved by the baserestrictinfo clauses. To prove the possibility of self-join elimination inner and outer clauses must have an exact match. The relation replacement procedure is not trivial and it is partly combined with the one, used to remove useless left joins. Tests, covering this feature, were added to join.sql. Some regression tests changed due to self-join removal logic. Discussion: https://postgr.es/m/flat/64486b0b-0404-e39e-322d-0801154901f3%40postgrespro.ru Author: Andrey Lepikhov, Alexander Kuzmenkov Reviewed-by: Tom Lane, Robert Haas, Andres Freund, Simon Riggs, Jonathan S. Katz Reviewed-by: David Rowley, Thomas Munro, Konstantin Knizhnik, Heikki Linnakangas Reviewed-by: Hywel Carver, Laurenz Albe, Ronan Dunklau, vignesh C, Zhihong Yu Reviewed-by: Greg Stark, Jaime Casanova, Michał Kłeczek, Alena Rybakina Reviewed-by: Alexander Korotkov	2023-10-25 12:59:16 +03:00
Tom Lane	a798660ebe	Defend against bogus parameterization of join input paths. An outer join cannot be formed using an input path that is parameterized by a value that is supposed to be nulled by the outer join. This is obviously nonsensical, and it could lead to a bad plan being selected; although currently it seems that we'll hit various sanity-check assertions first. I think that such cases were formerly prevented by the delay_upper_joins mechanism, but now that that's gone we need an explicit check. (Perhaps we should avoid generating baserel paths that could lead to this situation in the first place; but it seems like having a defense at the join level would be a good idea anyway.) Richard Guo and Tom Lane, per report from Jaime Casanova Discussion: https://postgr.es/m/CAJKUy5g2uZRrUDZJ8p-=giwcSHVUn0c9nmdxPSY0jF0Ov8VoEA@mail.gmail.com	2023-06-29 12:12:52 -04:00
Tom Lane	3af87736bf	Fix another cause of "wrong varnullingrels" planner failures. I removed the delay_upper_joins mechanism in commit b448f1c8d, reasoning that it was only needed when we have a single-table (SELECT ... WHERE) as the immediate RHS child of a left join, and we could get rid of that by hoisting the WHERE condition into the parent join's quals. However that new code missed a case: we could have "foo LEFT JOIN ((SELECT ... WHERE) LEFT JOIN bar)", and if the two left joins can be commuted then we now have the problematic query shape. We can fix this too easily enough, by allowing the syntactically-lower left join to pass through its parent qual location pointer recursively. That lets prepjointree.c discard the SELECT by temporarily hoisting the WHERE condition into the ancestor join's qual. Per bug #17978 from Zuming Jiang. Discussion: https://postgr.es/m/17978-12f3d93a55297266@postgresql.org	2023-06-20 11:09:56 -04:00
Tom Lane	efeb12ef0b	Don't include outer join relids in lateral_relids bitmapsets. This avoids an assertion failure when outer joins are rearranged per identity 3. Listing only the baserels from a PlaceHolderVar's ph_lateral set should be enough to ensure that the required values are available when we need to compute the PHV --- it's what we did before inventing nullingrel sets, after all. It's a bit unsatisfying; but with beta2 hard upon us, there's not time to look for an aesthetically cleaner fix. Richard Guo and Tom Lane Discussion: https://postgr.es/m/CAMbWs48Jcw-NvnxT23WiHP324wG44DvzcH1j4hc0Zn+3sR9cfg@mail.gmail.com	2023-06-20 10:29:57 -04:00
Tom Lane	0655c03ef9	Centralize fixups for mismatched nullingrels in nestloop params. It turns out that the fixes we applied in commits bfd332b3f and 63e4f13d2 were not nearly enough to solve the problem. We'd focused narrowly on subquery RTEs with lateral references, but lateral references can occur in several other RTE kinds such as function RTEs. Putting the same hack into half a dozen code paths seems quite unattractive. Hence, revert the code changes (but not the test cases) from those commits and instead solve it centrally in identify_current_nestloop_params(), as Richard proposed originally. This is a bit annoying because it could mask erroneous nullingrels in nestloop params that are generated from non-LATERAL parameterized paths; but on balance I don't see a better way. Maybe at some future time we'll be motivated to find a more rigorous approach to nestloop params, but that's not happening for beta2. Richard Guo and Tom Lane Discussion: https://postgr.es/m/CAMbWs48Jcw-NvnxT23WiHP324wG44DvzcH1j4hc0Zn+3sR9cfg@mail.gmail.com	2023-06-20 10:22:52 -04:00
David Rowley	7fcd7ef2a9	Don't use partial unique indexes for unique proofs in the planner Here we adjust relation_has_unique_index_for() so that it no longer makes use of partial unique indexes as uniqueness proofs. It is incorrect to use these as the predicates used by check_index_predicates() to set predOK makes use of not only baserestrictinfo quals as proofs, but also qual from join conditions. For relation_has_unique_index_for()'s case, we need to know the relation is unique for a given set of columns before any joins are evaluated, so if predOK was only set to true due to some join qual, then it's unsafe to use such indexes in relation_has_unique_index_for(). The final plan may not even make use of that index, which could result in reading tuples that are not as unique as the planner previously expected them to be. Bug: #17975 Reported-by: Tor Erik Linnerud Backpatch-through: 11, all supported versions Discussion: https://postgr.es/m/17975-98a90c156f25c952%40postgresql.org	2023-06-19 13:00:42 +12:00
Tom Lane	f4c00d138f	When removing a left join, clean out references in EquivalenceClasses. Since commit b448f1c8d, we've been able to remove left joins (that are otherwise removable) even when they are underneath other left joins, a case that was previously prevented by a delay_upper_joins check. This is a clear improvement, but it has a surprising side-effect: it's now possible that there are EquivalenceClasses whose relid sets mention the removed baserel and/or outer join. If we fail to clean those up, we may drop essential join quals due to not having any join level that appears to satisfy their relid sets. (It's not quite 100% clear that this was impossible before. But the lack of complaints since we added join removal a dozen years ago strongly suggests that it was impossible.) Richard Guo and Tom Lane, per bug #17976 from Zuming Jiang Discussion: https://postgr.es/m/17976-4b638b525e9a983b@postgresql.org	2023-06-15 15:24:50 -04:00
Tom Lane	63e4f13d2a	Fix "wrong varnullingrels" for Memoize's lateral references, too. The issue fixed in commit bfd332b3f can also bite Memoize plans, because of the separate copies of lateral reference Vars made by paraminfo_get_equal_hashops. Apply the same hacky fix there. (In passing, clean up shaky grammar in the existing comments for this function.) Richard Guo Discussion: https://postgr.es/m/CAMbWs4-krwk0Wbd6WdufMAupuou_Ua73ijQ4XQCr1Mb5BaVtKQ@mail.gmail.com	2023-06-13 18:01:33 -04:00
Tom Lane	bfd332b3fd	Fix "wrong varnullingrels" for subquery nestloop parameters. If we apply outer join identity 3 when relation C is a subquery having lateral references to relation B, then the lateral references within C continue to bear the original syntactically-correct varnullingrels marks, but that won't match what is available from the outer side of the nestloop. Compensate for that in process_subquery_nestloop_params(). This is a slightly hacky fix, but we certainly don't want to re-plan C in toto for each possible outer join order, so there's not a lot of better alternatives. Richard Guo and Tom Lane, per report from Markus Winand Discussion: https://postgr.es/m/DFBB2D25-DE97-49CA-A60E-07C881EA59A7@winand.at	2023-06-12 10:01:26 -04:00
Tom Lane	9a2dbc614e	Fix oversight in outer join removal. A placeholder that references the outer join's relid in ph_eval_at is logically "above" the join, and therefore we can't remove its PlaceHolderInfo: it might still be used somewhere in the query. This was not an issue pre-v16 because we failed to remove the join at all in such cases. The new outer-join-aware-Var infrastructure permits deducing that it's okay to remove the join, but then we have to clean up correctly afterwards. Report and fix by Richard Guo Discussion: https://postgr.es/m/CAMbWs4_tuVn9EwwMcggGiZJWWstdXX_ci8FeEU17vs+4nLgw3w@mail.gmail.com	2023-06-08 17:10:04 -04:00
Tom Lane	7a844c77ec	Fix joinclause removal logic to cope with cloned clauses. When we're deleting a no-op LEFT JOIN from the query, we must remove the join's joinclauses from surviving relations' joininfo lists. The invention of "cloned" clauses in 2489d76c4 broke the logic for that; it'd fail to remove clones that include OJ relids outside the doomed join's min relid sets, which could happen if that join was previously discovered to commute with some other join. This accidentally failed to cause problems in the majority of cases, because we'd never decide that such a cloned clause was evaluatable at any surviving join. However, Richard Guo discovered a case where that did happen, leading to "no relation entry for relid" errors later. Also, adding assertions that a non-removed clause contains no Vars from the doomed join exposes that there are quite a few existing regression test cases where the problem happens but is accidentally not exposed. The fix for this is just to include the target join's commute_above_r and commute_below_l sets in the relid set we test against when deciding whether a join clause is "pushed down" and thus not removable. While at it, do a little refactoring: the join's relid set can be computed inside remove_rel_from_query rather than in the caller. Patch by me; thanks to Richard Guo for review. Discussion: https://postgr.es/m/CAMbWs4_PHrRqTKDNnTRsxxQy6BtYCVKsgXm1_gdN2yQ=kmcO5g@mail.gmail.com	2023-05-26 12:13:19 -04:00
Tom Lane	991a3df227	Fix filtering of "cloned" outer-join quals some more. We've had multiple issues with the clause_is_computable_at logic that I introduced in 2489d76c4: it's been known to accept more than one clone of the same qual at the same plan node, and also to accept no clones at all. It's looking impractical to get it 100% right on the basis of the currently-stored information, so fix it by introducing a new RestrictInfo field "incompatible_relids" that explicitly shows which outer joins a given clone mustn't be pushed above. In principle we could populate this field in every RestrictInfo, but that would cost space and there doesn't presently seem to be a need for it in general. Also, while deconstruct_distribute_oj_quals can easily fill the field with the remaining members of the commutative join set that it's considering, computing it in the general case seems again pretty complicated. So for now, just fill it for clone quals. Along the way, fix a bug that may or may not be only latent: equivclass.c was generating replacement clauses with is_pushed_down and has_clone/is_clone markings that didn't match their required_relids. This led me to conclude that leaving the clone flags out of make_restrictinfo's purview wasn't such a great idea after all, so add them. Per report from Richard Guo. Discussion: https://postgr.es/m/CAMbWs48EYi_9-pSd0ORes1kTmTeAjT4Q3gu49hJtYCbSn2JyeA@mail.gmail.com	2023-05-25 10:28:33 -04:00
Tom Lane	b9c755a2f6	In clause_is_computable_at(), test required_relids for clone clauses. Use the clause's required_relids not clause_relids for testing whether it is computable at the current join level, if it is a clone clause generated by deconstruct_distribute_oj_quals(). Arguably, this is more correct and we should do it for all clauses; that would at least remove the handwavy claim that we are doing it to save cycles compared to inspecting Vars individually. However, attempting to do that exposes that we are not being careful to compute an accurate value for required_relids in all cases. I'm unsure whether it's a good idea to attempt to do that for v16, or leave it as future clean-up. In the meantime, this quick hack demonstrably fixes some cases, so let's squeeze it in for beta1. Patch by me, but great thanks to Richard Guo for investigation and testing. The new test cases are all modeled on his examples. Discussion: https://postgr.es/m/CAMbWs4-_vwkBij4XOQ5ukxUvLgwTm0kS5_DO9CicUeKbEfKjUw@mail.gmail.com	2023-05-21 15:25:52 -04:00
Tom Lane	d0f952691f	Fix thinko in join removal. In commit 9df8f903e I (tgl) switched join_is_removable() from using the min relid sets of the join under consideration to using its full syntactic relid sets. This was a mistake, as it allowed join removal in cases where a reference to the join output would survive in some syntactically-lower join condition. Revert to the former coding. Richard Guo Discussion: https://postgr.es/m/CAMbWs4-EU9uBGSP7G-iTwLBhRQ=rnZKvFDhD+n+xhajokyPCKg@mail.gmail.com	2023-05-19 15:24:07 -04:00
Tom Lane	9df8f903eb	Fix some issues with improper placement of outer join clauses. After applying outer-join identity 3 in the forward direction, it was possible for the planner to mistakenly apply a qual clause from above the two outer joins at the now-lower join level. This can give the wrong answer, since a value that would get nulled by the now-upper join might not yet be null. To fix, when we perform such a transformation, consider that the now-lower join hasn't really completed the outer join it's nominally responsible for and thus its relid set should not include that OJ's relid (nor should its output Vars have that nullingrel bit set). Instead we add those bits when the now-upper join is performed. The existing rules for qual placement then suffice to prevent higher qual clauses from dropping below the now-upper join. There are a few complications from needing to consider transitive closures in case multiple pushdowns have happened, but all in all it's not a very complex patch. This is all new logic (from 2489d76c4) so no need to back-patch. The added test cases all have the same results as in v15. Tom Lane and Richard Guo Discussion: https://postgr.es/m/0b819232-4b50-f245-1c7d-c8c61bf41827@postgrespro.ru	2023-05-17 11:14:04 -04:00

1 2 3 4 5

216 Commits