postgresql

mirror of https://git.postgresql.org/git/postgresql.git synced 2026-02-13 09:57:02 +08:00

Author	SHA1	Message	Date
Tom Lane	7ca25b7de6	Fix neqjoinsel's behavior for semi/anti join cases. Previously, this function estimated the selectivity as 1 minus eqjoinsel() for the negator equality operator, regardless of join type (I think there was an expectation that eqjoinsel would handle the join type). But actually this is completely wrong for semijoin cases: the fraction of the LHS that has a non-matching row is not one minus the fraction of the LHS that has a matching row. In reality a semijoin with <> will nearly always succeed: it can only fail when the RHS is empty, or it contains a single distinct value that is equal to the particular LHS value, or the LHS value is null. The only one of those things we should have much confidence in estimating is the fraction of LHS values that are null, so let's just take the selectivity as 1 minus outer nullfrac. Per coding convention, antijoin should be estimated the same as semijoin. Arguably this is a bug fix, but in view of the lack of field complaints and the risk of destabilizing plans, no back-patch. Thomas Munro, reviewed by Ashutosh Bapat Discussion: https://postgr.es/m/CAEepm=270ze2hVxWkJw-5eKzc3AB4C9KpH3L2kih75R5pdSogg@mail.gmail.com	2017-11-29 22:00:37 -05:00
Andres Freund	fa330f9adf	Add some regression tests that exercise hash join code. Although hash joins are already tested by many queries, these tests systematically cover the four different states we can reach as part of the strategy for respecting work_mem. Author: Thomas Munro Reviewed-By: Andres Freund	2017-11-29 16:06:50 -08:00
Robert Haas	57eebca03a	Fix create_lateral_join_info to handle dead relations properly. Commit 0a480502b092195a9b25a2f0f199a21d592a9c57 broke it. Report by Andreas Seltenreich. Fix by Ashutosh Bapat. Discussion: http://postgr.es/m/874ls2vrnx.fsf@ansel.ydns.eu	2017-09-20 10:20:10 -04:00
Robert Haas	0a480502b0	Expand partitioned table RTEs level by level, without flattening. Flattening the partitioning hierarchy at this stage makes various desirable optimizations difficult. The original use case for this patch was partition-wise join, which wants to match up the partitions in one partitioning hierarchy with those in another such hierarchy. However, it now seems that it will also be useful in making partition pruning work using the PartitionDesc rather than constraint exclusion, because with a flattened expansion, we have no easy way to figure out which PartitionDescs apply to which leaf tables in a multi-level partition hierarchy. As it turns out, we end up creating both rte->inh and !rte->inh RTEs for each intermediate partitioned table, just as we previously did for the root table. This seems unnecessary since the partitioned tables have no storage and are not scanned. We might want to go back and rejigger things so that no partitioned tables (including the parent) need !rte->inh RTEs, but that seems to require some adjustments not related to the core purpose of this patch. Ashutosh Bapat, reviewed by me and by Amit Langote. Some final adjustments by me. Discussion: http://postgr.es/m/CAFjFpRd=1venqLL7oGU=C1dEkuvk2DJgvF+7uKbnPHaum1mvHQ@mail.gmail.com	2017-09-14 15:41:08 -04:00
Tom Lane	d8e6b84bd2	Avoid regressions in foreign-key-based selectivity estimates. David Rowley found that the "use the smallest per-column selectivity" heuristic applied in some cases by get_foreign_key_join_selectivity() was badly off if the FK columns are independent, producing estimates much worse than we got before that code was added in 9.6. One case where that heuristic was used was for LEFT and FULL outer joins with the referenced rel on the outside of the join. But we should not really need to special-case those here. eqjoinsel() never has had such a special case; the correction is applied by calc_joinrel_size_estimate() instead. Let's just estimate such cases like inner joins and rely on that later adjustment. (I think there was something of a thinko here, in that the comments seem to be thinking about the selectivity as defined for semi/anti joins; but that shouldn't apply to left/full joins.) Add a regression test exercising such a case to show that this is sane in at least some cases. The other case where we used that heuristic was for SEMI/ANTI outer joins, either if the referenced rel was on the outside, or if it was on the inside but was part of a join within the RHS. In either case, the FK doesn't give us a lot of traction towards estimating the selectivity. To ensure that we don't have regressions from what happened before 9.6, let's punt by ignoring the FK in such cases and applying the traditional selectivity calculation. (We might be able to improve on that later, but for now I just want to be sure it's not worse than 9.5.) Report and patch by David Rowley, simplified a bit by me. Back-patch to 9.6 where this code was added. Discussion: https://postgr.es/m/CAKJS1f8NO8oCDcxrteohG6O72uU1saEVT9qX=R8pENr5QWerXw@mail.gmail.com	2017-06-19 15:33:41 -04:00
Tom Lane	92a43e4857	Reduce semijoins with unique inner relations to plain inner joins. If the inner relation can be proven unique, that is it can have no more than one matching row for any row of the outer query, then we might as well implement the semijoin as a plain inner join, allowing substantially more freedom to the planner. This is a form of outer join strength reduction, but it can't be implemented in reduce_outer_joins() because we don't have enough info about the individual relations at that stage. Instead do it much like remove_useless_joins(): once we've built base relations, we can make another pass over the SpecialJoinInfo list and get rid of any entries representing reducible semijoins. This is essentially a followon to the inner-unique patch (commit 9c7f5229a) and makes use of the proof machinery that that patch created. We need only minor refactoring of innerrel_is_unique's API to support this usage. Per performance complaint from Teodor Sigaev. Discussion: https://postgr.es/m/f994fc98-389f-4a46-d1bc-c42e05cb43ed@sigaev.ru	2017-05-01 14:53:42 -04:00
Tom Lane	2057a58d16	Fix mis-optimization of semijoins with more than one LHS relation. The inner-unique patch (commit 9c7f5229a) supposed that if we're considering a JOIN_UNIQUE_INNER join path, we can always set inner_unique for the join, because the inner path produced by create_unique_path should be unique relative to the outer relation. However, that's true only if we're considering joining to the whole outer relation --- otherwise we may be applying only some of the join quals, and so the inner path might be non-unique from the perspective of this join. Adjust the test to only believe that we can set inner_unique if we have the whole semijoin LHS on the outer side. There is more that can be done in this area, but this commit is only intended to provide the minimal fix needed to get correct plans. Per report from Teodor Sigaev. Thanks to David Rowley for preliminary investigation. Discussion: https://postgr.es/m/f994fc98-389f-4a46-d1bc-c42e05cb43ed@sigaev.ru	2017-05-01 14:39:11 -04:00
Tom Lane	9c7f5229ad	Optimize joins when the inner relation can be proven unique. If there can certainly be no more than one matching inner row for a given outer row, then the executor can move on to the next outer row as soon as it's found one match; there's no need to continue scanning the inner relation for this outer row. This saves useless scanning in nestloop and hash joins. In merge joins, it offers the opportunity to skip mark/restore processing, because we know we have not advanced past the first possible match for the next outer row. Of course, the devil is in the details: the proof of uniqueness must depend only on joinquals (not otherquals), and if we want to skip mergejoin mark/restore then it must depend only on merge clauses. To avoid adding more planning overhead than absolutely necessary, the present patch errs in the conservative direction: there are cases where inner_unique or skip_mark_restore processing could be used, but it will not do so because it's not sure that the uniqueness proof depended only on "safe" clauses. This could be improved later. David Rowley, reviewed and rather heavily editorialized on by me Discussion: https://postgr.es/m/CAApHDvqF6Sw-TK98bW48TdtFJ+3a7D2mFyZ7++=D-RyPsL76gw@mail.gmail.com	2017-04-07 22:20:13 -04:00
Andres Freund	7c5d8c16e1	Add explicit ORDER BY to a few tests that exercise hash-join code. A proposed patch, also by Thomas and in the same thread, would change the output order of these. Independent of the follow-up patches getting committed, nailing down the order in these specific tests at worst seems harmless. Author: Thomas Munro Discussion: https://postgr.es/m/CAEepm=1D4-tP7j7UAgT_j4ZX2j4Ehe1qgZQWFKBMb8F76UW5Rg@mail.gmail.com	2017-02-08 16:58:21 -08:00
Heikki Linnakangas	181bdb90ba	Fix typos in comments. Backpatch to all supported versions, where applicable, to make backpatching of future fixes go more smoothly. Josh Soref Discussion: https://www.postgresql.org/message-id/CACZqfqCf+5qRztLPgmmosr-B0Ye4srWzzw_mo4c_8_B_mtjmJQ@mail.gmail.com	2017-02-06 11:33:58 +02:00
Tom Lane	207d5a656e	Fix mishandling of equivalence-class tests in parameterized plans. Given a three-or-more-way equivalence class, such as X.Y = Y.Y = Z.Z, it was possible for the planner to omit one of the quals needed to enforce that all members of the equivalence class are actually equal. This only happened in the case of a parameterized join node for two of the relations, that is a plan tree like Nested Loop -> Scan X -> Nested Loop -> Scan Y -> Scan Z Filter: Z.Z = X.X The eclass machinery normally expects to apply X.X = Y.Y when those two relations are joined, but in this shape of plan tree they aren't joined until the top node --- and, if the lower nested loop is marked as parameterized by X, the top node will assume that the relevant eclass condition(s) got pushed down into the lower node. On the other hand, the scan of Z assumes that it's only responsible for constraining Z.Z to match any one of the other eclass members. So one or another of the required quals sometimes fell between the cracks, depending on whether consideration of the eclass in get_joinrel_parampathinfo() for the lower nested loop chanced to generate X.X = Y.Y or X.X = Z.Z as the appropriate constraint there. If it generated the latter, it'd erroneously suppose that the Z scan would take care of matters. To fix, force X.X = Y.Y to be generated and applied at that join node when this case occurs. This is extremely hard to hit in practice, because various planner behaviors conspire to mask the problem; starting with the fact that the planner doesn't really like to generate a parameterized plan of the above shape. (It might have been impossible to hit it before we tweaked things to allow this plan shape for star-schema cases.) Many thanks to Alexander Kirkouski for submitting a reproducible test case. The bug can be demonstrated in all branches back to 9.2 where parameterized paths were introduced, so back-patch that far.	2016-04-29 20:19:38 -04:00
Tom Lane	80f66a9ad0	Fix planner failure with full join in RHS of left join. Given a left join containing a full join in its righthand side, with the left join's joinclause referencing only one side of the full join (in a non-strict fashion, so that the full join doesn't get simplified), the planner could fail with "failed to build any N-way joins" or related errors. This happened because the full join was seen as overlapping the left join's RHS, and then recent changes within join_is_legal() caused that function to conclude that the full join couldn't validly be formed. Rather than try to rejigger join_is_legal() yet more to allow this, I think it's better to fix initsplan.c so that the required join order is explicit in the SpecialJoinInfo data structure. The previous coding there essentially ignored full joins, relying on the fact that we don't flatten them in the joinlist data structure to preserve their ordering. That's sufficient to prevent a wrong plan from being formed, but as this example shows, it's not sufficient to ensure that the right plan will be formed. We need to work a bit harder to ensure that the right plan looks sane according to the SpecialJoinInfos. Per bug #14105 from Vojtech Rylko. This was apparently induced by commit 8703059c6 (though now that I've seen it, I wonder whether there are related cases that could have failed before that); so back-patch to all active branches. Unfortunately, that patch also went into 9.0, so this bug is a regression that won't be fixed in that branch.	2016-04-21 20:05:58 -04:00
Tom Lane	d4c3a156cb	Remove GROUP BY columns that are functionally dependent on other columns. If a GROUP BY clause includes all columns of a non-deferred primary key, as well as other columns of the same relation, those other columns are redundant and can be dropped from the grouping; the pkey is enough to ensure that each row of the table corresponds to a separate group. Getting rid of the excess columns will reduce the cost of the sorting or hashing needed to implement GROUP BY, and can indeed remove the need for a sort step altogether. This seems worth testing for since many query authors are not aware of the GROUP-BY-primary-key exception to the rule about queries not being allowed to reference non-grouped-by columns in their targetlists or HAVING clauses. Thus, redundant GROUP BY items are not uncommon. Also, we can make the test pretty cheap in most queries where it won't help by not looking up a rel's primary key until we've found that at least two of its columns are in GROUP BY. David Rowley, reviewed by Julien Rouhaud	2016-02-11 17:34:59 -05:00
Tom Lane	f867ce5518	ExecHashRemoveNextSkewBucket must physically copy tuples to main hashtable. Commit 45f6240a8fa9d355 added an assumption in ExecHashIncreaseNumBatches and ExecHashIncreaseNumBuckets that they could find all tuples in the main hash table by iterating over the "dense storage" introduced by that patch. However, ExecHashRemoveNextSkewBucket continued its old practice of simply re-linking deleted skew tuples into the main table's hashchains. Hence, such tuples got lost during any subsequent increase in nbatch or nbuckets, and would never get joined, as reported in bug #13908 from Seth P. I (tgl) think that the aforesaid commit has got multiple design issues and should be reworked rather completely; but there is no time for that right now, so band-aid the problem by making ExecHashRemoveNextSkewBucket physically copy deleted skew tuples into the "dense storage" arena. The added test case is able to exhibit the problem by means of fooling the planner with a WHERE condition that it will underestimate the selectivity of, causing the initial nbatch estimate to be too small. Tomas Vondra and Tom Lane. Thanks to David Johnston for initial investigation into the bug report.	2016-02-07 12:29:32 -05:00
Tom Lane	acfcd45cac	Still more fixes for planner's handling of LATERAL references. More fuzz testing by Andreas Seltenreich exposed that the planner did not cope well with chains of lateral references. If relation X references Y laterally, and Y references Z laterally, then we will have to scan X on the inside of a nestloop with Z, so for all intents and purposes X is laterally dependent on Z too. The planner did not understand this and would generate intermediate joins that could not be used. While that was usually harmless except for wasting some planning cycles, under the right circumstances it would lead to "failed to build any N-way joins" or "could not devise a query plan" planner failures. To fix that, convert the existing per-relation lateral_relids and lateral_referencers relid sets into their transitive closures; that is, they now show all relations on which a rel is directly or indirectly laterally dependent. This not only fixes the chained-reference problem but allows some of the relevant tests to be made substantially simpler and faster, since they can be reduced to simple bitmap manipulations instead of searches of the LateralJoinInfo list. Also, when a PlaceHolderVar that is due to be evaluated at a join contains lateral references, we should treat those references as indirect lateral dependencies of each of the join's base relations. This prevents us from trying to join any individual base relations to the lateral reference source before the join is formed, which again cannot work. Andreas' testing also exposed another oversight in the "dangerous PlaceHolderVar" test added in commit 85e5e222b1dd02f1. Simply rejecting unsafe join paths in joinpath.c is insufficient, because in some cases we will end up rejecting all possible paths for a particular join, again leading to "could not devise a query plan" failures. The restriction has to be known also to join_is_legal and its cohort functions, so that they will not select a join for which that will happen. I chose to move the supporting logic into joinrels.c where the latter functions are. Back-patch to 9.3 where LATERAL support was introduced.	2015-12-11 14:22:20 -05:00
Tom Lane	7e19db0c09	Fix another oversight in checking if a join with LATERAL refs is legal. It was possible for the planner to decide to join a LATERAL subquery to the outer side of an outer join before the outer join itself is completed. Normally that's fine because of the associativity rules, but it doesn't work if the subquery contains a lateral reference to the inner side of the outer join. In such a situation the outer join must be done first. join_is_legal() missed this consideration and would allow the join to be attempted, but the actual path-building code correctly decided that no valid join path could be made, sometimes leading to planner errors such as "failed to build any N-way joins". Per report from Andreas Seltenreich. Back-patch to 9.3 where LATERAL support was added.	2015-12-07 17:42:11 -05:00
Tom Lane	6a0779a397	Improve regression test case to avoid depending on system catalog stats. In commit 95f4e59c32866716 I added a regression test case that examined the plan of a query on system catalogs. That isn't a terribly great idea because the catalogs tend to change from version to version, or even within a version if someone makes an unrelated regression-test change that populates the catalogs a bit differently. Usually I try to make planner test cases rely on test tables that have not changed since Berkeley days, but I got sloppy in this case because the submitted crasher example queried the catalogs and I didn't spend enough time on rewriting it. But it was a problem waiting to happen, as I was rudely reminded when I tried to port that patch into Salesforce's Postgres variant :-(. So spend a little more effort and rewrite the query to not use any system catalogs. I verified that this version still provokes the Assert if 95f4e59c32866716's code fix is reverted. I also removed the EXPLAIN output from the test, as it turns out that the assertion occurs while considering a plan that isn't the one ultimately selected anyway; so there's no value in risking any cross-platform variation in that printout. Back-patch to 9.2, like the previous patch.	2015-08-13 13:25:22 -04:00
Tom Lane	cfe30a72fa	Undo mistaken tightening in join_is_legal(). One of the changes I made in commit 8703059c6b55c427 turns out not to have been such a good idea: we still need the exception in join_is_legal() that allows a join if both inputs already overlap the RHS of the special join we're checking. Otherwise we can miss valid plans, and might indeed fail to find a plan at all, as in recent report from Andreas Seltenreich. That code was added way back in commit c17117649b9ae23d, but I failed to include a regression test case then; my bad. Put it back with a better explanation, and a test this time. The logic does end up a bit different than before though: I now believe it's appropriate to make this check first, thereby allowing such a case whether or not we'd consider the previous SJ(s) to commute with this one. (Presumably, we already decided they did; but it was confusing to have this consideration in the middle of the code that was handling the other case.) Back-patch to all active branches, like the previous patch.	2015-08-12 21:19:03 -04:00
Tom Lane	68fa28f771	Postpone extParam/allParam calculations until the very end of planning. Until now we computed these Param ID sets at the end of subquery_planner, but that approach depends on subquery_planner returning a concrete Plan tree. We would like to switch over to returning one or more Paths for a subquery, and in that representation the necessary details aren't fully fleshed out (not to mention that we don't really want to do this work for Paths that end up getting discarded). Hence, refactor so that we can compute the param ID sets at the end of planning, just before set_plan_references is run. The main change necessary to make this work is that we need to capture the set of outer-level Param IDs available to the current query level before exiting subquery_planner, since the outer levels' plan_params lists are transient. (That's not going to pose a problem for returning Paths, since all the work involved in producing that data is part of expression preprocessing, which will continue to happen before Paths are produced.) On the plus side, this change gets rid of several existing kluges. Eventually I'd like to get rid of SS_finalize_plan altogether in favor of doing this work during set_plan_references, but that will require some complex rejiggering because SS_finalize_plan needs to visit subplans and initplans before the main plan. So leave that idea for another day.	2015-08-11 23:48:37 -04:00
Tom Lane	4200a92862	Further mucking with PlaceHolderVar-related restrictions on join order. Commit 85e5e222b1dd02f135a8c3bf387d0d6d88e669bd turns out not to have taken care of all cases of the partially-evaluatable-PlaceHolderVar problem found by Andreas Seltenreich's fuzz testing. I had set it up to check for risky PHVs only in the event that we were making a star-schema-based exception to the param_source_rels join ordering heuristic. However, it turns out that the problem can occur even in joins that satisfy the param_source_rels heuristic, in which case allow_star_schema_join() isn't consulted. Refactor so that we check for risky PHVs whenever the proposed join has any remaining parameterization. Back-patch to 9.2, like the previous patch (except for the regression test case, which only works back to 9.3 because it uses LATERAL). Note that this discovery implies that problems of this sort could've occurred in 9.2 and up even before the star-schema patch; though I've not tried to prove that experimentally.	2015-08-10 17:18:17 -04:00
Tom Lane	89db83922a	Further adjustments to PlaceHolderVar removal. A new test case from Andreas Seltenreich showed that we were still a bit confused about removing PlaceHolderVars during join removal. Specifically, remove_rel_from_query would remove a PHV that was used only underneath the removable join, even if the place where it's used was the join partner relation and not the join clause being deleted. This would lead to a "too late to create a new PlaceHolderInfo" error later on. We can defend against that by checking ph_eval_at to see if the PHV could possibly be getting used at some partner rel. Also improve some nearby LATERAL-related logic. I decided that the check on ph_lateral needed to take precedence over the check on ph_needed, in case there's a lateral reference underneath the join being considered. (That may be impossible, but I'm not convinced of it, and it's easy enough to defend against the case.) Also, I realized that remove_rel_from_query's logic for updating LateralJoinInfos is dead code, because we don't build those at all until after join removal. Back-patch to 9.3. Previous versions didn't have the LATERAL issues, of course, and they also didn't attempt to remove PlaceHolderInfos during join removal. (I'm starting to wonder if changing that was really such a great idea.)	2015-08-07 14:13:50 -04:00
Tom Lane	bab163e121	Fix old oversight in join removal logic. Commit 9e7e29c75ad441450f9b8287bd51c13521641e3b introduced an Assert that join removal didn't reduce the eval_at set of any PlaceHolderVar to empty. At first glance it looks like join_is_removable ensures that's true --- but actually, the loop in join_is_removable skips PlaceHolderVars that are not referenced above the join due to be removed. So, if we don't want any empty eval_at sets, the right thing to do is to delete any now-unreferenced PlaceHolderVars from the data structure entirely. Per fuzz testing by Andreas Seltenreich. Back-patch to 9.3 where the aforesaid Assert was added.	2015-08-06 22:14:27 -04:00
Tom Lane	8703059c6b	Further fixes for degenerate outer join clauses. Further testing revealed that commit f69b4b9495269cc4 was still a few bricks shy of a load: minor tweaking of the previous test cases resulted in the same wrong-outer-join-order problem coming back. After study I concluded that my previous changes in make_outerjoininfo() were just accidentally masking the problem, and should be reverted in favor of forcing syntactic join order whenever an upper outer join's predicate doesn't mention a lower outer join's LHS. This still allows the chained-outer-joins style that is the normally optimizable case. I also tightened things up some more in join_is_legal(). It seems to me on review that what's really happening in the exception case where we ignore a mismatched special join is that we're allowing the proposed join to associate into the RHS of the outer join we're comparing it to. As such, we should always insist that the proposed join be a left join, which eliminates a bunch of rather dubious argumentation. The case where we weren't enforcing that was the one that was already known buggy anyway (it had a violatable Assert before the aforesaid commit) so it hardly deserves a lot of deference. Back-patch to all active branches, like the previous patch. The added regression test case failed in all branches back to 9.1, and I think it's only an unrelated change in costing calculations that kept 9.0 from choosing a broken plan.	2015-08-06 15:35:46 -04:00
Tom Lane	85e5e222b1	Fix a PlaceHolderVar-related oversight in star-schema planning patch. In commit b514a7460d9127ddda6598307272c701cbb133b7, I changed the planner so that it would allow nestloop paths to remain partially parameterized, ie the inner relation might need parameters from both the current outer relation and some upper-level outer relation. That's fine so long as we're talking about distinct parameters; but the patch also allowed creation of nestloop paths for cases where the inner relation's parameter was a PlaceHolderVar whose eval_at set included the current outer relation and some upper-level one. That does not work. In principle we could allow such a PlaceHolderVar to be evaluated at the lower join node using values passed down from the upper relation along with values from the join's own outer relation. However, nodeNestloop.c only supports simple Vars not arbitrary expressions as nestloop parameters. createplan.c is also a few bricks shy of being able to handle such cases; it misplaces the PlaceHolderVar parameters in the plan tree, which is why the visible symptoms of this bug are "plan should not reference subplan's variable" and "failed to assign all NestLoopParams to plan nodes" planner errors. Adding the necessary complexity to make this work doesn't seem like it would be repaid in significantly better plans, because in cases where such a PHV exists, there is probably a corresponding join order constraint that would allow a good plan to be found without using the star-schema exception. Furthermore, adding complexity to nodeNestloop.c would create a run-time penalty even for plans where this whole consideration is irrelevant. So let's just reject such paths instead. Per fuzz testing by Andreas Seltenreich; the added regression test is based on his example query. Back-patch to 9.2, like the previous patch.	2015-08-04 14:55:50 -04:00
Tom Lane	f69b4b9495	Fix some planner issues with degenerate outer join clauses. An outer join clause that didn't actually reference the RHS (perhaps only after constant-folding) could confuse the join order enforcement logic, leading to wrong query results. Also, nested occurrences of such things could trigger an Assertion that on reflection seems incorrect. Per fuzz testing by Andreas Seltenreich. The practical use of such cases seems thin enough that it's not too surprising we've not heard field reports about it. This has been broken for a long time, so back-patch to all active branches.	2015-08-01 20:57:41 -04:00
Tom Lane	a6492ff897	Fix an oversight in checking whether a join with LATERAL refs is legal. In many cases, we can implement a semijoin as a plain innerjoin by first passing the righthand-side relation through a unique-ification step. However, one of the cases where this does NOT work is where the RHS has a LATERAL reference to the LHS; that makes the RHS dependent on the LHS so that unique-ification is meaningless. joinpath.c understood this, and so would not generate any join paths of this kind ... but join_is_legal neglected to check for the case, so it would think that we could do it. The upshot would be a "could not devise a query plan for the given query" failure once we had failed to generate any join paths at all for the bogus join pair. Back-patch to 9.3 where LATERAL was added.	2015-07-31 19:26:33 -04:00
Tom Lane	95f4e59c32	Remove an unsafe Assert, and explain join_clause_is_movable_into() better. join_clause_is_movable_into() is approximate, in the sense that it might sometimes return "false" when actually it would be valid to push the given join clause down to the specified level. This is okay ... but there was an Assert in get_joinrel_parampathinfo() that's only safe if the answers are always exact. Comment out the Assert, and add a bunch of commentary to clarify what's going on. Per fuzz testing by Andreas Seltenreich. The added regression test is a pretty silly query, but it's based on his crasher example. Back-patch to 9.2 where the faulty logic was introduced.	2015-07-28 13:20:39 -04:00
Tom Lane	fca8e59c1c	Fix oversight in flattening of subqueries with empty FROM. I missed a restriction that commit f4abd0241de20d5d6a79b84992b9e88603d44134 should have enforced: we can't pull up an empty-FROM subquery if it's under an outer join, because then we'd need to wrap its output columns in PlaceHolderVars. As the code currently stands, the PHVs end up with empty relid sets, which doesn't work (and is correctly caught by an Assert). It's possible that this could be fixed by assigning the PHVs the relid sets of the parent FromExpr/JoinExpr, but getting that to work is more complication than I care to add right now; indeed it's likely that we'll never bother, since pulling up empty-FROM subqueries is a rather marginal optimization anyway. Per report from Andreas Seltenreich. Back-patch to 9.5 where the faulty code was added.	2015-07-26 17:44:27 -04:00
Tom Lane	358eaa01bf	Make entirely-dummy appendrels get marked as such in set_append_rel_size. The planner generally expects that the estimated rowcount of any relation is at least one row, unless it has been proven empty by constraint exclusion or similar mechanisms, which is marked by installing a dummy path as the rel's cheapest path (cf. IS_DUMMY_REL). When I split up allpaths.c's processing of base rels into separate set_base_rel_sizes and set_base_rel_pathlists steps, the intention was that dummy rels would get marked as such during the "set size" step; this is what justifies an Assert in indxpath.c's get_loop_count that other relations should either be dummy or have positive rowcount. Unfortunately I didn't get that quite right for append relations: if all the child rels have been proven empty then set_append_rel_size would come up with a rowcount of zero, which is correct, but it didn't then do set_dummy_rel_pathlist. (We would have ended up with the right state after set_append_rel_pathlist, but that's too late, if we generate indexpaths for some other rel first.) In addition to fixing the actual bug, I installed an Assert enforcing this convention in set_rel_size; that then allows simplification of a couple of now-redundant tests for zero rowcount in set_append_rel_size. Also, to cover the possibility that third-party FDWs have been careless about not returning a zero rowcount estimate, apply clamp_row_est to whatever an FDW comes up with as the rows estimate. Per report from Andreas Seltenreich. Back-patch to 9.2. Earlier branches did not have the separation between set_base_rel_sizes and set_base_rel_pathlists steps, so there was no intermediate state where an appendrel would have had inconsistent rowcount and pathlist. It's possible that adding the Assert to set_rel_size would be a good idea in older branches too; but since they're not under development any more, it's likely not worth the trouble.	2015-07-26 16:19:08 -04:00
Tom Lane	3cf8686014	Prevent improper reordering of antijoins vs. outer joins. An outer join appearing within the RHS of an antijoin can't commute with the antijoin, but somehow I missed teaching make_outerjoininfo() about that. In Teodor Sigaev's recent trouble report, this manifests as a "could not find RelOptInfo for given relids" error within eqjoinsel(); but I think silently wrong query results are possible too, if the planner misorders the joins and doesn't happen to trigger any internal consistency checks. It's broken as far back as we had antijoins, so back-patch to all supported branches.	2015-04-25 16:44:27 -04:00
Tom Lane	ca6805338f	Fix incorrect matching of subexpressions in outer-join plan nodes. Previously we would re-use input subexpressions in all expression trees attached to a Join plan node. However, if it's an outer join and the subexpression appears in the nullable-side input, this is potentially incorrect for apparently-matching subexpressions that came from above the outer join (ie, targetlist and qpqual expressions), because the executor will treat the subexpression value as NULL when maybe it should not be. The case is fairly hard to hit because (a) you need a non-strict subexpression (else NULL is correct), and (b) we don't usually compute expressions in the outputs of non-toplevel plan nodes. But we might do so if the expressions are sort keys for a mergejoin, for example. Probably in the long run we should make a more explicit distinction between Vars appearing above and below an outer join, but that will be a major planner redesign and not at all back-patchable. For the moment, just hack set_join_references so that it will not match any non-Var expressions coming from nullable inputs to expressions that came from above the join. (This is somewhat overkill, in that a strict expression could still be matched, but it doesn't seem worth the effort to check that.) Per report from Qingqing Zhou. The added regression test case is based on his example. This has been broken for a very long time, so back-patch to all active branches.	2015-04-04 19:55:15 -04:00
Tom Lane	f4abd0241d	Support flattening of empty-FROM subqueries and one-row VALUES tables. We can't handle this in the general case due to limitations of the planner's data representations; but we can allow it in many useful cases, by being careful to flatten only when we are pulling a single-row subquery up into a FROM (or, equivalently, inner JOIN) node that will still have at least one remaining relation child. Per discussion of an example from Kyotaro Horiguchi.	2015-03-11 23:18:03 -04:00
Tom Lane	b746d0c32d	Fix old bug in get_loop_count(). While poking at David Kubečka's issue I noticed an ancient logic error in get_loop_count(): it used 1.0 as a "no data yet" indicator, but since that is actually a valid rowcount estimate, this doesn't work. If we have one input relation with 1.0 as rowcount and then another one with a larger rowcount, we should use 1.0 as the result, but we picked the larger rowcount instead. (I think when I coded this, I recognized the conflict, but mistakenly thought that the logic would pick the desired count anyway.) Fixing this changed the plan for one existing regression test case. Since the point of that test is to exercise creation of a particular shape of nestloop plan, I tweaked the query a little bit so it still results in the same plan choice. This is definitely a bug, but I'm hesitant to back-patch since it might change plan choices unexpectedly, and anyway failure to implement a heuristic precisely as intended is a pretty low-grade bug.	2015-03-11 22:53:32 -04:00
Robert Haas	e529cd4ffa	Suggest to the user the column they may have meant to reference. Error messages informing the user that no such column exists can sometimes provoke a perplexed response. This often happens due to a subtle typo in the column name or, perhaps less likely, in the alias name. To speed discovery of what the real issue is in such cases, we'll now search the range table for approximate matches. If there are one or two such matches that are good enough to think that they might be what the user intended to type, and better than all other approximate matches, we'll issue a hint suggesting that the user might have intended to reference those columns. Peter Geoghegan and Robert Haas	2015-03-11 10:44:04 -04:00
Tom Lane	b514a7460d	Fix planning of star-schema-style queries. Part of the intent of the parameterized-path mechanism was to handle star-schema queries efficiently, but some overly-restrictive search limiting logic added in commit e2fa76d80ba571d4de8992de6386536867250474 prevented such cases from working as desired. Fix that and add a regression test about it. Per gripe from Marc Cousin. This is arguably a bug rather than a new feature, so back-patch to 9.2 where parameterized paths were introduced.	2015-02-28 12:43:04 -05:00
Tom Lane	1b4cc493d2	Preserve AND/OR flatness while extracting restriction OR clauses. The code I added in commit f343a880d5555faf1dad0286c5632047c8f599ad was careless about preserving AND/OR flatness: it could create a structure with an OR node directly underneath another one. That breaks an assumption that's fairly important for planning efficiency, not to mention triggering various Asserts (as reported by Benjamin Smith). Add a trifle more logic to handle the case properly.	2014-09-09 18:35:31 -04:00
Tom Lane	f15821eefd	Allow join removal in some cases involving a left join to a subquery. We can remove a left join to a relation if the relation's output is provably distinct for the columns involved in the join clause (considering only equijoin clauses) and the relation supplies no variables needed above the join. Previously, the join removal logic could only prove distinctness by reference to unique indexes of a table. This patch extends the logic to consider subquery relations, wherein distinctness might be proven by reference to GROUP BY, DISTINCT, etc. We actually already had some code to check that a subquery's output was provably distinct, but it was hidden inside pathnode.c; which was a pretty bad place for it really, since that file is mostly boilerplate Path construction and comparison. Move that code to analyzejoins.c, which is arguably a more appropriate location, and is certainly the site of the new usage for it. David Rowley, reviewed by Simon Riggs	2014-07-15 21:12:43 -04:00
Tom Lane	ab76208e3d	Forward-port regression test for bug #10587 into 9.3 and HEAD. Although this bug is already fixed in post-9.2 branches, the case triggering it is quite different from what was under consideration at the time. It seems worth memorializing this example in HEAD just to make sure it doesn't get broken again in future. Extracted from commit 187ae17300776f48b2bd9d0737923b1bf70f606e.	2014-06-09 21:37:18 -04:00
Tom Lane	a16d421ca4	Revert "Auto-tune effective_cache size to be 4x shared buffers" This reverts commit ee1e5662d8d8330726eaef7d3110cb7add24d058, as well as a remarkably large number of followup commits, which were mostly concerned with the fact that the implementation didn't work terribly well. It still doesn't: we probably need some rather basic work in the GUC infrastructure if we want to fully support GUCs whose default varies depending on the value of another GUC. Meanwhile, it also emerged that there wasn't really consensus in favor of the definition the patch tried to implement (ie, effective_cache_size should default to 4 times shared_buffers). So whack it all back to where it was. In a followup commit, I'll do what was recently agreed to, which is to simply change the default to a higher value.	2014-05-08 20:49:38 -04:00
Tom Lane	043f6ff05d	Fix bogus handling of "postponed" lateral quals. When pulling a "postponed" qual from a LATERAL subquery up into the quals of an outer join, we must make sure that the postponed qual is included in those seen by make_outerjoininfo(). Otherwise we might compute a too-small min_lefthand or min_righthand for the outer join, leading to "JOIN qualification cannot refer to other relations" failures from distribute_qual_to_rels. Subtler errors in the created plan seem possible, too, if the extra qual would only affect join ordering constraints. Per bug #9041 from David Leverton. Back-patch to 9.3.	2014-01-30 14:51:16 -05:00
Tom Lane	158b7fa6a3	Disallow LATERAL references to the target table of an UPDATE/DELETE. On second thought, commit 0c051c90082da0b7e5bcaf9aabcbd4f361137cdc was over-hasty: rather than allowing this case, we ought to reject it for now. That leaves the field clear for a future feature that allows the target table to be re-specified in the FROM (or USING) clause, which will enable left-joining the target table to something else. We can then also allow LATERAL references to such an explicitly re-specified target table. But allowing them right now will create ambiguities or worse for such a feature, and it isn't something we documented 9.3 as supporting. While at it, add a convenience subroutine to avoid having several copies of the ereport for disalllowed-LATERAL-reference cases.	2014-01-11 19:03:12 -05:00
Tom Lane	0c051c9008	Fix LATERAL references to target table of UPDATE/DELETE. I failed to think much about UPDATE/DELETE when implementing LATERAL :-(. The implemented behavior ended up being that subqueries in the FROM or USING clause (respectively) could access the update/delete target table as though it were a lateral reference; which seems fine if they said LATERAL, but certainly ought to draw an error if they didn't. Fix it so you get a suitable error when you omit LATERAL. Per report from Emre Hasegeli.	2014-01-07 15:25:27 -05:00
Tom Lane	f343a880d5	Extract restriction OR clauses whether or not they are indexable. It's possible to extract a restriction OR clause from a join clause that has the form of an OR-of-ANDs, if each sub-AND includes a clause that mentions only one specific relation. While PG has been aware of that idea for many years, the code previously only did it if it could extract an indexable OR clause. On reflection, though, that seems a silly limitation: adding a restriction clause can be a win by reducing the number of rows that have to be filtered at the join step, even if we have to test the clause as a plain filter clause during the scan. This should be especially useful for foreign tables, where the change can cut the number of rows that have to be retrieved from the foreign server; but testing shows it can win even on local tables. Per a suggestion from Robert Haas. As a heuristic, I made the code accept an extracted restriction clause if its estimated selectivity is less than 0.9, which will probably result in accepting extracted clauses just about always. We might need to tweak that later based on experience. Since the code no longer has even a weak connection to Path creation, remove orindxpath.c and create a new file optimizer/util/orclauses.c. There's some additional janitorial cleanup of now-dead code that needs to happen, but it seems like that's a fit subject for a separate commit.	2013-12-30 12:24:37 -05:00
Tom Lane	b5e0a2a384	Tweak placement of explicit ANALYZE commands in the regression tests. Make the COPY test, which loads most of the large static tables used in the tests, also explicitly ANALYZE those tables. This allows us to get rid of various ad-hoc, and rather redundant, ANALYZE commands that had gotten stuck into various test scripts over time to ensure we got consistent plan choices. (We could have done a database-wide ANALYZE, but that would cause stats to get attached to the small static tables too, which results in plan changes compared to the historical behavior. I'm not sure that's a good idea, so not going that far for now.) Back-patch to 9.0, since 9.0 and 9.1 are currently sometimes failing regression tests for lack of an "ANALYZE tenk1" in the subselect test. There's no need for this in 8.4 since we didn't print any plans back then.	2013-12-11 15:09:15 -05:00
Tom Lane	f19e92ed04	Flatten join alias Vars before pulling up targetlist items from a subquery. pullup_replace_vars()'s decisions about whether a pulled-up replacement expression needs to be wrapped in a PlaceHolderVar depend on the assumption that what looks like a Var behaves like a Var. However, if the Var is a join alias reference, later flattening of join aliases might replace the Var with something that's not a Var at all, and should have been wrapped. To fix, do a forcible pass of flatten_join_alias_vars() on the subquery targetlist before we start to copy items out of it. We'll re-run that processing on the pulled-up expressions later, but that's harmless. Per report from Ken Tanzer; the added regression test case is based on his example. This bug has been there since the PlaceHolderVar mechanism was invented, but has escaped detection because the circumstances that trigger it are fairly narrow. You need a flattenable query underneath an outer join, which contains another flattenable query inside a join of its own, with a dangerous expression (a constant or something else non-strict) in that one's targetlist. Having seen this, I'm wondering if it wouldn't be prudent to do all alias-variable flattening earlier, perhaps even in the rewriter. But that would probably not be a back-patchable change.	2013-11-22 14:37:21 -05:00
Tom Lane	f3b3b8d5be	Compute correct em_nullable_relids in get_eclass_for_sort_expr(). Bug #8591 from Claudio Freire demonstrates that get_eclass_for_sort_expr must be able to compute valid em_nullable_relids for any new equivalence class members it creates. I'd worried about this in the commit message for db9f0e1d9a4a0842c814a464cdc9758c3f20b96c, but claimed that it wasn't a problem because multi-member ECs should already exist when it runs. That is transparently wrong, though, because this function is also called by initialize_mergeclause_eclasses, which runs during deconstruct_jointree. The example given in the bug report (which the new regression test item is based upon) fails because the COALESCE() expression is first seen by initialize_mergeclause_eclasses rather than process_equivalence. Fixing this requires passing the appropriate nullable_relids set to get_eclass_for_sort_expr, and it requires new code to compute that set for top-level expressions such as ORDER BY, GROUP BY, etc. We store the top-level nullable_relids in a new field in PlannerInfo to avoid computing it many times. In the back branches, I've added the new field at the end of the struct to minimize ABI breakage for planner plugins. There doesn't seem to be a good alternative to changing get_eclass_for_sort_expr's API signature, though. There probably aren't any third-party extensions calling that function directly; moreover, if there are, they probably need to think about what to pass for nullable_relids anyway. Back-patch to 9.2, like the previous patch in this area.	2013-11-15 16:46:18 -05:00
Tom Lane	648bd05b13	Re-allow duplicate aliases within aliased JOINs. Although the SQL spec forbids duplicate table aliases, historically we've allowed queries like SELECT ... FROM tab1 x CROSS JOIN (tab2 x CROSS JOIN tab3 y) z on the grounds that the aliased join (z) hides the aliases within it, therefore there is no conflict between the two RTEs named "x". The LATERAL patch broke this, on the misguided basis that "x" could be ambiguous if tab3 were a LATERAL subquery. To avoid breaking existing queries, it's better to allow this situation and complain only if tab3 actually does contain an ambiguous reference. We need only remove the check that was throwing an error, because the column lookup code is already prepared to handle ambiguous references. Per bug #8444.	2013-11-11 10:42:57 -05:00
Bruce Momjian	ee1e5662d8	Auto-tune effective_cache size to be 4x shared buffers	2013-10-08 12:12:24 -04:00
Tom Lane	c64de21e96	Fix qual-clause-misplacement issues with pulled-up LATERAL subqueries. In an example such as SELECT * FROM i LEFT JOIN LATERAL (SELECT * FROM j WHERE i.n = j.n) j ON true; it is safe to pull up the LATERAL subquery into its parent, but we must then treat the "i.n = j.n" clause as a qual clause of the LEFT JOIN. The previous coding in deconstruct_recurse mistakenly labeled the clause as "is_pushed_down", resulting in wrong semantics if the clause were applied at the join node, as per an example submitted awhile ago by Jeremy Evans. To fix, postpone processing of such clauses until we return back up to the appropriate recursion depth in deconstruct_recurse. In addition, tighten the is-safe-to-pull-up checks in is_simple_subquery; we previously missed the possibility that the LATERAL subquery might itself contain an outer join that makes lateral references in lower quals unsafe. A regression test case equivalent to Jeremy's example was already in my commit of yesterday, but was giving the wrong results because of this bug. This patch fixes the expected output for that, and also adds a test case for the second problem.	2013-08-19 13:19:41 -04:00
Tom Lane	9e7e29c75a	Fix planner problems with LATERAL references in PlaceHolderVars. The planner largely failed to consider the possibility that a PlaceHolderVar's expression might contain a lateral reference to a Var coming from somewhere outside the PHV's syntactic scope. We had a previous report of a problem in this area, which I tried to fix in a quick-hack way in commit 4da6439bd8553059766011e2a42c6e39df08717f, but Antonin Houska pointed out that there were still some problems, and investigation turned up other issues. This patch largely reverts that commit in favor of a more thoroughly thought-through solution. The new theory is that a PHV's ph_eval_at level cannot be higher than its original syntactic level. If it contains lateral references, those don't change the ph_eval_at level, but rather they create a lateral-reference requirement for the ph_eval_at join relation. The code in joinpath.c needs to handle that. Another issue is that createplan.c wasn't handling nested PlaceHolderVars properly. In passing, push knowledge of lateral-reference checks for join clauses into join_clause_is_movable_to. This is mainly so that FDWs don't need to deal with it. This patch doesn't fix the original join-qual-placement problem reported by Jeremy Evans (and indeed, one of the new regression test cases shows the wrong answer because of that). But the PlaceHolderVar problems need to be fixed before that issue can be addressed, so committing this separately seems reasonable.	2013-08-17 20:22:37 -04:00

1 2 3

106 Commits