nodeSubplan.c expects that the testexpr for a hashable ANY SubPlan
has the form of one or more OpExprs whose LHS is an expression of the
outer query's, while the RHS is an expression over Params representing
output columns of the subquery. However, the planner only went as far
as verifying that the clauses were all binary OpExprs. This works
99.99% of the time, because the clauses have the right shape when
emitted by the parser --- but it's possible for function inlining to
break that, as reported by PegoraroF10. To fix, teach the planner
to check that the LHS and RHS contain the right things, or more
accurately don't contain the wrong things. Given that this has been
broken for years without anyone noticing, it seems sufficient to just
give up hashing when it happens, rather than go to the trouble of
commuting the clauses back again (which wouldn't necessarily work
anyway).
While poking at that, I also noticed that nodeSubplan.c had a baked-in
assumption that the number of hash clauses is identical to the number
of subquery output columns. Again, that's fine as far as parser output
goes, but it's not hard to break it via function inlining. There seems
little reason for that assumption though --- AFAICS, the only thing
it's buying us is not having to store the number of hash clauses
explicitly. Adding code to the planner to reject such cases would take
more code than getting nodeSubplan.c to cope, so I fixed it that way.
This has been broken for as long as we've had hashable SubPlans,
so back-patch to all supported branches.
Discussion: https://postgr.es/m/1549209182255-0.post@n3.nabble.com
Up to now, createplan.c attempted to share PARAM_EXEC slots for
NestLoopParams across different plan levels, if the same underlying Var
was being fed down to different righthand-side subplan trees by different
NestLoops. This was, I think, more of an artifact of using subselect.c's
PlannerParamItem infrastructure than an explicit design goal, but anyway
that was the end result.
This works well enough as long as the plan tree is executing synchronously,
but the feature whereby Gather can execute the parallelized subplan locally
breaks it. An upper NestLoop node might execute for a row retrieved from
a parallel worker, and assign a value for a PARAM_EXEC slot from that row,
while the leader's copy of the parallelized subplan is suspended with a
different active value of the row the Var comes from. When control
eventually returns to the leader's subplan, it gets the wrong answers if
the same PARAM_EXEC slot is being used within the subplan, as reported
in bug #15577 from Bartosz Polnik.
This is pretty reminiscent of the problem fixed in commit 46c508fbc, and
the proper fix seems to be the same: don't try to share PARAM_EXEC slots
across different levels of controlling NestLoop nodes.
This requires decoupling NestLoopParam handling from PlannerParamItem
handling, although the logic remains somewhat similar. To avoid bizarre
division of labor between subselect.c and createplan.c, I decided to move
all the param-slot-assignment logic for both cases out of those files
and put it into a new file paramassign.c. Hopefully it's a bit better
documented now, too.
A regression test case for this might be nice, but we don't know a
test case that triggers the problem with a suitably small amount
of data.
Back-patch to 9.6 where we added Gather nodes. It's conceivable that
related problems exist in older branches; but without some evidence
for that, I'll leave the older branches alone.
Discussion: https://postgr.es/m/15577-ca61ab18904af852@postgresql.org
postgres_fdw's postgresGetForeignPlan() assumes without checking that the
outer_plan it's given for a join relation must have a NestLoop, MergeJoin,
or HashJoin node at the top. That's been wrong at least since commit
4bbf6edfb (which could cause insertion of a Sort node on top) and it seems
like a pretty unsafe thing to Just Assume even without that.
Through blind good fortune, this doesn't seem to have any worse
consequences today than strange EXPLAIN output, but it's clearly trouble
waiting to happen.
To fix, test the node type explicitly before touching Join-specific
fields, and avoid jamming the new tlist into a node type that can't
do projection. Export a new support function from createplan.c
to avoid building low-level knowledge about the latter into FDWs.
Back-patch to 9.6 where the faulty coding was added. Note that the
associated regression test cases don't show any changes before v11,
apparently because the tests back-patched with 4bbf6edfb don't actually
exercise the problem case before then (there's no top-level Sort
in those plans).
Discussion: https://postgr.es/m/8946.1544644803@sss.pgh.pa.us
Allowing sub-select containing LIMIT/OFFSET in workers can lead to
inconsistent results at the top-level as there is no guarantee that the
row order will be fully deterministic. The fix is to prohibit pushing
LIMIT/OFFSET within sub-selects to workers.
Reported-by: Andrew Fletcher
Bug: 15324
Author: Amit Kapila
Reviewed-by: Dilip Kumar
Backpatch-through: 9.6
Discussion: https://postgr.es/m/153417684333.10284.11356259990921828616@wrigleys.postgresql.org
In some cases a clause attached to an outer join can be pushed down into
the outer join's RHS even though the clause is not degenerate --- this
can happen if we choose to make a parameterized path for the RHS. If
the clause ends up attached to a lower outer join, we'd misclassify it
as being a "join filter" not a plain "filter" condition at that node,
leading to wrong query results.
To fix, teach extract_actual_join_clauses to examine each join clause's
required_relids, not just its is_pushed_down flag. (The latter now
seems vestigial, or at least in need of rethinking, but we won't do
anything so invasive as redefining it in a bug-fix patch.)
This has been wrong since we introduced parameterized paths in 9.2,
though it's evidently hard to hit given the lack of previous reports.
The test case used here involves a lateral function call, and I think
that a lateral reference may be required to get the planner to select
a broken plan; though I wouldn't swear to that. In any case, even if
LATERAL is needed to trigger the bug, it still affects all supported
branches, so back-patch to all.
Per report from Andreas Karlsson. Thanks to Andrew Gierth for
preliminary investigation.
Discussion: https://postgr.es/m/f8128b11-c5bf-3539-48cd-234178b2314d@proxel.se
One of the things canonicalize_qual() does is to remove constant-NULL
subexpressions of top-level AND/OR clauses. It does that on the assumption
that what it's given is a top-level WHERE clause, so that NULL can be
treated like FALSE. Although this is documented down inside a subroutine
of canonicalize_qual(), it wasn't mentioned in the documentation of that
function itself, and some callers hadn't gotten that memo.
Notably, commit d007a9505 caused get_relation_constraints() to apply
canonicalize_qual() to CHECK constraints. That allowed constraint
exclusion to misoptimize situations in which a CHECK constraint had a
provably-NULL subclause, as seen in the regression test case added here,
in which a child table that should be scanned is not. (Although this
thinko is ancient, the test case doesn't fail before 9.2, for reasons
I've not bothered to track down in detail. There may be related cases
that do fail before that.)
More recently, commit f0e44751d added an independent bug by applying
canonicalize_qual() to index expressions, which is even sillier since
those might not even be boolean. If they are, though, I think this
could lead to making incorrect index entries for affected index
expressions in v10. I haven't attempted to prove that though.
To fix, add an "is_check" parameter to canonicalize_qual() to specify
whether it should assume WHERE or CHECK semantics, and make it perform
NULL-elimination accordingly. Adjust the callers to apply the right
semantics, or remove the call entirely in cases where it's not known
that the expression has one or the other semantics. I also removed
the call in some cases involving partition expressions, where it should
be a no-op because such expressions should be canonical already ...
and was a no-op, independently of whether it could in principle have
done something, because it was being handed the qual in implicit-AND
format which isn't what it expects. In HEAD, add an Assert to catch
that type of mistake in future.
This represents an API break for external callers of canonicalize_qual().
While that's intentional in HEAD to make such callers think about which
case applies to them, it seems like something we probably wouldn't be
thanked for in released branches. Hence, in released branches, the
extra parameter is added to a new function canonicalize_qual_ext(),
and canonicalize_qual() is a wrapper that retains its old behavior.
Patch by me with suggestions from Dean Rasheed. Back-patch to all
supported branches.
Discussion: https://postgr.es/m/24475.1520635069@sss.pgh.pa.us
Given overlapping or partially redundant join clauses, for example
t1 JOIN t2 ON t1.a = t2.x AND t1.b = t2.x
the planner's EquivalenceClass machinery will ordinarily refactor the
clauses as "t1.a = t1.b AND t1.a = t2.x", so that join processing doesn't
see multiple references to the same EquivalenceClass in a list of join
equality clauses. However, if the join is outer, it's incorrect to derive
a restriction clause on the outer side from the join conditions, so the
clause refactoring does not happen and we end up with overlapping join
conditions. The code that attempted to deal with such cases had several
subtle bugs, which could result in "left and right pathkeys do not match in
mergejoin" or "outer pathkeys do not match mergeclauses" planner errors,
if the selected join plan type was a mergejoin. (It does not appear that
any actually incorrect plan could have been emitted.)
The core of the problem really was failure to recognize that the outer and
inner relations' pathkeys have different relationships to the mergeclause
list. A join's mergeclause list is constructed by reference to the outer
pathkeys, so it will always be ordered the same as the outer pathkeys, but
this cannot be presumed true for the inner pathkeys. If the inner sides of
the mergeclauses contain multiple references to the same EquivalenceClass
({t2.x} in the above example) then a simplistic rendering of the required
inner sort order is like "ORDER BY t2.x, t2.x", but the pathkey machinery
recognizes that the second sort column is redundant and throws it away.
The mergejoin planning code failed to account for that behavior properly.
One error was to try to generate cut-down versions of the mergeclause list
from cut-down versions of the inner pathkeys in the same way as the initial
construction of the mergeclause list from the outer pathkeys was done; this
could lead to choosing a mergeclause list that fails to match the outer
pathkeys. The other problem was that the pathkey cross-checking code in
create_mergejoin_plan treated the inner and outer pathkey lists
identically, whereas actually the expectations for them must be different.
That led to false "pathkeys do not match" failures in some cases, and in
principle could have led to failure to detect bogus plans in other cases,
though there is no indication that such bogus plans could be generated.
Reported by Alexander Kuzmenkov, who also reviewed this patch. This has
been broken for years (back to around 8.3 according to my testing), so
back-patch to all supported branches.
Discussion: https://postgr.es/m/5dad9160-4632-0e47-e120-8e2082000c01@postgrespro.ru
This backpatches 935dee9ad5a8d12f4d3b772a6e6c99d245e5ad44 to the
the branches requested by extension authors.
Original-Author: Metin Doslu
Original-Committer: Robert Haas
Author: Brian Cloutier
rewriteTargetListUD's processing is dependent on the relkind of the query's
target table. That was fine at the time it was made to act that way, even
for queries on inheritance trees, because all tables in an inheritance tree
would necessarily be plain tables. However, the 9.5 feature addition
allowing some members of an inheritance tree to be foreign tables broke the
assumption that rewriteTargetListUD's output tlist could be applied to all
child tables with nothing more than column-number mapping. This led to
visible failures if foreign child tables had row-level triggers, and would
also break in cases where child tables belonged to FDWs that used methods
other than CTID for row identification.
To fix, delay running rewriteTargetListUD until after the planner has
expanded inheritance, so that it is applied separately to the (already
mapped) tlist for each child table. We can conveniently call it from
preprocess_targetlist. Refactor associated code slightly to avoid the
need to heap_open the target relation multiple times during
preprocess_targetlist. (The APIs remain a bit ugly, particularly around
the point of which steps scribble on parse->targetList and which don't.
But avoiding such scribbling would require a change in FDW callback APIs,
which is more pain than it's worth.)
Also fix ExecModifyTable to ensure that "tupleid" is reset to NULL when
we transition from rows providing a CTID to rows that don't. (That's
really an independent bug, but it manifests in much the same cases.)
Add a regression test checking one manifestation of this problem, which
was that row-level triggers on a foreign child table did not work right.
Back-patch to 9.5 where the problem was introduced.
Etsuro Fujita, reviewed by Ildus Kurbangaliev and Ashutosh Bapat
Discussion: https://postgr.es/m/20170514150525.0346ba72@postgrespro.ru
Don't move parenthesized lines to the left, even if that means they
flow past the right margin.
By default, BSD indent lines up statement continuation lines that are
within parentheses so that they start just to the right of the preceding
left parenthesis. However, traditionally, if that resulted in the
continuation line extending to the right of the desired right margin,
then indent would push it left just far enough to not overrun the margin,
if it could do so without making the continuation line start to the left of
the current statement indent. That makes for a weird mix of indentations
unless one has been completely rigid about never violating the 80-column
limit.
This behavior has been pretty universally panned by Postgres developers.
Hence, disable it with indent's new -lpl switch, so that parenthesized
lines are always lined up with the preceding left paren.
This patch is much less interesting than the first round of indent
changes, but also bulkier, so I thought it best to separate the effects.
Discussion: https://postgr.es/m/E1dAmxK-0006EE-1r@gemulon.postgresql.org
Discussion: https://postgr.es/m/30527.1495162840@sss.pgh.pa.us
Change pg_bsd_indent to follow upstream rules for placement of comments
to the right of code, and remove pgindent hack that caused comments
following #endif to not obey the general rule.
Commit e3860ffa4dd0dad0dd9eea4be9cc1412373a8c89 wasn't actually using
the published version of pg_bsd_indent, but a hacked-up version that
tried to minimize the amount of movement of comments to the right of
code. The situation of interest is where such a comment has to be
moved to the right of its default placement at column 33 because there's
code there. BSD indent has always moved right in units of tab stops
in such cases --- but in the previous incarnation, indent was working
in 8-space tab stops, while now it knows we use 4-space tabs. So the
net result is that in about half the cases, such comments are placed
one tab stop left of before. This is better all around: it leaves
more room on the line for comment text, and it means that in such
cases the comment uniformly starts at the next 4-space tab stop after
the code, rather than sometimes one and sometimes two tabs after.
Also, ensure that comments following #endif are indented the same
as comments following other preprocessor commands such as #else.
That inconsistency turns out to have been self-inflicted damage
from a poorly-thought-through post-indent "fixup" in pgindent.
This patch is much less interesting than the first round of indent
changes, but also bulkier, so I thought it best to separate the effects.
Discussion: https://postgr.es/m/E1dAmxK-0006EE-1r@gemulon.postgresql.org
Discussion: https://postgr.es/m/30527.1495162840@sss.pgh.pa.us
The new indent version includes numerous fixes thanks to Piotr Stefaniak.
The main changes visible in this commit are:
* Nicer formatting of function-pointer declarations.
* No longer unexpectedly removes spaces in expressions using casts,
sizeof, or offsetof.
* No longer wants to add a space in "struct structname *varname", as
well as some similar cases for const- or volatile-qualified pointers.
* Declarations using PG_USED_FOR_ASSERTS_ONLY are formatted more nicely.
* Fixes bug where comments following declarations were sometimes placed
with no space separating them from the code.
* Fixes some odd decisions for comments following case labels.
* Fixes some cases where comments following code were indented to less
than the expected column 33.
On the less good side, it now tends to put more whitespace around typedef
names that are not listed in typedefs.list. This might encourage us to
put more effort into typedef name collection; it's not really a bug in
indent itself.
There are more changes coming after this round, having to do with comment
indentation and alignment of lines appearing within parentheses. I wanted
to limit the size of the diffs to something that could be reviewed without
one's eyes completely glazing over, so it seemed better to split up the
changes as much as practical.
Discussion: https://postgr.es/m/E1dAmxK-0006EE-1r@gemulon.postgresql.org
Discussion: https://postgr.es/m/30527.1495162840@sss.pgh.pa.us
In a CHECK clause, a null result means true, whereas in a WHERE clause
it means false. predtest.c provided different functions depending on
which set of semantics applied to the predicate being proved, but had
no option to control what a null meant in the clauses provided as
axioms. Add one.
Use that in the partitioning code when figuring out whether the
validation scan on a new partition can be skipped. Rip out the
old logic that attempted (not very successfully) to compensate
for the absence of the necessary support in predtest.c.
Ashutosh Bapat and Robert Haas, reviewed by Amit Langote and
incorporating feedback from Tom Lane.
Discussion: http://postgr.es/m/CAFjFpReT_kq_uwU_B8aWDxR7jNGE=P0iELycdq5oupi=xSQTOw@mail.gmail.com
I'd always assumed that backend/optimizer/geqo/'s remarkably poor
showing on code coverage metrics was because we weren't exercising
it much in the regression tests. But it turns out that a good chunk
of the problem is that there's a bunch of code that is physically
unreachable (because the calls to it are #ifdef'd out in geqo_main.c)
but is being built anyway. Making the called code have #if guards
similar to the calling code saves a couple of kilobytes of executable
size and should make the coverage numbers more reflective of reality.
It's arguable that we should just delete all the unused recombination
mechanisms altogether, but I didn't feel a need to go that far today.
If the inner relation can be proven unique, that is it can have no more
than one matching row for any row of the outer query, then we might as
well implement the semijoin as a plain inner join, allowing substantially
more freedom to the planner. This is a form of outer join strength
reduction, but it can't be implemented in reduce_outer_joins() because
we don't have enough info about the individual relations at that stage.
Instead do it much like remove_useless_joins(): once we've built base
relations, we can make another pass over the SpecialJoinInfo list and
get rid of any entries representing reducible semijoins.
This is essentially a followon to the inner-unique patch (commit 9c7f5229a)
and makes use of the proof machinery that that patch created. We need only
minor refactoring of innerrel_is_unique's API to support this usage.
Per performance complaint from Teodor Sigaev.
Discussion: https://postgr.es/m/f994fc98-389f-4a46-d1bc-c42e05cb43ed@sigaev.ru
If there can certainly be no more than one matching inner row for a given
outer row, then the executor can move on to the next outer row as soon as
it's found one match; there's no need to continue scanning the inner
relation for this outer row. This saves useless scanning in nestloop
and hash joins. In merge joins, it offers the opportunity to skip
mark/restore processing, because we know we have not advanced past the
first possible match for the next outer row.
Of course, the devil is in the details: the proof of uniqueness must
depend only on joinquals (not otherquals), and if we want to skip
mergejoin mark/restore then it must depend only on merge clauses.
To avoid adding more planning overhead than absolutely necessary,
the present patch errs in the conservative direction: there are cases
where inner_unique or skip_mark_restore processing could be used, but
it will not do so because it's not sure that the uniqueness proof
depended only on "safe" clauses. This could be improved later.
David Rowley, reviewed and rather heavily editorialized on by me
Discussion: https://postgr.es/m/CAApHDvqF6Sw-TK98bW48TdtFJ+3a7D2mFyZ7++=D-RyPsL76gw@mail.gmail.com
Follow on patch in the multi-variate statistics patch series.
CREATE STATISTICS s1 WITH (dependencies) ON (a, b) FROM t;
ANALYZE;
will collect dependency stats on (a, b) and then use the measured
dependency in subsequent query planning.
Commit 7b504eb282ca2f5104b5c00b4f05a3ef6bb1385b added
CREATE STATISTICS with n-distinct coefficients. These are now
specified using the mutually exclusive option WITH (ndistinct).
Author: Tomas Vondra, David Rowley
Reviewed-by: Kyotaro HORIGUCHI, Álvaro Herrera, Dean Rasheed, Robert Haas
and many other comments and contributions
Discussion: https://postgr.es/m/56f40b20-c464-fad2-ff39-06b668fac47c@2ndquadrant.com
Currently, the only type of child relation is an "other member rel",
which is the child of a baserel, but in the future joins and even
upper relations may have child rels. To facilitate that, introduce
macros that test to test for particular RelOptKind values, and use
them in various places where they help to clarify the sense of a test.
(For example, a test may allow RELOPT_OTHER_MEMBER_REL either because
it intends to allow child rels, or because it intends to allow simple
rels.)
Also, remove find_childrel_top_parent, which will not work for a
child rel that is not a baserel. Instead, add a new RelOptInfo
member top_parent_relids to track the same kind of information in a
more generic manner.
Ashutosh Bapat, slightly tweaked by me. Review and testing of the
patch set from which this was taken by Rajkumar Raghuwanshi and Rafia
Sabih.
Discussion: http://postgr.es/m/CA+TgmoagTnF2yqR3PT2rv=om=wJiZ4-A+ATwdnriTGku1CLYxA@mail.gmail.com
A QueryEnvironment concept is added, which allows new types of
objects to be passed into queries from parsing on through
execution. At this point, the only thing implemented is a
collection of EphemeralNamedRelation objects -- relations which
can be referenced by name in queries, but do not exist in the
catalogs. The only type of ENR implemented is NamedTuplestore, but
provision is made to add more types fairly easily.
An ENR can carry its own TupleDesc or reference a relation in the
catalogs by relid.
Although these features can be used without SPI, convenience
functions are added to SPI so that ENRs can easily be used by code
run through SPI.
The initial use of all this is going to be transition tables in
AFTER triggers, but that will be added to each PL as a separate
commit.
An incidental effect of this patch is to produce a more informative
error message if an attempt is made to modify the contents of a CTE
from a referencing DML statement. No tests previously covered that
possibility, so one is added.
Kevin Grittner and Thomas Munro
Reviewed by Heikki Linnakangas, David Fetter, and Thomas Munro
with valuable comments and suggestions from many others
copyObject() is declared to return void *, which allows easily assigning
the result independent of the input, but it loses all type checking.
If the compiler supports typeof or something similar, cast the result to
the input type. This creates a greater amount of type safety. In some
cases, where the result is assigned to a generic type such as Node * or
Expr *, new casts are now necessary, but in general casts are now
unnecessary in the normal case and indicate that something unusual is
happening.
Reviewed-by: Mark Dilger <hornschnorter@gmail.com>
This extends the Aggregate node with two new features: HashAggregate
can now run multiple hashtables concurrently, and a new strategy
MixedAggregate populates hashtables while doing sorted grouping.
The planner will now attempt to save as many sorts as possible when
planning grouping sets queries, while not exceeding work_mem for the
estimated combined sizes of all hashtables used. No SQL-level changes
are required. There should be no user-visible impact other than the
new EXPLAIN output and possible changes to result ordering when ORDER
BY was not used (which affected a few regression tests). The
enable_hashagg option is respected.
Author: Andrew Gierth
Reviewers: Mark Dilger, Andres Freund
Discussion: https://postgr.es/m/87vatszyhj.fsf@news-spur.riddles.org.uk
Partitioned tables do not contain any data; only their unpartitioned
descendents need to be scanned. However, the partitioned tables still
need to be locked, even though they're not scanned. To make that
work, Append and MergeAppend relations now need to carry a list of
(unscanned) partitioned relations that must be locked, and InitPlan
must lock all partitioned result relations.
Aside from the obvious advantage of avoiding some work at execution
time, this has two other advantages. First, it may improve the
planner's decision-making in some cases since the empty relation
might throw things off. Second, it paves the way to getting rid of
the storage for partitioned tables altogether.
Amit Langote, reviewed by me.
Discussion: http://postgr.es/m/6837c359-45c4-8044-34d1-736756335a15@lab.ntt.co.jp
Commit 51ee6f3160d2e1515ed6197594bda67eb99dc2cc accidentally changed
the behavior around inheritance hierarchies; before, we always
considered parallel paths even for very small inheritance children,
because otherwise an inheritance hierarchy with even one small child
wouldn't be eligible for parallelism. That exception was inadverently
removed; put it back.
In passing, also adjust the degree-of-parallelism comptuation for
index-only scans not to consider the number of heap pages fetched.
Otherwise, we'll avoid parallel index-only scans on tables that are
mostly all-visible, which isn't especially logical.
Robert Haas and Amit Kapila, per a report from Ashutosh Sharma.
Discussion: http://postgr.es/m/CAE9k0PmgSoOHRd60SHu09aRVTHRSs8s6pmyhJKWHxWw9C_x+XA@mail.gmail.com
Like Gather, we spawn multiple workers and run the same plan in each
one; however, Gather Merge is used when each worker produces the same
output ordering and we want to preserve that output ordering while
merging together the streams of tuples from various workers. (In a
way, Gather Merge is like a hybrid of Gather and MergeAppend.)
This works out to a win if it saves us from having to perform an
expensive Sort. In cases where only a small amount of data would need
to be sorted, it may actually be faster to use a regular Gather node
and then sort the results afterward, because Gather Merge sometimes
needs to wait synchronously for tuples whereas a pure Gather generally
doesn't. But if this avoids an expensive sort then it's a win.
Rushabh Lathia, reviewed and tested by Amit Kapila, Thomas Munro,
and Neha Sharma, and reviewed and revised by me.
Discussion: http://postgr.es/m/CAGPqQf09oPX-cQRpBKS0Gq49Z+m6KBxgxd_p9gX8CKk_d75HoQ@mail.gmail.com
The index is scanned by a single process, but then all cooperating
processes can iterate jointly over the resulting set of heap blocks.
In the future, we might also want to support using a parallel bitmap
index scan to set up for a parallel bitmap heap scan, but that's a
job for another day.
Dilip Kumar, with some corrections and cosmetic changes by me. The
larger patch set of which this is a part has been reviewed and tested
by (at least) Andres Freund, Amit Khandekar, Tushar Ahuja, Rafia
Sabih, Haribabu Kommi, Thomas Munro, and me.
Discussion: http://postgr.es/m/CAFiTN-uc4=0WxRGfCzs-xfkMYcSEWUC-Fon6thkJGjkh9i=13A@mail.gmail.com
XMLTABLE is defined by the SQL/XML standard as a feature that allows
turning XML-formatted data into relational form, so that it can be used
as a <table primary> in the FROM clause of a query.
This new construct provides significant simplicity and performance
benefit for XML data processing; what in a client-side custom
implementation was reported to take 20 minutes can be executed in 400ms
using XMLTABLE. (The same functionality was said to take 10 seconds
using nested PostgreSQL XPath function calls, and 5 seconds using
XMLReader under PL/Python).
The implemented syntax deviates slightly from what the standard
requires. First, the standard indicates that the PASSING clause is
optional and that multiple XML input documents may be given to it; we
make it mandatory and accept a single document only. Second, we don't
currently support a default namespace to be specified.
This implementation relies on a new executor node based on a hardcoded
method table. (Because the grammar is fixed, there is no extensibility
in the current approach; further constructs can be implemented on top of
this such as JSON_TABLE, but they require changes to core code.)
Author: Pavel Stehule, Álvaro Herrera
Extensively reviewed by: Craig Ringer
Discussion: https://postgr.es/m/CAFj8pRAgfzMD-LoSmnMGybD0WsEznLHWap8DO79+-GTRAPR4qA@mail.gmail.com
Extract the logic used by hash_inner_and_outer into a separate
function, get_cheapest_parallel_safe_total_inner, so that it can
also be used to plan parallel merge joins.
Also, add a require_parallel_safe argument to the existing function
get_cheapest_path_for_pathkeys, because parallel merge join needs
to find the cheapest path for a given set of pathkeys that is
parallel-safe, not just the cheapest one overall.
Patch by me, reviewed by Dilip Kumar.
Discussion: http://postgr.es/m/CA+TgmoYOv+dFK0MWW6366dFj_xTnohQfoBDrHyB7d1oZhrgPjA@mail.gmail.com
In combination with 569174f1be92be93f5366212cc46960d28a5c5cd, which
taught the btree AM how to perform parallel index scans, this allows
parallel index scan plans on btree indexes. This infrastructure
should be general enough to support parallel index scans for other
index AMs as well, if someone updates them to support parallel
scans.
Amit Kapila, reviewed and tested by Anastasia Lubennikova, Tushar
Ahuja, and Haribabu Kommi, and me.
When min_parallel_relation_size was added, the only supported type
of parallel scan was a parallel sequential scan, but there are
pending patches for parallel index scan, parallel index-only scan,
and parallel bitmap heap scan. Those patches introduce two new
types of complications: first, what's relevant is not really the
total size of the relation but the portion of it that we will scan;
and second, index pages and heap pages shouldn't necessarily be
treated in exactly the same way. Typically, the number of index
pages will be quite small, but that doesn't necessarily mean that
a parallel index scan can't pay off.
Therefore, we introduce min_parallel_table_scan_size, which works
out a degree of parallelism for scans based on the number of table
pages that will be scanned (and which is therefore equivalent to
min_parallel_relation_size for parallel sequential scans) and also
min_parallel_index_scan_size which can be used to work out a degree
of parallelism based on the number of index pages that will be
scanned.
Amit Kapila and Robert Haas
Discussion: http://postgr.es/m/CAA4eK1KowGSYYVpd2qPpaPPA5R90r++QwDFbrRECTE9H_HvpOg@mail.gmail.com
Discussion: http://postgr.es/m/CAA4eK1+TnM4pXQbvn7OXqam+k_HZqb0ROZUMxOiL6DWJYCyYow@mail.gmail.com
Currently, we only need this logic in order to cost a Bitmap Heap
Scan. But a pending patch for Parallel Bitmap Heap Scan also uses
it to help figure out how many workers to use for the scan, which
has to be determined prior to costing. So, move the logic to
a separate function to make that easier.
Dilip Kumar. The patch series of which this is a part has been
reviewed by Andres Freund, Amit Khendekar, Tushar Ahuja, Rafia
Sabih, Haribabu Kommi, and me; it is not clear from the email
discussion which of those people have looked specifically at this
part.
Discussion: http://postgr.es/m/CAFiTN-v3QYNJEZnnmKCeATuLbN-h9tMVfeEF0+BrouYDqjXgwg@mail.gmail.com
Evaluation of set returning functions (SRFs_ in the targetlist (like SELECT
generate_series(1,5)) so far was done in the expression evaluation (i.e.
ExecEvalExpr()) and projection (i.e. ExecProject/ExecTargetList) code.
This meant that most executor nodes performing projection, and most
expression evaluation functions, had to deal with the possibility that an
evaluated expression could return a set of return values.
That's bad because it leads to repeated code in a lot of places. It also,
and that's my (Andres's) motivation, made it a lot harder to implement a
more efficient way of doing expression evaluation.
To fix this, introduce a new executor node (ProjectSet) that can evaluate
targetlists containing one or more SRFs. To avoid the complexity of the old
way of handling nested expressions returning sets (e.g. having to pass up
ExprDoneCond, and dealing with arguments to functions returning sets etc.),
those SRFs can only be at the top level of the node's targetlist. The
planner makes sure (via split_pathtarget_at_srfs()) that SRF evaluation is
only necessary in ProjectSet nodes and that SRFs are only present at the
top level of the node's targetlist. If there are nested SRFs the planner
creates multiple stacked ProjectSet nodes. The ProjectSet nodes always get
input from an underlying node.
We also discussed and prototyped evaluating targetlist SRFs using ROWS
FROM(), but that turned out to be more complicated than we'd hoped.
While moving SRF evaluation to ProjectSet would allow to retain the old
"least common multiple" behavior when multiple SRFs are present in one
targetlist (i.e. continue returning rows until all SRFs are at the end of
their input at the same time), we decided to instead only return rows till
all SRFs are exhausted, returning NULL for already exhausted ones. We
deemed the previous behavior to be too confusing, unexpected and actually
not particularly useful.
As a side effect, the previously prohibited case of multiple set returning
arguments to a function, is now allowed. Not because it's particularly
desirable, but because it ends up working and there seems to be no argument
for adding code to prohibit it.
Currently the behavior for COALESCE and CASE containing SRFs has changed,
returning multiple rows from the expression, even when the SRF containing
"arm" of the expression is not evaluated. That's because the SRFs are
evaluated in a separate ProjectSet node. As that's quite confusing, we're
likely to instead prohibit SRFs in those places. But that's still being
discussed, and the code would reside in places not touched here, so that's
a task for later.
There's a lot of, now superfluous, code dealing with set return expressions
around. But as the changes to get rid of those are verbose largely boring,
it seems better for readability to keep the cleanup as a separate commit.
Author: Tom Lane and Andres Freund
Discussion: https://postgr.es/m/20160822214023.aaxz5l4igypowyri@alap3.anarazel.de
In an RLS query, we must ensure that security filter quals are evaluated
before ordinary query quals, in case the latter contain "leaky" functions
that could expose the contents of sensitive rows. The original
implementation of RLS planning ensured this by pushing the scan of a
secured table into a sub-query that it marked as a security-barrier view.
Unfortunately this results in very inefficient plans in many cases, because
the sub-query cannot be flattened and gets planned independently of the
rest of the query.
To fix, drop the use of sub-queries to enforce RLS qual order, and instead
mark each qual (RestrictInfo) with a security_level field establishing its
priority for evaluation. Quals must be evaluated in security_level order,
except that "leakproof" quals can be allowed to go ahead of quals of lower
security_level, if it's helpful to do so. This has to be enforced within
the ordering of any one list of quals to be evaluated at a table scan node,
and we also have to ensure that quals are not chosen for early evaluation
(i.e., use as an index qual or TID scan qual) if they're not allowed to go
ahead of other quals at the scan node.
This is sufficient to fix the problem for RLS quals, since we only support
RLS policies on simple tables and thus RLS quals will always exist at the
table scan level only. Eventually these qual ordering rules should be
enforced for join quals as well, which would permit improving planning for
explicit security-barrier views; but that's a task for another patch.
Note that FDWs would need to be aware of these rules --- and not, for
example, send an insecure qual for remote execution --- but since we do
not yet allow RLS policies on foreign tables, the case doesn't arise.
This will need to be addressed before we can allow such policies.
Patch by me, reviewed by Stephen Frost and Dean Rasheed.
Discussion: https://postgr.es/m/8185.1477432701@sss.pgh.pa.us
Normally, if we have a WHERE clause like "indexcol = constant",
the planner will figure out that that index column can be ignored
when determining whether the index has a desired sort ordering.
But this failed to work for boolean index columns, because a
condition like "boolcol = true" is canonicalized to just "boolcol"
which does not give rise to an EquivalenceClass. Add a check to
allow the same type of deduction to be made in this case too.
Per a complaint from Dima Pavlov. Arguably this is a bug, but given the
limited impact and the small number of complaints so far, I won't risk
destabilizing plans in stable branches by back-patching.
Patch by me, reviewed by Michael Paquier
Discussion: https://postgr.es/m/1788.1481605684@sss.pgh.pa.us
We need to scan the whole parse tree for parallel-unsafe functions.
If there are none, we'll later need to determine whether particular
subtrees contain any parallel-restricted functions. The previous coding
retained no knowledge from the first scan, even though this is very
wasteful in the common case where the query contains only parallel-safe
functions. We can bypass all of the later scans by remembering that fact.
This provides a small but measurable speed improvement when the case
applies, and shouldn't cost anything when it doesn't.
Patch by me, reviewed by Robert Haas
Discussion: <3740.1471538387@sss.pgh.pa.us>
It's rather silly to make a separate pass over the tlist + HAVING qual,
and a separate set of visits to the syscache, when get_agg_clause_costs
already has all the required information in hand. This nets out as less
code as well as fewer cycles.
The original coding had three separate booleans representing partial
aggregation behavior, which was confusing, unreadable, and error-prone,
not least because the booleans weren't always listed in the same order.
It was also inadequate for the allegedly-desirable future extension to
support intermediate partial aggregation, because we'd need separate
markers for serialization and deserialization in such a case.
Merge these bools into an enum "AggSplit" to provide symbolic names for
the supported operating modes (and document what those are). By assigning
the values of the enum constants carefully, we can treat AggSplit values
as options bitmasks so that tests of what to do aren't noticeably more
expensive than before.
While at it, get rid of Aggref.aggoutputtype. That's not needed since
commit 59a3795c2 got rid of setrefs.c's special-purpose Aggref comparison
code, and it likewise seemed more confusing than helpful.
Assorted comment cleanup as well (there's still more that I want to do
in that line).
catversion bump for change in Aggref node contents. Should be the last
one for partial-aggregation changes.
Discussion: <29309.1466699160@sss.pgh.pa.us>
Commit e06a38965's original coding for constructing the execution-time
expression tree for a combining aggregate was rather messy, involving
duplicating quite a lot of code in setrefs.c so that it could inject
a nonstandard matching rule for Aggrefs. Get rid of that in favor of
explicitly constructing a combining Aggref with a partial Aggref as input,
then allowing setref's normal matching logic to match the partial Aggref
to the output of the lower plan node and hence replace it with a Var.
In passing, rename and redocument make_partialgroup_input_target to have
some connection to what it actually does.
The original upper-planner-pathification design (commit 3fc6e2d7f5b652b4)
assumed that we could always determine during Path formation whether or not
we would need a Result plan node to perform projection of a targetlist.
That turns out not to work very well, though, because createplan.c still
has some responsibilities for choosing the specific target list associated
with sorting/grouping nodes (in particular it might choose to add resjunk
columns for sorting). We might not ever refactor that --- doing so would
push more work into Path formation, which isn't attractive --- and we
certainly won't do so for 9.6. So, while create_projection_path and
apply_projection_to_path can tell for sure what will happen if the subpath
is projection-capable, they can't tell for sure when it isn't. This is at
least a latent bug in apply_projection_to_path, which might think it can
apply a target to a non-projecting node when the node will end up computing
something different.
Also, I'd tied the creation of a ProjectionPath node to whether or not a
Result is needed, but it turns out that we sometimes need a ProjectionPath
node anyway to avoid modifying a possibly-shared subpath node. Callers had
to use create_projection_path for such cases, and we added code to them
that knew about the potential omission of a Result node and attempted to
adjust the cost estimates for that. That was uncertainly correct and
definitely ugly/unmaintainable.
To fix, have create_projection_path explicitly check whether a Result
is needed and adjust its cost estimate accordingly, though it creates
a ProjectionPath in either case. apply_projection_to_path is now mostly
just an optimized version that can avoid creating an extra Path node when
the input is known to not be shared with any other live path. (There
is one case that create_projection_path doesn't handle, which is pushing
parallel-safe expressions below a Gather node. We could make it do that
by duplicating the GatherPath, but there seems no need as yet.)
create_projection_plan still has to recheck the tlist-match condition,
which means that if the matching situation does get changed by createplan.c
then we'll have made a slightly incorrect cost estimate. But there seems
no help for that in the near term, and I doubt it occurs often enough,
let alone would change planning decisions often enough, to be worth
stressing about.
I added a "dummypp" field to ProjectionPath to track whether
create_projection_path thinks a Result is needed. This is not really
necessary as-committed because create_projection_plan doesn't look at the
flag; but it seems like a good idea to remember what we thought when
forming the cost estimate, if only for debugging purposes.
In passing, get rid of the target_parallel parameter added to
apply_projection_to_path by commit 54f5c5150. I don't think that's a good
idea because it involves callers in what should be an internal decision,
and opens us up to missing optimization opportunities if callers think they
don't need to provide a valid flag, as most don't. For the moment, this
just costs us an extra has_parallel_hazard call when planning a Gather.
If that starts to look expensive, I think a better solution would be to
teach PathTarget to carry/cache knowledge of parallel-safety of its
contents.
This patch provides a new implementation of the logic added by commit
137805f89 and later removed by 77ba61080. It differs from the original
primarily in expending much less effort per joinrel in large queries,
which it accomplishes by doing most of the matching work once per query not
once per joinrel. Hopefully, it's also less buggy and better commented.
The never-documented enable_fkey_estimates GUC remains gone.
There remains work to be done to make the selectivity estimates account
for nulls in FK referencing columns; but that was true of the original
patch as well. We may be able to address this point later in beta.
In the meantime, any error should be in the direction of overestimating
rather than underestimating joinrel sizes, which seems like the direction
we want to err in.
Tomas Vondra and Tom Lane
Discussion: <31041.1465069446@sss.pgh.pa.us>
Commit 04ae11f62e643e07c411c4935ea6af46cb112aa9 removed some broken
code to apply the scan/join target to partial paths, but its theory
that this processing step is totally unnecessary turns out to be wrong.
Put similar code back again, but this time, check for parallel-safety
and avoid in-place modifications to paths that may already have been
used as part of some other path.
(This is not an entirely elegant solution to this problem; it might
be better, for example, to postpone generate_gather_paths for the
topmost scan/join rel until after the scan/join target has been
applied. But this is not the time for such redesign work.)
Amit Kapila and Robert Haas
The main point of doing this is to allow the cutoff to be set very small,
even zero, to allow parallel-query behavior to be tested on relatively
small tables such as we typically use in the regression tests. But it
might be of use to users too. The number-of-workers scaling behavior in
create_plain_partial_paths() is pretty ad-hoc and subject to change, so
we won't expose anything about that, but the notion of not considering
parallel query at all for tables below size X seems reasonably stable.
Amit Kapila, per a suggestion from me
Discussion: <17170.1465830165@sss.pgh.pa.us>
As noted by Andres Freund, we'd accumulated quite a few similar functions
in clauses.c that examine all functions in an expression tree to see if
they satisfy some boolean test. Reduce the duplication by inventing a
function check_functions_in_node() that applies a simple callback function
to each SQL function OID appearing in a given expression node. This also
fixes some arguable oversights; for example, contain_mutable_functions()
did not check aggregate or window functions for mutability. I doubt that
that represents a live bug at the moment, because we don't really consider
mutability for aggregates; but it might someday be one.
I chose to put check_functions_in_node() in nodeFuncs.c because it seemed
like other modules might wish to use it in future. That in turn forced
moving set_opfuncid() et al into nodeFuncs.c, as the alternative was for
nodeFuncs.c to depend on optimizer/setrefs.c which didn't seem very clean.
In passing, teach contain_leaked_vars_walker() about a few more expression
node types it can safely look through, and improve the rather messy and
undercommented code in has_parallel_hazard_walker().
Discussion: <20160527185853.ziol2os2zskahl7v@alap3.anarazel.de>
This terminology provoked widespread complaints. So, instead, rename
the GUC max_parallel_degree to max_parallel_workers_per_gather
(leaving room for a possible future GUC max_parallel_workers that acts
as a system-wide limit), and rename the parallel_degree reloption to
parallel_workers. Rename structure members to match.
These changes create a dump/restore hazard for users of PostgreSQL
9.6beta1 who have set the reloption (or applied the GUC using ALTER
USER or ALTER DATABASE).
This commit reverts 137805f89 as well as the associated commits 015e88942,
5306df283, and 68d704edb. We found multiple bugs in this feature, and
there was concern about possible planner slowdown (though to be fair,
exhibiting a very large slowdown proved difficult). The way forward
requires a considerable rewrite, which may or may not be possible to
accomplish in time for beta2. In my judgment reviewing the rewrite will
be easier to accomplish starting from a clean slate, so let's temporarily
revert what's there now. This also leaves us in a safe state if it turns
out to be necessary to postpone the rewrite to the next development cycle.
Discussion: <20160429102531.GA13701@huehner.biz>
Given a three-or-more-way equivalence class, such as X.Y = Y.Y = Z.Z,
it was possible for the planner to omit one of the quals needed to
enforce that all members of the equivalence class are actually equal.
This only happened in the case of a parameterized join node for two
of the relations, that is a plan tree like
Nested Loop
-> Scan X
-> Nested Loop
-> Scan Y
-> Scan Z
Filter: Z.Z = X.X
The eclass machinery normally expects to apply X.X = Y.Y when those
two relations are joined, but in this shape of plan tree they aren't
joined until the top node --- and, if the lower nested loop is marked
as parameterized by X, the top node will assume that the relevant eclass
condition(s) got pushed down into the lower node. On the other hand,
the scan of Z assumes that it's only responsible for constraining Z.Z
to match any one of the other eclass members. So one or another of
the required quals sometimes fell between the cracks, depending on
whether consideration of the eclass in get_joinrel_parampathinfo()
for the lower nested loop chanced to generate X.X = Y.Y or X.X = Z.Z
as the appropriate constraint there. If it generated the latter,
it'd erroneously suppose that the Z scan would take care of matters.
To fix, force X.X = Y.Y to be generated and applied at that join node
when this case occurs.
This is *extremely* hard to hit in practice, because various planner
behaviors conspire to mask the problem; starting with the fact that the
planner doesn't really like to generate a parameterized plan of the
above shape. (It might have been impossible to hit it before we
tweaked things to allow this plan shape for star-schema cases.) Many
thanks to Alexander Kirkouski for submitting a reproducible test case.
The bug can be demonstrated in all branches back to 9.2 where parameterized
paths were introduced, so back-patch that far.