The date/time values 'current', 'invalid', and 'undefined' were
removed a long time ago, but the code still contains explicit error
handling for the transition. To simplify the code and avoid having to
handle these values everywhere, just remove the recognition of these
tokens altogether now.
Reviewed-by: Michael Paquier <michael@paquier.xyz>
In commit 18555b132 we tentatively established a rule that regression
tests should use names containing "regression" for databases, and names
starting with "regress_" for all other globally-visible object names, so
as to circumscribe the side-effects that "make installcheck" could have
on an existing installation.
This commit adds a simple enforcement mechanism for that rule: if the code
is compiled with ENFORCE_REGRESSION_TEST_NAME_RESTRICTIONS defined, it
will emit a warning (not an error) whenever a database, role, tablespace,
subscription, or replication origin name is created that doesn't obey the
rule. Running one or more buildfarm members with that symbol defined
should be enough to catch new violations, at least in the regular
regression tests. Most TAP tests wouldn't notice such warnings, but
that's actually fine because TAP tests don't execute against an existing
server anyway.
Since it's already the case that running src/test/modules/ tests in
installcheck mode is deprecated, we can use that as a home for tests
that seem unsafe to run against an existing server, such as tests that
might have side-effects on existing roles. Document that (though this
commit doesn't in itself make it any less safe than before).
Update regress.sgml to define these restrictions more clearly, and
to clean up assorted lack-of-up-to-date-ness in its descriptions of
the available regression tests.
Discussion: https://postgr.es/m/16638.1468620817@sss.pgh.pa.us
In commit 18555b132 we tentatively established a rule that regression
tests should use names containing "regression" for databases, and names
starting with "regress_" for all other globally-visible object names, so
as to circumscribe the side-effects that "make installcheck" could have on
an existing installation. However, no enforcement mechanism was created,
so it's unsurprising that some new violations have crept in since then.
In fact, a whole new *category* of violations has crept in, to wit we now
also have globally-visible subscription and replication origin names, and
"make installcheck" could very easily clobber user-created objects of
those types. So it's past time to do something about this.
This commit sanitizes the tests enough that they will pass (i.e. not
generate any visible warnings) with the enforcement mechanism I'll add
in the next commit. There are some TAP tests that still trigger the
warnings, but the warnings do not cause test failure. Since these tests
do not actually run against a pre-existing installation, there's no need
to worry whether they could conflict with user-created objects.
The problem with rolenames.sql testing special role names like "user"
is still there, and is dealt with only very cosmetically in this patch
(by hiding the warnings :-(). What we actually need to do to be safe is
to take that test script out of "make installcheck" altogether, but that
seems like material for a separate patch.
Discussion: https://postgr.es/m/16638.1468620817@sss.pgh.pa.us
We forgot to map column numbers to/from the default partition for
various operations, leading to valid cases failing with spurious
errors, such as
ERROR: attribute N of type some_partition has been dropped
It was also possible that the search for conflicting rows in the default
partition when attaching another partition would fail to detect some.
Secondarily, it was also possible that such a search should be skipped
(because the constraint was implied) but wasn't.
Fix all this by mapping column numbers when necessary.
Reported by: Daniel Wilches
Author: Amit Langote
Discussion: https://postgr.es/m/15873-8c61945d6b3ef87c@postgresql.org
When a partitioned tables contains foreign tables as partitions, it is
not possible to implement unique or primary key indexes -- but when
regular indexes are created, there is no reason to do anything other
than ignoring such partitions. We were raising errors upon encountering
the foreign partitions, which is unfriendly and doesn't protect against
any actual problems.
Relax this restriction so that index creation is allowed on partitioned
tables containing foreign partitions, becoming a no-op on them. (We may
later want to redefine this so that the FDW is told to create the
indexes on the foreign side.) This applies to CREATE INDEX, as well as
ALTER TABLE / ATTACH PARTITION and CREATE TABLE / PARTITION OF.
Backpatch to 11, where indexes on partitioned tables were introduced.
Discussion: https://postgr.es/m/15724-d5a58fa9472eef4f@postgresql.org
Author: Álvaro Herrera
Reviewed-by: Amit Langote
a96c41f has introduced the option for heap, but it still lacked the
variant to control the behavior for toast relations.
While on it, refactor the tests so as they stress more scenarios with
the various values that vacuum_index_cleanup can use. It would be
useful to couple those tests with pageinspect to check that pages are
actually cleaned up, but this is left for later.
Author: Masahiko Sawada, Michael Paquier
Reviewed-by: Peter Geoghegan
Discussion: https://postgr.es/m/CAD21AoCqs8iN04RX=i1KtLSaX5RrTEM04b7NHYps4+rqtpWNEg@mail.gmail.com
This patch reverts all the code changes of commit e76de8861, which turns
out to have been seriously misguided. We can't wait till later to compute
the definition string for an index; we must capture that before applying
the data type change for any column it depends on, else ruleutils.c will
deliverr wrong/misleading results. (This fine point was documented
nowhere, of course.)
I'd also managed to forget that ATExecAlterColumnType executes once per
ALTER COLUMN TYPE clause, not once per statement; which resulted in the
code being basically completely broken for any case in which multiple ALTER
COLUMN TYPE clauses are applied to a table having non-constraint indexes
that must be rebuilt. Through very bad luck, none of the existing test
cases nor the ones added by e76de8861 caught that, but of course it was
soon found in the field.
The previous patch also had an implicit assumption that if a constraint's
index had a dependency on a table column, so would the constraint --- but
that isn't actually true, so it didn't fix such cases.
Instead of trying to delete unneeded index dependencies later, do the
is-there-a-constraint lookup immediately on seeing an index dependency,
and switch to remembering the constraint if so. In the unusual case of
multiple column dependencies for a constraint index, this will result in
duplicate constraint lookups, but that's not that horrible compared to all
the other work that happens here. Besides, such cases did not work at all
before, so it's hard to argue that they're performance-critical for anyone.
Per bug #15865 from Keith Fiske. As before, back-patch to all supported
branches.
Discussion: https://postgr.es/m/15865-17940eacc8f8b081@postgresql.org
Commit d7f8d26d9 added a test case that created a user, but forgot
to drop it again. This is no good; for one thing, it causes repeated
"make installcheck" runs to fail.
The multivariate MCV estimation code may run user-defined operators on
the values in the MCV list, which means that those operators may
potentially leak the values from the MCV list. Guard against leaking
data to unprivileged users by checking that the user has SELECT
privileges on the table or all of the columns referred to by the
statistics.
Additionally, if there are any securityQuals on the RTE (either due to
RLS policies on the table, or accessing the table via a security
barrier view), not all rows may be visible to the current user, even
if they have table or column privileges. Thus we further insist that
the operator be leakproof in this case.
Dean Rasheed, reviewed by Tomas Vondra.
Discussion: https://postgr.es/m/CAEZATCUhT9rt7Ui=Vdx4N==VV5XOK5dsXfnGgVOz_JhAicB=ZA@mail.gmail.com
This fixes some TAP suites when using msys Perl and a builddir located
in an msys mount point other than "/". For example, builddir=/c/pg
exhibited the problem, since /c/pg falls in mount point "/c".
Back-patch to 9.6, where tests first started to perform such
translations. In back branches, offer both new and old APIs.
Reviewed by Andrew Dunstan.
Discussion: https://postgr.es/m/20190610045838.GA238501@rfd.leadboat.com
This makes the whole user experience more consistent when bumping into
failures, and more in line with the rewording done via 508300e.
Author: Michael Paquier
Reviewed-by: Álvaro Herrera
Discussion: https://postgr.es/m/20190514153252.GA22168@alvherre.pgsql
This fixes an embarrassing oversight I (Andres) made in 737a292b,
namely missing two place where liverows/deadrows were used when
converting those variables to pointers, leading to incrementing the
pointer, rather than the value.
It's not that actually that easy to trigger a crash: One needs tuples
deleted by the current transaction, followed by a tuple deleted in
another session, all in one page. Which is presumably why this hasn't
been noticed before.
Reported-By: Steve Singer
Author: Steve Singer
Discussion: https://postgr.es/m/c7988239-d42c-ddc4-41db-171b23b35e4f@ssinger.info
This puts back reverted commit de87a084c0a5, with some bug fixes.
When two (or more) transactions are waiting for transaction T1 to release a
tuple-level lock, and transaction T1 upgrades its lock to a higher level, a
spurious deadlock can be reported among the waiting transactions when T1
finishes. The simplest example case seems to be:
T1: select id from job where name = 'a' for key share;
Y: select id from job where name = 'a' for update; -- starts waiting for T1
Z: select id from job where name = 'a' for key share;
T1: update job set name = 'b' where id = 1;
Z: update job set name = 'c' where id = 1; -- starts waiting for T1
T1: rollback;
At this point, transaction Y is rolled back on account of a deadlock: Y
holds the heavyweight tuple lock and is waiting for the Xmax to be released,
while Z holds part of the multixact and tries to acquire the heavyweight
lock (per protocol) and goes to sleep; once T1 releases its part of the
multixact, Z is awakened only to be put back to sleep on the heavyweight
lock that Y is holding while sleeping. Kaboom.
This can be avoided by having Z skip the heavyweight lock acquisition. As
far as I can see, the biggest downside is that if there are multiple Z
transactions, the order in which they resume after T1 finishes is not
guaranteed.
Backpatch to 9.6. The patch applies cleanly on 9.5, but the new tests don't
work there (because isolationtester is not smart enough), so I'm not going
to risk it.
Author: Oleksii Kliukin
Discussion: https://postgr.es/m/B9C9D7CD-EB94-4635-91B6-E558ACEC0EC3@hintbits.com
Discussion: https://postgr.es/m/2815.1560521451@sss.pgh.pa.us
Any authenticated user can overflow a stack-based buffer by changing the
user's own password to a purpose-crafted value. This often suffices to
execute arbitrary code as the PostgreSQL operating system account.
This fix is contributed by multiple folks, based on an initial analysis
from Tom Lane. This issue has been introduced by 68e61ee, so it was
possible to make use of it at authentication time. It became more
easily to trigger after ccae190 which has made the SCRAM parsing more
strict when changing a password, in the case where the client passes
down a verifier already hashed using SCRAM. Back-patch to v10 where
SCRAM has been introduced.
Reported-by: Alexander Lakhin
Author: Jonathan Katz, Heikki Linnakangas, Michael Paquier
Security: CVE-2019-10164
Backpatch-through: 10
This reverts commits 3da73d6839dc and de87a084c0a5.
This code has some tricky corner cases that I'm not sure are correct and
not properly tested anyway, so I'm reverting the whole thing for next
week's releases (reintroducing the deadlock bug that we set to fix).
I'll try again afterwards.
Discussion: https://postgr.es/m/E1hbXKQ-0003g1-0C@gemulon.postgresql.org
Regular per-column statistics are stored in pg_statistics catalog, which
is however rather difficult to read, so we also have pg_stats view with
a human-reablable version of the data.
For extended statistic the catalog was fairly easy to read, so we did
not have such human-readable view so far. Commit 9b6babfa2d however did
split the catalog into two, which makes querying harder. Furthermore,
we want to show the multi-column MCV list in a way similar to per-column
stats (and not as a bytea value).
This commit introduces pg_stats_ext view, joining the two catalogs and
massaging the data to produce human-readable output similar to pg_stats.
It also considers RLS and access privileges - the data is shown only when
the user has access to all columns the extended statistic is defined on.
Bumped CATVERSION due to adding new system view.
Author: Dean Rasheed, with improvements by me
Reviewed-by: Dean Rasheed, John Naylor
Discussion: https://postgr.es/m/CAEZATCUhT9rt7Ui%3DVdx4N%3D%3DVV5XOK5dsXfnGgVOz_JhAicB%3DZA%40mail.gmail.com
Since extended statistic got introduced in PostgreSQL 10, there was a
single catalog pg_statistic_ext storing both the definitions and built
statistic. That's however problematic when a user is supposed to have
access only to the definitions, but not to user data.
Consider for example pg_dump on a database with RLS enabled - if the
pg_statistic_ext catalog respects RLS (which it should, if it contains
user data), pg_dump would not see any records and the result would not
define any extended statistics. That would be a surprising behavior.
Until now this was not a pressing issue, because the existing types of
extended statistic (functional dependencies and ndistinct coefficients)
do not include any user data directly. This changed with introduction
of MCV lists, which do include most common combinations of values.
The easiest way to fix this is to split the pg_statistic_ext catalog
into two - one for definitions, one for the built statistic values.
The new catalog is called pg_statistic_ext_data, and we're maintaining
a 1:1 relationship with the old catalog - either there are matching
records in both catalogs, or neither of them.
Bumped CATVERSION due to changing system catalog definitions.
Author: Dean Rasheed, with improvements by me
Reviewed-by: Dean Rasheed, John Naylor
Discussion: https://postgr.es/m/CAEZATCUhT9rt7Ui%3DVdx4N%3D%3DVV5XOK5dsXfnGgVOz_JhAicB%3DZA%40mail.gmail.com
When two (or more) transactions are waiting for transaction T1 to release a
tuple-level lock, and transaction T1 upgrades its lock to a higher level, a
spurious deadlock can be reported among the waiting transactions when T1
finishes. The simplest example case seems to be:
T1: select id from job where name = 'a' for key share;
Y: select id from job where name = 'a' for update; -- starts waiting for X
Z: select id from job where name = 'a' for key share;
T1: update job set name = 'b' where id = 1;
Z: update job set name = 'c' where id = 1; -- starts waiting for X
T1: rollback;
At this point, transaction Y is rolled back on account of a deadlock: Y
holds the heavyweight tuple lock and is waiting for the Xmax to be released,
while Z holds part of the multixact and tries to acquire the heavyweight
lock (per protocol) and goes to sleep; once X releases its part of the
multixact, Z is awakened only to be put back to sleep on the heavyweight
lock that Y is holding while sleeping. Kaboom.
This can be avoided by having Z skip the heavyweight lock acquisition. As
far as I can see, the biggest downside is that if there are multiple Z
transactions, the order in which they resume after X finishes is not
guaranteed.
Backpatch to 9.6. The patch applies cleanly on 9.5, but the new tests don't
work there (because isolationtester is not smart enough), so I'm not going
to risk it.
Author: Oleksii Kliukin
Discussion: https://postgr.es/m/B9C9D7CD-EB94-4635-91B6-E558ACEC0EC3@hintbits.com
Given a query in which multiple JOIN nodes used the same alias
(which'd necessarily be in different sub-SELECTs), ruleutils.c
would assign the JOIN nodes distinct aliases for clarity ...
but then it forgot to print the modified aliases when dumping
the JOIN nodes themselves. This results in a dump/reload hazard
for views, because the emitted query is flat-out incorrect:
Vars will be printed with table names that have no referent.
This has been wrong for a long time, so back-patch to all supported
branches.
Philip Dubé
Discussion: https://postgr.es/m/CY4PR2101MB080246F2955FF58A6ED1FEAC98140@CY4PR2101MB0802.namprd21.prod.outlook.com
ATExecAlterColumnType failed to consider the possibility that an index
that needs to be rebuilt might be a child of a constraint that needs to be
rebuilt. We missed this so far because usually a constraint index doesn't
have a direct dependency on its table, just on the constraint object.
But if there's a WHERE clause, then dependency analysis of the WHERE
clause results in direct dependencies on the column(s) mentioned in WHERE.
This led to trying to drop and rebuild both the constraint and its
underlying index.
In v11/HEAD, we successfully drop both the index and the constraint,
and then try to rebuild both, and of course the second rebuild hits a
duplicate-index-name problem. Before v11, it fails with obscure messages
about a missing relation OID, due to trying to drop the index twice.
This is essentially the same kind of problem noted in commit
20bef2c31: the possible dependency linkages are broader than what
ATExecAlterColumnType was designed for. It was probably OK when
written, but it's certainly been broken since the introduction of
partial exclusion constraints. Fix by adding an explicit check
for whether any of the indexes-to-be-rebuilt belong to any of the
constraints-to-be-rebuilt, and ignoring any that do.
In passing, fix a latent bug introduced by commit 8b08f7d48: in
get_constraint_index() we must "continue" not "break" when rejecting
a relation of a wrong relkind. This is harmless today because we don't
expect that code path to be taken anyway; but if there ever were any
relations to be ignored, the existing coding would have an extremely
undesirable dependency on the order of pg_depend entries.
Also adjust a couple of obsolete comments.
Per bug #15835 from Yaroslav Schekin. Back-patch to all supported
branches.
Discussion: https://postgr.es/m/15835-32d9b7a76c06a7a9@postgresql.org
For a non-superuser, changing a comment on a domain constraint was
leading to a cache lookup failure as the code tried to perform the
ownership lookup on the constraint OID itself, thinking that it was a
type, but this check needs to happen on the type the domain constraint
relies on. As the type a domain constraint relies on can be guessed
directly based on the constraint OID, first fetch its type OID and
perform the ownership on it.
This is broken since 7eca575, which has split the handling of comments
for table constraints and domain constraints, so back-patch down to
9.5.
Reported-by: Clemens Ladisch
Author: Daniel Gustafsson, Michael Paquier
Reviewed-by: Álvaro Herrera
Discussion: https://postgr.es/m/15833-808e11904835d26f@postgresql.org
Backpatch-through: 9.5
json_to_record(), when an output column is declared as type json or jsonb,
should emit the corresponding field of the input JSON object. But it got
this slightly wrong when the field is just a string literal: it failed to
escape the contents of the string. That typically resulted in syntax
errors if the string contained any double quotes or backslashes.
jsonb_to_record() handles such cases correctly, but I added corresponding
test cases for it too, to prevent future backsliding.
Improve the documentation, as it provided only a very hand-wavy
description of the conversion rules used by these functions.
Per bug report from Robert Vollmert. Back-patch to v10 where the
error was introduced (by commit cf35346e8).
Note that PG 9.4 - 9.6 also get this case wrong, but differently so:
they feed the de-escaped contents of the string literal to json[b]_in.
That behavior is less obviously wrong, so possibly it's being depended on
in the field, so I won't risk trying to make the older branches behave
like the newer ones.
Discussion: https://postgr.es/m/D6921B37-BD8E-4664-8D5F-DB3525765DCD@vllmrt.net
There were a number of issues in the recent commits which include typos,
code and comments mismatch, leftover function declarations. Fix them.
Reported-by: Alexander Lakhin
Author: Alexander Lakhin, Amit Kapila and Amit Langote
Reviewed-by: Amit Kapila
Discussion: https://postgr.es/m/ef0c0232-0c1d-3a35-63d4-0ebd06e31387@gmail.com
In commit 87259588d0ab I (Álvaro) tried to rationalize the determination
of tablespace to use for partitioned tables, but failed to handle the
default_tablespace case. Repair and add proper tests.
Author: Amit Langote, Rushabh Lathia
Reported-by: Rushabh Lathia
Reviewed-by: Amit Langote, Álvaro Herrera
Discussion: https://postgr.es/m/CAGPqQf0cYjm1=rjxk_6gU0SjUS70=yFUAdCJLwWzh9bhNJnyVg@mail.gmail.com
When performing REINDEX TABLE CONCURRENTLY, if all of the table's indexes
could not be reindexed, a NOTICE message claimed that the table had no
indexes. This was confusing, so let's change the NOTICE text to something
less confusing.
In passing, also mention in the comment before ReindexRelationConcurrently
that materialized views are supported too and also explain what the return
value of the function means.
Author: Ashwin Agrawal
Reviewed-by: Michael Paquier
Discussion: https://postgr.es/m/CALfoeithHvi13p_VyR8kt9o6Pa7Z=Smi6Nfc2anHnQx5Lj8bTQ@mail.gmail.com
86b85044e rewrote how COPY FROM works to allow multiple tuple buffers to
exist to once thus allowing multi-inserts to be used in more cases with
partitioned tables. That commit neglected to update the estate's
es_result_relation_info when flushing the insert buffer to the partition
making it possible for the index tuples to be added into an index on the
wrong partition.
Fix this and also add an Assert in ExecInsertIndexTuples to help ensure
that we never make this mistake again.
Reported-by: Haruka Takatsuka
Author: Ashutosh Sharma
Discussion: https://postgr.es/m/15832-b1bf336a4ee246b5@postgresql.org
This test module was not getting invoked, other than at compile time,
limiting its usefulness -- and keeping its coverage at 0%. Add a
minimal regression test to ensure it runs on make check-world; this
makes it 92% covered (line-wise), which seems sufficient.
Author: Álvaro Herrera <alvherre@alvh.no-ip.org>
Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us>
Discussion: https://postgr.es/m/20190529193256.GA17603@alvherre.pgsql
When this suite runs installcheck, redirect file creations from
src/test/regress to src/bin/pg_upgrade/tmp_check/regress. This closes a
race condition in "make -j check-world". If the pg_upgrade suite wrote
to a given src/test/regress/results file in parallel with the regular
src/test/regress invocation writing it, a test failed spuriously. Even
without parallelism, in "make -k check-world", the suite finishing
second overwrote the other's regression.diffs. This revealed test
"largeobject" assuming @abs_builddir@ is getcwd(), so fix that, too.
Buildfarm client REL_10, released fifty-four days ago, supports saving
regression.diffs from its new location. When an older client reports a
pg_upgradeCheck failure, it will no longer include regression.diffs.
Back-patch to 9.5, where pg_upgrade moved to src/bin.
Reviewed (in earlier versions) by Andrew Dunstan.
Discussion: https://postgr.es/m/20181224034411.GA3224776@rfd.leadboat.com
When there were duplicate columns in the hash key list, the array
sizes could be miscomputed, resulting in access off the end of the
array. Adjust the computation to ensure the array is always large
enough.
(I considered whether the duplicates could be removed in planning, but
I can't rule out the possibility that duplicate columns might have
different hash functions assigned. Simpler to just make sure it works
at execution time regardless.)
Bug apparently introduced in fc4b3dea2 as part of narrowing down the
tuples stored in the hashtable. Reported by Colm McHugh of Salesforce,
though I didn't use their patch. Backpatch back to version 10 where
the bug was introduced.
Discussion: https://postgr.es/m/CAFeeJoKKu0u+A_A9R9316djW-YW3-+Gtgvy3ju655qRHR3jtdA@mail.gmail.com
This is still using the 2.0 version of pg_bsd_indent.
I thought it would be good to commit this separately,
so as to document the differences between 2.0 and 2.1 behavior.
Discussion: https://postgr.es/m/16296.1558103386@sss.pgh.pa.us
This code was still using the old style of forming a heap tuple rather
than using tuple slots. This would be less efficient if a non-heap
access method was used. And using tuple slots is actually quite a bit
faster when using heap as well.
Also add some test cases for generated columns with null values and
with varlena values. This lack of coverage was discovered while
working on this patch.
Discussion: https://www.postgresql.org/message-id/flat/20190331025744.ugbsyks7czfcoksd%40alap3.anarazel.de
The original coding of this view relied on a correlated IN sub-query.
Our planner is not very bright about correlated sub-queries, and even
if it were, there's no way for it to know that the output of
pg_get_publication_tables() is duplicate-free, making the de-duplicating
semantics of IN unnecessary. Hence, rewrite as a LATERAL sub-query.
This provides circa 100X speedup for me with a few hundred published
tables (the whole regression database), and things would degrade as
roughly O(published_relations * all_relations) beyond that.
Because the rules.out expected output changes, force a catversion bump.
Ordinarily we might not want to do that post-beta1; but we already know
we'll be doing a catversion bump before beta2 to fix pg_statistic_ext
issues, so it's pretty much free to fix it now instead of waiting for v13.
Per report and fix suggestion from PegoraroF10.
Discussion: https://postgr.es/m/1551385426763-0.post@n3.nabble.com
We're seeing occasional instability in the plans generated for
parallel queries on the "a_star" table hierarchy. This suggests
that something is changing the planner's stats for those tables,
but that should not be happening within a regression test run.
To try to gather some information about what's happening, insert
additional queries to check the basic page/tuple counts for these
tables, as well as whether any vacuums or analyzes have happened
on them. (We expect that only the database-wide VACUUM in
sanity_check.sql will have touched them.)
I added the probes not only in select_parallel.sql itself, but
also in stats.sql, bearing in mind that the stats collector's
lag may prevent the initial query from reporting current truth.
If any extra vacuum/analyze has happened, the recheck in stats.sql
definitely ought to see it.
This commit can be reverted once we figure out what's going on.
Per suggestion from David Rowley, though I changed the queries around.
Discussion: https://postgr.es/m/CA+hUKG+0CxrKRWRMf5ymN3gm+BECHna2B-q1w8onKBep4HasUw@mail.gmail.com
This shouldn't have been committed without even running the tests (nor
were the tests added that were suggested). I'm fixing up the results
to get the buildfarm back to green, it's quite possible we'll want to
revert this later.
For partial aggregation combine steps,
AggStatePerTrans->numTransInputs was set to the transition function's
number of inputs, rather than the combine function's number of
inputs (always 1).
That lead to partial aggregates with strict combine functions to
wrongly check for NOT NULL input as required by strictness. When the
aggregate wasn't exactly passed one argument, the strictness check was
either omitted (in the 0 args case) or too many arguments were
checked. In the latter case we'd read beyond the end of
FunctionCallInfoData->args (only in master).
AggStatePerTrans->numTransInputs actually has been wrong since since
9.6, where partial aggregates were added. But it turns out to not be
an active problem in 9.6 and 10, because numTransInputs wasn't used at
all for combine functions: Before c253b722f6 there simply was no NULL
check for the input to strict trans functions, and after that the
check was simply hardcoded for the right offset in fcinfo, as it's
done by code specific to combine functions.
In bf6c614a2f2 (11) the strictness check was generalized, with common
code doing the strictness checks for both plain and combine transition
functions, based on numTransInputs. For combine functions this lead to
not emitting an expression step to check for strict input in the 0
arguments case, and in the > 1 arguments case, we'd check too many
arguments.Due to the fact that the relevant fcinfo->isnull[2..] was
always zero-initialized (more or less by accident, by being part of
the AggStatePerTrans struct, which is palloc0'ed), there was no
observable damage in the latter case before a9c35cf85ca1f, we just
checked too many array elements.
Due to the changes in a9c35cf85ca1f, > 1 argument bug became visible,
because these days fcinfo is a) dynamically allocated without being
zeroed b) exactly the length required for the number of specified
arguments (hardcoded to 2 in this case).
This commit only contains a fairly minimal fix, setting numTransInputs
to a hardcoded 1 when building a pertrans for a combine function. It
seems likely that we'll want to clean this up further (e.g. the
arguments build_pertrans_for_aggref() aren't particularly meaningful
for combine functions). But the wrap date for 12 beta1 is coming up
fast, so it seems good to have a minimal fix in place.
Backpatch to 11. While AggStatePerTrans->numTransInputs was set
wrongly before that, the value was not used for combine functions.
Reported-By: Rajkumar Raghuwanshi
Diagnosed-By: Kyotaro Horiguchi, Jeevan Chalke, Andres Freund, David Rowley
Author: David Rowley, Kyotaro Horiguchi, Andres Freund
Discussion: https://postgr.es/m/CAKcux6=uZEyWyLw0N7HtR9OBc-sWEFeByEZC7t-KDf15FKxVew@mail.gmail.com
Before this commit, when ANALYZE was run on a table and serializable
was used (either by virtue of an explicit BEGIN TRANSACTION ISOLATION
LEVEL SERIALIZABLE, or default_transaction_isolation being set to
serializable) a null pointer dereference lead to a crash.
The analyze scan doesn't need a snapshot (nor predicate locking), but
before this commit a scan only contained information about being a
bitmap or sample scan.
Refactor the option passing to the scan_begin callback to use a
bitmask instead. Alternatively we could have added a new boolean
parameter, but that seems harder to read. Even before this issue
various people (Heikki, Tom, Robert) suggested doing so.
These changes don't change the scan APIs outside of tableam. The flags
argument could be exposed, it's not necessary to fix this
problem. Also the wrapper table_beginscan* functions encapsulate most
of that complexity.
After these changes fixing the bug is trivial, just don't acquire
predicate lock for analyze style scans. That was already done for
bitmap heap scans. Add an assert that a snapshot is passed when
acquiring the predicate lock, so this kind of bug doesn't require
running with serializable.
Also add a comment about sample scans currently requiring predicate
locking the entire relation, that previously wasn't remarked upon.
Reported-By: Joe Wildish
Author: Andres Freund
Discussion:
https://postgr.es/m/4EA80A20-E9BF-49F1-9F01-5B66CAB21453@elusive.cxhttps://postgr.es/m/20190411164947.nkii4gaeilt4bui7@alap3.anarazel.dehttps://postgr.es/m/20190518203102.g7peu2fianukjuxm@alap3.anarazel.de
When this suite runs installcheck, redirect file creations from
src/test/regress to src/bin/pg_upgrade/tmp_check/regress. This closes a
race condition in "make -j check-world". If the pg_upgrade suite wrote
to a given src/test/regress/results file in parallel with the regular
src/test/regress invocation writing it, a test failed spuriously. Even
without parallelism, in "make -k check-world", the suite finishing
second overwrote the other's regression.diffs. This revealed test
"largeobject" assuming @abs_builddir@ is getcwd(), so fix that, too.
Buildfarm client REL_10, released forty-five days ago, supports saving
regression.diffs from its new location. When an older client reports a
pg_upgradeCheck failure, it will no longer include regression.diffs.
Back-patch to 9.5, where pg_upgrade moved to src/bin.
Reviewed by Andrew Dunstan.
Discussion: https://postgr.es/m/20181224034411.GA3224776@rfd.leadboat.com