During decoding for speculative inserts, we were relying for cleaning
toast hash on confirmation records or next change records. But that
could lead to multiple problems (a) memory leak if there is neither a
confirmation record nor any other record after toast insertion for a
speculative insert in the transaction, (b) error and assertion failures
if the next operation is not an insert/update on the same table.
The fix is to start queuing spec abort change and clean up toast hash
and change record during its processing. Currently, we are queuing the
spec aborts for both toast and main table even though we perform cleanup
while processing the main table's spec abort record. Later, if we have a
way to distinguish between the spec abort record of toast and the main
table, we can avoid queuing the change for spec aborts of toast tables.
Reported-by: Ashutosh Bapat
Author: Dilip Kumar
Reviewed-by: Amit Kapila
Backpatch-through: 9.6, where it was introduced
Discussion: https://postgr.es/m/CAExHW5sPKF-Oovx_qZe4p5oM6Dvof7_P+XgsNAViug15Fm99jA@mail.gmail.com
Mis-merge introduced by e2f21ff, where pgwin32_setenv() was listed but
not defined in win32env.c. This had no consequences as this routine
does not exist in this branch.
Only REL_12_STABLE and REL_13_STABLE got that wrong.
Backpatch-through: 12
This fixes the code compilation on Windows with MSVC and Kerberos, as
a missing implementation of setenv() causes a compilation failure of the
GSSAPI code. This was only reproducible when building the code with
Kerberos, something that buildfarm animal hamerkop has fixed recently.
This issue only happens on 12 and 13, as this code has been introduced
in b0b39f7. HEAD is already able to compile properly thanks to
7ca37fb0, and this commit is a minimal cherry-pick of it.
Thanks to Tom Lane for the discussion.
Discussion: https://postgr.es/m/YLDtm5WGjPxm6ua4@paquier.xyz
Backpatch-through: 12
COMMIT/ROLLBACK necessarily destroys all snapshots within the session.
The original implementation of intra-procedure transactions just
cavalierly did that, ignoring the fact that this left us executing in
a rather different environment than normal. In particular, it turns
out that handling of toasted datums depends rather critically on there
being an outer ActiveSnapshot: otherwise, when SPI or the core
executor pop whatever snapshot they used and return, it's unsafe to
dereference any toasted datums that may appear in the query result.
It's possible to demonstrate "no known snapshots" and "missing chunk
number N for toast value" errors as a result of this oversight.
Historically this outer snapshot has been held by the Portal code,
and that seems like a good plan to preserve. So add infrastructure
to pquery.c to allow re-establishing the Portal-owned snapshot if it's
not there anymore, and add enough bookkeeping support that we can tell
whether it is or not.
We can't, however, just re-establish the Portal snapshot as part of
COMMIT/ROLLBACK. As in normal transaction start, acquiring the first
snapshot should wait until after SET and LOCK commands. Hence, teach
spi.c about doing this at the right time. (Note that this patch
doesn't fix the problem for any PLs that try to run intra-procedure
transactions without using SPI to execute SQL commands.)
This makes SPI's no_snapshots parameter rather a misnomer, so in HEAD,
rename that to allow_nonatomic.
replication/logical/worker.c also needs some fixes, because it wasn't
careful to hold a snapshot open around AFTER trigger execution.
That code doesn't use a Portal, which I suspect someday we're gonna
have to fix. But for now, just rearrange the order of operations.
This includes back-patching the recent addition of finish_estate()
to centralize the cleanup logic there.
This also back-patches commit 2ecfeda3e into v13, to improve the
test coverage for worker.c (it was that test that exposed that
worker.c's snapshot management is wrong).
Per bug #15990 from Andreas Wicht. Back-patch to v11 where
intra-procedure COMMIT was added.
Discussion: https://postgr.es/m/15990-eee2ac466b11293d@postgresql.org
Split up CHECK_FOR_INTERRUPTS() to provide an additional macro
INTERRUPTS_PENDING_CONDITION(), which just tests whether an
interrupt is pending without attempting to service it. This is
useful in situations where the caller knows that interrupts are
blocked, and would like to find out if it's worth the trouble
to unblock them.
Also add INTERRUPTS_CAN_BE_PROCESSED(), which indicates whether
CHECK_FOR_INTERRUPTS() can be relied on to clear the pending interrupt.
This commit doesn't actually add any uses of the new macros,
but a follow-on bug fix will do so. Back-patch to all supported
branches to provide infrastructure for that fix.
Alvaro Herrera and Tom Lane
Discussion: https://postgr.es/m/20210513155351.GA7848@alvherre.pgsql
The worker.c global wrconn is only meant to be used by logical apply/
tablesync workers, but there are other variables with the same name. To
reduce future confusion rename the global from "wrconn" to
"LogRepWorkerWalRcvConn".
While this is just cosmetic, it seems better to backpatch it all the way
back to 10 where this code appeared, to avoid future backpatching
issues.
Author: Peter Smith <smithpb2250@gmail.com>
Discussion: https://postgr.es/m/CAHut+Pu7Jv9L2BOEx_Z0UtJxfDevQSAUW2mJqWU+CtmDrEZVAg@mail.gmail.com
It's unusual to have any resjunk columns in an ON CONFLICT ... UPDATE
list, but it can happen when MULTIEXPR_SUBLINK SubPlans are present.
If it happens, the ON CONFLICT UPDATE code path would end up storing
tuples that include the values of the extra resjunk columns. That's
fairly harmless in the short run, but if new columns are added to
the table then the values would become accessible, possibly leading
to malfunctions if they don't match the datatypes of the new columns.
This had escaped notice through a confluence of missing sanity checks,
including
* There's no cross-check that a tuple presented to heap_insert or
heap_update matches the table rowtype. While it's difficult to
check that fully at reasonable cost, we can easily add assertions
that there aren't too many columns.
* The output-column-assignment cases in execExprInterp.c lacked
any sanity checks on the output column numbers, which seems like
an oversight considering there are plenty of assertion checks on
input column numbers. Add assertions there too.
* We failed to apply nodeModifyTable's ExecCheckPlanOutput() to
the ON CONFLICT UPDATE tlist. That wouldn't have caught this
specific error, since that function is chartered to ignore resjunk
columns; but it sure seems like a bad omission now that we've seen
this bug.
In HEAD, the right way to fix this is to make the processing of
ON CONFLICT UPDATE tlists work the same as regular UPDATE tlists
now do, that is don't add "SET x = x" entries, and use
ExecBuildUpdateProjection to evaluate the tlist and combine it with
old values of the not-set columns. This adds a little complication
to ExecBuildUpdateProjection, but allows removal of a comparable
amount of now-dead code from the planner.
In the back branches, the most expedient solution seems to be to
(a) use an output slot for the ON CONFLICT UPDATE projection that
actually matches the target table, and then (b) invent a variant of
ExecBuildProjectionInfo that can be told to not store values resulting
from resjunk columns, so it doesn't try to store into nonexistent
columns of the output slot. (We can't simply ignore the resjunk columns
altogether; they have to be evaluated for MULTIEXPR_SUBLINK to work.)
This works back to v10. In 9.6, projections work much differently and
we can't cheaply give them such an option. The 9.6 version of this
patch works by inserting a JunkFilter when it's necessary to get rid
of resjunk columns.
In addition, v11 and up have the reverse problem when trying to
perform ON CONFLICT UPDATE on a partitioned table. Through a
further oversight, adjust_partition_tlist() discarded resjunk columns
when re-ordering the ON CONFLICT UPDATE tlist to match a partition.
This accidentally prevented the storing-bogus-tuples problem, but
at the cost that MULTIEXPR_SUBLINK cases didn't work, typically
crashing if more than one row has to be updated. Fix by preserving
resjunk columns in that routine. (I failed to resist the temptation
to add more assertions there too, and to do some minor code
beautification.)
Per report from Andres Freund. Back-patch to all supported branches.
Security: CVE-2021-32028
While we were (mostly) careful about ensuring that the dimensions of
arrays aren't large enough to cause integer overflow, the lower bound
values were generally not checked. This allows situations where
lower_bound + dimension overflows an integer. It seems that that's
harmless so far as array reading is concerned, except that array
elements with subscripts notionally exceeding INT_MAX are inaccessible.
However, it confuses various array-assignment logic, resulting in a
potential for memory stomps.
Fix by adding checks that array lower bounds aren't large enough to
cause lower_bound + dimension to overflow. (Note: this results in
disallowing cases where the last subscript position would be exactly
INT_MAX. In principle we could probably allow that, but there's a lot
of code that computes lower_bound + dimension and would need adjustment.
It seems doubtful that it's worth the trouble/risk to allow it.)
Somewhat independently of that, array_set_element() was careless
about possible overflow when checking the subscript of a fixed-length
array, creating a different route to memory stomps. Fix that too.
Security: CVE-2021-32027
Previously, get_cached_rowtype() cached a pointer to a reference-counted
tuple descriptor from the typcache, relying on the ExprContextCallback
mechanism to release the tupdesc refcount when the expression tree
using the tupdesc was destroyed. This worked fine when it was designed,
but the introduction of within-DO-block COMMITs broke it. The refcount
is logged in a transaction-lifespan resource owner, but plpgsql won't
destroy simple expressions made within the DO block (before its first
commit) until the DO block is exited. That results in a warning about
a leaked tupdesc refcount when the COMMIT destroys the original resource
owner, and then an error about the active resource owner not holding a
matching refcount when the expression is destroyed.
To fix, get rid of the need to have a shutdown callback at all, by
instead caching a pointer to the relevant typcache entry. Those
survive for the life of the backend, so we needn't worry about the
pointer becoming stale. (For registered RECORD types, we can still
cache a pointer to the tupdesc, knowing that it won't change for the
life of the backend.) This mechanism has been in use in plpgsql
and expandedrecord.c since commit 4b93f5799, and seems to work well.
This change requires modifying the ExprEvalStep structs used by the
relevant expression step types, which is slightly worrisome for
back-patching. However, there seems no good reason for extensions
to be familiar with the details of these particular sub-structs.
Per report from Rohit Bhogate. Back-patch to v11 where within-DO-block
COMMITs became a thing.
Discussion: https://postgr.es/m/CAAV6ZkQRCVBh8qAY+SZiHnz+U+FqAGBBDaDTjF2yiKa2nJSLKg@mail.gmail.com
A couple error messages and comments used 'statistic kind', not the
correct 'statistics kind'. Fix and backpatch all the way back to 10,
where extended statistics were introduced.
Backpatch-through: 10
The text search cache mechanisms assume that we can clean up
an invalidated dictionary cache entry simply by resetting the
associated long-lived memory context. However, that does not work
for ispell affixes that make use of regular expressions, because
the regex library deals in plain old malloc. Hence, we leaked
compiled regex(es) any time we dropped such a cache entry. That
could quickly add up, since even a fairly trivial regex can use up
tens of kB, and a large one can eat megabytes. Add a memory context
callback to ensure that a regex gets freed when its owning cache
entry is cleared.
Found via valgrind testing.
This problem is ancient, so back-patch to all supported branches.
Discussion: https://postgr.es/m/3816764.1616104288@sss.pgh.pa.us
With very unlucky timing and parallel_leader_participation off, PHJ
could attempt to access per-batch state just as it was being freed.
There was code intended to prevent that by checking for a cleared
pointer, but it was buggy.
Fix, by introducing an extra barrier phase. The new phase
PHJ_BUILD_RUNNING means that it's safe to access the per-batch state to
find a batch to help with, and PHJ_BUILD_DONE means that it is too late.
The last to detach will free the array of per-batch state as before, but
now it will also atomically advance the phase at the same time, so that
late attachers can avoid the hazard, without the data race. This
mirrors the way per-batch hash tables are freed (see phases
PHJ_BATCH_PROBING and PHJ_BATCH_DONE).
Revealed by a one-off build farm failure, where BarrierAttach() failed a
sanity check assertion, because the memory had been clobbered by
dsa_free().
Back-patch to 11, where the code arrived.
Reported-by: Michael Paquier <michael@paquier.xyz>
Discussion: https://postgr.es/m/20200929061142.GA29096%40paquier.xyz
If ExecGetInsertedCols(), ExecGetUpdatedCols() or ExecGetExtraUpdatedCols()
were called with a ResultRelInfo that's not in the range table and isn't a
partition routing target, the functions would dereference a NULL pointer,
relinfo->ri_RootResultRelInfo. Such ResultRelInfos are created when firing
RI triggers in tables that are not modified directly. None of the current
callers of these functions pass such relations, so this isn't a live bug,
but let's make them more robust.
Also update comment in ResultRelInfo; after commit 6214e2b228,
ri_RangeTableIndex is zero for ResultRelInfos created for partition tuple
routing.
Noted by Coverity. Backpatch down to v11, like commit 6214e2b228.
Reviewed-by: Tom Lane, Amit Langote
FreeBSD 13 gained O_DSYNC, which would normally cause wal_sync_method to
choose open_datasync as its default value. That may not be a good
choice for all systems, and performs worse than fdatasync in some
scenarios. Let's preserve the existing default behavior for now.
Like commit 576477e73c4, which did the same for Linux, back-patch to all
supported releases.
Discussion: https://postgr.es/m/CA%2BhUKGLsAMXBQrCxCXoW-JsUYmdOL8ALYvaX%3DCrHqWxm-nWbGA%40mail.gmail.com
Modern gcc and clang compilers offer alignment sanitizers, which help to detect
pointer misalignment. However, our codebase already contains x86-specific
crc32 computation code, which uses unalignment access. Thankfully, those
compilers also support the attribute, which disables alignment sanitizers at
the function level. This commit adds pg_attribute_no_sanitize_alignment(),
which wraps this attribute, and applies it to pg_comp_crc32c_sse42() function.
Back-patch of commits 993bdb9f9 and ad2ad698a, to enable doing
alignment testing in all supported branches.
Discussion: https://postgr.es/m/CAPpHfdsne3%3DT%3DfMNU45PtxdhSL_J2PjLTeS8rwKnJzUR4YNd4w%40mail.gmail.com
Discussion: https://postgr.es/m/475514.1612745257%40sss.pgh.pa.us
Author: Alexander Korotkov, revised by Tom Lane
Reviewed-by: Tom Lane
For an index, attstattarget can be updated using ALTER INDEX SET
STATISTICS. This data was lost on the new index after REINDEX
CONCURRENTLY.
The update of this field is done when the old and new indexes are
swapped to make the fix back-patchable. Another approach we could look
after in the long-term is to change index_create() to pass the wanted
values of attstattarget when creating the new relation, but, as this
would cause an ABI breakage this can be done only on HEAD.
Reported-by: Ronan Dunklau
Author: Michael Paquier
Reviewed-by: Ronan Dunklau, Tomas Vondra
Discussion: https://postgr.es/m/16628084.uLZWGnKmhe@laptop-ronand
Backpatch-through: 12
If a cross-partition UPDATE violates a constraint on the target partition,
and the columns in the new partition are in different physical order than
in the parent, the error message can reveal columns that the user does not
have SELECT permission on. A similar bug was fixed earlier in commit
804b6b6db4.
The cause of the bug is that the callers of the
ExecBuildSlotValueDescription() function got confused when constructing
the list of modified columns. If the tuple was routed from a parent, we
converted the tuple to the parent's format, but the list of modified
columns was grabbed directly from the child's RTE entry.
ExecUpdateLockMode() had a similar issue. That lead to confusion on which
columns are key columns, leading to wrong tuple lock being taken on tables
referenced by foreign keys, when a row is updated with INSERT ON CONFLICT
UPDATE. A new isolation test is added for that corner case.
With this patch, the ri_RangeTableIndex field is no longer set for
partitions that don't have an entry in the range table. Previously, it was
set to the RTE entry of the parent relation, but that was confusing.
NOTE: This modifies the ResultRelInfo struct, replacing the
ri_PartitionRoot field with ri_RootResultRelInfo. That's a bit risky to
backpatch, because it breaks any extensions accessing the field. The
change that ri_RangeTableIndex is not set for partitions could potentially
break extensions, too. The ResultRelInfos are visible to FDWs at least,
and this patch required small changes to postgres_fdw. Nevertheless, this
seem like the least bad option. I don't think these fields widely used in
extensions; I don't think there are FDWs out there that uses the FDW
"direct update" API, other than postgres_fdw. If there is, you will get a
compilation error, so hopefully it is caught quickly.
Backpatch to 11, where support for both cross-partition UPDATEs, and unique
indexes on partitioned tables, were added.
Reviewed-by: Amit Langote
Security: CVE-2021-3393
In a cluster having used CREATE INDEX CONCURRENTLY while having enabled
prepared transactions, queries that use the resulting index can silently
fail to find rows. Fix this for future CREATE INDEX CONCURRENTLY by
making it wait for prepared transactions like it waits for ordinary
transactions. This expands the VirtualTransactionId structure domain to
admit prepared transactions. It may be necessary to reindex to recover
from past occurrences. Back-patch to 9.5 (all supported versions).
Andrey Borodin, reviewed (in earlier versions) by Tom Lane and Michael
Paquier.
Discussion: https://postgr.es/m/2E712143-97F7-4890-B470-4A35142ABC82@yandex-team.ru
perform_pruning_combine_step() was not taught about the number of
partition indexes used in hash partitioning; more embarrassingly,
get_matching_hash_bounds() also had it wrong. These errors are masked
in the common case where all the partitions have the same modulus
and no partition is missing. However, with missing or unequal-size
partitions, we could erroneously prune some partitions that need
to be scanned, leading to silently wrong query answers.
While a minimal-footprint fix for this could be to export
get_partition_bound_num_indexes and make the incorrect functions use it,
I'm of the opinion that that function should never have existed in the
first place. It's not reasonable data structure design that
PartitionBoundInfoData lacks any explicit record of the length of
its indexes[] array. Perhaps that was all right when it could always
be assumed equal to ndatums, but something should have been done about
it as soon as that stopped being true. Putting in an explicit
"nindexes" field makes both partition_bounds_equal() and
partition_bounds_copy() simpler, safer, and faster than before,
and removes explicit knowledge of the number-of-partition-indexes
rules from some other places too.
This change also makes get_hash_partition_greatest_modulus obsolete.
I left that in place in case any external code uses it, but no core
code does anymore.
Per bug #16840 from Michał Albrycht. Back-patch to v11 where the
hash partitioning code came in. (In the back branches, add the new
field at the end of PartitionBoundInfoData to minimize ABI risks.)
Discussion: https://postgr.es/m/16840-571a22976f829ad4@postgresql.org
Previously, pull_varnos() took the relids of a PlaceHolderVar as being
equal to the relids in its contents, but that fails to account for the
possibility that we have to postpone evaluation of the PHV due to outer
joins. This could result in a malformed plan. The known cases end up
triggering the "failed to assign all NestLoopParams to plan nodes"
sanity check in createplan.c, but other symptoms may be possible.
The right value to use is the join level we actually intend to evaluate
the PHV at. We can get that from the ph_eval_at field of the associated
PlaceHolderInfo. However, there are some places that call pull_varnos()
before the PlaceHolderInfos have been created; in that case, fall back
to the conservative assumption that the PHV will be evaluated at its
syntactic level. (In principle this might result in missing some legal
optimization, but I'm not aware of any cases where it's an issue in
practice.) Things are also a bit ticklish for calls occurring during
deconstruct_jointree(), but AFAICS the ph_eval_at fields should have
reached their final values by the time we need them.
The main problem in making this work is that pull_varnos() has no
way to get at the PlaceHolderInfos. We can fix that easily, if a
bit tediously, in HEAD by passing it the planner "root" pointer.
In the back branches that'd cause an unacceptable API/ABI break for
extensions, so leave the existing entry points alone and add new ones
with the additional parameter. (If an old entry point is called and
encounters a PHV, it'll fall back to using the syntactic level,
again possibly missing some valid optimization.)
Back-patch to v12. The computation is surely also wrong before that,
but it appears that we cannot reach a bad plan thanks to join order
restrictions imposed on the subquery that the PlaceHolderVar came from.
The error only became reachable when commit 4be058fe9 allowed trivial
subqueries to be collapsed out completely, eliminating their join order
restrictions.
Per report from Stephan Springl.
Discussion: https://postgr.es/m/171041.1610849523@sss.pgh.pa.us
Every core SLRU wraps around. With the exception of pg_notify, the wrap
point can fall in the middle of a page. Account for this in the
PagePrecedes callback specification and in SimpleLruTruncate()'s use of
said callback. Update each callback implementation to fit the new
specification. This changes SerialPagePrecedesLogically() from the
style of asyncQueuePagePrecedes() to the style of CLOGPagePrecedes().
(Whereas pg_clog and pg_serial share a key space, pg_serial is nothing
like pg_notify.) The bug fixed here has the same symptoms and user
followup steps as 592a589a04bd456410b853d86bd05faa9432cbbb. Back-patch
to 9.5 (all supported versions).
Reviewed by Andrey Borodin and (in earlier versions) by Tom Lane.
Discussion: https://postgr.es/m/20190202083822.GC32531@gust.leadboat.com
When a tablespace is used in a partitioned relation (per commits
ca4103025dfe in pg12 for tables and 33e6c34c3267 in pg11 for indexes),
it is possible to drop the tablespace, potentially causing various
problems. One such was reported in bug #16577, where a rewriting ALTER
TABLE causes a server crash.
Protect against this by using pg_shdepend to keep track of tablespaces
when used for relations that don't keep physical files; we now abort a
tablespace if we see that the tablespace is referenced from any
partitioned relations.
Backpatch this to 11, where this problem has been latent all along. We
don't try to create pg_shdepend entries for existing partitioned
indexes/tables, but any ones that are modified going forward will be
protected.
Note slight behavior change: when trying to drop a tablespace that
contains both regular tables as well as partitioned ones, you'd
previously get ERRCODE_OBJECT_NOT_IN_PREREQUISITE_STATE and now you'll
get ERRCODE_DEPENDENT_OBJECTS_STILL_EXIST. Arguably, the latter is more
correct.
It is possible to add protecting pg_shdepend entries for existing
tables/indexes, by doing
ALTER TABLE ONLY some_partitioned_table SET TABLESPACE pg_default;
ALTER TABLE ONLY some_partitioned_table SET TABLESPACE original_tablespace;
for each partitioned table/index that is not in the database default
tablespace. Because these partitioned objects do not have storage, no
file needs to be actually moved, so it shouldn't take more time than
what's required to acquire locks.
This query can be used to search for such relations:
SELECT ... FROM pg_class WHERE relkind IN ('p', 'I') AND reltablespace <> 0
Reported-by: Alexander Lakhin <exclusion@gmail.com>
Discussion: https://postgr.es/m/16577-881633a9f9894fd5@postgresql.org
Author: Álvaro Herrera <alvherre@alvh.no-ip.org>
Reviewed-by: Michael Paquier <michael@paquier.xyz>
The deadlocks that the recovery conflict on lock is involved in can
happen between hot-standby backends and the startup process.
If a backend takes an access exclusive lock on the table and which
finally triggers the deadlock, that deadlock can be detected
as expected. On the other hand, previously, if the startup process
took an access exclusive lock and which finally triggered the deadlock,
that deadlock could not be detected and could remain even after
deadlock_timeout passed. This is a bug.
The cause of this bug was that the code for handling the recovery
conflict on lock didn't take care of deadlock case at all. It assumed
that deadlocks involving the startup process and backends were able
to be detected by the deadlock detector invoked within backends.
But this assumption was incorrect. The startup process also should
have invoked the deadlock detector if necessary.
To fix this bug, this commit makes the startup process invoke
the deadlock detector if deadlock_timeout is reached while handling
the recovery conflict on lock. Specifically, in that case, the startup
process requests all the backends holding the conflicting locks to
check themselves for deadlocks.
Back-patch to v9.6. v9.5 has also this bug, but per discussion we decided
not to back-patch the fix to v9.5. Because v9.5 doesn't have some
infrastructure codes (e.g., 37c54863cf) that this bug fix patch depends on.
We can apply those codes for the back-patch, but since the next minor
version release is the final one for v9.5, it's risky to do that. If we
unexpectedly introduce new bug to v9.5 by the back-patch, there is no
chance to fix that. We determined that the back-patch to v9.5 would give
more risk than gain.
Author: Fujii Masao
Reviewed-by: Bertrand Drouvot, Masahiko Sawada, Kyotaro Horiguchi
Discussion: https://postgr.es/m/4041d6b6-cf24-a120-36fa-1294220f8243@oss.nttdata.com
Unrecoverable errors detected by GSSAPI encryption can't just be
reported with elog(ERROR) or elog(FATAL), because attempting to
send the error report to the client is likely to lead to infinite
recursion or loss of protocol sync. Instead make this code do what
the SSL encryption code has long done, which is to just report any
such failure to the server log (with elevel COMMERROR), then pretend
we've lost the connection by returning errno = ECONNRESET.
Along the way, fix confusion about whether message translation is done
by pg_GSS_error() or its callers (the latter should do it), and make
the backend version of that function work more like the frontend
version.
Avoid allocating the port->gss struct until it's needed; we surely
don't need to allocate it in the postmaster.
Improve logging of "connection authorized" messages with GSS enabled.
(As part of this, I back-patched the code changes from dc11f31a1.)
Make BackendStatusShmemSize() account for the GSS-related space that
will be allocated by CreateSharedBackendStatus(). This omission
could possibly cause out-of-shared-memory problems with very high
max_connections settings.
Remove arbitrary, pointless restriction that only GSS authentication
can be used on a GSS-encrypted connection.
Improve documentation; notably, document the fact that libpq now
prefers GSS encryption over SSL encryption if both are possible.
Per report from Mikael Gustavsson. Back-patch to v12 where
this code was introduced.
Discussion: https://postgr.es/m/e5b0b6ed05764324a2f3fe7acfc766d5@smhi.se
If a database shutdown (smart or fast) is commanded between the time
some process decides to request a new background worker and the time
that the postmaster can launch that worker, then nothing happens
because the postmaster won't launch any bgworkers once it's exited
PM_RUN state. This is fine ... unless the requesting process is
waiting for that worker to finish (or even for it to start); in that
case the requestor is stuck, and only manual intervention will get us
to the point of being able to shut down.
To fix, cancel pending requests for workers when the postmaster sends
shutdown (SIGTERM) signals, and similarly cancel any new requests that
arrive after that point. (We can optimize things slightly by only
doing the cancellation for workers that have waiters.) To fit within
the existing bgworker APIs, the "cancel" is made to look like the
worker was started and immediately stopped, causing deregistration of
the bgworker entry. Waiting processes would have to deal with
premature worker exit anyway, so this should introduce no bugs that
weren't there before. We do have a side effect that registration
records for restartable bgworkers might disappear when theoretically
they should have remained in place; but since we're shutting down,
that shouldn't matter.
Back-patch to v10. There might be value in putting this into 9.6
as well, but the management of bgworkers is a bit different there
(notably see 8ff518699) and I'm not convinced it's worth the effort
to validate the patch for that branch.
Discussion: https://postgr.es/m/661570.1608673226@sss.pgh.pa.us
As it stood, expandTableLikeClause() re-did the same relation_openrv
call that transformTableLikeClause() had done. However there are
scenarios where this would not find the same table as expected.
We hold lock on the LIKE source table, so it can't be renamed or
dropped, but another table could appear before it in the search path.
This explains the odd behavior reported in bug #16758 when cloning a
table as a temp table of the same name. This case worked as expected
before commit 502898192 introduced the need to open the source table
twice, so we should fix it.
To make really sure we get the same table, let's re-open it by OID not
name. That requires adding an OID field to struct TableLikeClause,
which is a little nervous-making from an ABI standpoint, but as long
as it's at the end I don't think there's any serious risk.
Per bug #16758 from Marc Boeren. Like the previous patch,
back-patch to all supported branches.
Discussion: https://postgr.es/m/16758-840e84a6cfab276d@postgresql.org
Previously this code assumed that all IndexScan nodes supported
mark/restore, which is not true since it depends on optional index AM
support functions. This could lead to errors about missing support
functions in rare edge cases of mergejoins with no sort keys, where an
unordered non-btree index scan was placed on the inner path without a
protecting Materialize node. (Normally, the fact that merge join
requires ordered input would avoid this error.)
Backpatch all the way since this bug is ancient.
Per report from Eugen Konkov on irc.
Discussion: https://postgr.es/m/87o8jn50be.fsf@news-spur.riddles.org.uk
Introduce TimestampDifferenceMilliseconds() to simplify callers
that would rather have the difference in milliseconds, instead of
the select()-oriented seconds-and-microseconds format. This gets
rid of at least one integer division per call, and it eliminates
some apparently-easy-to-mess-up arithmetic.
Two of these call sites were in fact wrong:
* pg_prewarm's autoprewarm_main() forgot to multiply the seconds
by 1000, thus ending up with a delay 1000X shorter than intended.
That doesn't quite make it a busy-wait, but close.
* postgres_fdw's pgfdw_get_cleanup_result() thought it needed to compute
microseconds not milliseconds, thus ending up with a delay 1000X longer
than intended. Somebody along the way had noticed this problem but
misdiagnosed the cause, and imposed an ad-hoc 60-second limit rather
than fixing the units. This was relatively harmless in context, because
we don't care that much about exactly how long this delay is; still,
it's wrong.
There are a few more callers of TimestampDifference() that don't
have a direct need for seconds-and-microseconds, but can't use
TimestampDifferenceMilliseconds() either because they do need
microsecond precision or because they might possibly deal with
intervals long enough to overflow 32-bit milliseconds. It might be
worth inventing another API to improve that, but that seems outside
the scope of this patch; so those callers are untouched here.
Given the fact that we are fixing some bugs, and the likelihood
that future patches might want to back-patch code that uses this
new API, back-patch to all supported branches.
Alexey Kondratov and Tom Lane
Discussion: https://postgr.es/m/3b1c053a21c07c1ed5e00be3b2b855ef@postgrespro.ru
This back-patches commit 20d3fe900 into the v12 and v13 branches.
At the time I thought that commit was not fixing any observable
bug, but Bertrand Drouvot showed otherwise: adding a dropped column
to the previously-considered scenario crashes v12 and v13, unless the
dropped column happens to be an integer. That is, of course, because
the tupdesc we derive from the plan output tlist fails to describe
the dropped column accurately, so that we'll do the wrong thing with
a tuple in which that column isn't NULL.
There is no bug in pre-v12 branches because they already did use
the table's real tuple descriptor for any trigger-returned tuple.
It seems that this set of bugs can be blamed on the changes that
removed es_trig_tuple_slot, though I've not attempted to pin that
down precisely.
Although there's no code change needed in HEAD, update the test case
to include a dropped column there too.
Discussion: https://postgr.es/m/db5d97c8-f48a-51e2-7b08-b73d5434d425@amazon.com
Discussion: https://postgr.es/m/16644-5da7ef98a7ac4545@postgresql.org
fill_hba_line() thought it could get away with passing sizeof(struct
sockaddr_storage) rather than the actual addrlen previously returned
by getaddrinfo(). While that appears to work on many platforms,
it does not work on FreeBSD 11: you get back a failure, which leads
to the view showing NULL for the address and netmask columns in all
rows. The POSIX spec for getnameinfo() is pretty clearly on
FreeBSD's side here: you should pass the actual address length.
So it seems plausible that there are other platforms where this
coding also fails, and we just hadn't noticed.
Also, IMO the fact that getnameinfo() failure leads to a NULL output
is pretty bogus in itself. Our pg_getnameinfo_all() wrapper is
careful to emit "???" on failure, and we should use that in such
cases. NULL should only be emitted in rows that don't have IP
addresses.
Per bug #16695 from Peter Vandivier. Back-patch to v10 where this
code was added.
Discussion: https://postgr.es/m/16695-a665558e2f630be7@postgresql.org
This is useful for checks of relation pages without having to load the
pages into the shared buffers, and two cases can make use of that: page
verification in base backups and the online, lock-safe, flavor.
Compatibility is kept with past versions using a routine that calls the
new extended routine with the set of options compatible with the
original version. Contrary to d401c576, a macro cannot be used as there
may be external code relying on the presence of the original routine.
This is applied down to 11, where this will be used by a follow-up
commit addressing a set of issues with page verification in base
backups.
Extracted from a larger patch by the same author.
Author: Anastasia Lubennikova
Reviewed-by: Michael Paquier, Julien Rouhaud
Discussion: https://postgr.es/m/608f3476-0598-2514-2c03-e05c7d2b0cbd@postgrespro.ru
Backpatch-through: 11
Statistics associated to an index got lost after running REINDEX
CONCURRENTLY, while the non-concurrent case preserves these correctly.
The concurrent and non-concurrent operations need to be consistent for
the end-user, and missing statistics would force to wait for a new
analyze to happen, which could take some time depending on the activity
of the existing autovacuum workers. This issue is fixed by copying any
existing entries in pg_statistic associated to the old index to the new
one. Note that this copy is already done with the data of the index in
the stats collector.
Reported-by: Fabrízio de Royes Mello
Author: Michael Paquier, Fabrízio de Royes Mello
Reviewed-by: Justin Pryzby
Discussion: https://postgr.es/m/CAFcNs+qpFPmiHd1oTXvcPdvAHicJDA9qBUSujgAhUMJyUMb+SA@mail.gmail.com
Backpatch-through: 12
It's unsafe to do this at parse time because addition of generated
columns to a table would not invalidate stored rules containing
UPDATEs on the table ... but there might now be dependent generated
columns that were not there when the rule was made. This also fixes
an oversight that rewriteTargetView failed to update extraUpdatedCols
when transforming an UPDATE on an updatable view. (Since the new
calculation is downstream of that, rewriteTargetView doesn't actually
need to do anything; but before, there was a demonstrable bug there.)
In v13 and HEAD, this leads to easily-visible bugs because (since
commit c6679e4fc) we won't recalculate generated columns that aren't
listed in extraUpdatedCols. In v12 this bitmap is mostly just used
for trigger-firing decisions, so you'd only notice a problem if a
trigger cared whether a generated column had been updated.
I'd complained about this back in May, but then forgot about it
until bug #16671 from Michael Paul Killian revived the issue.
Back-patch to v12 where this field was introduced. If existing
stored rules contain any extraUpdatedCols values, they'll be
ignored because the rewriter will overwrite them, so the bug will
be fixed even for existing rules. (But note that if someone were
to update to 13.1 or 12.5, store some rules with UPDATEs on tables
having generated columns, and then downgrade to a prior minor version,
they might observe issues similar to what this patch fixes. That
seems unlikely enough to not be worth going to a lot of effort to fix.)
Discussion: https://postgr.es/m/10206.1588964727@sss.pgh.pa.us
Discussion: https://postgr.es/m/16671-2fa55851859fb166@postgresql.org
Unfortunately in LLVM 3.9 LLVMGetAttributeCountAtIndex(func, index)
crashes when called with an index that has 0 attributes. Since there's
no way to work around this in the C API, add a small C++ wrapper doing
so.
The only reason this didn't fail before 72559438f92 is that there
always are function attributes...
Author: Andres Freund <andres@anarazel.de>
Discussion: https://postgr.es/m/20201016001254.w2nfj7gd74jmb5in@alap3.anarazel.de
Backpatch: 11-, like 72559438f92
POSIX sigaction(2) can be told to block a set of signals while a
signal handler executes. Make use of that instead of manually
blocking and unblocking signals in the postmaster's signal handlers.
This should save a few cycles, but more importantly it prevents
recursive invocation of signal handlers when many signals arrive in
close succession. (Assuming that the platform's signal infrastructure
is designed to avoid consuming stack space in that case, but this is
demonstrably true at least on Linux.) The existing code has been seen
to recurse to the point of stack overflow, either in the postmaster
or in a forked-off child.
Back-patch of commit 9abb2bfc0. At the time, we'd only seen excess
postmaster stack consumption in the buildfarm; but we now have a
user report of it, and that commit has aged enough to have a fair
amount of confidence that it doesn't break anything.
This still doesn't change anything about the way that it works on
Windows. Perhaps someone else would like to fix that?
Per bug #16673 from David Geier. Back-patch to 9.6. Although
the problem exists in principle before that, we've only seen it
actually materialize in connection with heavy use of parallel
workers, so it doesn't seem necessary to do anything in 9.5;
and the relevant code is different there, too.
Discussion: https://postgr.es/m/16673-d278c604f8e34ec0@postgresql.org
Discussion: https://postgr.es/m/14878.1570820201@sss.pgh.pa.us
The code recorded cache invalidation events by zeroing the "localreloid"
field of affected cache entries. However, it's possible for an inval
event to occur even while we have the entry open and locked. So an
ill-timed inval could result in "cache lookup failed for relation 0"
errors, if the worker's code tried to use the cleared field. We can
fix that by creating a separate bool field to record whether the entry
needs to be revalidated. (In the back branches, cram the bool into
what had been padding space, to avoid an ABI break in the somewhat
unlikely event that any extension is looking at this struct.)
Also, rearrange the logic in logicalrep_rel_open so that it
does the right thing in cases where table_open would fail.
We should retry the lookup by name in that case, but we didn't.
The real-world impact of this is probably small. In the first place,
the error conditions are very low probability, and in the second place,
the worker would just exit and get restarted. We only noticed because
in a CLOBBER_CACHE_ALWAYS build, the failure can occur repeatedly,
preventing the worker from making progress. Nonetheless, it's clearly
a bug, and it impedes a useful type of testing; so back-patch to v10
where this code was introduced.
Discussion: https://postgr.es/m/1032727.1600096803@sss.pgh.pa.us
If a CREATE TABLE command uses both LIKE and traditional inheritance,
Vars in CHECK constraints and expression indexes that are absorbed
from a LIKE parent table tended to get mis-numbered, resulting in
wrong answers and/or bizarre error messages (though probably not any
actual crashes, thanks to validation occurring in the executor).
In v12 and up, the same could happen to Vars in GENERATED expressions,
even in cases with no LIKE clause but multiple traditional-inheritance
parents.
The cause of the problem for LIKE is that parse_utilcmd.c supposed
it could renumber such Vars correctly during transformCreateStmt(),
which it cannot since we have not yet accounted for columns added via
inheritance. Fix that by postponing processing of LIKE INCLUDING
CONSTRAINTS, DEFAULTS, GENERATED, INDEXES till after we've performed
DefineRelation().
The error with GENERATED and multiple inheritance is a simple oversight
in MergeAttributes(); it knows it has to renumber Vars in inherited
CHECK constraints, but forgot to apply the same processing to inherited
GENERATED expressions (a/k/a defaults).
Per bug #16272 from Tom Gottfried. The non-GENERATED variants of the
issue are ancient, presumably dating right back to the addition of
CREATE TABLE LIKE; hence back-patch to all supported branches.
Discussion: https://postgr.es/m/16272-6e32da020e9a9381@postgresql.org
Commit a477bfc1d fixed eval_const_expressions() to ensure that it
didn't generate unnecessary RelabelType nodes, but I failed to notice
that some other places in the planner had the same issue. Really
noplace in the planner should be using plain makeRelabelType(), for
fear of generating expressions that should be equal() to semantically
equivalent trees, but aren't.
An example is that because canonicalize_ec_expression() failed
to be careful about this, we could end up with an equivalence class
containing both a plain Const, and a Const-with-RelabelType
representing exactly the same value. So far as I can tell this led to
no visible misbehavior, but we did waste a bunch of cycles generating
and evaluating "Const = Const-with-RelabelType" to prove such entries
are redundant.
Hence, move the support function added by a477bfc1d to where it can
be more generally useful, and use it in the places where planner code
previously used makeRelabelType.
Back-patch to v12, like the previous patch. While I have no concrete
evidence of any real misbehavior here, it's certainly possible that
I overlooked a case where equivalent expressions that aren't equal()
could cause a user-visible problem. In any case carrying extra
RelabelType nodes through planning to execution isn't very desirable.
Discussion: https://postgr.es/m/1311836.1597781384@sss.pgh.pa.us
Several PGXN modules reference LockTagType values; renumbering would
force a recompile of those modules. Oversight in back-patch of today's
commit 566372b3d6435639e4cc4476d79b8505a0297c87. Back-patch to released
branches, v12 through 9.5.
Reported by Tom Lane.
Discussion: https://postgr.es/m/921383.1597523945@sss.pgh.pa.us
The SimpleLruTruncate() header comment states the new coding rule. To
achieve this, add locktype "frozenid" and two LWLocks. This closes a
rare opportunity for data loss, which manifested as "apparent
wraparound" or "could not access status of transaction" errors. Data
loss is more likely in pg_multixact, due to released branches' thin
margin between multiStopLimit and multiWrapLimit. If a user's physical
replication primary logged ": apparent wraparound" messages, the user
should rebuild standbys of that primary regardless of symptoms. At less
risk is a cluster having emitted "not accepting commands" errors or
"must be vacuumed" warnings at some point. One can test a cluster for
this data loss by running VACUUM FREEZE in every database. Back-patch
to 9.5 (all supported versions).
Discussion: https://postgr.es/m/20190218073103.GA1434723@rfd.leadboat.com
nodeSubplan.c expects that the testexpr for a hashable ANY SubPlan
has the form of one or more OpExprs whose LHS is an expression of the
outer query's, while the RHS is an expression over Params representing
output columns of the subquery. However, the planner only went as far
as verifying that the clauses were all binary OpExprs. This works
99.99% of the time, because the clauses have the right shape when
emitted by the parser --- but it's possible for function inlining to
break that, as reported by PegoraroF10. To fix, teach the planner
to check that the LHS and RHS contain the right things, or more
accurately don't contain the wrong things. Given that this has been
broken for years without anyone noticing, it seems sufficient to just
give up hashing when it happens, rather than go to the trouble of
commuting the clauses back again (which wouldn't necessarily work
anyway).
While poking at that, I also noticed that nodeSubplan.c had a baked-in
assumption that the number of hash clauses is identical to the number
of subquery output columns. Again, that's fine as far as parser output
goes, but it's not hard to break it via function inlining. There seems
little reason for that assumption though --- AFAICS, the only thing
it's buying us is not having to store the number of hash clauses
explicitly. Adding code to the planner to reject such cases would take
more code than getting nodeSubplan.c to cope, so I fixed it that way.
This has been broken for as long as we've had hashable SubPlans,
so back-patch to all supported branches.
Discussion: https://postgr.es/m/1549209182255-0.post@n3.nabble.com
Up to now, upon receipt of a SIGTERM ("smart shutdown" command), the
postmaster has immediately killed all "optional" background processes,
and subsequently refused to launch new ones while it's waiting for
foreground client processes to exit. No doubt this seemed like an OK
policy at some point; but it's a pretty bad one now, because it makes
for a seriously degraded environment for the remaining clients:
* Parallel queries are killed, and new ones fail to launch. (And our
parallel-query infrastructure utterly fails to deal with the case
in a reasonable way --- it just hangs waiting for workers that are
not going to arrive. There is more work needed in that area IMO.)
* Autovacuum ceases to function. We can tolerate that for awhile,
but if bulk-update queries continue to run in the surviving client
sessions, there's eventually going to be a mess. In the worst case
the system could reach a forced shutdown to prevent XID wraparound.
* The bgwriter and walwriter are also stopped immediately, likely
resulting in performance degradation.
Hence, let's rearrange things so that the only immediate change in
behavior is refusing to let in new normal connections. Once the last
normal connection is gone, shut everything down as though we'd received
a "fast" shutdown. To implement this, remove the PM_WAIT_BACKUP and
PM_WAIT_READONLY states, instead staying in PM_RUN or PM_HOT_STANDBY
while normal connections remain. A subsidiary state variable tracks
whether or not we're letting in new connections in those states.
This also allows having just one copy of the logic for killing child
processes in smart and fast shutdown modes. I moved that logic into
PostmasterStateMachine() by inventing a new state PM_STOP_BACKENDS.
Back-patch to 9.6 where parallel query was added. In principle
this'd be a good idea in 9.5 as well, but the risk/reward ratio
is not as good there, since lack of autovacuum is not a problem
during typical uses of smart shutdown.
Per report from Bharath Rupireddy.
Patch by me, reviewed by Thomas Munro
Discussion: https://postgr.es/m/CALj2ACXAZ5vKxT9P7P89D87i3MDO9bfS+_bjMHgnWJs8uwUOOw@mail.gmail.com