While the preceding commit prevented such attachments from occurring
in future, this one aims to prevent further abuse of any already-
created operator that exposes _int_matchsel to the wrong data types.
(No other contrib module has a vulnerable selectivity estimator.)
We need only check that the Const we've found in the query is indeed
of the type we expect (query_int), but there's a difficulty: as an
extension type, query_int doesn't have a fixed OID that we could
hard-code into the estimator.
Therefore, the bulk of this patch consists of infrastructure to let
an extension function securely look up the OID of a datatype
belonging to the same extension. (Extension authors have requested
such functionality before, so we anticipate that this code will
have additional non-security uses, and may soon be extended to allow
looking up other kinds of SQL objects.)
This is done by first finding the extension that owns the calling
function (there can be only one), and then thumbing through the
objects owned by that extension to find a type that has the desired
name. This is relatively expensive, especially for large extensions,
so a simple cache is put in front of these lookups.
Reported-by: Daniel Firer as part of zeroday.cloud
Author: Tom Lane <tgl@sss.pgh.pa.us>
Reviewed-by: Noah Misch <noah@leadboat.com>
Security: CVE-2026-2004
Backpatch-through: 14
An upcoming patch requires this cache so that it can track updates
in the pg_extension catalog. So far though, the EXTENSIONOID cache
only exists in v18 and up (see 490f869d9). We can add it in older
branches without an ABI break, if we are careful not to disturb the
numbering of existing syscache IDs.
In v16 and before, that just requires adding the new ID at the end
of the hand-assigned enum list, ignoring our convention about
alphabetizing the IDs. But in v17, genbki.pl enforces alphabetical
order of the IDs listed in MAKE_SYSCACHE macros. We can fake it
out by calling the new cache ZEXTENSIONOID.
Note that adding a syscache does change the required contents of the
relcache init file (pg_internal.init). But that isn't problematic
since we blow those away at postmaster start for other reasons.
Author: Tom Lane <tgl@sss.pgh.pa.us>
Reviewed-by: Noah Misch <noah@leadboat.com>
Security: CVE-2026-2004
Backpatch-through: 14-17
These data types are represented like full-fledged arrays, but
functions that deal specifically with these types assume that the
array is 1-dimensional and contains no nulls. However, there are
cast pathways that allow general oid[] or int2[] arrays to be cast
to these types, allowing these expectations to be violated. This
can be exploited to cause server memory disclosure or SIGSEGV.
Fix by installing explicit checks in functions that accept these
types.
Reported-by: Altan Birler <altan.birler@tum.de>
Author: Tom Lane <tgl@sss.pgh.pa.us>
Reviewed-by: Noah Misch <noah@leadboat.com>
Security: CVE-2026-2003
Backpatch-through: 14
A security patch changed them today, so close the coverage gap now.
Test that buffer overrun is avoided when pg_mblen*() requires more
than the number of bytes remaining.
This does not cover the calls in dict_thesaurus.c or in dict_synonym.c.
That code is straightforward. To change that code's input, one must
have access to modify installed OS files, so low-privilege users are not
a threat. Testing this would likewise require changing installed
share/postgresql/tsearch_data, which was enough of an obstacle to not
bother.
Security: CVE-2026-2006
Backpatch-through: 14
Co-authored-by: Thomas Munro <thomas.munro@gmail.com>
Co-authored-by: Noah Misch <noah@leadboat.com>
Reviewed-by: Heikki Linnakangas <hlinnaka@iki.fi>
A corrupted string could cause code that iterates with pg_mblen() to
overrun its buffer. Fix, by converting all callers to one of the
following:
1. Callers with a null-terminated string now use pg_mblen_cstr(), which
raises an "illegal byte sequence" error if it finds a terminator in the
middle of the sequence.
2. Callers with a length or end pointer now use either
pg_mblen_with_len() or pg_mblen_range(), for the same effect, depending
on which of the two seems more convenient at each site.
3. A small number of cases pre-validate a string, and can use
pg_mblen_unbounded().
The traditional pg_mblen() function and COPYCHAR macro still exist for
backward compatibility, but are no longer used by core code and are
hereby deprecated. The same applies to the t_isXXX() functions.
Security: CVE-2026-2006
Backpatch-through: 14
Co-authored-by: Thomas Munro <thomas.munro@gmail.com>
Co-authored-by: Noah Misch <noah@leadboat.com>
Reviewed-by: Heikki Linnakangas <hlinnaka@iki.fi>
Reported-by: Paul Gerste (as part of zeroday.cloud)
Reported-by: Moritz Sanft (as part of zeroday.cloud)
Commit 0bada39c83a150079567a6e97b1a25a198f30ea3 fixed a bug of this kind,
which existed in all branches for six days before detection. While the
probability of reaching the trouble was low, the disruption was extreme. No
new backends could start, and service restoration needed an immediate
shutdown. Hence, add this to catch the next bug like it.
The new check in RelationIdGetRelation() suffices to make autovacuum detect
the bug in commit 243e9b40f1b2dd09d6e5bf91ebf6e822a2cd3704 that led to commit
0bada39. This also checks in a number of similar places. It replaces each
Assert(IsTransactionState()) that pertained to a conditional catalog read.
Back-patch to v14 - v17. This a back-patch of commit
f4ece891fc2f3f96f0571720a1ae30db8030681b (from before v18 branched) to
all supported branches, to accompany the back-patch of commits 243e9b4
and 0bada39. For catalog indexes, the bttextcmp() behavior that
motivated IsCatalogTextUniqueIndexOid() was v18-specific. Hence, this
back-patch doesn't need that or its correction from commit
4a4ee0c2c1e53401924101945ac3d517c0a8a559.
Reported-by: Alexander Lakhin <exclusion@gmail.com>
Discussion: https://postgr.es/m/20250410191830.0e.nmisch@google.com
Discussion: https://postgr.es/m/10ec0bc3-5933-1189-6bb8-5dec4114558e@gmail.com
Backpatch-through: 14-17
The inplace update survives ROLLBACK. The inval didn't, so another
backend's DDL could then update the row without incorporating the
inplace update. In the test this fixes, a mix of CREATE INDEX and ALTER
TABLE resulted in a table with an index, yet relhasindex=f. That is a
source of index corruption.
Back-patch to v14 - v17. This is a back-patch of commits:
- 243e9b40f1b2dd09d6e5bf91ebf6e822a2cd3704
(main change, on master, before v18 branched)
- 0bada39c83a150079567a6e97b1a25a198f30ea3
(defect fix, on master, before v18 branched)
- bae8ca82fd00603ebafa0658640d6e4dfe20af92
(cosmetics from post-commit review, on REL_18_STABLE)
It reverses commit c1099dd745b0135960895caa8892a1873ac6cbe5, my revert
of the original back-patch of 243e9b4.
This back-patch omits the non-comment heap_decode() changes. I find
those changes removed harmless code that was last necessary in v13. See
discussion thread for details. The back branches aren't the place to
remove such code.
Like the original back-patch, this doesn't change WAL, because these
branches use end-of-recovery SIResetAll(). All branches change the ABI
of extern function PrepareToInvalidateCacheTuple(). No PGXN extension
calls that, and there's no apparent use case in extensions. Expect
".abi-compliance-history" edits to follow.
Reviewed-by: Paul A Jungwirth <pj@illuminatedcomputing.com>
Reviewed-by: Surya Poondla <s_poondla@apple.com>
Reviewed-by: Ilyasov Ian <ianilyasov@outlook.com>
Reviewed-by: Nitin Motiani <nitinmotiani@google.com> (in earlier versions)
Reviewed-by: Andres Freund <andres@anarazel.de> (in earlier versions)
Discussion: https://postgr.es/m/20240523000548.58.nmisch@google.com
Backpatch-through: 14-17
The async notification queue contains the XID of the sender, and when
processing notifications we call TransactionIdDidCommit() on the
XID. But we had no safeguards to prevent the CLOG segments containing
those XIDs from being truncated away. As a result, if a backend didn't
for some reason process its notifications for a long time, or when a
new backend issued LISTEN, you could get an error like:
test=# listen c21;
ERROR: 58P01: could not access status of transaction 14279685
DETAIL: Could not open file "pg_xact/000D": No such file or directory.
LOCATION: SlruReportIOError, slru.c:1087
To fix, make VACUUM "freeze" the XIDs in the async notification queue
before truncating the CLOG. Old XIDs are replaced with
FrozenTransactionId or InvalidTransactionId.
Note: This commit is not a full fix. A race condition remains, where a
backend is executing asyncQueueReadAllNotifications() and has just
made a local copy of an async SLRU page which contains old XIDs, while
vacuum concurrently truncates the CLOG covering those XIDs. When the
backend then calls TransactionIdDidCommit() on those XIDs from the
local copy, you still get the error. The next commit will fix that
remaining race condition.
This was first reported by Sergey Zhuravlev in 2021, with many other
people hitting the same issue later. Thanks to:
- Alexandra Wang, Daniil Davydov, Andrei Varashen and Jacques Combrink
for investigating and providing reproducable test cases,
- Matheus Alcantara and Arseniy Mukhin for review and earlier proposed
patches to fix this,
- Álvaro Herrera and Masahiko Sawada for reviews,
- Yura Sokolov aka funny-falcon for the idea of marking transactions
as committed in the notification queue, and
- Joel Jacobson for the final patch version. I hope I didn't forget
anyone.
Backpatch to all supported versions. I believe the bug goes back all
the way to commit d1e027221d, which introduced the SLRU-based async
notification queue.
Discussion: https://www.postgresql.org/message-id/16961-25f29f95b3604a8a@postgresql.org
Discussion: https://www.postgresql.org/message-id/18804-bccbbde5e77a68c2@postgresql.org
Discussion: https://www.postgresql.org/message-id/CAK98qZ3wZLE-RZJN_Y%2BTFjiTRPPFPBwNBpBi5K5CU8hUHkzDpw@mail.gmail.com
Backpatch-through: 14
This omission allowed table owners to create statistics in any
schema, potentially leading to unexpected naming conflicts. For
ALTER TABLE commands that require re-creating statistics objects,
skip this check in case the user has since lost CREATE on the
schema. The addition of a second parameter to CreateStatistics()
breaks ABI compatibility, but we are unaware of any impacted
third-party code.
Reported-by: Jelte Fennema-Nio <postgres@jeltef.nl>
Author: Jelte Fennema-Nio <postgres@jeltef.nl>
Co-authored-by: Nathan Bossart <nathandbossart@gmail.com>
Reviewed-by: Noah Misch <noah@leadboat.com>
Reviewed-by: Álvaro Herrera <alvherre@kurilemu.de>
Security: CVE-2025-12817
Backpatch-through: 13
generic-gcc.h maps our read and write barriers to C11 acquire and
release fences using compiler builtins, for platforms where we don't
have our own hand-rolled assembler. This is apparently enough for GCC,
but the C11 memory model is only defined in terms of atomic accesses,
and our barriers for non-atomic, non-volatile accesses were not always
respected under Clang's stricter interpretation of the standard.
This explains the occasional breakage observed on new RISC-V + Clang
animal greenfly in lock-free PgAioHandle manipulation code containing a
repeating pattern of loads and read barriers. The problem can also be
observed in code generated for MIPS and LoongAarch, though we aren't
currently testing those with Clang, and on x86, though we use our own
assembler there. The scariest aspect is that we use the generic version
on very common ARM systems, but it doesn't seem to reorder the relevant
code there (or we'd have debugged this long ago).
Fix by inserting an explicit compiler barrier. It expands to an empty
assembler block declared to have memory side-effects, so registers are
flushed and reordering is prevented. In those respects this is like the
architecture-specific assembler versions, but the compiler is still in
charge of generating the appropriate fence instruction. Done for write
barriers on principle, though concrete problems have only been observed
with read barriers.
Reported-by: Alexander Lakhin <exclusion@gmail.com>
Tested-by: Alexander Lakhin <exclusion@gmail.com>
Discussion: https://postgr.es/m/d79691be-22bd-457d-9d90-18033b78c40a%40gmail.com
Backpatch-through: 13
XLogRecPtrIsInvalid() is inconsistent with the affirmative form of
macros used for other datatypes, and leads to awkward double negatives
in a few places. This commit introduces XLogRecPtrIsValid(), which
allows code to be written more naturally.
This patch only adds the new macro. XLogRecPtrIsInvalid() is left in
place, and all existing callers remain untouched. This means all
supported branches can accept hypothetical bug fixes that use the new
macro, and at the same time any code that compiled with the original
formulation will continue to silently compile just fine.
Author: Bertrand Drouvot <bertranddrouvot.pg@gmail.com>
Backpatch-through: 13
Discussion: https://postgr.es/m/aQB7EvGqrbZXrMlg@ip-10-97-1-34.eu-west-3.compute.internal
The C standard says that the second and third arguments of a
conditional operator shall be both void type or both not-void
type. The Windows version of INTERRUPTS_PENDING_CONDITION()
got this wrong. It's pretty harmless because the result of
the operator is ignored anyway, but apparently recent versions
of MSVC have started issuing a warning about it. Silence the
warning by casting the dummy zero to void.
Reported-by: Christian Ullrich <chris@chrullrich.net>
Author: Bryan Green <dbryan.green@gmail.com>
Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us>
Discussion: https://postgr.es/m/cc4ef8db-f8dc-4347-8a22-e7ebf44c0308@chrullrich.net
Backpatch-through: 13
Previously, if primary_slot_name was set to an invalid slot name and
the configuration file was reloaded, both the postmaster and all other
backend processes reported a WARNING. With many processes running,
this could produce a flood of duplicate messages. The problem was that
the GUC check hook for primary_slot_name reported errors at WARNING
level via ereport().
This commit changes the check hook to use GUC_check_errdetail() and
GUC_check_errhint() for error reporting. As with other GUC parameters,
this causes non-postmaster processes to log the message at DEBUG3,
so by default, only the postmaster's message appears in the log file.
Backpatch to all supported versions.
Author: Fujii Masao <masao.fujii@gmail.com>
Reviewed-by: Chao Li <lic@highgo.com>
Reviewed-by: Amit Kapila <amit.kapila16@gmail.com>
Reviewed-by: Álvaro Herrera <alvherre@kurilemu.de>
Reviewed-by: Hayato Kuroda <kuroda.hayato@fujitsu.com>
Discussion: https://postgr.es/m/CAHGQGwFud-cvthCTfusBfKHBS6Jj6kdAPTdLWKvP2qjUX6L_wA@mail.gmail.com
Backpatch-through: 13
log_error() would probably fail completely if used, and would
certainly print garbage for anything that needed to be interpolated
into the message, because it was failing to use the correct printing
subroutine for a va_list argument.
This bug likely went undetected because the error cases this code
is used for are rarely exercised - they only occur when Windows
security API calls fail catastrophically (out of memory, security
subsystem corruption, etc).
The FRONTEND variant can be fixed just by calling vfprintf()
instead of fprintf(). However, there was no va_list variant
of write_stderr(), so create one by refactoring that function.
Following the usual naming convention for such things, call
it vwrite_stderr().
Author: Bryan Green <dbryan.green@gmail.com>
Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us>
Discussion: https://postgr.es/m/CAF+pBj8goe4fRmZ0V3Cs6eyWzYLvK+HvFLYEYWG=TzaM+tWPnw@mail.gmail.com
Backpatch-through: 13
hash_xlog.h included descriptions for the blocks used in WAL records
that were was not completely consistent with how the records are
generated, with one block missing for SQUEEZE_PAGE, and inconsistent
descriptions used for block 0 in VACUUM_ONE_PAGE and MOVE_PAGE_CONTENTS.
This information was incorrect since c11453ce0aea, cross-checking the
logic for the record generation.
Author: Kirill Reshke <reshkekirill@gmail.com>
Reviewed-by: Andrey Borodin <x4mmm@yandex-team.ru>
Discussion: https://postgr.es/m/CALdSSPj1j=a1d1hVA3oabRFz0hSU3KKrYtZPijw4UPUM7LY9zw@mail.gmail.com
Backpatch-through: 13
If an INSERT has an ON CONFLICT DO UPDATE clause, the executor must
check that the target relation supports UPDATE as well as INSERT. In
particular, it must check that the target relation has a REPLICA
IDENTITY if it publishes updates. Formerly, it was not doing this
check, which could lead to silently breaking replication.
Fix by adding such a check to CheckValidResultRel(), which requires
adding a new onConflictAction argument. In back-branches, preserve ABI
compatibility by introducing a wrapper function with the original
signature.
Author: Zhijie Hou <houzj.fnst@fujitsu.com>
Reviewed-by: Ashutosh Bapat <ashutosh.bapat.oss@gmail.com>
Reviewed-by: Dean Rasheed <dean.a.rasheed@gmail.com>
Tested-by: Chao Li <li.evan.chao@gmail.com>
Discussion: https://postgr.es/m/OS3PR01MB57180C87E43A679A730482DF94B62@OS3PR01MB5718.jpnprd01.prod.outlook.com
Backpatch-through: 13
Commit e2d4ef8de86 (the fix for CVE-2017-7484) added security checks
to the selectivity estimation functions to prevent them from running
user-supplied operators on data obtained from pg_statistic if the user
lacks privileges to select from the underlying table. In cases
involving inheritance/partitioning, those checks were originally
performed against the child RTE (which for plain inheritance might
actually refer to the parent table). Commit 553d2ec2710 then extended
that to also check the parent RTE, allowing access if the user had
permissions on either the parent or the child. It turns out, however,
that doing any checks using the child RTE is incorrect, since
securityQuals is set to NULL when creating an RTE for an inheritance
child (whether it refers to the parent table or the child table), and
therefore such checks do not correctly account for any RLS policies or
security barrier views. Therefore, do the security checks using only
the parent RTE. This is consistent with how RLS policies are applied,
and the executor's ACL checks, both of which use only the parent
table's permissions/policies. Similar checks are performed in the
extended stats code, so update that in the same way, centralizing all
the checks in a new function.
In addition, note that these checks by themselves are insufficient to
ensure that the user has access to the table's data because, in a
query that goes via a view, they only check that the view owner has
permissions on the underlying table, not that the current user has
permissions on the view itself. In the selectivity estimation
functions, there is no easy way to navigate from underlying tables to
views, so add permissions checks for all views mentioned in the query
to the planner startup code. If the user lacks permissions on a view,
a permissions error will now be reported at planner-startup, and the
selectivity estimation functions will not be run.
Checking view permissions at planner-startup in this way is a little
ugly, since the same checks will be repeated at executor-startup.
Longer-term, it might be better to move all the permissions checks
from the executor to the planner so that permissions errors can be
reported sooner, instead of creating a plan that won't ever be run.
However, such a change seems too far-reaching to be back-patched.
Back-patch to all supported versions. In v13, there is the added
complication that UPDATEs and DELETEs on inherited target tables are
planned using inheritance_planner(), which plans each inheritance
child table separately, so that the selectivity estimation functions
do not know that they are dealing with a child table accessed via its
parent. Handle that by checking access permissions on the top parent
table at planner-startup, in the same way as we do for views. Any
securityQuals on the top parent table are moved down to the child
tables by inheritance_planner(), so they continue to be checked by the
selectivity estimation functions.
Author: Dean Rasheed <dean.a.rasheed@gmail.com>
Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us>
Reviewed-by: Noah Misch <noah@leadboat.com>
Backpatch-through: 13
Security: CVE-2025-8713
Commit 9e6104c66 disallowed transition tables on foreign tables, but
failed to account for cases where a foreign table is a child table of a
partitioned/inherited table on which transition tables exist, leading to
incorrect transition tuples collected from such foreign tables for
queries on the parent table triggering transition capture. This
occurred not only for inherited UPDATE/DELETE but for partitioned INSERT
later supported by commit 3d956d956, which should have handled it at
least for the INSERT case, but didn't.
To fix, modify ExecAR*Triggers to throw an error if the given relation
is a foreign table requesting transition capture. Also, this commit
fixes make_modifytable so that in case of an inherited UPDATE/DELETE
triggering transition capture, FDWs choose normal operations to modify
child foreign tables, not DirectModify; which is needed because they
would otherwise skip the calls to ExecAR*Triggers at execution, causing
unexpected behavior.
Author: Etsuro Fujita <etsuro.fujita@gmail.com>
Reviewed-by: Amit Langote <amitlangote09@gmail.com>
Discussion: https://postgr.es/m/CAPmGK14QJYikKzBDCe3jMbpGENnQ7popFmbEgm-XTNuk55oyHg%40mail.gmail.com
Backpatch-through: 13
Recent ICU versions have added U_SHOW_CPLUSPLUS_HEADER_API, and we need
to set this to zero as well to hide the ICU C++ APIs from pg_locale.h
Per discussion, we want cpluspluscheck to work cleanly in backbranches,
so backpatch both this and its predecessor commit ed26c4e25a4 to all
supported versions.
Reported-by: Tom Lane <tgl@sss.pgh.pa.us>
Discussion: https://postgr.es/m/1115793.1754414782%40sss.pgh.pa.us
Backpatch-through: 13
A deadlock can occur when the DDL command and the apply worker acquire
catalog locks in different orders while dropping replication origins.
The issue is rare in PG16 and higher branches because, in most cases, the
tablesync worker performs the origin drop in those branches, and its
locking sequence does not conflict with DDL operations.
This patch ensures consistent lock acquisition to prevent such deadlocks.
As per buildfarm.
Reported-by: Alexander Lakhin <exclusion@gmail.com>
Author: Ajin Cherian <itsajin@gmail.com>
Reviewed-by: Hayato Kuroda <kuroda.hayato@fujitsu.com>
Reviewed-by: vignesh C <vignesh21@gmail.com>
Reviewed-by: Amit Kapila <amit.kapila16@gmail.com>
Backpatch-through: 14, where it was introduced
Discussion: https://postgr.es/m/bab95e12-6cc5-4ebb-80a8-3e41956aa297@gmail.com
Solaris has never bothered to add "const" to the second argument
of PAM conversation procs, as all other Unixen did decades ago.
This resulted in an "incompatible pointer" compiler warning when
building --with-pam, but had no more serious effect than that,
so we never did anything about it. However, as of GCC 14 the
case is an error not warning by default.
To complicate matters, recent OpenIndiana (and maybe illumos
in general?) *does* supply the "const" by default, so we can't
just assume that platforms using our solaris template need help.
What we can do, short of building a configure-time probe,
is to make solaris.h #define _PAM_LEGACY_NONCONST, which
causes OpenIndiana's pam_appl.h to revert to the traditional
definition, and hopefully will have no effect anywhere else.
Then we can use that same symbol to control whether we include
"const" in the declaration of pam_passwd_conv_proc().
Bug: #18995
Reported-by: Andrew Watkins <awatkins1966@gmail.com>
Author: Tom Lane <tgl@sss.pgh.pa.us>
Discussion: https://postgr.es/m/18995-82058da9ab4337a7@postgresql.org
Backpatch-through: 13
Commit 4909b38af0 introduced logic to distribute invalidation messages
from catalog-modifying transactions to all concurrent in-progress
transactions. However, since each transaction distributes not only its
original invalidation messages but also previously distributed
messages to other transactions, this leads to an exponential increase
in allocation request size for invalidation messages, ultimately
causing memory allocation failure.
This commit fixes this issue by tracking distributed invalidation
messages separately per decoded transaction and not redistributing
these messages to other in-progress transactions. The maximum size of
distributed invalidation messages that one transaction can store is
limited to MAX_DISTR_INVAL_MSG_PER_TXN (8MB). Once the size of the
distributed invalidation messages exceeds this threshold, we
invalidate all caches in locations where distributed invalidation
messages need to be executed.
Back-patch to all supported versions where we introduced the fix by
commit 4909b38af0.
Note that this commit adds two new fields to ReorderBufferTXN to store
the distributed transactions. This change breaks ABI compatibility in
back branches, affecting third-party extensions that depend on the
size of the ReorderBufferTXN struct, though this scenario seems
unlikely.
Additionally, it adds a new flag to the txn_flags field of
ReorderBufferTXN to indicate distributed invalidation message
overflow. This should not affect existing implementations, as it is
unlikely that third-party extensions use unused bits in the txn_flags
field.
Bug: #18938#18942
Author: vignesh C <vignesh21@gmail.com>
Reported-by: Duncan Sands <duncan.sands@deepbluecap.com>
Reported-by: John Hutchins <john.hutchins@wicourts.gov>
Reported-by: Laurence Parry <greenreaper@hotmail.com>
Reported-by: Max Madden <maxmmadden@gmail.com>
Reported-by: Braulio Fdo Gonzalez <brauliofg@gmail.com>
Reviewed-by: Masahiko Sawada <sawada.mshk@gmail.com>
Reviewed-by: Amit Kapila <amit.kapila16@gmail.com>
Reviewed-by: Hayato Kuroda <kuroda.hayato@fujitsu.com>
Discussion: https://postgr.es/m/680bdaf6-f7d1-4536-b580-05c2760c67c6@deepbluecap.com
Discussion: https://postgr.es/m/18942-0ab1e5ae156613ad@postgresql.org
Discussion: https://postgr.es/m/18938-57c9a1c463b68ce0@postgresql.org
Discussion: https://postgr.es/m/CAD1FGCT2sYrP_70RTuo56QTizyc+J3wJdtn2gtO3VttQFpdMZg@mail.gmail.com
Discussion: https://postgr.es/m/CANO2=B=2BT1hSYCE=nuuTnVTnjidMg0+-FfnRnqM6kd23qoygg@mail.gmail.com
Backpatch-through: 13
Per the letter of the C11 standard, one must #define
__STDC_WANT_LIB_EXT1__ as 1 before including <string.h> in order to
have access to memset_s(). It appears that many platforms are lenient
about this, because we weren't doing it and yet the code appeared to
work anyway. But we now find that with -std=c11, macOS is strict and
doesn't declare memset_s, leading to compile failures since we try to
use it anyway. (Given the lack of prior reports, perhaps this is new
behavior in the latest SDK? No matter, we're clearly in the wrong.)
In addition to the immediate problem, which could be fixed merely by
adding the needed #define to explicit_bzero.c, it seems possible that
our configure-time probe for memset_s() could fail in case a platform
implements the function in some odd way due to this spec requirement.
This concern can be fixed in largely the same way that we dealt with
strchrnul() in 6da2ba1d8: switch to using a declaration-based
configure probe instead of a does-it-link probe.
Back-patch to v13 where we started using memset_s().
Reported-by: Lakshmi Narayana Velayudam <dev.narayana.v@gmail.com>
Author: Tom Lane <tgl@sss.pgh.pa.us>
Discussion: https://postgr.es/m/CAA4pTnLcKGG78xeOjiBr5yS7ZeE-Rh=FaFQQGOO=nPzA1L8yEA@mail.gmail.com
Backpatch-through: 13
With GB18030 as source encoding, applications could crash the server via
SQL functions convert() or convert_from(). Applications themselves
could crash after passing unterminated GB18030 input to libpq functions
PQescapeLiteral(), PQescapeIdentifier(), PQescapeStringConn(), or
PQescapeString(). Extension code could crash by passing unterminated
GB18030 input to jsonapi.h functions. All those functions have been
intended to handle untrusted, unterminated input safely.
A crash required allocating the input such that the last byte of the
allocation was the last byte of a virtual memory page. Some malloc()
implementations take measures against that, making the SIGSEGV hard to
reach. Back-patch to v13 (all supported versions).
Author: Noah Misch <noah@leadboat.com>
Author: Andres Freund <andres@anarazel.de>
Reviewed-by: Masahiko Sawada <sawada.mshk@gmail.com>
Backpatch-through: 13
Security: CVE-2025-4207
synchronous_standby_names cannot be reloaded safely by backends, and the
checkpointer is in charge of updating a state in shared memory if the
GUC is enabled in WalSndCtl, to let the backends know if they should
wait or not for a given LSN. This provides a strict control on the
timing of the waiting queues if the GUC is enabled or disabled, then
reloaded. The checkpointer is also in charge of waking up the backends
that could be waiting for a LSN when the GUC is disabled.
This logic had a race condition at startup, where it would be possible
for backends to not wait for a LSN even if synchronous_standby_names is
enabled. This would cause visibility issues with transactions that we
should be waiting for but they were not. The problem lasts until the
checkpointer does its initial update of the shared memory state when it
loads synchronous_standby_names.
In order to take care of this problem, the shared memory state in
WalSndCtl is extended to detect if it has been initialized by the
checkpointer, and not only check if synchronous_standby_names is
defined. In WalSndCtlData, sync_standbys_defined is renamed to
sync_standbys_status, a bits8 able to know about two states:
- If the shared memory state has been initialized. This flag is set by
the checkpointer at startup once, and never removed.
- If synchronous_standby_names is known as defined in the shared memory
state. This is the same as the previous sync_standbys_defined in
WalSndCtl.
This method gives a way for backends to decide what they should do until
the shared memory area is initialized, and they now ultimately fall back
to a check on the GUC value in this case, which is the best thing that
can be done.
Fortunately, SyncRepUpdateSyncStandbysDefined() is called immediately by
the checkpointer when this process starts, so the window is very narrow.
It is possible to enlarge the problematic window by making the
checkpointer wait at the beginning of SyncRepUpdateSyncStandbysDefined()
with a hardcoded sleep for example, and doing so has showed that a 2PC
visibility test is indeed failing. On machines slow enough, this bug
would cause spurious failures.
In 17~, we have looked at the possibility of adding an injection point
to have a reproducible test, but as the problematic window happens at
early startup, we would need to invent a way to make an injection point
optionally persistent across restarts when attached, something that
would be fine for this case as it would involve the checkpointer. This
issue is quite old, and can be reproduced on all the stable branches.
Author: Melnikov Maksim <m.melnikov@postgrespro.ru>
Co-authored-by: Michael Paquier <michael@paquier.xyz>
Discussion: https://postgr.es/m/163fcbec-900b-4b07-beaa-d2ead8634bec@postgrespro.ru
Backpatch-through: 13
Data loss can happen when the DDLs like ALTER PUBLICATION ... ADD TABLE ...
or ALTER TYPE ... that don't take a strong lock on table happens
concurrently to DMLs on the tables involved in the DDL. This happens
because logical decoding doesn't distribute invalidations to concurrent
transactions and those transactions use stale cache data to decode the
changes. The problem becomes bigger because we keep using the stale cache
even after those in-progress transactions are finished and skip the
changes required to be sent to the client.
This commit fixes the issue by distributing invalidation messages from
catalog-modifying transactions to all concurrent in-progress transactions.
This allows the necessary rebuild of the catalog cache when decoding new
changes after concurrent DDL.
We observed performance regression primarily during frequent execution of
*publication DDL* statements that modify the published tables. The
regression is minor or nearly nonexistent for DDLs that do not affect the
published tables or occur infrequently, making this a worthwhile cost to
resolve a longstanding data loss issue.
An alternative approach considered was to take a strong lock on each
affected table during publication modification. However, this would only
address issues related to publication DDLs (but not the ALTER TYPE ...)
and require locking every relation in the database for publications
created as FOR ALL TABLES, which is impractical.
The bug exists in all supported branches, but we are backpatching till 14.
The fix for 13 requires somewhat bigger changes than this fix, so the fix
for that branch is still under discussion.
Reported-by: hubert depesz lubaczewski <depesz@depesz.com>
Reported-by: Tomas Vondra <tomas.vondra@enterprisedb.com>
Author: Shlok Kyal <shlok.kyal.oss@gmail.com>
Author: Hayato Kuroda <kuroda.hayato@fujitsu.com>
Reviewed-by: Zhijie Hou <houzj.fnst@fujitsu.com>
Reviewed-by: Masahiko Sawada <sawada.mshk@gmail.com>
Reviewed-by: Amit Kapila <amit.kapila16@gmail.com>
Tested-by: Benoit Lobréau <benoit.lobreau@dalibo.com>
Backpatch-through: 14
Discussion: https://postgr.es/m/de52b282-1166-1180-45a2-8d8917ca74c6@enterprisedb.com
Discussion: https://postgr.es/m/CAD21AoAenVqiMjpN-PvGHL1N9DWnHSq673bfgr6phmBUzx=kLQ@mail.gmail.com
Before a restartpoint finishes PreallocXlogFiles(), a startup process
KeepFileRestoredFromArchive() call can unlink the preallocated segment.
If a CHECKPOINT sql command had elicited the restartpoint experiencing
the race condition, that sql command failed. Moreover, the restartpoint
omitted its log_checkpoints message and some inessential resource
reclamation. Prevent the ERROR by skipping open() of the segment.
Since these consequences are so minor, no back-patch.
This commit has been applied as of 2b3e4672f760 in v15 and newer
versions. This is required on stable branches of v13 and v14 to fix a
regression reported by Noah Misch, introduced by 1f95181b44c8, causing
spurious failures in archive recovery (neither streaming nor archive
recovery) with concurrent restartpoints. The backpatched versions of
the patches have been aligned on these branches by me, Noah Misch is the
author. Tests have been conducted by the both of us.
Reported-by: Arun Thirupathi
Author: Noah Misch <noah@leadboat.com>
Discussion: https://postgr.es/m/20210202151416.GB3304930@rfd.leadboat.com
Discussion: https://postgr.es/m/20250306193013.36.nmisch@google.com
Backpatch-through: 13
Only initdb used it. initdb refuses to operate on a non-empty directory
and generally does not cope with pre-existing files of other kinds.
Hence, use the opportunity to simplify.
This commit has been applied as of 421484f79c0b in v15 and newer
versions. This is required on stable branches of v13 and v14 to fix a
regression reported by Noah Misch, introduced by 1f95181b44c8, causing
spurious failures in archive recovery (neither streaming nor archive
recovery) with concurrent restartpoints. The backpatched versions of
the patches have been aligned on these branches by me, Noah Misch is the
author. Tests have been conducted by the both of us.
Reported-by: Arun Thirupathi
Author: Noah Misch <noah@leadboat.com>
Discussion: https://postgr.es/m/20210202151416.GB3304930@rfd.leadboat.com
Discussion: https://postgr.es/m/20250306193013.36.nmisch@google.com
Backpatch-through: 13
Cold paths, initdb and end-of-recovery, used it. Don't optimize them.
This commit has been applied as of c53c6b98d38a in v15 and newer
versions. This is required on stable branches of v13 and v14 to fix a
regression reported by Noah Misch, introduced by 1f95181b44c8, causing
spurious failures in archive recovery (neither streaming nor archive
recovery) with concurrent restartpoints. The backpatched versions of
the patches have been aligned on these branches by me, Noah Misch is the
author. Tests have been conducted by the both of us.
Reported-by: Arun Thirupathi
Author: Noah Misch <noah@leadboat.com>
Discussion: https://postgr.es/m/20210202151416.GB3304930@rfd.leadboat.com
Discussion: https://postgr.es/m/20250306193013.36.nmisch@google.com
Backpatch-through: 13
This replaces dblink's blocking libpq calls, allowing cancellation and
allowing DROP DATABASE (of a database not involved in the query). Apart
from explicit dblink_cancel_query() calls, dblink still doesn't cancel
the remote side. The replacement for the blocking calls consists of
new, general-purpose query execution wrappers in the libpqsrv facility.
Out-of-tree extensions should adopt these.
The original commit d3c5f37dd543498cc7c678815d3921823beec9e9 did not
back-patch. Back-patch now to v16-v13, bringing coverage to all supported
versions. This back-patch omits the orignal's refactoring in postgres_fdw.
Discussion: https://postgr.es/m/20231122012945.74@rfd.leadboat.com
Currently dblink and postgres_fdw don't process interrupts during connection
establishment. Besides preventing query cancellations etc, this can lead to
undetected deadlocks, as global barriers are not processed.
Libpqwalreceiver in contrast, processes interrupts during connection
establishment. The required code is not trivial, so duplicating it into
additional places does not seem like a good option.
These aforementioned undetected deadlocks are the reason for the spate of CI
test failures in the FreeBSD 'test_running' step.
For now the helper library is just a header, as it needs to be linked into
each extension using libpq, and it seems too small to be worth adding a
dedicated static library for.
The conversion to the helper are done in subsequent commits.
Reviewed-by: Thomas Munro <thomas.munro@gmail.com>
Discussion: https://postgr.es/m/20220925232237.p6uskba2dw6fnwj2@awork3.anarazel.de
As of 15.4, macOS has strchrnul(), but access to it is blocked behind
a check for MACOSX_DEPLOYMENT_TARGET >= 15.4. But our does-it-link
configure check finds it, so we try to use it, and fail with the
present default deployment target (namely 15.0). This accounts for
today's buildfarm failures on indri and sifaka.
This is the identical problem that we faced some years ago when Apple
introduced preadv and pwritev in the same way. We solved that in
commit f014b1b9b by using AC_CHECK_DECLS instead of AC_CHECK_FUNCS
to check the functions' availability. So do the same now for
strchrnul(). Interestingly, we already had a workaround for
"the link check doesn't agree with <string.h>" cases with glibc,
which we no longer need since only the header declaration is being
checked.
Testing this revealed that the meson version of this check has never
worked, because it failed to use "-Werror=unguarded-availability-new".
(Apparently nobody's tried to build with meson on macOS versions that
lack preadv/pwritev as standard.) Adjust that while at it. Also,
we had never put support for "-Werror=unguarded-availability-new"
into v13, but we need that now.
Co-authored-by: Tom Lane <tgl@sss.pgh.pa.us>
Co-authored-by: Peter Eisentraut <peter@eisentraut.org>
Discussion: https://postgr.es/m/385134.1743523038@sss.pgh.pa.us
Backpatch-through: 13
Letting the rewriter keep RangeTblEntry.relid when expanding a view
RTE, without making the outfuncs/readfuncs changes that went along
with that originally, is more problematic than I realized. It causes
WRITE_READ_PARSE_PLAN_TREES testing to fail because outfuncs/readfuncs
don't think relid need be saved in an RTE_SUBQUERY RTE.
There doesn't seem to be any other good route to fixing the whole-row
Var problem solved at f4e7756ef, so we just have to deal with the
consequences. We can make the eventually-produced plan tree safe
for WRITE_READ_PARSE_PLAN_TREES by clearing the relid field at the
end of planning, as was already being done for the functions field.
(The functions field is not problematic here because our abuse of it
is strictly local to the planner.) However, there is no nice fix for
the post-rewrite WRITE_READ_PARSE_PLAN_TREES test.
The solution adopted here is to remove the post-rewrite test in the
affected branches. That's surely less than ideal, but a couple of
arguments can be made why it's not unacceptable. First, the behavior
of outfuncs/readfuncs for parsetrees in these branches is frozen no
matter what, because of catalog stability requirements. So we're not
testing anything that is going to change. Second, testing
WRITE_READ_PARSE_PLAN_TREES at this particular time doesn't correspond
to any direct system functionality requirement, neither rule storage
nor plan transmission.
Reported-by: Andres Freund <andres@anarazel.de>
Author: Tom Lane <tgl@sss.pgh.pa.us>
Reviewed-by: Dean Rasheed <dean.a.rasheed@gmail.com>
Discussion: https://postgr.es/m/3518c50a-ab18-482f-b916-a37263622501@deepbluecap.com
Backpatch-through: 13-15
When the rewriter converts an RTE_RELATION RTE for a view into
an RTE_SUBQUERY RTE containing the view's defining query, leave
the relid field alone instead of zeroing it. This allows the
planner to tell that the subquery came from a view rather than
having been written in-line, which is needed to support an
upcoming planner bug fix. We cannot change the behavior of the
outfuncs/readfuncs code in released branches, so the relid field
will not survive in plans passed to parallel workers; therefore
this info should not be relied on in the executor.
This back-patches a portion of the data structure definitional
changes made in v16 and up by commit 47bb9db75. It's being
committed separately for visibility in the commit log, but
with luck it will not actually matter to anyone. While it's
not inconceivable that this change will break code expecting
relid to be zero in a subquery RTE, we can hope that such code
has already been adjusted to cope with v16 and up.
Reported-by: Duncan Sands <duncan.sands@deepbluecap.com>
Diagnosed-by: Tender Wang <tndrwang@gmail.com>
Author: Tom Lane <tgl@sss.pgh.pa.us>
Reviewed-by: Dean Rasheed <dean.a.rasheed@gmail.com>
Discussion: https://postgr.es/m/3518c50a-ab18-482f-b916-a37263622501@deepbluecap.com
Backpatch-through: 13-15
If a domain type has a default, adding a column of that type (without
any explicit DEFAULT clause) failed to install the domain's default
value in existing rows, instead leaving the new column null. This
is unexpected, and it used to work correctly before v11. The cause
is confusion in the atthasmissing mechanism about which default value
to install: we'd only consider installing an explicitly-specified
default, and then we'd decide that no table rewrite is needed.
To fix, take the responsibility for filling attmissingval out of
StoreAttrDefault, and instead put it into ATExecAddColumn's existing
logic that derives the correct value to fill the new column with.
Also, centralize the logic that determines the need for
default-related table rewriting there, instead of spreading it over
four or five places.
In the back branches, we'll leave the attmissingval-filling code
in StoreAttrDefault even though it's now dead, for fear that some
extension may be depending on that functionality to exist there.
A separate HEAD-only patch will clean up the now-useless code.
Reported-by: jian he <jian.universality@gmail.com>
Author: jian he <jian.universality@gmail.com>
Author: Tom Lane <tgl@sss.pgh.pa.us>
Discussion: https://postgr.es/m/CACJufxHFssPvkP1we7WMhPD_1kwgbG52o=kQgL+TnVoX5LOyCQ@mail.gmail.com
Backpatch-through: 13
This commit adds fmtIdEnc() and fmtQualifiedIdEnc(), which allow to specify
the encoding as an explicit argument. Additionally setFmtEncoding() is
provided, which defines the encoding when no explicit encoding is provided, to
avoid breaking all code using fmtId().
All users of fmtId()/fmtQualifiedId() are either converted to the explicit
version or a call to setFmtEncoding() has been added.
This commit does not yet utilize the now well-defined encoding, that will
happen in a subsequent commit.
Reviewed-by: Noah Misch <noah@leadboat.com>
Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us>
Backpatch-through: 13
Security: CVE-2025-1094
There are cases where we cannot / do not want to error out for invalidly
encoded input. In such cases it can be useful to replace e.g. an incomplete
multi-byte characters with bytes that will trigger an error when getting
validated as part of a larger string.
Unfortunately, until now, for some encoding no such sequence existed. For
those encodings this commit removes one previously accepted input combination
- we consider that to be ok, as the chosen bytes are outside of the valid
ranges for the encodings, we just previously failed to detect that.
As we cannot add a new field to pg_wchar_table without breaking ABI, this is
implemented "in-line" in the newly added function.
Author: Noah Misch <noah@leadboat.com>
Reviewed-by: Andres Freund <andres@anarazel.de>
Backpatch-through: 13
Security: CVE-2025-1094
If a column is omitted in an INSERT, and there's no column default,
the code in preptlist.c generates a NULL Const to be inserted.
Furthermore, if the column is of a domain type, we wrap the Const
in CoerceToDomain, so as to throw a run-time error if the domain
has a NOT NULL constraint. That's fine as far as it goes, but
there are two problems:
1. We're being sloppy about the type/typmod that the Const is
labeled with. It really should have the domain's base type/typmod,
since it's the input to CoerceToDomain not the output. This can
result in coerce_to_domain inserting a useless length-coercion
function (useless because it's being applied to a null). The
coercion would typically get const-folded away later, but it'd
be better not to create it in the first place.
2. We're not applying expression preprocessing (specifically,
eval_const_expressions) to the resulting expression tree.
The planner's primary expression-preprocessing pass already happened,
so that means the length coercion step and CoerceToDomain node miss
preprocessing altogether.
This is at the least inefficient, since it means the length coercion
and CoerceToDomain will actually be executed for each inserted row,
though they could be const-folded away in most cases. Worse, it
seems possible that missing preprocessing for the length coercion
could result in an invalid plan (for example, due to failing to
perform default-function-argument insertion). I'm not aware of
any live bug of that sort with core datatypes, and it might be
unreachable for extension types as well because of restrictions of
CREATE CAST, but I'm not entirely convinced that it's unreachable.
Hence, it seems worth back-patching the fix (although I only went
back to v14, as the patch doesn't apply cleanly at all in v13).
There are several places in the rewriter that are building null
domain constants the same way as preptlist.c. While those are
before the planner and hence don't have any reachable bug, they're
still applying a length coercion that will be const-folded away
later, uselessly wasting cycles. Hence, make a utility routine
that all of these places can call to do it right.
Making this code more careful about the typmod assigned to the
generated NULL constant has visible but cosmetic effects on some
of the plans shown in contrib/postgres_fdw's regression tests.
Discussion: https://postgr.es/m/1865579.1738113656@sss.pgh.pa.us
Backpatch-through: 14
When running on Windows, canonicalize_path() converts '\' to '/'
to prevent confusing the Windows command processor. It was
doing that in a non-encoding-aware fashion; but in SJIS there
are valid two-byte characters whose second byte matches '\'.
So encoding corruption ensues if such a character is used in
the path.
We can fairly easily fix this if we know which encoding is
in use, but a lot of our utilities don't have much of a clue
about that. After some discussion we decided we'd settle for
fixing this only in psql, and assuming that its value of
client_encoding matches what the user is typing.
It seems hopeless to get the server to deal with the problematic
characters in database path names, so we'll just declare that
case to be unsupported. That means nothing need be done in
the server, nor in utility programs whose only contact with
file path names is for database paths. But psql frequently
deals with client-side file paths, so it'd be good if it
didn't mess those up.
Bug: #18735
Reported-by: Koichi Suzuki <koichi.suzuki@enterprisedb.com>
Author: Tom Lane <tgl@sss.pgh.pa.us>
Reviewed-by: Koichi Suzuki <koichi.suzuki@enterprisedb.com>
Discussion: https://postgr.es/m/18735-4acdb3998bb9f2b1@postgresql.org
Backpatch-through: 13
The right mix of DDL and VACUUM could corrupt a catalog page header such
that PageIsVerified() durably fails, requiring a restore from backup.
This affects only catalogs that both have a syscache and have DDL code
that uses syscache tuples to construct updates. One of the test
permutations shows a variant not yet fixed.
This makes !TransactionIdIsValid(TM_FailureData.xmax) possible with
TM_Deleted. I think core and PGXN are indifferent to that.
Per bug #17821 from Alexander Lakhin. Back-patch to v13 (all supported
versions). The test case is v17+, since it uses INJECTION_POINT.
Discussion: https://postgr.es/m/17821-dd8c334263399284@postgresql.org
The freshly-released 2025a version of tzdata has a refined estimate
for the longitude of Manila, changing their value for LMT in
pre-standardized-timezone days. This changes the output of one of
our test cases. Since we need to be able to run with system tzdata
files that may or may not contain this update, we'd better stop
making that specific test.
I switched it to use Asia/Singapore, which has a roughly similar UTC
offset. That LMT value hasn't changed in tzdb since 2003, so we can
hope that it's well established.
I also noticed that this set of make_timestamptz tests only exercises
zones east of Greenwich, which seems rather sad, and was not the
original intent AFAICS. (We've already changed these tests once
to stabilize their results across tzdata updates, cf 66b737cd9;
it looks like I failed to consider the UTC-offset-sign aspect then.)
To improve that, add a test with Pacific/Honolulu. That LMT offset
is also quite old in tzdb, so we'll cross our fingers that it doesn't
get improved.
Reported-by: Christoph Berg <cb@df7cb.de>
Discussion: https://postgr.es/m/Z46inkznCxesvDEb@msg.df7cb.de
Backpatch-through: 13
In the name of ABI stability (that is, to avoid a library major
version bump for libpq), libpq still exports a version of pqsignal()
that we no longer want to use ourselves. However, since that has
the same link name as the function exported by src/port/pqsignal.c,
there is a link ordering dependency determining which version will
actually get used by code that uses libpq as well as libpgport.a.
It now emerges that the wrong version has been used by pgbench and
psql since commit 06843df4a rearranged their link commands. This
can result in odd failures in pgbench with the -T switch, since its
SIGALRM handler will now not be marked SA_RESTART. psql may have
some edge-case problems in \watch, too.
Since we don't want to depend on link ordering effects anymore,
let's fix this in the same spirit as b6c7cfac8: use macros to change
the actual link names of the competing functions. We cannot change
legacy-pqsignal.c's exported name of course, so the victim has to be
src/port/pqsignal.c.
In master, rename its exported name to be pqsignal_fe in frontend or
pqsignal_be in backend. (We could perhaps have gotten away with using
the same symbol in both cases, but since the FE and BE versions now
work a little differently, it seems advisable to use different names.)
In back branches, rename to pqsignal_fe in frontend but keep it as
pqsignal in backend. The frontend change could affect third-party
code that is calling pqsignal from libpgport.a or libpgport_shlib.a,
but only if the code is compiled against port.h from a different minor
release than libpgport. Since we don't support using libpgport as a
shared library, it seems unlikely that there will be such a problem.
I left the backend symbol unchanged to avoid an ABI break for
extensions. This means that the link ordering hazard still exists
for any extension that links against libpq. However, none of our own
extensions use both pqsignal() and libpq, and we're not making things
any worse for third-party extensions that do.
Report from Andy Fan, diagnosis by Fujii Masao, patch by me.
Back-patch to all supported branches, as 06843df4a was.
Discussion: https://postgr.es/m/87msfz5qv2.fsf@163.com
If a new catalog tuple is inserted that belongs to a catcache list
entry, and cache invalidation happens while the list entry is being
built, the list entry might miss the newly inserted tuple.
To fix, change the way we detect concurrent invalidations while a
catcache entry is being built. Keep a stack of entries that are being
built, and apply cache invalidation to those entries in addition to
the real catcache entries. This is similar to the in-progress list in
relcache.c.
Back-patch to all supported versions.
Reviewed-by: Noah Misch
Discussion: https://www.postgresql.org/message-id/2234dc98-06fe-42ed-b5db-ac17384dc880@iki.fi
It's possible that external code is calling smgrtruncate(). Any
external callers might like to consider the recent changes to
RelationTruncate(), but commit 38c579b0 should not have changed the
function prototype in the back-branches, per ABI stability policy.
Restore smgrtruncate()'s traditional argument list in the back-branches,
but make it a wrapper for a new function smgrtruncate2(). The three
callers in core can use smgrtruncate2() directly. In master (18-to-be),
smgrtruncate2() is effectively renamed to smgrtruncate(), so this wart
is cleaned up.
Reviewed-by: Michael Paquier <michael@paquier.xyz>
Discussion: https://postgr.es/m/CA%2BhUKG%2BThae6x6%2BjmQiuALQBT2Ae1ChjMh1%3DkMvJ8y_SBJZrvA%40mail.gmail.com
Cause parallel workers to not check datallowconn, rolcanlogin, and
ACL_CONNECT privileges. The leader already checked these things
(except for rolcanlogin which might have been checked for a different
role). Re-checking can accomplish little except to induce unexpected
failures in applications that might not even be aware that their query
has been parallelized. We already had the principle that parallel
workers rely on their leader to pass a valid set of authorization
information, so this change just extends that a bit further.
Also, modify the ReservedConnections, datconnlimit and rolconnlimit
logic so that these limits are only enforced against regular backends,
and only regular backends are counted while checking if the limits
were already reached. Previously, background processes that had an
assigned database or role were subject to these limits (with rather
random exclusions for autovac workers and walsenders), and the set of
existing processes that counted against each limit was quite haphazard
as well. The point of these limits, AFAICS, is to ensure the
availability of PGPROC slots for regular backends. Since all other
types of processes have their own separate pools of PGPROC slots, it
makes no sense either to enforce these limits against them or to count
them while enforcing the limit.
While edge-case failures of these sorts have been possible for a
long time, the problem got a good deal worse with commit 5a2fed911
(CVE-2024-10978), which caused parallel workers to make some of these
checks using the leader's current role where before we had used its
AuthenticatedUserId, thus allowing parallel queries to fail after
SET ROLE. The previous behavior was fairly accidental and I have
no desire to return to it.
This patch includes reverting 73c9f91a1, which was an emergency hack
to suppress these same checks in some cases. It wasn't complete,
as shown by a recent bug report from Laurenz Albe. We can also revert
fd4d93d26 and 492217301, which hacked around the same problems in one
regression test.
In passing, remove the special case for autovac workers in
CheckMyDatabase; it seems cleaner to have AutoVacWorkerMain pass
the INIT_PG_OVERRIDE_ALLOW_CONNS flag, now that that does what's
needed.
Like 5a2fed911, back-patch to supported branches (which sadly no
longer includes v12).
Discussion: https://postgr.es/m/1808397.1735156190@sss.pgh.pa.us
Commit aac2c9b4fde889d13f859c233c2523345e72d32b mandated such locking
and attempted to fulfill that mandate, but it missed REASSIGN OWNED.
Hence, it remained possible to lose VACUUM's inplace update of
datfrozenxid if a REASSIGN OWNED processed that database at the same
time. This didn't affect the other inplace-updated catalog, pg_class.
For pg_class, REASSIGN OWNED calls ATExecChangeOwner() instead of the
generic AlterObjectOwner_internal(), and ATExecChangeOwner() fulfills
the locking mandate.
Like in GRANT, implement this by following the locking protocol for any
catalog subject to the generic AlterObjectOwner_internal(). It would
suffice to do this for IsInplaceUpdateOid() catalogs only. Back-patch
to v13 (all supported versions).
Kirill Reshke. Reported by Alexander Kukushkin.
Discussion: https://postgr.es/m/CAFh8B=mpKjAy4Cuun-HP-f_vRzh2HSvYFG3rhVfYbfEBUhBAGg@mail.gmail.com
RelationTruncate() does three things, while holding an
AccessExclusiveLock and preventing checkpoints:
1. Logs the truncation.
2. Drops buffers, even if they're dirty.
3. Truncates some number of files.
Step 2 could previously be canceled if it had to wait for I/O, and step
3 could and still can fail in file APIs. All orderings of these
operations have data corruption hazards if interrupted, so we can't give
up until the whole operation is done. When dirty pages were discarded
but the corresponding blocks were left on disk due to ERROR, old page
versions could come back from disk, reviving deleted data (see
pgsql-bugs #18146 and several like it). When primary and standby were
allowed to disagree on relation size, standbys could panic (see
pgsql-bugs #18426) or revive data unknown to visibility management on
the primary (theorized).
Changes:
* WAL is now unconditionally flushed first
* smgrtruncate() is now called in a critical section, preventing
interrupts and causing PANIC on file API failure
* smgrtruncate() has a new parameter for existing fork sizes,
because it can't call smgrnblocks() itself inside a critical section
The changes apply to RelationTruncate(), smgr_redo() and
pg_truncate_visibility_map(). That last is also brought up to date with
other evolutions of the truncation protocol.
The VACUUM FileTruncate() failure mode had been discussed in older
reports than the ones referenced below, with independent analysis from
many people, but earlier theories on how to fix it were too complicated
to back-patch. The more recently invented cancellation bug was
diagnosed by Alexander Lakhin. Other corruption scenarios were spotted
by me while iterating on this patch and earlier commit 75818b3a.
Back-patch to all supported releases.
Reviewed-by: Michael Paquier <michael@paquier.xyz>
Reviewed-by: Robert Haas <robertmhaas@gmail.com>
Reported-by: rootcause000@gmail.com
Reported-by: Alexander Lakhin <exclusion@gmail.com>
Discussion: https://postgr.es/m/18146-04e908c662113ad5%40postgresql.org
Discussion: https://postgr.es/m/18426-2d18da6586f152d6%40postgresql.org