Commit 94f3ad3961a2cb32d30c79f01a70db4caff13318 failed to do this
because I couldn't think of a use for the information, but this has
proven to be short-sighted. Best to fix it before this code is
officially released.
Now, the only argument to standard_planenr that isn't passed to
planner_setup_hook is boundParams, but that is accessible via
glob->boundParams, and so doesn't need to be passed separately.
Discussion: https://www.postgresql.org/message-id/CA+TgmoYS4ZCVAF2jTce=bMP0Oq_db_srocR4cZyO0OBp9oUoGg@mail.gmail.com
Commit 4020b370f214315b8c10430301898ac21658143f had the idea that it
would be a good idea to handle testing PGS_CONSIDER_NONPARTIAL within
cost_material to save callers the trouble, but that turns out not to be
a very good idea. One concern is that it makes cost_material() dependent
on the caller having initialized certain fields in the MaterialPath,
which is a bit awkward for materialize_finished_plan, which wants to use
a dummy path.
Another problem is that it can result in generated materialized nested
loops where the Materialize node is disabled, contrary to the intention
of joinpath.c's logic in match_unsorted_outer() and
consider_parallel_nestloop(), which aims to consider such paths only
when they would not need to be disabled. In the previous coding, it was
possible for the pgs_mask on the joinrel to have PGS_CONSIDER_NONPARTIAL
set, while the inner rel had the same bit clear. In that case, we'd
generate and then disable a Materialize path.
That seems wrong, so instead, pull up the logic to test the
PGS_CONSIDER_NONPARTIAL bit into joinpath.c, restoring the historical
behavior that either we don't generate a given materialized nested loop
in the first place, or we don't disable it.
Discussion: http://postgr.es/m/CA+TgmoawzvCoZAwFS85tE5+c8vBkqgcS8ZstQ_ohjXQ9wGT9sw@mail.gmail.com
Discussion: http://postgr.es/m/CA+TgmoYS4ZCVAF2jTce=bMP0Oq_db_srocR4cZyO0OBp9oUoGg@mail.gmail.com
Two changes here:
1. Introduce a separate RECOVERY_CONFLICT_BUFFERPIN_DEADLOCK flag to
indicate a suspected deadlock that involves a buffer pin. Previously
the startup process used the same flag for a deadlock involving just
regular locks, and to check for deadlocks involving the buffer
pin. The cases are handled separately in the startup process, but the
receiving backend had to deduce which one it was based on
HoldingBufferPinThatDelaysRecovery(). With a separate flag, the
receiver doesn't need to guess.
2. Rewrite the ProcessRecoveryConflictInterrupt() function to not rely
on fallthrough through the switch-statement. That was difficult to
read.
Reviewed-by: Chao Li <li.evan.chao@gmail.com>
Discussion: https://www.postgresql.org/message-id/4cc13ba1-4248-4884-b6ba-4805349e7f39@iki.fi
The log messages used in this file applied too much quoting logic:
- No need for quote_identifier(), which is fine to not use in the
context of a log entry.
- The usual project style is to group the namespace and object together
in a quoted string, when mentioned in an log message. This code quoted
the namespace name and the extended statistics object name separately,
which was confusing.
Reported-by: Kyotaro Horiguchi <horikyota.ntt@gmail.com>
Discussion: https://postgr.es/m/20260210.143752.1113524465620875233.horikyota.ntt@gmail.com
This commit adds three attributes to the system view pg_stats_ext_exprs,
whose data can exist when involving a range type in an expression:
range_length_histogram
range_empty_frac
range_bounds_histogram
These statistics fields exist since 918eee0c497c, and have become
viewable in pg_stats later in bc3c8db8ae2f. This puts the definition of
pg_stats_ext_exprs on par with pg_stats.
This issue has showed up during the discussion about the restore of
extended statistics for expressions, so as it becomes possible to query
the stats data to restore from the catalogs. Having access to this data
is useful on its own, without the restore part.
Some documentation and some tests are added, written by me. Corey has
authored the part in system_views.sql.
Bump catalog version.
Author: Corey Huinker <corey.huinker@gmail.com>
Co-authored-by: Michael Paquier <michael@paquier.xyz>
Discussion: https://postgr.es/m/aYmCUx9VvrKiZQLL@paquier.xyz
In the spirit of 8d19d0e13, this patch teaches the planner about the
principle that NullTest with !argisrow is fully equivalent to SQL's IS
[NOT] DISTINCT FROM NULL.
The parser already performs this transformation for literal NULLs.
However, a DistinctExpr expression with one input evaluating to NULL
during planning (e.g., via const-folding of "1 + NULL" or parameter
substitution in custom plans) currently remains as a DistinctExpr
node.
This patch closes the gap for const-folded NULLs. It specifically
targets the case where one input is a constant NULL and the other is a
nullable non-constant expression. (If the other input were otherwise,
the DistinctExpr node would have already been simplified to a constant
TRUE or FALSE.)
This transformation can be beneficial because NullTest is much more
amenable to optimization than DistinctExpr, since the planner knows a
good deal about the former and next to nothing about the latter.
Author: Richard Guo <guofenglinux@gmail.com>
Reviewed-by: Tender Wang <tndrwang@gmail.com>
Discussion: https://postgr.es/m/CAMbWs49BMAOWvkdSHxpUDnniqJcEcGq3_8dd_5wTR4xrQY8urA@mail.gmail.com
The IS DISTINCT FROM construct compares values acting as though NULL
were a normal data value, rather than "unknown". Semantically, "x IS
DISTINCT FROM y" yields true if the values differ or if exactly one is
NULL, and false if they are equal or both NULL. Unlike ordinary
comparison operators, it never returns NULL.
Previously, the planner only simplified this construct if all inputs
were constants, folding it to a constant boolean result. This patch
extends the optimization to cases where inputs are non-constant but
proven to be non-nullable. Specifically, "x IS DISTINCT FROM NULL"
folds to constant TRUE if "x" is known to be non-nullable. For cases
where both inputs are guaranteed not to be NULL, the expression
becomes semantically equivalent to "x <> y", and the DistinctExpr is
converted into an inequality OpExpr.
This transformation provides several benefits. It converts the
comparison into a standard operator, allowing the use of partial
indexes and constraint exclusion. Furthermore, if the clause is
negated (i.e., "IS NOT DISTINCT FROM"), it simplifies to an equality
operator. This enables the planner to generate better plans using
index scans, merge joins, hash joins, and EC-based qual deduction.
Author: Richard Guo <guofenglinux@gmail.com>
Reviewed-by: Tender Wang <tndrwang@gmail.com>
Discussion: https://postgr.es/m/CAMbWs49BMAOWvkdSHxpUDnniqJcEcGq3_8dd_5wTR4xrQY8urA@mail.gmail.com
For binary upgrades from v16 or newer, pg_upgrade transfers the
files for pg_largeobject_metadata from the old cluster, as opposed
to using COPY or ordinary SQL commands to reconstruct its contents.
While this approach adds complexity, it can greatly reduce
pg_upgrade's runtime when there are many large objects.
Large objects with comments or security labels are one source of
complexity for this approach. During pg_upgrade, schema
restoration happens before files are transferred. Comments and
security labels are transferred in the former step, but the COMMENT
and SECURITY LABEL commands will fail if their corresponding large
objects do not exist. To deal with this, pg_upgrade first copies
only the rows of pg_largeobject_metadata that are needed to avoid
failures. Later, pg_upgrade overwrites those rows by replacing
pg_largeobject_metadata's files with its files in the old cluster.
Unfortunately, there's a subtle problem here. Simply put, there's
no guarantee that pg_upgrade will overwrite all of
pg_largeobject_metadata's files on the new cluster. For example,
the new cluster's version might more aggressively extend relations
or create visibility maps, and pg_upgrade's file transfer code is
not sophisticated enough to remove files that lack counterparts in
the old cluster. These extra files could cause problems
post-upgrade.
More fortunately, we can simultaneously fix the aforementioned
problem and further optimize binary upgrades for clusters with many
large objects. If we teach the COMMENT and SECURITY LABEL commands
to allow nonexistent large objects during binary upgrades,
pg_upgrade no longer needs to transfer pg_largeobject_metadata's
contents beforehand. This approach also allows us to remove the
associated dependency tracking from pg_dump, even for upgrades from
v12-v15 that use COPY to transfer pg_largeobject_metadata's
contents.
In addition to what is described in the previous paragraph, this
commit modifies the query in getLOs() to only retrieve LOs with
comments or security labels for upgrades from v12 or newer. We
have long assumed that such usage is rare, so this should reduce
pg_upgrade's memory usage and runtime in many cases. We might also
be able to remove the "upgrades from v12 or newer" restriction on
the recent batch of optimizations by adding special handling for
pg_largeobject_metadata's hidden OID column on older versions
(since this catalog previously used the now-removed WITH OIDS
feature), but that is left as a future exercise.
Reported-by: Andres Freund <andres@anarazel.de>
Reviewed-by: Andres Freund <andres@anarazel.de>
Discussion: https://postgr.es/m/3yd2ss6n7xywo6pmhd7jjh3bqwgvx35bflzgv3ag4cnzfkik7m%40hiyadppqxx6w
Clean up a few leftovers from when the deadlock checker was called
from signal handler. We stopped doing that in commit 6753333f55, in
year 2015.
- CheckDeadLock can return a return value directly to the caller,
there's no need to use a global variable for that.
- Remove outdated comments that claimed that CheckDeadLock "signals
ProcSleep".
- It should be OK to ereport() from DeadLockCheck now. I considered
getting rid of InitDeadLockChecking() and moving the workspace
allocations into DeadLockCheck, but it's still good to avoid doing
the allocations while we're holding all the partition locks. So just
update the comment to give that as the reason we do the allocations
up front.
They are not and never have been used by any known code -- apparently we
just cargo-culted them in commit 37484ad2aace (or their ancestor macros
anyway, which begat these functions in commit 34694ec888d6). Allegedly
they're also potentially dangerous; users are better off going through
HeapTupleSetHintBits instead.
Author: Andy Fan <zhihuifan1213@163.com>
Discussion: https://postgr.es/m/87sejogt4g.fsf@163.com
While the preceding commit prevented such attachments from occurring
in future, this one aims to prevent further abuse of any already-
created operator that exposes _int_matchsel to the wrong data types.
(No other contrib module has a vulnerable selectivity estimator.)
We need only check that the Const we've found in the query is indeed
of the type we expect (query_int), but there's a difficulty: as an
extension type, query_int doesn't have a fixed OID that we could
hard-code into the estimator.
Therefore, the bulk of this patch consists of infrastructure to let
an extension function securely look up the OID of a datatype
belonging to the same extension. (Extension authors have requested
such functionality before, so we anticipate that this code will
have additional non-security uses, and may soon be extended to allow
looking up other kinds of SQL objects.)
This is done by first finding the extension that owns the calling
function (there can be only one), and then thumbing through the
objects owned by that extension to find a type that has the desired
name. This is relatively expensive, especially for large extensions,
so a simple cache is put in front of these lookups.
Reported-by: Daniel Firer as part of zeroday.cloud
Author: Tom Lane <tgl@sss.pgh.pa.us>
Reviewed-by: Noah Misch <noah@leadboat.com>
Security: CVE-2026-2004
Backpatch-through: 14
Selectivity estimators come in two flavors: those that make specific
assumptions about the data types they are working with, and those
that don't. Most of the built-in estimators are of the latter kind
and are meant to be safely attachable to any operator. If the
operator does not behave as the estimator expects, you might get a
poor estimate, but it won't crash.
However, estimators that do make datatype assumptions can malfunction
if they are attached to the wrong operator, since then the data they
get from pg_statistic may not be of the type they expect. This can
rise to the level of a security problem, even permitting arbitrary
code execution by a user who has the ability to create SQL objects.
To close this hole, establish a rule that built-in estimators are
required to protect themselves against being called on the wrong type
of data. It does not seem practical however to expect estimators in
extensions to reach a similar level of security, at least not in the
near term. Therefore, also establish a rule that superuser privilege
is required to attach a non-built-in estimator to an operator.
We expect that this restriction will have little negative impact on
extensions, since estimators generally have to be written in C and
thus superuser privilege is required to create them in the first
place.
This commit changes the privilege checks in CREATE/ALTER OPERATOR
to enforce the rule about superuser privilege, and fixes a couple
of built-in estimators that were making datatype assumptions without
sufficiently checking that they're valid.
Reported-by: Daniel Firer as part of zeroday.cloud
Author: Tom Lane <tgl@sss.pgh.pa.us>
Reviewed-by: Noah Misch <noah@leadboat.com>
Security: CVE-2026-2004
Backpatch-through: 14
These data types are represented like full-fledged arrays, but
functions that deal specifically with these types assume that the
array is 1-dimensional and contains no nulls. However, there are
cast pathways that allow general oid[] or int2[] arrays to be cast
to these types, allowing these expectations to be violated. This
can be exploited to cause server memory disclosure or SIGSEGV.
Fix by installing explicit checks in functions that accept these
types.
Reported-by: Altan Birler <altan.birler@tum.de>
Author: Tom Lane <tgl@sss.pgh.pa.us>
Reviewed-by: Noah Misch <noah@leadboat.com>
Security: CVE-2026-2003
Backpatch-through: 14
pgp_sym_decrypt() and pgp_pub_decrypt() will raise such errors, while
bytea variants will not. The existing "dat3" test decrypted to non-UTF8
text, so switch that query to bytea.
The long-term intent is for type "text" to always be valid in the
database encoding. pgcrypto has long been known as a source of
exceptions to that intent, but a report about exploiting invalid values
of type "text" brought this module to the forefront. This particular
exception is straightforward to fix, with reasonable effect on user
queries. Back-patch to v14 (all supported versions).
Reported-by: Paul Gerste (as part of zeroday.cloud)
Reported-by: Moritz Sanft (as part of zeroday.cloud)
Author: shihao zhong <zhong950419@gmail.com>
Reviewed-by: cary huang <hcary328@gmail.com>
Discussion: https://postgr.es/m/CAGRkXqRZyo0gLxPJqUsDqtWYBbgM14betsHiLRPj9mo2=z9VvA@mail.gmail.com
Backpatch-through: 14
Security: CVE-2026-2006
Change log_min_messages from being a single element to a comma-separated
list of type:level elements, with 'type' representing a process type,
and 'level' being a log level to use for that type of process. The list
must also have a freestanding level specification which is used for
process types not listed, which convenientely makes the whole thing
backwards-compatible.
Some choices made here could be contested; for instance, we use the
process type `backend` to affect regular backends as well as dead-end
backends and the standalone backend, and `autovacuum` means both the
launcher and the workers. I think it's largely sensible though, and it
can easily be tweaked if desired.
Author: Euler Taveira <euler@eulerto.com>
Reviewed-by: Chao Li <li.evan.chao@gmail.com>
Reviewed-by: Japin Li <japinli@hotmail.com>
Reviewed-by: Tan Yang <332696245@qq.com>
Discussion: https://postgr.es/m/e85c6671-1600-4112-8887-f97a8a5d07b2@app.fastmail.com
A security patch changed them today, so close the coverage gap now.
Test that buffer overrun is avoided when pg_mblen*() requires more
than the number of bytes remaining.
This does not cover the calls in dict_thesaurus.c or in dict_synonym.c.
That code is straightforward. To change that code's input, one must
have access to modify installed OS files, so low-privilege users are not
a threat. Testing this would likewise require changing installed
share/postgresql/tsearch_data, which was enough of an obstacle to not
bother.
Security: CVE-2026-2006
Backpatch-through: 14
Co-authored-by: Thomas Munro <thomas.munro@gmail.com>
Co-authored-by: Noah Misch <noah@leadboat.com>
Reviewed-by: Heikki Linnakangas <hlinnaka@iki.fi>
A corrupted string could cause code that iterates with pg_mblen() to
overrun its buffer. Fix, by converting all callers to one of the
following:
1. Callers with a null-terminated string now use pg_mblen_cstr(), which
raises an "illegal byte sequence" error if it finds a terminator in the
middle of the sequence.
2. Callers with a length or end pointer now use either
pg_mblen_with_len() or pg_mblen_range(), for the same effect, depending
on which of the two seems more convenient at each site.
3. A small number of cases pre-validate a string, and can use
pg_mblen_unbounded().
The traditional pg_mblen() function and COPYCHAR macro still exist for
backward compatibility, but are no longer used by core code and are
hereby deprecated. The same applies to the t_isXXX() functions.
Security: CVE-2026-2006
Backpatch-through: 14
Co-authored-by: Thomas Munro <thomas.munro@gmail.com>
Co-authored-by: Noah Misch <noah@leadboat.com>
Reviewed-by: Heikki Linnakangas <hlinnaka@iki.fi>
Reported-by: Paul Gerste (as part of zeroday.cloud)
Reported-by: Moritz Sanft (as part of zeroday.cloud)
When converting multibyte to pg_wchar, the UTF-8 implementation would
silently ignore an incomplete final character, while the other
implementations would cast a single byte to pg_wchar, and then repeat
for the remaining byte sequence. While it didn't overrun the buffer, it
was surely garbage output.
Make all encodings behave like the UTF-8 implementation. A later change
for master only will convert this to an error, but we choose not to
back-patch that behavior change on the off-chance that someone is
relying on the existing UTF-8 behavior.
Security: CVE-2026-2006
Backpatch-through: 14
Author: Thomas Munro <thomas.munro@gmail.com>
Reported-by: Noah Misch <noah@leadboat.com>
Reviewed-by: Noah Misch <noah@leadboat.com>
Reviewed-by: Heikki Linnakangas <hlinnaka@iki.fi>
While EUC_CN supports only 1- and 2-byte sequences (CS0, CS1), the
mb<->wchar conversion functions allow 3-byte sequences beginning SS2,
SS3.
Change pg_encoding_max_length() to return 3, not 2, to close a
hypothesized buffer overrun if a corrupted string is converted to wchar
and back again in a newly allocated buffer. We might reconsider that in
master (ie harmonizing in a different direction), but this change seems
better for the back-branches.
Also change pg_euccn_mblen() to report SS2 and SS3 characters as having
length 3 (following the example of EUC_KR). Even though such characters
would not pass verification, it's remotely possible that invalid bytes
could be used to compute a buffer size for use in wchar conversion.
Security: CVE-2026-2006
Backpatch-through: 14
Author: Thomas Munro <thomas.munro@gmail.com>
Reviewed-by: Noah Misch <noah@leadboat.com>
Reviewed-by: Heikki Linnakangas <hlinnaka@iki.fi>
The code made a subtle assumption that the lower-cased version of a
string never has more characters than the original. That is not always
true. For example, in a database with the latin9 encoding:
latin9db=# select lower(U&'\00CC' COLLATE "lt-x-icu");
lower
-----------
i\x1A\x1A
(1 row)
In this example, lower-casing expands the single input character into
three characters.
The generate_trgm_only() function relied on that assumption in two
ways:
- It used "slen * pg_database_encoding_max_length() + 4" to allocate
the buffer to hold the lowercased and blank-padded string. That
formula accounts for expansion if the lower-case characters are
longer (in bytes) than the originals, but it's still not enough if
the lower-cased string contains more *characters* than the original.
- Its callers sized the output array to hold the trigrams extracted
from the input string with the formula "(slen / 2 + 1) * 3", where
'slen' is the input string length in bytes. (The formula was
generous to account for the possibility that RPADDING was set to 2.)
That's also not enough if one input byte can turn into multiple
characters.
To fix, introduce a growable trigram array and give up on trying to
choose the correct max buffer sizes ahead of time.
Backpatch to v18, but no further. In previous versions lower-casing was
done character by character, and thus the assumption that lower-casing
doesn't change the character length was valid. That was changed in v18,
commit fb1a18810f.
Security: CVE-2026-2007
Reviewed-by: Noah Misch <noah@leadboat.com>
Reviewed-by: Jeff Davis <pgsql@j-davis.com>
The function assumed that if charlen == bytelen, there are no
multibyte characters in the string. That's sensible, but the callers
were a little careless in how they calculated the lengths. The callers
converted the string to lowercase before calling make_trigram(), and
the 'charlen' value was calculated *before* the conversion to
lowercase while 'bytelen' was calculated after the conversion. If the
lowercased string had a different number of characters than the
original, make_trigram() might incorrectly apply the fastpath and
treat all the bytes as single-byte characters, or fail to apply the
fastpath (which is harmless), or it might hit the "Assert(bytelen ==
charlen)" assertion. I'm not aware of any locale / character
combinations where you could hit that assertion in practice,
i.e. where a string converted to lowercase would have fewer characters
than the original, but it seems best to avoid making that assumption.
To fix, remove the 'charlen' argument. To keep the performance when
there are no multibyte characters, always try the fast path first, but
check the input for multibyte characters as we go. The check on each
byte adds some overhead, but it's close enough. And to compensate, the
find_word() function no longer needs to count the characters.
This fixes one small bug in make_trigrams(): in the multibyte
codepath, it peeked at the byte just after the end of the input
string. When compiled with IGNORECASE, that was harmless because there
is always a NUL byte or blank after the input string. But with
!IGNORECASE, the call from generate_wildcard_trgm() doesn't guarantee
that.
Backpatch to v18, but no further. In previous versions lower-casing was
done character by character, and thus the assumption that lower-casing
doesn't change the character length was valid. That was changed in v18,
commit fb1a18810f.
Security: CVE-2026-2007
Reviewed-by: Noah Misch <noah@leadboat.com>
pgp_pub_decrypt_bytea() was missing a safeguard for the session key
length read from the message data, that can be given in input of
pgp_pub_decrypt_bytea(). This can result in the possibility of a buffer
overflow for the session key data, when the length specified is longer
than PGP_MAX_KEY, which is the maximum size of the buffer where the
session data is copied to.
A script able to rebuild the message and key data that can trigger the
overflow is included in this commit, based on some contents provided by
the reporter, heavily editted by me. A SQL test is added, based on the
data generated by the script.
Reported-by: Team Xint Code as part of zeroday.cloud
Author: Michael Paquier <michael@paquier.xyz>
Reviewed-by: Noah Misch <noah@leadboat.com>
Security: CVE-2026-2005
Backpatch-through: 14
Looking again at commit 7cdb633c8, I wondered why we have hard-wired
"1034" for the OID of type aclitem[]. Some other entries in the same
array have numeric type OIDs as well. This seems to be a hangover
from years ago when not every built-in pg_type entry had an OID macro.
But since we made genbki.pl responsible for generating these macros,
there are macros available for all these array types, so there's no
reason not to follow the project policy of never writing numeric OID
constants in C code.
This thinko caused us to not substitute our own getopt() code,
which results in failing to parse long options for the postmaster
since Solaris' getopt() doesn't do what we expect. This can be seen
in the results of buildfarm member icarus, which is the only one
trying to build via meson on Solaris.
Per consultation with pgsql-release, it seems okay to fix this
now even though we're in release freeze. The fix visibly won't
affect any other platforms, and it can't break Solaris/meson
builds any worse than they're already broken.
Discussion: https://postgr.es/m/2471229.1770499291@sss.pgh.pa.us
Backpatch-through: 16
Commit 176dffdf7 added a NULL array pointer check before performing
a qsort in order to prevent undefined behavior when passing NULL
pointer and zero length. To head off future degenerate cases, check
that there are at least two elements to sort before proceeding with
insertion sort. This has the added advantage of allowing us to remove
four equivalent checks that guarded against recursion/iteration.
There might be a tiny performance penalty from unproductive
recursions, but we can buy that back by increasing the insertion sort
threshold. That is left for future work.
Discussion: https://postgr.es/m/CANWCAZZWvds_35nXc4vXD-eBQa_=mxVtqZf-PM_ps=SD7ghhJg@mail.gmail.com
Commit 7b378237a widened AclMode to 64 bits, which implies that
the alignment of AclItem is now determined by an int64 field.
That commit correctly set the typalign for SQL type aclitem to
'd', but it missed the hard-wired knowledge about _aclitem in
bootstrap.c. This doesn't seem to have caused any ill effects,
probably because we never try to fill a non-null value into
an aclitem[] column during bootstrap. Nonetheless, it's clearly
a gotcha waiting to happen, so fix it up.
In passing, also fix a couple of typanalyze functions that were
using hard-coded typalign constants when they could just as
easily use greppable TYPALIGN_xxx macros.
Noticed these while working on a patch to expand the set of
typalign values. I doubt we are going to pursue that path,
but these fixes still seem worth a quick commit.
Discussion: https://postgr.es/m/1127261.1769649624@sss.pgh.pa.us
This commit adjusts a few debugging macros to match the style of
those in pg_config_manual.h. Like commits 123661427b and
b4cbc106a6, these were discovered while reviewing Aleksander
Alekseev's proposed changes to pgindent.
Reviewed-by: Michael Paquier <michael@paquier.xyz>
Reviewed-by: Chao Li <li.evan.chao@gmail.com>
Discussion: https://postgr.es/m/aP-H6kSsGOxaB21k%40nathan
The main reason that libpq doesn't request protocol version 3.2 by
default is because other proxy/server implementations don't implement
the negotiation. This is a bit of a chicken-and-egg problem: We don't
bump the default version that libpq requests, but other implementations
may not be incentivized to implement version negotiation if their users
never run into issues.
One established practice to combat this is to flip Postel's Law on its
head, by sending parameters that the server cannot possibly support. If
the server fails the handshake instead of correctly negotiating, then
the problem is surfaced naturally. If the server instead claims to
support the bogus parameters, then we fail the connection to make the
lie obvious. This is called "grease" (or "greasing"), after the GREASE
mechanism in TLS that popularized the concept:
https://www.rfc-editor.org/rfc/rfc8701.html
This patch reserves 3.9999 as an explicitly unsupported protocol version
number and `_pq_.test_protocol_negotiation` as an explicitly unsupported
protocol extension. A later commit will send these by default in order
to stress-test the ecosystem during the beta period; that commit will
then be reverted before 19 RC1, so that we can decide what to do with
whatever data has been gathered.
The _pq_.test_protocol_negotiation change here is intentionally docs-
only: after its implementation is reverted, the parameter should remain
reserved.
Extracted/adapted from a patch by Jelte Fennema-Nio.
Author: Jelte Fennema-Nio <postgres@jeltef.nl>
Co-authored-by: Jacob Champion <jacob.champion@enterprisedb.com>
Discussion: https://postgr.es/m/DDPR5BPWH1RJ.1LWAK6QAURVAY%40jeltef.nl
First, split the Protocol Versions table in two, and lead with the list
of versions that are supported today. Reserved and unsupported version
numbers go into the second table.
Second, in anticipation of a new (reserved) protocol extension, document
the extension negotiation process alongside version negotiation, and add
the corresponding tables for future extension parameter registrations.
Reviewed-by: Jelte Fennema-Nio <postgres@jeltef.nl>
Reviewed-by: David G. Johnston <david.g.johnston@gmail.com>
Discussion: https://postgr.es/m/DDPR5BPWH1RJ.1LWAK6QAURVAY%40jeltef.nl
This routine's internals directly used MyProcNumber to choose which
object ID to assign for the hash key of a backend's stats entry, while
the value to use is given as input argument of the function.
The original intention was to pass MyProcNumber as an argument of
pgstat_create_backend() when called in pgstat_bestart_final(),
pgstat_beinit() ensuring that MyProcNumber has been set, not use it
directly in the function. This commit addresses this inconsistency by
using the procnum given by the caller of pgstat_create_backend(), not
MyProcNumber.
This issue is not a cause of bugs currently. However, let's keep the
code in sync across all the branches where this code exists, as it could
matter in a future backpatch.
Oversight in 4feba03d8b92.
Reported-by: Ryo Matsumura <matsumura.ryo@fujitsu.com>
Discussion: https://postgr.es/m/TYCPR01MB11316AD8150C8F470319ACCAEE866A@TYCPR01MB11316.jpnprd01.prod.outlook.com
Backpatch-through: 18
These errors are very unlikely going to show up, but in the event that
they happen, some incorrect information would have been provided:
- In pg_rewind, a stat() failure was reported as an open() failure.
- In pg_combinebackup, a check for the new directory of a tablespace
mapping was referred as the old directory.
- In pg_combinebackup, a failure in reading a source file when copying
blocks referred to the destination file.
The changes for pg_combinebackup affect v17 and newer versions. For
pg_rewind, all the stable branches are affected.
Author: Man Zeng <zengman@halodbtech.com>
Discussion: https://postgr.es/m/tencent_1EE1430B1E6C18A663B8990F@qq.com
Backpatch-through: 14
Provide a way to disable the use of posix_fallocate() for relation
files. It was introduced by commit 4d330a61bb1. The new setting
file_extend_method=write_zeros can be used as a workaround for problems
reported from the field:
* BTRFS compression is disabled by the use of posix_fallocate()
* XFS could produce spurious ENOSPC errors in some Linux kernel
versions, though that problem is reported to have been fixed
The default is file_extend_method=posix_fallocate if available, as
before. The write_zeros option is similar to PostgreSQL < 16, except
that now it's multi-block.
Backpatch-through: 16
Reviewed-by: Jakub Wartak <jakub.wartak@enterprisedb.com>
Reported-by: Dimitrios Apostolou <jimis@gmx.net>
Discussion: https://postgr.es/m/b1843124-fd22-e279-a31f-252dffb6fbf2%40gmx.net
synchronized_standby_slots is defined in guc_parameter.dat as part of
the REPLICATION_PRIMARY group and is listed under the "Primary Server"
section in postgresql.conf.sample. However, in the documentation
its description was previously placed under the "Sending Servers" section.
Since synchronized_standby_slots only takes effect on the primary server,
this commit moves its documentation to the "Primary Server" section to
match its behavior and other references.
Backpatch to v17 where synchronized_standby_slots was added.
Author: Fujii Masao <masao.fujii@gmail.com>
Reviewed-by: Shinya Kato <shinya11.kato@gmail.com>
Discussion: https://postgr.es/m/CAHGQGwE_LwgXgCrqd08OFteJqdERiF3noqOKu2vt7Kjk4vMiGg@mail.gmail.com
Backpatch-through: 17
Commit 29d0a77fa6 improved pg_upgrade to allow migrating logical slots
provided that all logical slots have caught up (i.e., they have no
pending decodable WAL records). Previously, this verification was done
by checking each slot individually, which could be time-consuming if
there were many logical slots to migrate.
This commit optimizes the check to avoid reading the same WAL stream
multiple times. It performs the check only for the slot with the
minimum confirmed_flush_lsn and applies the result to all other slots
in the same database. This limits the check to at most one logical
slot per database.
During the check, we identify the last decodable WAL record's LSN to
report any slots with unconsumed records, consistent with the existing
error reporting behavior. Additionally, the maximum
confirmed_flush_lsn among all logical slots on the database is used as
an early scan cutoff; finding a decodable WAL record beyond this point
implies that no slot has caught up.
Performance testing demonstrated that the execution time remains
stable regardless of the number of slots in the database.
Note that we do not distinguish slots based on their output plugins. A
hypothetical plugin might use a replication origin filter that filters
out changes from a specific origin. In such cases, we might get a
false positive (erroneously considering a slot caught up). However,
this is safe from a data integrity standpoint, such scenarios are
rare, and the impact of a false positive is minimal.
This optimization is applied only when the old cluster is version 19
or later.
Bump catalog version.
Reviewed-by: Chao Li <li.evan.chao@gmail.com>
Reviewed-by: shveta malik <shveta.malik@gmail.com>
Reviewed-by: Amit Kapila <amit.kapila16@gmail.com>
Discussion: https://postgr.es/m/CAD21AoBZ0LAcw1OHGEKdW7S5TRJaURdhEk3CLAW69_siqfqyAg@mail.gmail.com
Instead of assigning the backend type in the Main function of each
postmaster child, do it right after fork(), by which time it is already
known by postmaster_child_launch(). This reduces the time frame during
which MyBackendType is incorrect.
Before this commit, ProcessStartupPacket would overwrite MyBackendType
to B_BACKEND for dead-end backends, which is quite dubious. Stop that.
We may now see MyBackendType == B_BG_WORKER before setting up
MyBgworkerEntry. As far as I can see this is only a problem if we try
to log a message and %b is in log_line_prefix, so we now have a constant
string to cover that case. Previously, it would print "unrecognized",
which seems strictly worse.
Author: Euler Taveira <euler@eulerto.com>
Discussion: https://postgr.es/m/e85c6671-1600-4112-8887-f97a8a5d07b2@app.fastmail.com
Commit 5f13999aa11 added a TAP test for GUC settings passed via the
CONNECTION string in logical replication, but the buildfarm member
sungazer reported test failures.
The test incorrectly used the subscriber's log file position as the
starting offset when reading the publisher's log. As a result, the test
failed to find the expected log message in the publisher's log and
erroneously reported a failure.
This commit fixes the test to use the publisher's own log file position
when reading the publisher's log.
Also, to avoid similar confusion in the future, this commit splits the single
$log_location variable into $log_location_pub and $log_location_sub,
clearly distinguishing publisher and subscriber log positions.
Backpatched to v15, where commit 5f13999aa11 introduced the test.
Per buildfarm member sungazer.
This issue was reported and diagnosed by Alexander Lakhin.
Reported-by: Alexander Lakhin <exclusion@gmail.com>
Discussion: https://postgr.es/m/966ec3d8-1b6f-4f57-ae59-fc7d55bc9a5a@gmail.com
Backpatch-through: 15
Mostly this involves checking for NULL pointer before doing operations
that add a non-zero offset.
The exception is an overflow warning in heap_fetch_toast_slice(). This
was caused by unneeded parentheses forcing an expression to be
evaluated to a negative integer, which then got cast to size_t.
Per clang 21 undefined behavior sanitizer.
Backpatch to all supported versions.
Co-authored-by: Alexander Lakhin <exclusion@gmail.com>
Reported-by: Alexander Lakhin <exclusion@gmail.com>
Discussion: https://postgr.es/m/777bd201-6e3a-4da0-a922-4ea9de46a3ee@gmail.com
Backpatch-through: 14
Currently, when the argument of copyObject() is const-qualified, the
return type is also, because the use of typeof carries over all the
qualifiers. This is incorrect, since the point of copyObject() is to
make a copy to mutate. But apparently no code ran into it.
The new implementation uses typeof_unqual, which drops the qualifiers,
making this work correctly.
typeof_unqual is standardized in C23, but all recent versions of all
the usual compilers support it even in non-C23 mode, at least as
__typeof_unqual__. We add a configure/meson test for typeof_unqual
and __typeof_unqual__ and use it if it's available, else we use the
existing fallback of just returning void *.
Reviewed-by: David Geier <geidav.pg@gmail.com>
Discussion: https://www.postgresql.org/message-id/flat/92f9750f-c7f6-42d8-9a4a-85a3cbe808f3%40eisentraut.org
The pg_dump documentation had repetitive notes for the --schema,
--table, and --extension switches, noting that dependent database
objects are not automatically included in the dump. This commit removes
these notes and replaces them with a consolidated paragraph in the
"Notes" section.
pg_restore had a similar note for -t but lacked one for -n; do likewise.
Also, add a note to --extension in pg_dump to note that ancillary files
(such as shared libraries and control files) are not included in the
dump and must be present on the destination system.
Author: Florents Tselai <florents.tselai@gmail.com>
Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us>
Discussion: https://postgr.es/m/284C4D55-4F90-4AA0-84C8-1E6A28DDF271@gmail.com