Commit 492a69e14070 anticipated this change:
[C] 'function bool AdjustNotNullInheritance(Oid, AttrNumber, bool, bool, bool)' has some sub-type changes:
parameter 6 of type 'bool' was added
parameter 3 of type 'bool' changed:
entity changed from 'bool' to 'const char*'
type size changed from 1 to 8 (in bytes)
Discussion: https://postgr.es/m/19351-8f1c523ead498545%40postgresql.org
Backpatch-through: 18 only
When using ALTER TABLE ... ADD CONSTRAINT to add a not-null constraint
with an explicit name, we have to ensure that if the column is already
marked NOT NULL, the provided name matches the existing constraint name.
Failing to do so could lead to confusion regarding which constraint
object actually enforces the rule.
This patch adds a check to throw an error if the user tries to add a
named not-null constraint to a column that already has one with a
different name.
Reported-by: yanliang lei <msdnchina@163.com>
Co-authored-by: Álvaro Herrera <alvherre@kurilemu.de>
Co-authored-bu: Srinath Reddy Sadipiralla <srinath2133@gmail.com>
Backpatch-through: 18
Discussion: https://postgr.es/m/19351-8f1c523ead498545%40postgresql.org
This routine has an option to bypass an error if a WAL summary file is
opened for read but is missing (missing_ok=true). However, the code
incorrectly checked for EEXIST, that matters when using O_CREAT and
O_EXCL, rather than ENOENT, for this case.
There are currently only two callers of OpenWalSummaryFile() in the
tree, and both use missing_ok=false, meaning that the check based on the
errno is currently dead code. This issue could matter for out-of-core
code or future backpatches that would like to use missing_ok set to
true.
Issue spotted while monitoring this area of the code, after
a9afa021e95f.
Author: Michael Paquier <michael@paquier.xyz>
Reviewed-by: Chao Li <li.evan.chao@gmail.com>
Discussion: https://postgr.es/m/aYAf8qDHbpBZ3Rml@paquier.xyz
Backpatch-through: 17
The build generates four files based on the wait event contents stored
in wait_event_names.txt:
- wait_event_types.h
- pgstat_wait_event.c
- wait_event_funcs_data.c
- wait_event_types.sgml
The SGML file is generated as part of a documentation build, with its
data stored in doc/src/sgml/ for meson and configure. The three others
are handled differently for meson and configure:
- In configure, all the files are created in src/backend/utils/activity/.
A link to wait_event_types.h is created in src/include/utils/.
- In meson, all the files are created in src/include/utils/.
The two C files, pgstat_wait_event.c and wait_event_funcs_data.c, are
then included in respectively wait_event.c and wait_event_funcs.c,
without the "utils/" path.
For configure, this does not present a problem. For meson, this has to
be combined with a trick in src/backend/utils/activity/meson.build,
where include_directories needs to point to include/utils/ to make the
inclusion of the C files work properly, causing builds to pull in
PostgreSQL headers rather than system headers in some build paths, as
src/include/utils/ would take priority.
In order to fix this issue, this commit reworks the way the C/H files
are generated, becoming consistent with guc_tables.inc.c:
- For meson, basically nothing changes. The files are still generated
in src/include/utils/. The trick with include_directories is removed.
- For configure, the files are now generated in src/backend/utils/, with
links in src/include/utils/ pointing to the ones in src/backend/. This
requires extra rules in src/backend/utils/activity/Makefile so as a
make command in this sub-directory is able to work.
- The three files now fall under header-stamp, which is actually simpler
as guc_tables.inc.c does the same.
- wait_event_funcs_data.c and pgstat_wait_event.c are now included with
"utils/" in their path.
This problem has not been an issue in the buildfarm; it has been noted
with AIX and a conflict with float.h. This issue could, however, create
conflicts in the buildfarm depending on the environment with unexpected
headers pulled in, so this fix is backpatched down to where the
generation of the wait-event files has been introduced.
While on it, this commit simplifies wait_event_names.txt regarding the
paths of the files generated, to mention just the names of the files
generated. The paths where the files are generated became incorrect.
The path of the SGML path was wrong.
This change has been tested in the CI, down to v17. Locally, I have run
tests with configure (with and without VPATH), as well as meson, on the
three branches.
Combo oversight in fa88928470b5 and 1e68e43d3f0f.
Reported-by: Aditya Kamath <aditya.kamath1@ibm.com>
Discussion: https://postgr.es/m/LV8PR15MB64888765A43D229EA5D1CFE6D691A@LV8PR15MB6488.namprd15.prod.outlook.com
Backpatch-through: 17
BackgroundPsql needs to wait for all the output from an interactive
psql command to come back. To make sure that's happened, it issues
the command, then issues \echo and \warn psql commands that echo
a "banner" string (which we assume won't appear in the command's
output), then waits for the banner strings to appear. The hazard
in this approach is that the banner will also appear in the echoed
psql commands themselves, so we need to distinguish those echoes from
the desired output. Commit 8b886a4e3 tried to do that by positing
that the desired output would be directly preceded and followed by
newlines, but it turns out that that assumption is timing-sensitive.
In particular, it tends to fail in builds made --without-readline,
wherein the command echoes will be made by the pty driver and may
be interspersed with prompts issued by psql proper.
It does seem safe to assume that the banner output we want will be
followed by a newline, since that should be the last output before
things quiesce. Therefore, we can improve matters by putting quotes
around the banner strings in the \echo and \warn psql commands, so
that their echoes cannot include banner directly followed by newline,
and then checking for just banner-and-newline in the match pattern.
While at it, spruce up the pump() call in sub query() to look like
the neater version in wait_connect(), and don't die on timeout
until after printing whatever we got.
Reported-by: Oleg Tselebrovskiy <o.tselebrovskiy@postgrespro.ru>
Diagnosed-by: Oleg Tselebrovskiy <o.tselebrovskiy@postgrespro.ru>
Author: Tom Lane <tgl@sss.pgh.pa.us>
Reviewed-by: Soumya S Murali <soumyamurali.work@gmail.com>
Discussion: https://postgr.es/m/db6fdb35a8665ad3c18be01181d44b31@postgrespro.ru
Backpatch-through: 14
As noted in the commit message for b4307ae2e54, the change to the
TransitionCaptureState structure is nominally an ABI break, but it is
not expected to affect any third-party code. Therefore, add it to the
.abi-compliance-history file.
Discussion: https://postgr.es/m/19380-4e293be2b4007248%40postgresql.org
Backpatch-through: 15-18
The leaks were hard to reach in practice and the impact was low.
The callers provide a buffer the same number of bytes as the source
string (plus one for NUL terminator) as a starting size, and libc
never increases the number of characters. But, if the byte length of
one of the converted characters is larger, then it might need a larger
destination buffer. Previously, in that case, the working buffers
would be leaked.
Even in that case, the call typically happens within a context that
will soon be reset. Regardless, it's worth fixing to avoid such
assumptions, and the fix is simple so it's worth backporting.
Discussion: https://postgr.es/m/e2b7a0a88aaadded7e2d19f42d5ab03c9e182ad8.camel@j-davis.com
Backpatch-through: 18
In the psql prompt, %P prompt shows the current pipeline status. Unlike
most of the other options, its status was showing up in the output
generated even if psql was not connected to a database. This was
confusing, because without a connection a pipeline status makes no
sense.
Like the other options, %P is updated so as its data is now hidden
without an active connection.
Author: Chao Li <li.evan.chao@gmail.com>
Discussion: https://postgr.es/m/86EF76B5-6E62-404D-B9EC-66F4714D7D5F@gmail.com
Backpatch-through: 18
The test added in commit 851f6649cc uses a backup taken from a node
created by the previous test to perform standby related checks. On
Windows, however, the standby failed to start with the following error:
FATAL: could not rename file "backup_label" to "backup_label.old": Permission denied
This occurred because some background sessions from the earlier test were
still active. These leftover processes continued accessing the parent
directory of the backup_label file, likely preventing the rename and
causing the failure. Ensuring that these sessions are cleanly terminated
resolves the issue in local testing.
Additionally, the has_restoring => 1 option has been removed, as it was
not required by the new test.
Reported-by: Robert Haas <robertmhaas@gmail.com>
Backpatch-through: 17
Discussion: https://postgr.es/m/CA+TgmobdVhO0ckZfsBZ0wqDO4qHVCwZZx8sf=EinafvUam-dsQ@mail.gmail.com
Commit 7f007e4a04 in master depends on 1476028225, but the latter was
not backported. Therefore 806555e300 (the backport of commit
7f007e4a04) incorrectly used pg_strfold() in a locale where
ctype_is_c.
The fix is to simply have the callers check for ctype_is_c.
Because 7f007e4a04 was only backported to version 18, and because the
commit in master is fine, this fix only exists in version 18.
Reported-by: Александр Кожемякин <a.kozhemyakin@postgrespro.ru>
Discussion: https://postgr.es/m/456f7143-51ea-4342-b4a1-85f0d9b6c79f@postgrespro.ru
A race condition could cause a newly synced replication slot to become
invalidated between its initial sync and the checkpoint.
When syncing a replication slot to a standby, the slot's initial
restart_lsn is taken from the publisher's remote_restart_lsn. Because slot
sync happens asynchronously, this value can lag behind the standby's
current redo pointer. Without any interlocking between WAL reservation and
checkpoints, a checkpoint may remove WAL required by the newly synced
slot, causing the slot to be invalidated.
To fix this, we acquire ReplicationSlotAllocationLock before reserving WAL
for a newly synced slot, similar to commit 006dd4b2e5. This ensures that
if WAL reservation happens first, the checkpoint process must wait for
slotsync to update the slot's restart_lsn before it computes the minimum
required LSN.
However, unlike in ReplicationSlotReserveWal(), this lock alone cannot
protect a newly synced slot if a checkpoint has already run
CheckPointReplicationSlots() before slotsync updates the slot. In such
cases, the remote restart_lsn may be stale and earlier than the current
redo pointer. To prevent relying on an outdated LSN, we use the oldest
WAL location available if it is greater than the remote restart_lsn.
This ensures that newly synced slots always start with a safe, non-stale
restart_lsn and are not invalidated by concurrent checkpoints.
Author: Zhijie Hou <houzj.fnst@fujitsu.com>
Reviewed-by: Hayato Kuroda <kuroda.hayato@fujitsu.com>
Reviewed-by: Amit Kapila <amit.kapila16@gmail.com>
Reviewed-by: Vitaly Davydov <v.davydov@postgrespro.ru>
Reviewed-by: Chao Li <li.evan.chao@gmail.com>
Backpatch-through: 17
Discussion: https://postgr.es/m/TY4PR01MB16907E744589B1AB2EE89A31F94D7A%40TY4PR01MB16907.jpnprd01.prod.outlook.com
We've assumed that touching the memory is sufficient for a page to be
located on one of the NUMA nodes. But a page may be moved to a swap
after we touch it, due to memory pressure.
We touch the memory before querying the status, but there is no
guarantee it won't be moved to the swap in the meantime. The touching
happens only on the first call, so later calls are more likely to be
affected. And the batching increases the window too.
It's up to the kernel if/when pages get moved to swap. We have to accept
ENOENT (-2) as a valid result, and handle it without failing. This patch
simply treats it as an unknown node, and returns NULL in the two
affected views (pg_shmem_allocations_numa and pg_buffercache_numa).
Hugepages cannot be swapped out, so this affects only regular pages.
Reported by Christoph Berg, investigation and fix by me. Backpatch to
18, where the two views were introduced.
Reported-by: Christoph Berg <myon@debian.org>
Discussion: 18
Backpatch-through: https://postgr.es/m/aTq5Gt_n-oS_QSpL@msg.df7cb.de
When building a tuplesort during parallel GIN builds, the function
incorrectly looked up the default B-Tree operator, not the function
associated with the GIN opclass (through GIN_COMPARE_PROC).
Fixed by using the same logic as initGinState(), and the other place
in parallel GIN builds.
This could cause two types of issues. First, a data type might not have
a B-Tree opclass, in which case the PrepareSortSupportFromOrderingOp()
fails with an ERROR. Second, a data type might have both B-Tree and GIN
opclasses, defining order/equality in different ways. This could lead to
logical corruption in the index.
Backpatch to 18, where parallel GIN builds were introduced.
Discussion: https://postgr.es/m/73a28b94-43d5-4f77-b26e-0d642f6de777@iki.fi
Reported-by: Heikki Linnakangas <hlinnaka@iki.fi>
Backpatch-through: 18
Buildfarm member fairywren hit the Windows limitation on the length of a
file path. While there may be other things we should also do to prevent
this from happening, it's certainly the case that the length of this
test file name is much longer than others in the same directory, so make
it shorter.
Reported-by: Alexander Lakhin <exclusion@gmail.com>
Discussion: http://postgr.es/m/274e0a1a-d7d2-4bc8-8b56-dd09f285715e@gmail.com
Backpatch-through: 17
ed1a88dda made it so WindowClauses can be merged when all window
functions belonging to the WindowClause can equally well use some
other WindowClause without any behavioral changes. When that
optimization applies, the WindowFunc's "winref" gets adjusted to
reference the new WindowClause.
That commit does not work well with the deduplication logic in
find_window_functions(), which only added the WindowFunc to the list
when there wasn't already an identical WindowFunc in the list. That
deduplication logic meant that the duplicate WindowFunc wouldn't get the
"winref" changed when optimize_window_clauses() was able to swap the
WindowFunc to another WindowClause. This could lead to the following
error in the unlikely event that the deduplication code did something and
the duplicate WindowFunc happened to be moved into another WindowClause.
ERROR: WindowFunc with winref 2 assigned to WindowAgg with winref 1
As it turns out, the deduplication logic in find_window_functions() is
pretty bogus. It might have done something when added, as that code
predates b8d7f053c, which changed how projections work. As it turns
out, at least now we *will* evaluate the duplicate WindowFuncs. All
that the deduplication code seems to do today is assist in
underestimating the WindowAggPath costs due to not counting the
evaluation costs of duplicate WindowFuncs.
Ideally the fix would be to remove the deduplication code, but that
could result in changes to the plan costs, as duplicate WindowFuncs
would then be costed. Instead, let's play it safe and shift the
deduplication code so it runs after the other processing in
optimize_window_clauses().
Backpatch only as far as v16 as there doesn't seem to be any other harm
done by the WindowFunc deduplication code before then. This issue was
fixed in master by 7027dd499.
Reported-by: Meng Zhang <mza117jc@gmail.com>
Author: Meng Zhang <mza117jc@gmail.com>
Author: David Rowley <dgrowleyml@gmail.com>
Discussion: https://postgr.es/m/CAErYLFAuxmW0UVdgrz7iiuNrxGQnFK_OP9hBD5CUzRgjrVrz=Q@mail.gmail.com
Backpatch-through: 16
When executing a data-modifying CTE query containing MERGE and some
other DML operation on a table with statement-level AFTER triggers,
the transition tables passed to the triggers would fail to include the
rows affected by the MERGE.
The reason is that, when initializing a ModifyTable node for MERGE,
MakeTransitionCaptureState() would create a TransitionCaptureState
structure with a single "tcs_private" field pointing to an
AfterTriggersTableData structure with cmdType == CMD_MERGE. Tuples
captured there would then not be included in the sets of tuples
captured when executing INSERT/UPDATE/DELETE ModifyTable nodes in the
same query.
Since there are no MERGE triggers, we should only create
AfterTriggersTableData structures for INSERT/UPDATE/DELETE. Individual
MERGE actions should then use those, thereby sharing the same capture
tuplestores as any other DML commands executed in the same query.
This requires changing the TransitionCaptureState structure, replacing
"tcs_private" with 3 separate pointers to AfterTriggersTableData
structures, one for each of INSERT, UPDATE, and DELETE. Nominally,
this is an ABI break to a public structure in commands/trigger.h.
However, since this is a private field pointing to an opaque data
structure, the only way to create a valid TransitionCaptureState is by
calling MakeTransitionCaptureState(), and no extensions appear to be
doing that anyway, so it seems safe for back-patching.
Backpatch to v15, where MERGE was introduced.
Bug: #19380
Reported-by: Daniel Woelfel <dwwoelfel@gmail.com>
Author: Dean Rasheed <dean.a.rasheed@gmail.com>
Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us>
Discussion: https://postgr.es/m/19380-4e293be2b4007248%40postgresql.org
Backpatch-through: 15
ExecInitModifyTable() unconditionally required a ctid junk column even
when the target was a partitioned table. This led to spurious "could
not find junk ctid column" errors when all children were excluded and
only the dummy root result relation remained.
A partitioned table only appears in the result relations list when all
leaf partitions have been pruned, leaving the dummy root as the sole
entry. Assert this invariant (nrels == 1) and skip the ctid requirement.
Also adjust ExecModifyTable() to tolerate invalid ri_RowIdAttNo for
partitioned tables, which is safe since no rows will be processed in
this case.
Bug: #19099
Reported-by: Alexander Lakhin <exclusion@gmail.com>
Author: Amit Langote <amitlangote09@gmail.com>
Reviewed-by: Tender Wang <tndrwang@gmail.com>
Reviewed-by: Kirill Reshke <reshkekirill@gmail.com>
Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us>
Discussion: https://postgr.es/m/19099-e05dcfa022fe553d%40postgresql.org
Backpatch-through: 14
Commit f16241bef mistakenly supposed that INSERT...ON CONFLICT DO
UPDATE rejects partitioned target tables. (This may have been
accurate when the patch was written, but it was already obsolete
when committed.) Hence, there's an assertion that we can't see
ItemPointerIndicatesMovedPartitions() in that path, but the assertion
is triggerable.
Some other places throw error if they see a moved-across-partitions
tuple, but there seems no need for that here, because if we just retry
then we get the same behavior as in the update-within-partition case,
as demonstrated by the new isolation test. So fix by deleting the
faulty Assert. (The fact that this is the fix doubtless explains
why we've heard no field complaints: the behavior of a non-assert
build is fine.)
The TM_Deleted case contains a cargo-culted copy of the same Assert,
which I also deleted to avoid confusion, although I believe that one
is actually not triggerable.
Per our code coverage report, neither the TM_Updated nor the
TM_Deleted case were reached at all by existing tests, so this
patch adds tests for both.
Reported-by: Dmitry Koval <d.koval@postgrespro.ru>
Author: Joseph Koshakow <koshy44@gmail.com>
Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us>
Discussion: https://postgr.es/m/f5fffe4b-11b2-4557-a864-3587ff9b4c36@postgrespro.ru
Backpatch-through: 14
With LLVM >= 17, transform passes are provided as a string to
LLVMRunPasses. Only two strings were used: "default<O3>" and
"default<O0>,mem2reg".
With previous LLVM versions, an additional inline pass was added when
JIT inlining was enabled without optimization. With LLVM >= 17, the code
would go through llvm_inline, prepare the functions for inlining, but
the generated bitcode would be the same due to the missing inline pass.
This patch restores the previous behavior by adding an inline pass when
inlining is enabled but no optimization is done.
This fixes an oversight introduced by 76200e5e when support for LLVM 17
was added.
Backpatch-through: 14
Author: Anthonin Bonnefoy <anthonin.bonnefoy@datadoghq.com>
Reviewed-by: Thomas Munro <thomas.munro@gmail.com>
Reviewed-by: Andreas Karlsson <andreas@proxel.se>
Reviewed-by: Andres Freund <andres@anarazel.de>
Reviewed-by: Álvaro Herrera <alvherre@kurilemu.de>
Reviewed-by: Pierre Ducroquet <p.psql@pinaraf.info>
Reviewed-by: Matheus Alcantara <matheusssilv97@gmail.com>
Discussion: https://postgr.es/m/CAO6_XqrNjJnbn15ctPv7o4yEAT9fWa-dK15RSyun6QNw9YDtKg%40mail.gmail.com
We were using SnapshotAny to do some index checks, but that's wrong and
causes spurious errors when used on indexes created by CREATE INDEX
CONCURRENTLY. Fix it to use an MVCC snapshot, and add a test for it.
Backpatch of 6bd469d26aca to branches 14-16. I previously misidentified
the bug's origin: it came in with commit 7f563c09f890 (pg11-era, not
5ae2087202af as claimed previously), so all live branches are affected.
Also take the opportunity to fix some comments that we failed to update
in the original commits and apply pgperltidy. In branch 14, remove the
unnecessary test plan specification (which would have need to have been
changed anyway; c.f. commit 549ec201d613.)
Diagnosed-by: Donghang Lin <donghanglin@gmail.com>
Author: Mihail Nikalayeu <mihailnikalayeu@gmail.com>
Reviewed-by: Andrey Borodin <x4mmm@yandex-team.ru>
Backpatch-through: 17
Discussion: https://postgr.es/m/CANtu0ojmVd27fEhfpST7RG2KZvwkX=dMyKUqg0KM87FkOSdz8Q@mail.gmail.com
This reverts d8aa21b74ff, which was added for the PG 18 release notes,
and adjusts the PG 18 release notes for this change. This is necessary
since the "xreflabel" affected other references to these chapters.
Reported-by: Robert Treat
Author: Robert Treat
Discussion: https://postgr.es/m/CABV9wwNEZDdp5QtrW5ut0H+MOf6U1PvrqBqmgSTgcixqk+Q73A@mail.gmail.com
Backpatch-through: 18
When IN/ANY clauses contain both constants and variable expressions, the
optimizer transforms them into separate structures: constants become
an array expression while variables become individual OR conditions.
This transformation was creating an overlap with the token locations,
causing pg_stat_statements query normalization to crash because it
could not calculate the amount of bytes remaining to write for the
normalized query.
This commit disables squashing for mixed IN list expressions when
constructing a scalar array op, by setting list_start and list_end
to -1 when both variables and non-variables are present. Some
regression tests are added to PGSS to verify these patterns.
Author: Sami Imseih <samimseih@gmail.com>
Reviewed-by: Dmitry Dolgov <9erthalion6@gmail.com>
Discussion: https://postgr.es/m/CAA5RZ0ts9qiONnHjjHxPxtePs22GBo4d3jZ_s2BQC59AN7XbAA@mail.gmail.com
Backpatch-through: 18
When faced with a relation containing more than 1 physical segment
(i.e. >1GB, with normal settings), the previous code could compute a
truncation block length greater than RELSEG_SIZE, which could lead to
restore failures of this form:
file "%s" has truncation block length %u in excess of segment size %u
The fix is simply to clamp the maximum computed truncation_block_length
to RELSEG_SiZE. I have also added some comments to clarify the logic.
The test case was written by Oleg Tkachenko, but I have rewritten its
comments.
Reported-by: Oleg Tkachenko <oatkachenko@gmail.com>
Diagnosed-by: Oleg Tkachenko <oatkachenko@gmail.com>
Co-authored-by: Robert Haas <rhaas@postgresql.org>
Co-authored-by: Oleg Tkachenko <oatkachenko@gmail.com>
Reviewed-by: Amul Sul <sulamul@gmail.com>
Backpatch-through: 17
Discussion: http://postgr.es/m/00FEFC88-EA1D-4271-B38F-EB741733A84A@gmail.com
When checking a subquery's output expressions to see if it's safe to
push down an upper-level qual, check_output_expressions() previously
treated grouping Vars as opaque Vars. This implicitly assumed they
were stable and scalar.
However, a grouping Var's underlying expression corresponds to the
grouping clause, which may be volatile or set-returning. If an
upper-level qual references such an output column, pushing it down
into the subquery is unsafe. This can cause strange results due to
multiple evaluation of a volatile function, or introduce SRFs into
the subquery's WHERE/HAVING quals.
This patch teaches check_output_expressions() to look through grouping
Vars to their underlying expressions. This ensures that any
volatility or set-returning properties in the grouping clause are
detected, preventing the unsafe pushdown.
We do not need to recursively examine the Vars contained in these
underlying expressions. Even if they reference outputs from
lower-level subqueries (at any depth), those references are guaranteed
not to expand to volatile or set-returning functions, because
subqueries containing such functions in their targetlists are never
pulled up.
Backpatch to v18, where this issue was introduced.
Reported-by: Eric Ridge <eebbrr@gmail.com>
Diagnosed-by: Tom Lane <tgl@sss.pgh.pa.us>
Author: Richard Guo <guofenglinux@gmail.com>
Discussion: https://postgr.es/m/7900964C-F99E-481E-BEE5-4338774CEB9F@gmail.com
Backpatch-through: 18
This is pretty pro-forma for our purposes, as the only change
is a historical correction for pre-1976 DST laws in
Baja California. (Upstream made this release mostly to update
their leap-second data, which we don't use.) But with minor
releases coming up, we should be up-to-date.
Backpatch-through: 14
The code adding the WAL information included in a backup manifest is
cross-checked with the contents of the timeline history file of the end
timeline. A check based on the end timeline, when it fails, reported
the value of the start timeline in the error message. This error is
fixed to show the correct timeline number in the report.
This error report would be confusing for users if seen, because it would
provide an incorrect information, so backpatch all the way down.
Oversight in 0d8c9c1210c4.
Author: Man Zeng <zengman@halodbtech.com>
Discussion: https://postgr.es/m/tencent_0F2949C4594556F672CF4658@qq.com
Backpatch-through: 14
The function is part of the injection_points test module and only used
in tests. None of the current tests call it with a NULL argument, but
it is supposed to work.
Backpatch-through: 17
Commit cbc127917e introduced tracking of unpruned relids to skip
processing of pruned partitions. PlannedStmt.unprunableRelids is
computed as the difference between PlannerGlobal.allRelids and
prunableRelids, but allRelids only contains RTE_RELATION entries.
This means non-relation RTEs (VALUES, subqueries, CTEs, etc.) are
never included in unprunableRelids, and consequently not in
es_unpruned_relids at runtime.
As a result, rowmarks attached to non-relation RTEs were incorrectly
skipped during executor initialization. This affects any DML statement
that has rowmarks on such RTEs, including MERGE with a VALUES or
subquery source, and UPDATE/DELETE with joins against subqueries or
CTEs. When a concurrent update triggers an EPQ recheck, the missing
rowmark leads to incorrect results.
Fix by restricting the es_unpruned_relids membership check to
RTE_RELATION entries only, since partition pruning only applies to
actual relations. Rowmarks for other RTE kinds are now always
processed.
Bug: #19355
Reported-by: Bihua Wang <wangbihua.cn@gmail.com>
Diagnosed-by: Dean Rasheed <dean.a.rasheed@gmail.com>
Diagnosed-by: Tender Wang <tndrwang@gmail.com>
Author: Dean Rasheed <dean.a.rasheed@gmail.com>
Discussion: https://postgr.es/m/19355-57d7d52ea4980dc6@postgresql.org
Backpatch-through: 18
If a FATAL error occurs while holding a lock in a DSM segment (such
as a dshash lock) and the process is not in a transaction, a
segmentation fault can occur during process exit.
The problem sequence is:
1. Process acquires a lock in a DSM segment (e.g., via dshash)
2. FATAL error occurs outside transaction context
3. proc_exit() begins, calling before_shmem_exit callbacks
4. dsm_backend_shutdown() detaches all DSM segments
5. Later, on_shmem_exit callbacks run
6. ProcKill() calls LWLockReleaseAll()
7. Segfault: the lock being released is in unmapped memory
This only manifests outside transaction contexts because
AbortTransaction() calls LWLockReleaseAll() during transaction
abort, releasing locks before DSM cleanup. Background workers and
other non-transactional code paths are vulnerable.
Fix by calling LWLockReleaseAll() unconditionally at the start of
shmem_exit(), before any callbacks run. Releasing locks before
callbacks prevents the segfault - locks must be released before
dsm_backend_shutdown() detaches their memory. This is safe because
after an error, held locks are protecting potentially inconsistent
data anyway, and callbacks can acquire fresh locks if needed.
Also add a comment noting that LWLockReleaseAll() must be safe to
call before LWLock initialization (which it is, since
num_held_lwlocks will be 0), plus an Assert for the post-condition.
This fix aligns with the original design intent from commit
001a573a2, which noted that backends must clean up shared memory
state (including releasing lwlocks) before unmapping dynamic shared
memory segments.
Reported-by: Rahila Syed <rahilasyed90@gmail.com>
Author: Rahila Syed <rahilasyed90@gmail.com>
Reviewed-by: Amit Langote <amitlangote09@gmail.com>
Reviewed-by: Dilip Kumar <dilipbalaut@gmail.com>
Reviewed-by: Andres Freund <andres@anarazel.de>
Reviewed-by: Álvaro Herrera <alvherre@kurilemu.de>
Discussion: https://postgr.es/m/CAH2L28uSvyiosL+kaic9249jRVoQiQF6JOnaCitKFq=xiFzX3g@mail.gmail.com
Backpatch-through: 14
On restart, a replica can fail with an error like 'unexpected data
beyond EOF in block 200 of relation T/D/R'. These are the steps to
reproduce it:
- A relation has a size of 400 blocks.
- Blocks 201 to 400 are empty.
- Block 200 has two rows.
- Blocks 100 to 199 are empty.
- A restartpoint is done
- Vacuum truncates the relation to 200 blocks
- A FPW deletes a row in block 200
- A checkpoint is done
- A FPW deletes the last row in block 200
- Vacuum truncates the relation to 100 blocks
- The replica restarts
When the replica restarts:
- The relation on disk starts at 100 blocks, because all the
truncations were applied before restart.
- The first truncate to 200 blocks is replayed. It silently fails, but
it will still (incorrectly!) update the cache size to 200 blocks
- The first FPW on block 200 is applied. XLogReadBufferForRead relies
on the cached size and incorrectly assumes that the page already
exists in the file, and thus won't extend the relation.
- The online checkpoint record is replayed, calling smgrdestroyall
which causes the cached size to be discarded
- The second FPW on block 200 is applied. This time, the detected size
is 100 blocks, an extend is attempted. However, the block 200 is
already present in the buffer cache due to the first FPW. This
triggers the 'unexpected data beyond EOF'.
To fix, update the cached size in SmgrRelation with the current size
rather than the requested new size, when the requested new size is
greater.
Author: Anthonin Bonnefoy <anthonin.bonnefoy@datadoghq.com>
Discussion: https://www.postgresql.org/message-id/CAO6_Xqrv-snNJNhbj1KjQmWiWHX3nYGDgAc=vxaZP3qc4g1Siw@mail.gmail.com
Backpatch-through: 14
We called io_uring_cqe_seen(..., cqe) before reading cqe->res. That allows the
completion to be reused, which in turn could lead to cqe->res being
overwritten. The window for that is very narrow and the likelihood of it
happening is very low, as we should never actually utilize all CQEs, but the
consequences would be bad.
This bug was reported to me privately.
Backpatch-through: 18
Discussion: https://postgr.es/m/bwo3e5lj2dgi2wzq4yvbyzu7nmwueczvvzioqsqo6azu6lm5oy@pbx75g2ach3p
If a multixid with zero offset is left behind after a crash, and that
multixid later becomes the oldest multixid, truncation might try to
look up its offset and read the zero value. In the worst case, we
might incorrectly use the zero offset to truncate valid SLRU segments
that are still needed. I'm not sure if that can happen in practice, or
if there are some other lower-level safeguards or incidental reasons
that prevent the caller from passing an unwritten multixid as the
oldest multi. But better safe than sorry, so let's add an explicit
check for it.
In stable branches, we should perhaps do the same check for
'oldestOffset', i.e. the offset of the old oldest multixid (in master,
'oldestOffset' is gone). But if the old oldest multixid has an invalid
offset, the damage has been done already, and we would never advance
past that point. It's not clear what we should do in that case. The
check that this commit adds will prevent such an multixid with invalid
offset from becoming the oldest multixid in the first place, which
seems enough for now.
Reviewed-by: Andrey Borodin <x4mmm@yandex-team.ru>
Discussion: Discussion: https://www.postgresql.org/message-id/000301b2-5b81-4938-bdac-90f6eb660843@iki.fi
Backpatch-through: 14
The test 002_save_fullpage.pl, checking --save-fullpage fails with
wal_consistency_checking enabled, due to the fact that the block saved
in the file has the same LSN as the LSN used in the file name. The test
required that the block LSN is stritly lower than file LSN. This commit
relaxes the check a bit, by allowing the LSNs to match.
While on it, the test name is reworded to include some information about
the file and block LSNs, which is useful for debugging.
Author: Andrey Borodin <x4mmm@yandex-team.ru>
Discussion: https://postgr.es/m/4226AED7-E38F-419B-AAED-9BC853FB55DE@yandex-team.ru
Backpatch-through: 16
RangeTblEntry.groupexprs was marked with the node attribute
query_jumble_ignore, causing a list of GROUP BY expressions to be
ignored during the query jumbling. For example, these two queries could
be grouped together within the same query ID:
SELECT count(*) FROM t GROUP BY a;
SELECT count(*) FROM t GROUP BY b;
However, as such queries use different GROUP BY clauses, they should be
split across multiple entries.
This fixes an oversight in 247dea89f761, that has introduced an RTE for
GROUP BY clauses. Query IDs are documented as being stable across minor
releases, but as this is a regression new to v18 and that we are still
early in its support cycle, a backpatch is exceptionally done as this
has broken a behavior that exists since query jumbling is supported in
core, since its introduction in pg_stat_statements.
The tests of pg_stat_statements are expanded to cover this area, with
patterns involving GROUP BY and GROUPING clauses.
Author: Jian He <jian.universality@gmail.com>
Discussion: https://postgr.es/m/CACJufxEy2W+tCqC7XuJ94r3ivWsM=onKJp94kRFx3hoARjBeFQ@mail.gmail.com
Backpatch-through: 18
Since commit bd15b7db48, pg_dump uses pg_get_sequence_data() (née
pg_sequence_read_tuple()) to gather all sequence data in a single
query as opposed to a query per sequence. Two related bugs have
been identified:
* If the user lacks appropriate privileges on the sequence, pg_dump
generates a setval() command with garbage values instead of
failing as expected.
* pg_dump can fail due to a concurrently dropped sequence, even if
the dropped sequence's data isn't part of the dump.
This commit fixes the above issues by 1) teaching
pg_get_sequence_data() to return nulls instead of erroring for a
missing sequence and 2) teaching pg_dump to fail if it tries to
dump the data of a sequence for which pg_get_sequence_data()
returned nulls. Note that pg_dump may still fail due to a
concurrently dropped sequence, but it should now only do so when
the sequence data is part of the dump. This matches the behavior
before commit bd15b7db48.
Bug: #19365
Reported-by: Paveł Tyślacki <pavel.tyslacki@gmail.com>
Suggested-by: Tom Lane <tgl@sss.pgh.pa.us>
Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us>
Discussion: https://postgr.es/m/19365-6245240d8b926327%40postgresql.org
Discussion: https://postgr.es/m/2885944.1767029161%40sss.pgh.pa.us
Backpatch-through: 18
When creating a partition for a RANGE partitioned table, the reporting
of errors relating to converting the specified range values into
constant values for the partition key's type could display the name of a
previous partition key column when an earlier range was specified as
MINVALUE or MAXVALUE.
This was caused by the code not correctly incrementing the index that
tracks which partition key the foreach loop was working on after
processing MINVALUE/MAXVALUE ranges.
Fix by using foreach_current_index() to ensure the index variable is
always set to the List element being worked on.
Author: myzhen <zhenmingyang@yeah.net>
Reviewed-by: zhibin wang <killerwzb@gmail.com>
Discussion: https://postgr.es/m/273cab52.978.19b96fc75e7.Coremail.zhenmingyang@yeah.net
Backpatch-through: 14
Commit c7eab0e97 changed the default password_encryption setting to
'scram-sha-256', so update the example for creating a user with an
assigned password.
In addition, commit 08951a7c9 added new options that in turn pass
default tokens NOBYPASSRLS and NOREPLICATION to the CREATE ROLE
command, so fix this omission as well for v16 and later.
Reported-by: Heikki Linnakangas <hlinnaka@iki.fi>
Discussion: https://postgr.es/m/cff1ea60-c67d-4320-9e33-094637c2c4fb%40iki.fi
Backpatch-through: 14