To do this, we must include the wal_level in the first WAL record
covered by each summary file; so add wal_level to struct Checkpoint
and the payload of XLOG_CHECKPOINT_REDO and XLOG_END_OF_RECOVERY.
This, in turn, requires bumping XLOG_PAGE_MAGIC and, since the
Checkpoint is also stored in the control file, also
PG_CONTROL_VERSION. It's not great to do that so late in the release
cycle, but the alternative seems to ship v17 without robust
protections against this scenario, which could result in corrupted
incremental backups.
A side effect of this patch is that, when a server with
wal_level=replica is started with summarize_wal=on for the first time,
summarization will no longer begin with the oldest WAL that still
exists in pg_wal, but rather from the first checkpoint after that.
This change should be harmless, because a WAL summary for a partial
checkpoint cycle can never make an incremental backup possible when
it would otherwise not have been.
Report by Fujii Masao. Patch by me. Review and/or testing by Jakub
Wartak and Fujii Masao.
Discussion: http://postgr.es/m/6e30082e-041b-4e31-9633-95a66de76f5d@oss.nttdata.com
The current code can have pg_isready unexpectedly succeed if there is a
server running on the default port. To avoid this we delay running the
test until after a node has been created but before it starts, and then
use that node's port, so we are fairly sure there is nothing running on
the port.
Backpatch to all live branches.
The slot synchronization failed because the local slot's (created during
slot synchronization) catalog_xmin on standby is ahead of remote slot.
This happens because the INSERT before slot synchronization results in the
generation of a new xid that could be replicated to the standby. Now
before the xmin of the physical slot on the primary catches up via
hot_standby_feedback, the test has created a logical slot that got some
prior value of catalog_xmin.
To fix this we could try to ensure that the physical slot's catalog_xmin
is caught up to latest value before creating a logical slot but we took a
simpler path to move the INSERT after synchronizing the logical slot.
Reported-by: Alexander Lakhin as per buildfarm
Diagnosed-by: Amit Kapila, Hou Zhijie, Alexander Lakhin
Author: Hou Zhijie
Backpatch-through: 17
Discussion: https://postgr.es/m/bde6ac67-69cc-c104-5ab6-dd4f5deadf24@gmail.com
This reverts commit e9f15bc9. Instead of a hacky solution that didn't
work on Windows, we avoid trying to move the directory possibly across
drives, and instead remove it and recreate it in the new location.
Discussion: https://postgr.es/m/20240707070243.sb77kp4ubowauctz@awork3.anarazel.de
Backpatch to release 14 like the previous patch.
This acts as a revert of b83747a8a65b and 9886744a361b. As pointed out
by Noah, HEAD and REL_17_STABLE are in a weird state where the code
paths adding /D would limit the spawn of child processes, but we still
have code paths where the spawn of more than one child process would be
possible.
Let's remove these /D switches for now, to bring back the code into a
state consistent with how autorun is configured on a Windows host.
Reported-by: Noah Misch
Discussion: https://postgr.es/m/20240630021211.f3.nmisch@google.com
Backpatch-through: 17
The failed test was syncing failover replication slot to standby to test
that we remove such slots after the standby is converted to subscriber by
pg_createsubscriber.
In one of the buildfarm members, the sync of the slot failed because the
LSN on the standby was before the syncslot's LSN. We need to wait for
standby to catch up before trying to sync the slot with
pg_sync_replication_slots().
The other buildfarm failed because autovacuum generated a xid which is
replicated to the standby at some random point making slots at primary
lag behind standby during slot sync.
Both these failures wouldn't have occurred if we had used built-in
slotsync worker as it would have waited for the standby to sync with
primary but for this test, it is sufficient to use
pg_sync_replication_slots().
Reported-by: Alexander Lakhin as per buildfarm
Author: Kuroda Hayato
Reviewed-by: Amit Kapila
Backpatch-through: 17
Discussion: https://postgr.es/m/0dffca12-bf17-4a7a-334d-225569de5e6e@gmail.com
Discussion: https://postgr.es/m/OSBPR01MB25528300C71FDD83EA1DCA12F5DD2@OSBPR01MB2552.jpnprd01.prod.outlook.com
We don't need the pre-existing subscriptions on the newly formed
subscriber by using pg_createsubscriber. The apply workers corresponding
to these subscriptions can connect to other publisher nodes and either get
some unwarranted data or can lead to ERRORs in connecting to such nodes.
Author: Kuroda Hayato
Reviewed-by: Amit Kapila, Shlok Kyal, Vignesh C
Backpatch-through: 17
Discussion: https://postgr.es/m/OSBPR01MB25526A30A1FBF863ACCDDA3AF5C92@OSBPR01MB2552.jpnprd01.prod.outlook.com
Also omit backslashes (\) in the generated database names on Windows.
As before, perhaps we can revert this after updating affected
buildfarm animals.
Discussion: https://postgr.es/m/2509767.1719773880@sss.pgh.pa.us
This is required before the creation of a new branch. pgindent is
clean, as well as is reformat-dat-files.
perltidy version is v20230309, as documented in pgindent's README.
Don't include double-quotes (") in the generated database names
on Windows. Doing so tickles a bug in older versions of IPC::Run,
which fail to quote command line arguments correctly for that
platform. Possibly we can revert this after updating affected
buildfarm animals.
Discussion: https://postgr.es/m/2509767.1719773880@sss.pgh.pa.us
Introduces an environment variable PG_TEST_PG_COMBINEBACKUP_MODE, that
determines copy mode used by pg_combinebackup in TAP tests. Defaults to
"--copy" but may be set to "--clone" or "--copy-file-range" to use the
alternative stategies.
Reported-by: Peter Eisentraut
Discussion: https://postgr.es/m/48da4a1f-ccd9-4988-9622-24f37b1de2b4%40eisentraut.org
Introduces --copy as an alternative to --clone and --copy-file-range.
This option simply picks the default mode to copy files, as if none of
the options was specified. This makes pg_combinebackup options more
consistent with pg_upgrade, and it makes testing simpler.
Reported-by: Peter Eisentraut
Discussion: https://postgr.es/m/48da4a1f-ccd9-4988-9622-24f37b1de2b4%40eisentraut.org
The code for file cloning existed, but was not reachable as it relied on
constants from missing headers. Due to that, on Linux --clone always
failed with
error: file cloning not supported on this platform
Fixed by including the missing headers to relevant places. Adding the
headers revealed a couple compile errors in copy_file_clone(), so fix
those too.
Reported-by: Peter Eisentraut
Discussion: https://postgr.es/m/48da4a1f-ccd9-4988-9622-24f37b1de2b4%40eisentraut.org
pg_createsubscriber currently always sets up logical replication
with two-phase commit disabled. Improving that is not going to
happen for v17. In the meantime, document the deficiency, and
adjust pg_createsubscriber so that it will emit a warning if
the source installation has max_prepared_transactions > 0.
Hayato Kuroda (some mods by Amit Kapila and me), per complaint from
Noah Misch
Discussion: https://postgr.es/m/20240623062157.97.nmisch@google.com
The original coding here could fail with database names, user names,
etc that contain spaces or other special characters.
As partial test coverage, extend the 040_pg_createsubscriber.pl
test script so that it uses a generated database name containing
funny characters.
Hayato Kuroda (some mods by me), per complaint from Noah Misch
Discussion: https://postgr.es/m/20240623062157.97.nmisch@google.com
This covers both regular and inplace changes, since bugs arise at their
intersection. Where marked, these witness extant bugs. Back-patch to
v12 (all supported versions).
Reviewed (in an earlier version) by Robert Haas.
Discussion: https://postgr.es/m/20240512232923.aa.nmisch@google.com
Contrary to what the comment for the "check" struct member claims,
'pg_upgrade --check' performs only the checks and does not ask the
user for permission to make changes.
Reviewed-by: Daniel Gustafsson
Discussion: https://postgr.es/m/ZnHk7ci5IuTWVc_c%40nathan
It used to check if the replication slot exists and is active on
primary. This check might fail on slow hosts because the replication
slot might not be active at the time of this check.
The current code obtains the replication slot name from the
primary_slot_name on standby and assumes the replication slot exists
and is active on primary. If it doesn't exist, this tool will log an
error and continue.
Author: Euler Taveira <euler.taveira@enterprisedb.com>
Reviewed-by: Hayato Kuroda <kuroda.hayato@fujitsu.com>
Discussion: https://www.postgresql.org/message-id/776c5cac-5ef5-4001-b1bc-5b698bc0c62a%40app.fastmail.com
It used to check if the target server is connected to the primary
server (send required WAL) to rapidly react when the process won't
succeed. This code is not enough to guarantee that the recovery
process will complete. There is a window between the walreceiver
shutdown and the pg_is_in_recovery() returns false that can reach
NUM_CONN_ATTEMPTS attempts and fails.
Instead, rely only on the --recovery-timeout option to give up the
process after the specified number of seconds.
This should help with buildfarm failures on slow machines.
Author: Euler Taveira <euler.taveira@enterprisedb.com>
Reviewed-by: Hayato Kuroda <kuroda.hayato@fujitsu.com>
Discussion: https://www.postgresql.org/message-id/776c5cac-5ef5-4001-b1bc-5b698bc0c62a%40app.fastmail.com
Commit f5e4dedfa exposed libpq's internal function PQsocketPoll
without a lot of thought about whether that was an API we really
wanted to chisel in stone. The main problem with it is the use of
time_t to specify the timeout. While we do want an absolute time
so that a loop around PQsocketPoll doesn't have problems with
timeout slippage, time_t has only 1-second resolution. That's
already problematic for libpq's own internal usage --- for example,
pqConnectDBComplete has long had a kluge to treat "connect_timeout=1"
as 2 seconds so that it doesn't accidentally round to nearly zero.
And it's even less likely to be satisfactory for external callers.
Hence, let's change this while we still can.
The best idea seems to be to use an int64 count of microseconds since
the epoch --- basically the same thing as the backend's TimestampTz,
but let's use the standard Unix epoch (1970-01-01) since that's more
likely for clients to be easy to calculate. Millisecond resolution
would be plenty for foreseeable uses, but maybe the day will come that
we're glad we used microseconds.
Also, since time(2) isn't especially helpful for computing timeouts
defined this way, introduce a new function PQgetCurrentTimeUSec
to get the current time in this form.
Remove the hack in pqConnectDBComplete, so that "connect_timeout=1"
now means what you'd expect.
We can also remove the "#include <time.h>" that f5e4dedfa added to
libpq-fe.h, since there's no longer a need for time_t in that header.
It seems better for v17 not to enlarge libpq-fe.h's include footprint
from what it's historically been, anyway.
I also failed to resist the temptation to do some wordsmithing
on PQsocketPoll's documentation.
Patch by me, per complaint from Dominique Devienne.
Discussion: https://postgr.es/m/913559.1718055575@sss.pgh.pa.us
It wasn't in the documentation at all (even though we document all the
other debugging-like options). Also, change the --help output to show
that it exits after showing, similar to other options.
Files in common/ and fe_utils/ that contain translatable strings need
to be listed in the nls.mk files of the programs that use them. (Not
great, but that's the way it works for now.) This usually requires
some manual analysis which is done about once during each major
release beta period. This time, I wrote a hackish script that figures
some of this out more automatically, so this update is a bit larger as
it also includes some files that were missed in the past.
Make sure that function declarations use names that exactly match the
corresponding names from function definitions in a few places. These
inconsistencies were all introduced during Postgres 17 development.
pg_bsd_indent still has a couple of similar inconsistencies, which I
(pgeoghegan) have left untouched for now.
This commit was written with help from clang-tidy, by mechanically
applying the same rules as similar clean-up commits (the earliest such
commit was commit 035ce1fe).
After further review, we want to move in the direction of always
quoting GUC names in error messages, rather than the previous (PG16)
wildly mixed practice or the intermittent (mid-PG17) idea of doing
this depending on how possibly confusing the GUC name is.
This commit applies appropriate quotes to (almost?) all mentions of
GUC names in error messages. It partially supersedes a243569bf65 and
8d9978a7176, which had moved things a bit in the opposite direction
but which then were abandoned in a partial state.
Author: Peter Smith <smithpb2250@gmail.com>
Discussion: https://www.postgresql.org/message-id/flat/CAHut%2BPv-kSN8SkxSdoHano_wPubqcg5789ejhCDZAcLFceBR-w%40mail.gmail.com
This feature set did not handle empty ranges correctly, and it's now
too late for PostgreSQL 17 to fix it.
The following commits are reverted:
6db4598fcb8 Add stratnum GiST support function
46a0cd4cefb Add temporal PRIMARY KEY and UNIQUE constraints
86232a49a43 Fix comment on gist_stratnum_btree
030e10ff1a3 Rename pg_constraint.conwithoutoverlaps to conperiod
a88c800deb6 Use daterange and YMD in without_overlaps tests instead of tsrange.
5577a71fb0c Use half-open interval notation in without_overlaps tests
34768ee3616 Add temporal FOREIGN KEY contraints
482e108cd38 Add test for REPLICA IDENTITY with a temporal key
c3db1f30cba doc: clarify PERIOD and WITHOUT OVERLAPS in CREATE TABLE
144c2ce0cc7 Fix ON CONFLICT DO NOTHING/UPDATE for temporal indexes
Discussion: https://www.postgresql.org/message-id/d0b64a7a-dfe4-4b84-a906-c7dedfa40a3e@eisentraut.org
9a974cbcba00 moved the query in binary_upgrade_set_pg_class_oids to the
outer level, but left the PQclear and query buffer destruction in the
is_index conditional. 353708e1fb2d fixed the leak of the query buffer
but left the PGresult leak. This moves clearing the result to the outer
level ensuring that it will be called.
Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us>
Discussion: https://postgr.es/m/374550C1-F4ED-4D9D-9498-0FD029CCF674@yesql.se
Backpatch-through: v15
Run pgindent, pgperltidy, and reformat-dat-files.
The pgindent part of this is pretty small, consisting mainly of
fixing up self-inflicted formatting damage from patches that
hadn't bothered to add their new typedefs to typedefs.list.
In order to keep it from making anything worse, I manually added
a dozen or so typedefs that appeared in the existing typedefs.list
but not in the buildfarm's list. Perhaps we should formalize that,
or better find a way to get those typedefs into the automatic list.
pgperltidy is as opinionated as always, and reformat-dat-files too.
Commit d6607016c7 moved all the jsonapi.c error messages into
token_error(). This needs to be added to the various nls.mk files
that use this. Since that makes token_error() effectively a globally
known symbol, the name seems a bit too general, so rename to
json_token_error() for more clarity.
When --auth was added to initdb in commit e7029b212755 it had support
for auth options separated by space from the auth type, like:
--auth pam <servicename>
--auth ident sameuser
Passing an option to the ident auth type was removed in 01c1a12a5bb4
which left the pam auth-options support in place. 8a02339e9ba3 broke
this by inverting a calculation in the strncmp arguments, which went
unnoticed for a long time. The ability to pass options to the auth
type was never documented.
Rather than fixing the support for an undocumented feature which has
been broken for all supported versions, and which only supports one
out of many auth types which can take options, it is removed.
Reported-by: Jingxian Li <aqktjcm@qq.com>
Reviewed-by: Michael Paquier <michael@paquier.xyz>
Reviewed-by: Aleksander Alekseev <aleksander@timescale.com>
Discussion: https://postgr.es/m/tencent_29731C7C7E6A2F9FB807C3A1DC3D81293C06@qq.com
There are some problems with the new way to handle these constraints
that were detected at the last minute, and require fixes that appear too
invasive to be doing this late in the cycle. Revert this (again) for
now, we'll try again with these problems fixed.
The following commits are reverted:
b0e96f311985 Catalog not-null constraints
9b581c534186 Disallow changing NO INHERIT status of a not-null constraint
d0ec2ddbe088 Fix not-null constraint test
ac22a9545ca9 Move privilege check to the right place
b0f7dd915bca Check stack depth in new recursive functions
3af721794272 Update information_schema definition for not-null constraints
c3709100be73 Fix propagating attnotnull in multiple inheritance
d9f686a72ee9 Fix restore of not-null constraints with inheritance
d72d32f52d26 Don't try to assign smart names to constraints
0cd711271d42 Better handle indirect constraint drops
13daa33fa5a6 Disallow NO INHERIT not-null constraints on partitioned tables
d45597f72fe5 Disallow direct change of NO INHERIT of not-null constraints
21ac38f498b3 Fix inconsistencies in error messages
Discussion: https://postgr.es/m/202405110940.joxlqcx4dogd@alvherre.pgsql
If -l was specified together with selective-restore options such as -n
or -N, dependent TOC entries such as comments would be omitted from
the listing, even when an actual restore would have selected them.
This happened because PrintTOCSummary neglected to update the te->reqs
marking of the entry they depended on.
Per report from Justin Pryzby. This has been wrong since 0d4e6ed30
taught _tocEntryRequired to sometimes look at the "reqs" marking of
other TOC entries, so back-patch to all supported branches.
Discussion: https://postgr.es/m/ZjoeirG7yxODdC4P@pryzbyj2023