This commit adds a new system view pg_stat_subscription_workers, that
shows information about any errors which occur during the application of
logical replication changes as well as during performing initial table
synchronization. The subscription statistics entries are removed when the
corresponding subscription is removed.
It also adds an SQL function pg_stat_reset_subscription_worker() to reset
single subscription errors.
The contents of this view can be used by an upcoming patch that skips the
particular transaction that conflicts with the existing data on the
subscriber.
This view can be extended in the future to track other xact related
statistics like the number of xacts committed/aborted for subscription
workers.
Author: Masahiko Sawada
Reviewed-by: Greg Nancarrow, Hou Zhijie, Tang Haiying, Vignesh C, Dilip Kumar, Takamichi Osumi, Amit Kapila
Discussion: https://postgr.es/m/CAD21AoDeScrsHhLyEPYqN3sydg6PxAPVBboK=30xJfUVihNZDA@mail.gmail.com
This reverts commits c2d1eea9e and 11b500072, as well as similar hacks
elsewhere, in favor of setting up the PGDLLIMPORT macro so that it can
just be used unconditionally. That can work because in frontend code,
we need no marking in either the defining or consuming files for a
variable exported from these libraries; and frontend code has no need
to access variables exported from the core backend, either.
While at it, write some actual documentation about the PGDLLIMPORT
and PGDLLEXPORT macros.
Patch by me, based on a suggestion from Robert Haas.
Discussion: https://postgr.es/m/1160385.1638165449@sss.pgh.pa.us
PGDLLIMPORT is only appropriate for variables declared in the backend,
not when the variable is coming from a library included in frontend code.
(This isn't a particularly nice fix, but for now, use the same method
employed elsewhere.)
Discussion: https://postgr.es/m/E1mrWUD-000235-Hq@gemulon.postgresql.org
Standardize on xoroshiro128** as our basic PRNG algorithm, eliminating
a bunch of platform dependencies as well as fundamentally-obsolete PRNG
code. In addition, this API replacement will ease replacing the
algorithm again in future, should that become necessary.
xoroshiro128** is a few percent slower than the drand48 family,
but it can produce full-width 64-bit random values not only 48-bit,
and it should be much more trustworthy. It's likely to be noticeably
faster than the platform's random(), depending on which platform you
are thinking about; and we can have non-global state vectors easily,
unlike with random(). It is not cryptographically strong, but neither
are the functions it replaces.
Fabien Coelho, reviewed by Dean Rasheed, Aleksander Alekseev, and myself
Discussion: https://postgr.es/m/alpine.DEB.2.22.394.2105241211230.165418@pseudo
Surround the contents with a test that the feature is enabled by
configure, to silence header checking tools on systems without GSSAPI
installed.
Backpatch to 12, where the file appeared.
Discussion: https://postgr.es/m/202111161709.u3pbx5lxdimt@alvherre.pgsql
It's possible that a subplan below a Memoize node contains a parameter
from above the Memoize node. If this parameter changes then cache entries
may become out-dated due to the new parameter value.
Previously Memoize was mistakenly not aware of this. We fix this here by
flushing the cache whenever a parameter that's not part of the cache
key changes.
Bug: #17213
Reported by: Elvis Pranskevichus
Author: David Rowley
Discussion: https://postgr.es/m/17213-988ed34b225a2862@postgresql.org
Backpatch-through: 14, where Memoize was added
It's possible that a subplan below a Memoize node contains a parameter
from above the Memoize node. If this parameter changes then cache entries
may become out-dated due to the new parameter value.
Previously Memoize was mistakenly not aware of this. We fix this here by
flushing the cache whenever a parameter that's not part of the cache
key changes.
Bug: #17213
Reported by: Elvis Pranskevichus
Author: David Rowley
Discussion: https://postgr.es/m/17213-988ed34b225a2862@postgresql.org
Backpatch-through: 14, where Memoize was added
Memoize would always use the hash equality operator for the cache key
types to determine if the current set of parameters were the same as some
previously cached set. Certain types such as floating points where -0.0
and +0.0 differ in their binary representation but are classed as equal by
the hash equality operator may cause problems as unless the join uses the
same operator it's possible that whichever join operator is being used
would be able to distinguish the two values. In which case we may
accidentally return in the incorrect rows out of the cache.
To fix this here we add a binary mode to Memoize to allow it to the
current set of parameters to previously cached values by comparing
bit-by-bit rather than logically using the hash equality operator. This
binary mode is always used for LATERAL joins and it's used for normal
joins when any of the join operators are not hashable.
Reported-by: Tom Lane
Author: David Rowley
Discussion: https://postgr.es/m/3004308.1632952496@sss.pgh.pa.us
Backpatch-through: 14, where Memoize was added
This commit adds a set of functions able to look at the contents of
various paths related to replication slots:
- pg_ls_logicalsnapdir, for pg_logical/snapshots/
- pg_ls_logicalmapdir, for pg_logical/mappings/
- pg_ls_replslotdir, for pg_replslot/<slot_name>/
These are intended to be used by monitoring tools. Unlike pg_ls_dir(),
execution permission can be granted to non-superusers. Roles members of
pg_monitor gain have access to those functions.
Bump catalog version.
Author: Bharath Rupireddy
Reviewed-by: Nathan Bossart, Justin Pryzby
Discussion: https://postgr.es/m/CALj2ACWsfizZjMN6bzzdxOk1ADQQeSw8HhEjhmVXn_Pu+7VzLw@mail.gmail.com
While determining xid horizons, we skip over backends that are running
Vacuum. We also ignore Create Index Concurrently, or Reindex Concurrently
for the purposes of computing Xmin for Vacuum. But we were not setting the
flags corresponding to these operations when they are performed in
parallel which was preventing Xid horizon from advancing.
The optimization related to skipping Create Index Concurrently, or Reindex
Concurrently operations was implemented in PG-14 but the fix is the same
for the Parallel Vacuum as well so back-patched till PG-13.
Author: Masahiko Sawada
Reviewed-by: Amit Kapila
Backpatch-through: 13
Discussion: https://postgr.es/m/CAD21AoCLQqgM1sXh9BrDFq0uzd3RBFKi=Vfo6cjjKODm0Onr5w@mail.gmail.com
Up to now, you couldn't escape out of psql's \password command
by typing control-C (or other local spelling of SIGINT). This
is pretty user-unfriendly, so improve it. To do so, we have to
modify the functions provided by pg_get_line.c; but we don't
want to mess with psql's SIGINT handler setup, so provide an
API that lets that handler cause the cancel to occur.
This relies on the assumption that we won't do any major harm by
longjmp'ing out of fgets(). While that's obviously a little shaky,
we've long had the same assumption in the main input loop, and few
issues have been reported.
psql has some other simple_prompt() calls that could usefully
be improved the same way; for now, just deal with \password.
Nathan Bossart, minor tweaks by me
Discussion: https://postgr.es/m/747443.1635536754@sss.pgh.pa.us
This fills in some gaps in planner support for starts_with() and
the equivalent ^@ operator:
* A condition such as "textcol ^@ constant" can now use a regular
btree index, not only an SP-GiST index, so long as the index's
collation is C. (This works just like "textcol LIKE 'foo%'".)
* "starts_with(textcol, constant)" can be optimized the same as
"textcol ^@ constant".
* Fixed-prefix LIKE and regex patterns are now more like starts_with()
in another way: if you apply one to an SPGiST-indexed column, you'll
get an index condition using ^@ rather than two index conditions with
>= and <.
Per a complaint from Shay Rojansky. Patch by me; thanks to
Nathan Bossart for review.
Discussion: https://postgr.es/m/232599.1633800229@sss.pgh.pa.us
Add a comment explaining why the pgstats accounting used during
opportunistic heap pruning operations (to maintain the current number of
dead tuples in the relation) needs to compensate by subtracting away the
number of new LP_DEAD items. This is needed so it can avoid completely
forgetting about tuples that become LP_DEAD items during pruning -- they
should still count.
It seems more natural to discuss this issue at the only relevant call
site (opportunistic pruning), since the same issue does not apply to the
only other caller (the VACUUM call site). Move everything there too.
Author: Peter Geoghegan <pg@bowt.ie>
Discussion: https://postgr.es/m/CAH2-Wzm7f+A6ej650gi_ifTgbhsadVW5cujAL3punpupHff5Yg@mail.gmail.com
Presently, the archive_status directory was scanned for each file to
archive. When there are many status files, say because archive_command
has been failing for a long time, these directory scans can get very
slow. With this change, the archiver remembers several files to archive
during each directory scan, speeding things up.
To ensure timeline history files are archived as quickly as possible,
XLogArchiveNotify() forces the archiver to do a new directory scan as
soon as the .ready file for one is created.
Nathan Bossart, per a long discussion involving many people. It is
not clear to me exactly who out of all those people reviewed this
particular patch.
Discussion: http://postgr.es/m/CA+TgmobhAbs2yabTuTRkJTq_kkC80-+jw=pfpypdOJ7+gAbQbw@mail.gmail.com
Discussion: http://postgr.es/m/620F3CE1-0255-4D66-9D87-0EADE866985A@amazon.com
It's a coin toss which of these is a better default assumption.
However, of the machines we have in the buildfarm, the only ones
relying on the fallback socklen_t definition are ancient HPUX,
and on that platform unsigned int is the right choice. Minor
tweak to ee3a1a5b6.
Discussion: https://postgr.es/m/1440792.1636558888@sss.pgh.pa.us
This check was used to accommodate a staggering variety in particular
in the type of the third argument of accept(). This is no longer of
concern on currently supported systems. We can just use socklen_t in
the code and put in a simple check that substitutes int for socklen_t
if it's missing, to cover the few stragglers.
Reviewed-by: Andres Freund <andres@anarazel.de>
Discussion: https://www.postgresql.org/message-id/3538f4c4-1886-64f2-dcff-aaad8267fb82@enterprisedb.com
Commit 5a2832465f introduced some enums to represent all tables in schema
publications and used REL in their names. Use TABLE instead of REL in
those enums to avoid confusion with other objects like SEQUENCES that can
be part of a publication in the future.
In the passing, (a) Change one of the newly introduced error messages to
make it consistent for Create and Alter commands, (b) add missing alias in
one of the SQL Statements that is used to print publications associated
with the table.
Reported-by: Tomas Vondra, Peter Smith
Author: Vignesh C
Reviewed-by: Hou Zhijie, Peter Smith
Discussion: https://www.postgresql.org/message-id/CALDaNm0OANxuJ6RXqwZsM1MSY4s19nuH3734j4a72etDwvBETQ%40mail.gmail.com
The server collects up to a bufferload of data whenever it reads data
from the client socket. When SSL or GSS encryption is requested
during startup, any additional data received with the initial
request message remained in the buffer, and would be treated as
already-decrypted data once the encryption handshake completed.
Thus, a man-in-the-middle with the ability to inject data into the
TCP connection could stuff some cleartext data into the start of
a supposedly encryption-protected database session.
This could be abused to send faked SQL commands to the server,
although that would only work if the server did not demand any
authentication data. (However, a server relying on SSL certificate
authentication might well not do so.)
To fix, throw a protocol-violation error if the internal buffer
is not empty after the encryption handshake.
Our thanks to Jacob Champion for reporting this problem.
Security: CVE-2021-23214
In v14, because we don't have a field in RestrictInfo to cache both the
left and right type's hash equality operator, we just restrict the scope
of Memoize to only when the left and right types of a RestrictInfo are the
same.
In master we add another field to RestrictInfo and cache both hash
equality operators.
Reported-by: Jaime Casanova
Author: David Rowley
Discussion: https://postgr.es/m/20210929185544.GB24346%40ahch-to
Backpatch-through: 14
StartupXLOG() still has ThisTimeLineID as a local variable, but the
remaining code in xlog.c now needs to the relevant TimeLineID by some
other means. Mostly, this means that we now pass it as a function
parameter to a bunch of functions where we didn't previously.
However, a few cases require special handling:
- In functions that might be called by outside callers who
wouldn't necessarily know what timeline to specify, we get
the timeline ID from shared memory. XLogCtl->ThisTimeLineID
can be used in most cases since recovery is known to have
completed by the time those functions are called. In
xlog_redo(), we can use XLogCtl->replayEndTLI.
- XLogFileClose() needs to know the TLI of the open logfile.
Do that with a new global variable openLogTLI. While
someone could argue that this is just trading one global
variable for another, the new one has a far more narrow
purposes and is referenced in just a few places.
- read_backup_label() now returns the TLI that it obtains
by parsing the backup_label file. Previously, ReadRecord()
could be called to parse the checkpoint record without
ThisTimeLineID having been initialized. Now, the timeline
is passed down, and I didn't want to pass an uninitialized
variable; this change lets us avoid that. The old coding
didn't seem to have any practical consequences that we need
to worry about, but this is cleaner.
- In BootstrapXLOG(), it's just a constant.
Patch by me, reviewed and tested by Michael Paquier, Amul Sul, and
Álvaro Herrera.
Discussion: https://postgr.es/m/CA+TgmobfAAqhfWa1kaFBBFvX+5CjM=7TE=n4r4Q1o2bjbGYBpA@mail.gmail.com
All such code deals with this global variable in one of three ways.
Sometimes the same functions use it in more than one of these ways
at the same time.
First, sometimes it's an implicit argument to one or more functions
being called in xlog.c or elsewhere, and must be set to the
appropriate value before calling those functions lest they
misbehave. In those cases, it is now passed as an explicit argument
instead.
Second, sometimes it's used to obtain the current timeline after
the end of recovery, i.e. the timeline to which WAL is being
written and flushed. Such code now calls GetWALInsertionTimeLine()
or relies on the new out parameter added to GetFlushRecPtr().
Third, sometimes it's used during recovery to store the current
replay timeline. That can change, so such code must generally
update the value before each use. It can still do that, but must
now use a local variable instead.
The net effect of these changes is to reduce by a fair amount the
amount of code that is directly accessing this global variable.
That's good, because history has shown that we don't always think
clearly about which timeline ID it's supposed to contain at any
given point in time, or indeed, whether it has been or needs to
be initialized at any given point in the code.
Patch by me, reviewed and tested by Michael Paquier, Amul Sul, and
Álvaro Herrera.
Discussion: https://postgr.es/m/CA+TgmobfAAqhfWa1kaFBBFvX+5CjM=7TE=n4r4Q1o2bjbGYBpA@mail.gmail.com
The base backup code has accumulated a healthy number of new
features over the years, but it's becoming increasingly difficult
to maintain and further enhance that code because there's no
real separation of concerns. For example, the code that
understands knows the details of how we send data to the client
using the libpq protocol is scattered throughout basebackup.c,
rather than being centralized in one place.
To try to improve this situation, introduce a new 'bbsink' object
which acts as a recipient for archives generated during the base
backup progress and also for the backup manifest. This commit
introduces three types of bbsink: a 'copytblspc' bbsink forwards the
backup to the client using one COPY OUT operation per tablespace and
another for the manifest, a 'progress' bbsink performs command
progress reporting, and a 'throttle' bbsink performs rate-limiting.
The 'progress' and 'throttle' bbsink types also forward the data to a
successor bbsink; at present, the last bbsink in the chain will
always be of type 'copytblspc'. There are plans to add more types
of 'bbsink' in future commits.
This abstraction is a bit leaky in the case of progress reporting,
but this still seems cleaner than what we had before.
Patch by me, reviewed and tested by Andres Freund, Sumanta Mukherjee,
Dilip Kumar, Suraj Kharage, Dipesh Pandit, Tushar Ahuja, Mark Dilger,
and Jeevan Ladhe.
Discussion: https://postgr.es/m/CA+TgmoZGwR=ZVWFeecncubEyPdwghnvfkkdBe9BLccLSiqdf9Q@mail.gmail.com
Discussion: https://postgr.es/m/CA+TgmoZvqk7UuzxsX1xjJRmMGkqoUGYTZLDCH8SmU1xTPr1Xig@mail.gmail.com
Add hardening to the heapam index tuple deletion path to catch TIDs in
index pages that point to a heap item that index tuples should never
point to. The corruption we're trying to catch here is particularly
tricky to detect, since it typically involves "extra" (corrupt) index
tuples, as opposed to the absence of required index tuples in the index.
For example, a heap TID from an index page that turns out to point to an
LP_UNUSED item in the heap page has a good chance of being caught by one
of the new checks. There is a decent chance that the recently fixed
parallel VACUUM bug (see commit 9bacec15) would have been caught had
that particular check been in place for Postgres 14. No backpatch of
this extra hardening for now, though.
Author: Peter Geoghegan <pg@bowt.ie>
Reviewed-By: Andres Freund <andres@anarazel.de>
Discussion: https://postgr.es/m/CAH2-Wzk-4_raTzawWGaiqNvkpwDXxv3y1AQhQyUeHfkU=tFCeA@mail.gmail.com
Commit b4af70cb, which simplified state managed by VACUUM, performed
refactoring of parallel VACUUM in passing. Confusion about the exact
details of the tasks that the leader process is responsible for led to
code that made it possible for parallel VACUUM to miss a subset of the
table's indexes entirely. Specifically, indexes that fell under the
min_parallel_index_scan_size size cutoff were missed. These indexes are
supposed to be vacuumed by the leader (alongside any parallel unsafe
indexes), but weren't vacuumed at all. Affected indexes could easily
end up with duplicate heap TIDs, once heap TIDs were recycled for new
heap tuples. This had generic symptoms that might be seen with almost
any index corruption involving structural inconsistencies between an
index and its table.
To fix, make sure that the parallel VACUUM leader process performs any
required index vacuuming for indexes that happen to be below the size
cutoff. Also document the design of parallel VACUUM with these
below-size-cutoff indexes.
It's unclear how many users might be affected by this bug. There had to
be at least three indexes on the table to hit the bug: a smaller index,
plus at least two additional indexes that themselves exceed the size
cutoff. Cases with just one additional index would not run into
trouble, since the parallel VACUUM cost model requires two
larger-than-cutoff indexes on the table to apply any parallel
processing. Note also that autovacuum was not affected, since it never
uses parallel processing.
Test case based on tests from a larger patch to test parallel VACUUM by
Masahiko Sawada.
Many thanks to Kamigishi Rei for her invaluable help with tracking this
problem down.
Author: Peter Geoghegan <pg@bowt.ie>
Author: Masahiko Sawada <sawada.mshk@gmail.com>
Reported-By: Kamigishi Rei <iijima.yun@koumakan.jp>
Reported-By: Andrew Gierth <andrew@tao11.riddles.org.uk>
Diagnosed-By: Andres Freund <andres@anarazel.de>
Bug: #17245
Discussion: https://postgr.es/m/17245-ddf06aaf85735f36@postgresql.org
Discussion: https://postgr.es/m/20211030023740.qbnsl2xaoh2grq3d@alap3.anarazel.de
Backpatch: 14-, where the refactoring commit appears.
As in commits 6301c3ada and e9d9ba2a4, avoid doing repetitive
list_delete_first() operations, since that would be expensive when
there are many files waiting to be unlinked. This is a slightly
larger change than in those cases. We have to keep the list state
valid for calls to AbsorbSyncRequests(), so it's necessary to invent a
"canceled" field instead of immediately deleting PendingUnlinkEntry
entries. Also, because we might not be able to process all the
entries, we need a new list primitive list_delete_first_n().
list_delete_first_n() is almost list_copy_tail(), but it modifies the
input List instead of making a new copy. I found a couple of existing
uses of the latter that could profitably use the new function. (There
might be more, but the other callers look like they probably shouldn't
overwrite the input List.)
As before, back-patch to v13.
Discussion: https://postgr.es/m/CD2F0E7F-9822-45EC-A411-AE56F14DEA9F@amazon.com
Commit 0bead9af484c introduced XLOG_INCLUDE_XID flag to indicate that the
WAL record contains subXID-to-topXID association. It uses that flag later
to mark in CurrentTransactionState that top-xid is logged so that we
should not try to log it again with the next WAL record in the current
subtransaction. However, we can use a localized variable to pass that
information.
In passing, change the related function and variable names to make them
consistent with what the code is actually doing.
Author: Dilip Kumar
Reviewed-by: Alvaro Herrera, Amit Kapila
Discussion: https://postgr.es/m/E1mSoYz-0007Fh-D9@gemulon.postgresql.org
A new option "FOR ALL TABLES IN SCHEMA" in Create/Alter Publication allows
one or more schemas to be specified, whose tables are selected by the
publisher for sending the data to the subscriber.
The new syntax allows specifying both the tables and schemas. For example:
CREATE PUBLICATION pub1 FOR TABLE t1,t2,t3, ALL TABLES IN SCHEMA s1,s2;
OR
ALTER PUBLICATION pub1 ADD TABLE t1,t2,t3, ALL TABLES IN SCHEMA s1,s2;
A new system table "pg_publication_namespace" has been added, to maintain
the schemas that the user wants to publish through the publication.
Modified the output plugin (pgoutput) to publish the changes if the
relation is part of schema publication.
Updates pg_dump to identify and dump schema publications. Updates the \d
family of commands to display schema publications and \dRp+ variant will
now display associated schemas if any.
Author: Vignesh C, Hou Zhijie, Amit Kapila
Syntax-Suggested-by: Tom Lane, Alvaro Herrera
Reviewed-by: Greg Nancarrow, Masahiko Sawada, Hou Zhijie, Amit Kapila, Haiying Tang, Ajin Cherian, Rahila Syed, Bharath Rupireddy, Mark Dilger
Tested-by: Haiying Tang
Discussion: https://www.postgresql.org/message-id/CALDaNm0OANxuJ6RXqwZsM1MSY4s19nuH3734j4a72etDwvBETQ@mail.gmail.com
Remove superuser check, allowing any user granted permissions on
pg_log_backend_memory_contexts() to log the memory contexts of any
backend.
Note that this could allow a privileged non-superuser to log the
memory contexts of a superuser backend, but as discussed, that does
not seem to be a problem.
Reviewed-by: Nathan Bossart, Bharath Rupireddy, Michael Paquier, Kyotaro Horiguchi, Andres Freund
Discussion: https://postgr.es/m/e5cf6684d17c8d1ef4904ae248605ccd6da03e72.camel@j-davis.com
Users sometimes get concerned whe they start the server and it
emits a few messages and then doesn't emit any more messages for
a long time. Generally, what's happening is either that the
system is taking a long time to apply WAL, or it's taking a
long time to reset unlogged relations, or it's taking a long
time to fsync the data directory, but it's not easy to tell
which is the case.
To fix that, add a new 'log_startup_progress_interval' setting,
by default 10s. When an operation that is known to be potentially
long-running takes more than this amount of time, we'll log a
status update each time this interval elapses.
To avoid undesirable log chatter, don't log anything about WAL
replay when in standby mode.
Nitin Jadhav and Robert Haas, reviewed by Amul Sul, Bharath
Rupireddy, Justin Pryzby, Michael Paquier, and Álvaro Herrera.
Discussion: https://postgr.es/m/CA+TgmoaHQrgDFOBwgY16XCoMtXxsrVGFB2jNCvb7-ubuEe1MGg@mail.gmail.com
Discussion: https://postgr.es/m/CAMm1aWaHF7VE69572_OLQ+MgpT5RUiUDgF1x5RrtkJBLdpRj3Q@mail.gmail.com
The command is supported for physical slots for now, and returns the
type of slot, its restart_lsn and its restart_tli.
This will be useful for an upcoming patch related to pg_receivewal, to
allow the tool to be able to stream from the position of a slot, rather
than the last WAL position flushed by the backend (as reported by
IDENTIFY_SYSTEM) if the archive directory is found as empty, which would
be an advantage in the case of switching to a different archive
locations with the same slot used to avoid holes in WAL segment
archives.
Author: Ronan Dunklau
Reviewed-by: Kyotaro Horiguchi, Michael Paquier, Bharath Rupireddy
Discussion: https://postgr.es/m/18708360.4lzOvYHigE@aivenronan
The purpose of commit 8a54e12a38d1545d249f1402f66c8cde2837d97c was to
fix this, and it sufficed when the PREPARE TRANSACTION completed before
the CIC looked for lock conflicts. Otherwise, things still broke. As
before, in a cluster having used CIC while having enabled prepared
transactions, queries that use the resulting index can silently fail to
find rows. It may be necessary to reindex to recover from past
occurrences; REINDEX CONCURRENTLY suffices. Fix this for future index
builds by making CIC wait for arbitrarily-recent prepared transactions
and for ordinary transactions that may yet PREPARE TRANSACTION. As part
of that, have PREPARE TRANSACTION transfer locks to its dummy PGPROC
before it calls ProcArrayClearTransaction(). Back-patch to 9.6 (all
supported versions).
Andrey Borodin, reviewed (in earlier versions) by Andres Freund.
Discussion: https://postgr.es/m/01824242-AA92-4FE9-9BA7-AEBAFFEA3D0C@yandex-team.ru
CIC and REINDEX CONCURRENTLY assume backends see their catalog changes
no later than each backend's next transaction start. That failed to
hold when a backend absorbed a relevant invalidation in the middle of
running RelationBuildDesc() on the CIC index. Queries that use the
resulting index can silently fail to find rows. Fix this for future
index builds by making RelationBuildDesc() loop until it finishes
without accepting a relevant invalidation. It may be necessary to
reindex to recover from past occurrences; REINDEX CONCURRENTLY suffices.
Back-patch to 9.6 (all supported versions).
Noah Misch and Andrey Borodin, reviewed (in earlier versions) by Andres
Freund.
Discussion: https://postgr.es/m/20210730022548.GA1940096@gust.leadboat.com
The code does not expect sh_error() to return, but the patch
that made this header usable in frontend didn't get that memo.
While here, plaster unlikely() on the tests that decide whether
to invoke sh_error(), and add our standard copyright notice.
Noted by Andres Freund. Back-patch to v13 where this frontend
support came in.
Discussion: https://postgr.es/m/0D54435C-1199-4361-9D74-2FBDCF8EA164@anarazel.de
I just got burnt by trying to use pg_malloc instead of pg_malloc0
with this. Save the next hacker some time by not leaving this
API detail undocumented.
All the tape functions, like LogicalTapeRead and LogicalTapeWrite, now
take a LogicalTape as argument, instead of LogicalTapeSet+tape number.
You can create any number of LogicalTapes in a single LogicalTapeSet, and
you don't need to decide the number upfront, when you create the tape set.
This makes the tape management in hash agg spilling in nodeAgg.c simpler.
Discussion: https://www.postgresql.org/message-id/420a0ec7-602c-d406-1e75-1ef7ddc58d83%40iki.fi
Reviewed-by: Peter Geoghegan, Zhihong Yu, John Naylor
During a replication slot creation, an ERROR generated in the same
transaction as the one creating a to-be-exported snapshot would have
left the backend in an inconsistent state, as the associated static
export snapshot state was not being reset on transaction abort, but only
on the follow-up command received by the WAL sender that created this
snapshot on replication slot creation. This would trigger inconsistency
failures if this session tried to export again a snapshot, like during
the creation of a replication slot.
Note that a snapshot export cannot happen in a transaction block, so
there is no need to worry resetting this state for subtransaction
aborts. Also, this inconsistent state would very unlikely show up to
users. For example, one case where this could happen is an
out-of-memory error when building the initial snapshot to-be-exported.
Dilip found this problem while poking at a different patch, that caused
an error in this code path for reasons unrelated to HEAD.
Author: Dilip Kumar
Reviewed-by: Michael Paquier, Zhihong Yu
Discussion: https://postgr.es/m/CAFiTN-s0zA1Kj0ozGHwkYkHwa5U0zUE94RSc_g81WrpcETB5=w@mail.gmail.com
Backpatch-through: 9.6
Do not update shm_mq's mq_bytes_written until we have written
an amount of data greater than 1/4th of the ring size, unless
the caller of shm_mq_send(v) requests a flush at the end of
the message. This reduces the number of calls to SetLatch(),
and also the number of CPU cache misses, considerably, and thus
makes shm_mq significantly faster.
Dilip Kumar, reviewed by Zhihong Yu and Tomas Vondra. Some
minor cosmetic changes by me.
Discussion: http://postgr.es/m/CAFiTN-tVXqn_OG7tHNeSkBbN+iiCZTiQ83uakax43y1sQb2OBA@mail.gmail.com
windows.h includes a lot of other headers, slowing down compilation
significantly. WIN32_LEAN_AND_MEAN reduces that a bit. It'd be better to
remove the include of windows.h (as well as indirect inclusions of it) from such
a central place, but until then...
Discussion: https://postgr.es/m/20210921193035.pqzay43vpyv7in43@alap3.anarazel.de
Prior to v14, we insisted that the query in RETURN QUERY be of a type
that returns tuples. (For instance, INSERT RETURNING was allowed,
but not plain INSERT.) That happened indirectly because we opened a
cursor for the query, so spi.c checked SPI_is_cursor_plan(). As a
consequence, the error message wasn't terribly on-point, but at least
it was there.
Commit 2f48ede08 lost this detail. Instead, plain RETURN QUERY
insisted that the query be a SELECT (by checking for SPI_OK_SELECT)
while RETURN QUERY EXECUTE failed to check the query type at all.
Neither of these changes was intended.
The only convenient place to check this in the EXECUTE case is inside
_SPI_execute_plan, because we haven't done parse analysis until then.
So we need to pass down a flag saying whether to enforce that the
query returns tuples. Fortunately, we can squeeze another boolean
into struct SPIExecuteOptions without an ABI break, since there's
padding space there. (It's unlikely that any extensions would
already be using this new struct, but preserving ABI in v14 seems
like a smart idea anyway.)
Within spi.c, it seemed like _SPI_execute_plan's parameter list
was already ridiculously long, and I didn't want to make it longer.
So I thought of passing SPIExecuteOptions down as-is, allowing that
parameter list to become much shorter. This makes the patch a bit
more invasive than it might otherwise be, but it's all internal to
spi.c, so that seems fine.
Per report from Marc Bachmann. Back-patch to v14 where the
faulty code came in.
Discussion: https://postgr.es/m/1F2F75F0-27DF-406F-848D-8B50C7EEF06A@gmail.com
Commit 84f5c2908 forgot to consider the possibility that
EnsurePortalSnapshotExists could run inside a subtransaction with
lifespan shorter than the Portal's. In that case, the new active
snapshot would be popped at the end of the subtransaction, leaving
a dangling pointer in the Portal, with mayhem ensuing.
To fix, make sure the ActiveSnapshot stack entry is marked with
the same subtransaction nesting level as the associated Portal.
It's certainly safe to do so since we won't be here at all unless
the stack is empty; hence we can't create an out-of-order stack.
Let's also apply this logic in the case where PortalRunUtility
sets portalSnapshot, just to be sure that path can't cause similar
problems. It's slightly less clear that that path can't create
an out-of-order stack, so add an assertion guarding it.
Report and patch by Bertrand Drouvot (with kibitzing by me).
Back-patch to v11, like the previous commit.
Discussion: https://postgr.es/m/ff82b8c5-77f4-3fe7-6028-fcf3303e82dd@amazon.com