The current setup assumes that commands for lz4, zstd and gzip always
exist by default if not enforced by a user's environment. However,
vcpkg, as one example, installs libraries but no binaries, so this
default setup to assume that a command should always be present would
cause failures. This commit improves the detection of such external
commands as follows:
* If a ENV value is available, trust the environment/user and use it.
* If a ENV value is not available, check its execution by looking in the
current PATH, by launching a simple "$command --version" (that should be
portable enough).
** On execution failure, ignore ENV{command}.
** On execution success, set ENV{command} = "$command".
Note that this new rule applies to gzip, lz4 and zstd but not tar that
we assume will always exist. Those commands are set up in the
environment only when using bincheck and taptest. The CI includes all
those commands and I have checked that their setup is correct there. I
have also tested this change in a MSVC environment where we have none of
those commands.
While on it, remove the references to lz4 from the documentation and
vcregress.pl in ~v13. --with-lz4 has been added in v14~ so there is no
point to have this information in these older branches.
Reported-by: Andrew Dunstan
Reviewed-by: Andrew Dunstan
Discussion: https://postgr.es/m/14402151-376b-a57a-6d0c-10ad12608e12@dunslane.net
Backpatch-through: 10
Crash recovery on standby may encounter missing directories when
replaying create database WAL records. Prior to this patch, the standby
would fail to recover in such a case. However, the directories could be
legitimately missing. Consider a sequence of WAL records as follows:
CREATE DATABASE
DROP DATABASE
DROP TABLESPACE
If, after replaying the last WAL record and removing the tablespace
directory, the standby crashes and has to replay the create database
record again, the crash recovery must be able to move on.
This patch adds a mechanism similar to invalid-page tracking, to keep a
tally of missing directories during crash recovery. If all the missing
directory references are matched with corresponding drop records at the
end of crash recovery, the standby can safely continue following the
primary.
Backpatch to 13, at least for now. The bug is older, but fixing it in
older branches requires more careful study of the interactions with
commit e6d8069522c8, which appeared in 13.
A new TAP test file is added to verify the condition. However, because
it depends on commit d6d317dbf615, it can only be added to branch
master. I (Álvaro) manually verified that the code behaves as expected
in branch 14. It's a bit nervous-making to leave the code uncovered by
tests in older branches, but leaving the bug unfixed is even worse.
Also, the main reason this fix took so long is precisely that we
couldn't agree on a good strategy to approach testing for the bug, so
perhaps this is the best we can do.
Diagnosed-by: Paul Guo <paulguo@gmail.com>
Author: Paul Guo <paulguo@gmail.com>
Author: Kyotaro Horiguchi <horikyota.ntt@gmail.com>
Author: Asim R Praveen <apraveen@pivotal.io>
Discussion: https://postgr.es/m/CAEET0ZGx9AvioViLf7nbR_8tH9-=27DN5xWJ2P9-ROH16e4JUA@mail.gmail.com
GCC 12 complains that set_stack_base is storing the address of
a local variable in a long-lived pointer. This is an entirely
reasonable warning (indeed, it just helped us find a bug);
but that behavior is intentional here. We can work around it
by using __builtin_frame_address(0) instead of a specific local
variable; that produces an address a dozen or so bytes different,
in my testing, but we don't care about such a small difference.
Maybe someday a compiler lacking that function will start to issue
a similar warning, but we'll worry about that when it happens.
Patch by me, per a suggestion from Andres Freund. Back-patch to
v12, which is as far back as the patch will go without some pain.
(Recently-established project policy would permit a back-patch as
far as 9.2, but I'm disinclined to expend the work until GCC 12
is much more widespread.)
Discussion: https://postgr.es/m/3773792.1645141467@sss.pgh.pa.us
Commit 6a2a70a02 supposed that any platform having <sys/epoll.h>
would also have <sys/signalfd.h>. It turns out there are still a
few people using platforms where that's not so, so we'd better make
a separate configure probe for it. But since it took this long to
notice, I'm content with the decision to not have a separate code
path for epoll-only machines; we'll just fall back to using poll()
for these stragglers.
Per gripe from Gabriela Serventi. Back-patch to v14 where this
code came in.
Discussion: https://postgr.es/m/CAHOHWE-JjJDfcYuLAAEO7Jk07atFAU47z8TzHzg71gbC0aMy=g@mail.gmail.com
This reverts commits ab27df2, af8d530 and 3a0cced, that introduced
pg_cryptohash_error(). In order to make the core code able to pass down
the new error types that this introduced, some of the MD5-related
routines had to be reworked, causing an ABI breakage, but we found that
some external extensions rely on them. Maintaining compatibility
outweights the error report benefits, so just revert the change in v14.
Reported-by: Laurenz Albe
Discussion: https://postgr.es/m/9f0c0a96d28cf14fc87296bbe67061c14eb53ae8.camel@cybertec.at
The existing cryptohash facility was causing problems in some code paths
related to MD5 (frontend and backend) that relied on the fact that the
only type of error that could happen would be an OOM, as the MD5
implementation used in PostgreSQL ~13 (the in-core implementation is
used when compiling with or without OpenSSL in those older versions),
could fail only under this circumstance.
The new cryptohash facilities can fail for reasons other than OOMs, like
attempting MD5 when FIPS is enabled (upstream OpenSSL allows that up to
1.0.2, Fedora and Photon patch OpenSSL 1.1.1 to allow that), so this
would cause incorrect reports to show up.
This commit extends the cryptohash APIs so as callers of those routines
can fetch more context when an error happens, by using a new routine
called pg_cryptohash_error(). The error states are stored within each
implementation's internal context data, so as it is possible to extend
the logic depending on what's suited for an implementation. The default
implementation requires few error states, but OpenSSL could report
various issues depending on its internal state so more is needed in
cryptohash_openssl.c, and the code is shaped so as we are always able to
grab the necessary information.
The core code is changed to adapt to the new error routine, painting
more "const" across the call stack where the static errors are stored,
particularly in authentication code paths on variables that provide
log details. This way, any future changes would warn if attempting to
free these strings. The MD5 authentication code was also a bit blurry
about the handling of "logdetail" (LOG sent to the postmaster), so
improve the comments related that, while on it.
The origin of the problem is 87ae969, that introduced the centralized
cryptohash facility. Extra changes are done for pgcrypto in v14 for the
non-OpenSSL code path to cope with the improvements done by this
commit.
Reported-by: Michael Mühlbeyer
Author: Michael Paquier
Reviewed-by: Tom Lane
Discussion: https://postgr.es/m/89B7F072-5BBE-4C92-903E-D83E865D9367@trivadis.com
Backpatch-through: 14
Instead of using a hardcoded or default path to the perl file the .bat
file is a wrapper for, we use a path that means the file is found in
the same directory as the .bat file.
Patch by Anton Voloshin, slightly tweaked by me.
Backpatch to all live branches
Discussion: https://postgr.es/m/2b7a674b-5fb0-d264-75ef-ecc7a31e54f8@postgrespro.ru
edc2332 has introduced in vcregress.pl some control on the environment
variables LZ4, TAR and GZIP_PROGRAM to allow any TAP tests to be able
use those commands. This makes the settings more consistent with
src/Makefile.global.in, as the same default gets used for Make and MSVC
builds.
Each parameter can be changed in buildenv.pl, but as a default gets
assigned after loading buldenv.pl, it is not possible to unset any of
these, and using an empty value would not work with "||=" either. As
some environments may not have a compatible command in their PATH (tar
coming from MinGW is an issue, for one), this could break tests without
an exit path to bypass any failing test. This commit changes things so
as the default values for LZ4, TAR and GZIP_PROGRAM are assigned before
loading buildenv.pl, not after. This way, we keep the same amount of
compatibility as a GNU build with the same defaults, and it becomes
possible to unset any of those values.
While on it, this adds some documentation about those three variables in
the section dedicated to the TAP tests for MSVC.
Per discussion with Andrew Dunstan.
Discussion: https://postgr.es/m/YbGYe483803il3X7@paquier.xyz
Backpatch-through: 10
Certain settings from configuration or the Makefile infrastructure are
used by the TAP tests, but were not being set up by vcregress.pl. This
remedies those omissions. This should increase test coverage, especially
on the buildfarm.
Reviewed by Noah Misch
Discussion: https://postgr.es/m/17093da5-e40d-8335-d53a-2bd803fc38b0@dunslane.net
Backpatch to all live branches.
Documentation and any code paths related to VS are updated to keep the
whole consistent. Similarly to 2017 and 2019, the version of VS and the
version of nmake that we use to determine which code paths to use for
the build are still inconsistent in their own way.
Backpatch down to 10, so as buildfarm members are able to use this new
version of Visual Studio on all the stable branches supported.
Author: Hans Buschmann
Discussion: https://postgr.es/m/1633101364685.39218@nidsa.net
Backpatch-through: 10
CIC and REINDEX CONCURRENTLY assume backends see their catalog changes
no later than each backend's next transaction start. That failed to
hold when a backend absorbed a relevant invalidation in the middle of
running RelationBuildDesc() on the CIC index. Queries that use the
resulting index can silently fail to find rows. Fix this for future
index builds by making RelationBuildDesc() loop until it finishes
without accepting a relevant invalidation. It may be necessary to
reindex to recover from past occurrences; REINDEX CONCURRENTLY suffices.
Back-patch to 9.6 (all supported versions).
Noah Misch and Andrey Borodin, reviewed (in earlier versions) by Andres
Freund.
Discussion: https://postgr.es/m/20210730022548.GA1940096@gust.leadboat.com
For non-MSVC builds this is make's $(CURDIR), while for MSVC builds it
is $topdir/$Config/$module. The directory is added as the second element
in the PATH, so that the install location takes precedence, but the
added PATH element takes precedence over the rest of the PATH.
The reason for this is to allow tests to find built products that are
not installed, such as the libpq_pipeline test driver.
The libpq_pipeline test is adjusted to take advantage of this.
Based on a suggestion from Andres Freund.
Backpatch to release 14.
Discussion: https://postgr.es/m/4941f5a5-2d50-1a0e-6701-14c5fefe92d6@dunslane.net
The build scripts of Visual Studio would fail to detect properly a 3.0.0
build as the check on the second digit was failing. This is adjusted
where needed, allowing the builds to complete. Note that the MSIs of
OpenSSL mentioned in the documentation have not changed any library
names for Win32 and Win64, making this change straight-forward.
Reported-by: htalaco, via github
Reviewed-by: Daniel Gustafsson
Discussion: https://postgr.es/m/YW5XKYkq6k7OtrFq@paquier.xyz
Backpatch-through: 9.6
Physical replication always ships WAL segment files to replicas once
they are complete. This is a problem if one WAL record is split across
a segment boundary and the primary server crashes before writing down
the segment with the next portion of the WAL record: WAL writing after
crash recovery would happily resume at the point where the broken record
started, overwriting that record ... but any standby or backup may have
already received a copy of that segment, and they are not rewinding.
This causes standbys to stop following the primary after the latter
crashes:
LOG: invalid contrecord length 7262 at A8/D9FFFBC8
because the standby is still trying to read the continuation record
(contrecord) for the original long WAL record, but it is not there and
it will never be. A workaround is to stop the replica, delete the WAL
file, and restart it -- at which point a fresh copy is brought over from
the primary. But that's pretty labor intensive, and I bet many users
would just give up and re-clone the standby instead.
A fix for this problem was already attempted in commit 515e3d84a0b5, but
it only addressed the case for the scenario of WAL archiving, so
streaming replication would still be a problem (as well as other things
such as taking a filesystem-level backup while the server is down after
having crashed), and it had performance scalability problems too; so it
had to be reverted.
This commit fixes the problem using an approach suggested by Andres
Freund, whereby the initial portion(s) of the split-up WAL record are
kept, and a special type of WAL record is written where the contrecord
was lost, so that WAL replay in the replica knows to skip the broken
parts. With this approach, we can continue to stream/archive segment
files as soon as they are complete, and replay of the broken records
will proceed across the crash point without a hitch.
Because a new type of WAL record is added, users should be careful to
upgrade standbys first, primaries later. Otherwise they risk the standby
being unable to start if the primary happens to write such a record.
A new TAP test that exercises this is added, but the portability of it
is yet to be seen.
This has been wrong since the introduction of physical replication, so
backpatch all the way back. In stable branches, keep the new
XLogReaderState members at the end of the struct, to avoid an ABI
break.
Author: Álvaro Herrera <alvherre@alvh.no-ip.org>
Reviewed-by: Kyotaro Horiguchi <horikyota.ntt@gmail.com>
Reviewed-by: Nathan Bossart <bossartn@amazon.com>
Discussion: https://postgr.es/m/202108232252.dh7uxf6oxwcy@alvherre.pgsql
Session statistics, as introduced by 960869da08, had several shortcomings:
- an additional GetCurrentTimestamp() call that also impaired the accuracy of
the data collected
This can be avoided by passing the current timestamp we already have in
pgstat_report_stat().
- an additional statistics UDP packet sent every 500ms
This is solved by adding the new statistics to PgStat_MsgTabstat.
This is conceptually ugly, because session statistics are not
table statistics. But the struct already contains data unrelated
to tables, so there is not much damage done.
Connection and disconnection are reported in separate messages, which
reduces the number of additional messages to two messages per session and a
slight increase in PgStat_MsgTabstat size (but the same number of table
stats fit).
- Session time computation could overflow on systems where long is 32 bit.
Reported-By: Andres Freund <andres@anarazel.de>
Author: Andres Freund <andres@anarazel.de>
Author: Laurenz Albe <laurenz.albe@cybertec.at>
Discussion: https://postgr.es/m/20210801205501.nyxzxoelqoo4x2qc%40alap3.anarazel.de
Backpatch: 14-, where the feature was introduced.
This is a combined revert of the following commits:
- c3826f8, a refactoring piece that moved the hex decoding code to
src/common/. This code was cleaned up by aef8948, as it originally
included no overflow checks in the same way as the base64 routines in
src/common/ used by SCRAM, making it unsafe for its purpose.
- aef8948, a more advanced refactoring of the hex encoding/decoding code
to src/common/ that added sanity checks on the result buffer for hex
decoding and encoding. As reported by Hans Buschmann, those overflow
checks are expensive, and it is possible to see a performance drop in
the decoding/encoding of bytea or LOs the longer they are. Simple SQLs
working on large bytea values show a clear difference in perf profile.
- ccf4e27, a cleanup made possible by aef8948.
The reverts of all those commits bring back the performance of hex
decoding and encoding back to what it was in ~13. Fow now and
post-beta3, this is the simplest option.
Reported-by: Hans Buschmann
Discussion: https://postgr.es/m/1629039545467.80333@nidsa.net
Backpatch-through: 14
Autoconf's AC_CHECK_DECLS() always defines HAVE_DECL_whatever
as 1 or 0, but some of the entries in msvc/Solution.pm showed
such symbols as "undef" instead of 0. Fix that for consistency.
There's no live bug in current usages AFAICS, but it's not hard
to imagine one creeping in if more-complex #if tests get added.
Back-patch to v13, which is as far back as Solution.pm contains
this data. The inconsistency still exists in the manually-filled
pg_config_ext.h.win32 files of older branches; but as long as the
problem is only latent, it doesn't seem worth the trouble to
clean things up there.
Discussion: https://postgr.es/m/3185430.1626133592@sss.pgh.pa.us
"Result Cache" was never a great name for this node, but nobody managed
to come up with another name that anyone liked enough. That was until
David Johnston mentioned "Node Memoization", which Tom Lane revised to
just "Memoize". People seem to like "Memoize", so let's do the rename.
Reviewed-by: Justin Pryzby
Discussion: https://postgr.es/m/20210708165145.GG1176@momjian.us
Backpatch-through: 14, where Result Cache was introduced
Apple's mechanism for dealing with functions that are available
in only some OS versions confuses AC_CHECK_FUNCS, and therefore
AC_REPLACE_FUNCS. We can use AC_CHECK_DECLS instead, so long as
we enable -Werror=unguarded-availability-new. This allows people
compiling for macOS to control whether or not preadv/pwritev are
used by setting MACOSX_DEPLOYMENT_TARGET, rather than supplying
a back-rev SDK. (Of course, the latter still works, too.)
James Hilliard
Discussion: https://postgr.es/m/20210122193230.25295-1-james.hilliard1@gmail.com
The separate libldap_r is gone and libldap itself is now always
thread-safe. Unfortunately there seems no easy way to tell by
inspection whether libldap is thread-safe, so we have to take
it on faith that libldap is thread-safe if there's no libldap_r.
That should be okay, as it appears that libldap_r was a standard
part of the installation going back at least 20 years.
Report and patch by Adrian Ho. Back-patch to all supported
branches, since people might try to build any of them with
a newer OpenLDAP.
Discussion: https://postgr.es/m/17083-a19190d9591946a7@postgresql.org
The version string is grabbed from PACKAGE_VERSION in pg_config.h in the
MSVC build since 8f4fb4c6, but an error message referenced a variable
that existed before that. This had no consequences except if one messes
up enough with the version number of the build.
Author: Anton Voloshin
Discussion: https://postgr.es/m/af79ee1b-9962-b299-98e1-f90a289e19e6@postgrespro.ru
Backpatch-through: 13
Add a .git-blame-ignore-revs file with a list of pgindent, pgperlyidy,
and reformat-dat-files commit hashes. Postgres hackers that configure
git to use the ignore file will get git-blame output that avoids
attributing line changes to the ignored indent commits. This makes
git-blame output much easier to work with in practice.
Author: Peter Geoghegan <pg@bowt.ie>
Reviewed-By: Tom Lane <tgl@sss.pgh.pa.us>
Discussion: https://postgr.es/m/CAH2-Wz=cVh3GHTP6SdLU-Gnmt2zRdF8vZkcrFdSzXQ=WhbWm9Q@mail.gmail.com
The deliverables of upstream Kerberos on Windows are installed with
paths that do not match our MSVC scripts. First, the include folder was
named "inc/" in our scripts, but the upstream MSIs use "include/".
Second, the build would fail with 64-bit environments as the libraries
are named differently.
This commit adjusts the MSVC scripts to be compatible with the latest
installations of upstream, and I have checked that the compilation was
able to work with the 32-bit and 64-bit installations.
Special thanks to Kondo Yuta for the help in investigating the situation
in hamerkop, which had an incorrect configuration for the GSS
compilation.
Reported-by: Brian Ye
Discussion: https://postgr.es/m/162128202219.27274.12616756784952017465@wrigleys.postgresql.org
Backpatch-through: 9.6
Recently we refactored things so that pg_regress makes the
"testtablespace" subdirectory used by the core regression tests,
instead of doing that in the makefiles. That had the undesirable
side effect of making such a subdirectory in every directory that
has "input" or "output" test files. Since these subdirectories
remain empty, git doesn't complain about them, but nonetheless
they're clutter.
To fix, invent an explicit --make-testtablespace-dir switch,
so that pg_regress only makes the subdirectory when explicitly
told to.
Discussion: https://postgr.es/m/2854388.1621284789@sss.pgh.pa.us
Also "make reformat-dat-files".
The only change worthy of note is that pgindent messed up the formatting
of launcher.c's struct LogicalRepWorkerId, which led me to notice that
that struct wasn't used at all anymore, so I just took it out.
I copied the existing spelling of "--max_connections", but
that's just wrong :-(. Evidently setting $ENV{MAX_CONNECTIONS}
has never actually worked in this script. Given the lack of
complaints, it's probably not worth back-patching a fix.
Per buildfarm.
Discussion: https://postgr.es/m/899209.1620759506@sss.pgh.pa.us
Having to maintain two lists of regression test scripts has proven
annoyingly error-prone. We can achieve the effect of the
serial_schedule by running the parallel_schedule with
"--max_connections=1"; so do that and remove serial_schedule.
This causes cosmetic differences in the progress output, but it
doesn't seem worth restructuring pg_regress to avoid that.
Discussion: https://postgr.es/m/899209.1620759506@sss.pgh.pa.us
Since its introduction in bbe0a81, compression of table data supports
LZ4, but nothing had been done within the MSVC scripts to allow users to
build the code with this library.
This commit closes the gap by extending the MSVC scripts to be able to
build optionally with LZ4. Getting libraries that can be used for
compilation and execution is possible as LZ4 can be compiled down to
MSVC 2010 using its source tarball. MinGW may require extra efforts to
be able to work, and I have been able to test this only with MSVC, still
this is better than nothing to give users a way to test the feature on
Windows.
Author: Dilip Kumar
Reviewed-by: Michael Paquier
Discussion: https://postgr.es/m/YJPdNeF68XpwDDki@paquier.xyz
This set of commits has some bugs with known fixes, but at this late
stage in the release cycle it seems best to revert and resubmit next
time, along with some new automated test coverage for this whole area.
Commits reverted:
dc88460c: Doc: Review for "Optionally prefetch referenced data in recovery."
1d257577: Optionally prefetch referenced data in recovery.
f003d9f8: Add circular WAL decoding buffer.
323cbe7c: Remove read_page callback from XLogReader.
Remove the new GUC group WAL_RECOVERY recently added by a55a9847, as the
corresponding section of config.sgml is now reverted.
Discussion: https://postgr.es/m/CAOuzzgrn7iKnFRsB4MHp3UisEQAGgZMbk_ViTN4HV4-Ksq8zCg%40mail.gmail.com
Design problems were discovered in the handling of composite types and
record types that would cause some relevant versions not to be recorded.
Misgivings were also expressed about the use of the pg_depend catalog
for this purpose. We're out of time for this release so we'll revert
and try again.
Commits reverted:
1bf946bd: Doc: Document known problem with Windows collation versions.
cf002008: Remove no-longer-relevant test case.
ef387bed: Fix bogus collation-version-recording logic.
0fb0a050: Hide internal error for pg_collation_actual_version(<bad OID>).
ff942057: Suppress "warning: variable 'collcollate' set but not used".
d50e3b1f: Fix assertion in collation version lookup.
f24b1569: Rethink extraction of collation dependencies.
257836a7: Track collation versions for indexes.
cd6f479e: Add pg_depend.refobjversion.
7d1297df: Remove pg_collation.collversion.
Discussion: https://postgr.es/m/CA%2BhUKGLhj5t1fcjqAu8iD9B3ixJtsTNqyCCD4V0aTO9kAKAjjA%40mail.gmail.com
Update checklist to reflect current practice:
* The platform-specific FAQ files are long gone.
* We've never routinely updated the libbind code we borrowed, either,
and there seems no reason to start now.
* Explain current practice of running pgindent twice per cycle.
Discussion: https://postgr.es/m/4038398.1620238684@sss.pgh.pa.us
Previously, we used to use the array of size max_replication_slots to
store stats for replication slots. But that had two problems in the cases
where a message for dropping a slot gets lost: 1) the stats for the new
slot are not recorded if the array is full and 2) writing beyond the end
of the array if the user reduces the max_replication_slots.
This commit uses HTAB for replication slot statistics, resolving both
problems. Now, pgstat_vacuum_stat() search for all the dead replication
slots in stats hashtable and tell the collector to remove them. To avoid
showing the stats for the already-dropped slots, pg_stat_replication_slots
view searches slot stats by the slot name taken from pg_replication_slots.
Also, we send a message for creating a slot at slot creation, initializing
the stats. This reduces the possibility that the stats are accumulated
into the old slot stats when a message for dropping a slot gets lost.
Reported-by: Andres Freund
Author: Sawada Masahiko, test case by Vignesh C
Reviewed-by: Amit Kapila, Vignesh C, Dilip Kumar
Discussion: https://postgr.es/m/20210319185247.ldebgpdaxsowiflw@alap3.anarazel.de
This commit introduces new foreign data wrapper API for TRUNCATE.
It extends TRUNCATE command so that it accepts foreign tables as
the targets to truncate and invokes that API. Also it extends postgres_fdw
so that it can issue TRUNCATE command to foreign servers, by adding
new routine for that TRUNCATE API.
The information about options specified in TRUNCATE command, e.g.,
ONLY, CACADE, etc is passed to FDW via API. The list of foreign tables to
truncate is also passed to FDW. FDW truncates the foreign data sources
that the passed foreign tables specify, based on those information.
For example, postgres_fdw constructs TRUNCATE command using them
and issues it to the foreign server.
For performance, TRUNCATE command invokes the FDW routine for
TRUNCATE once per foreign server that foreign tables to truncate belong to.
Author: Kazutaka Onishi, Kohei KaiGai, slightly modified by Fujii Masao
Reviewed-by: Bharath Rupireddy, Michael Paquier, Zhihong Yu, Alvaro Herrera, Stephen Frost, Ashutosh Bapat, Amit Langote, Daniel Gustafsson, Ibrar Ahmed, Fujii Masao
Discussion: https://postgr.es/m/CAOP8fzb_gkReLput7OvOK+8NHgw-RKqNv59vem7=524krQTcWA@mail.gmail.com
Discussion: https://postgr.es/m/CAJuF6cMWDDqU-vn_knZgma+2GMaout68YUgn1uyDnexRhqqM5Q@mail.gmail.com
Introduce a new GUC recovery_prefetch, disabled by default. When
enabled, look ahead in the WAL and try to initiate asynchronous reading
of referenced data blocks that are not yet cached in our buffer pool.
For now, this is done with posix_fadvise(), which has several caveats.
Better mechanisms will follow in later work on the I/O subsystem.
The GUC maintenance_io_concurrency is used to limit the number of
concurrent I/Os we allow ourselves to initiate, based on pessimistic
heuristics used to infer that I/Os have begun and completed.
The GUC wal_decode_buffer_size is used to limit the maximum distance we
are prepared to read ahead in the WAL to find uncached blocks.
Reviewed-by: Alvaro Herrera <alvherre@2ndquadrant.com> (parts)
Reviewed-by: Andres Freund <andres@anarazel.de> (parts)
Reviewed-by: Tomas Vondra <tomas.vondra@2ndquadrant.com> (parts)
Tested-by: Tomas Vondra <tomas.vondra@2ndquadrant.com>
Tested-by: Jakub Wartak <Jakub.Wartak@tomtom.com>
Tested-by: Dmitry Dolgov <9erthalion6@gmail.com>
Tested-by: Sait Talha Nisanci <Sait.Nisanci@microsoft.com>
Discussion: https://postgr.es/m/CA%2BhUKGJ4VJN8ttxScUFM8dOKX0BrBiboo5uz1cq%3DAovOddfHpA%40mail.gmail.com
First, don't perform database access while holding a buffer lock.
When checking a heap, we can validate that TOAST pointers are sane by
performing a scan on the TOAST index and looking up the chunks that
correspond to each value ID that appears in a TOAST poiner in the main
table. But, to do that while holding a buffer lock at least risks
causing other backends to wait uninterruptibly, and probably can cause
undetected and uninterruptible deadlocks. So, instead, make a list of
checks to perform while holding the lock, and then perform the checks
after releasing it.
Second, adjust things so that we don't try to follow TOAST pointers
for tuples that are already eligible to be pruned. The TOAST tuples
become eligible for pruning at the same time that the main tuple does,
so trying to check them may lead to spurious reports of corruption,
as observed in the buildfarm. The necessary infrastructure to decide
whether or not the tuple being checked is prunable was added by
commit 3b6c1259f9ca8e21860aaf24ec6735a8e5598ea0, but it wasn't
actually used for its intended purpose prior to this patch.
Mark Dilger, adjusted by me to avoid a memory leak.
Discussion: http://postgr.es/m/AC5479E4-6321-473D-AC92-5EC36299FBC2@enterprisedb.com
Retry the call to heap_prune_page() in rare cases where there is
disagreement between the heap_prune_page() call and the call to
HeapTupleSatisfiesVacuum() that immediately follows. Disagreement is
possible when a concurrently-aborted transaction makes a tuple DEAD
during the tiny window between each step. This was the only case where
a tuple considered DEAD by VACUUM still had storage following pruning.
VACUUM's definition of dead tuples is now uniformly simple and
unambiguous: dead tuples from each page are always LP_DEAD line pointers
that were encountered just after we performed pruning (and just before
we considered freezing remaining items with tuple storage).
Eliminating the tupgone=true special case enables INDEX_CLEANUP=off
style skipping of index vacuuming that takes place based on flexible,
dynamic criteria. The INDEX_CLEANUP=off case had to know about skipping
indexes up-front before now, due to a subtle interaction with the
special case (see commit dd695979) -- this was a special case unto
itself. Now there are no special cases. And so now it won't matter
when or how we decide to skip index vacuuming: it won't affect how
pruning behaves, and it won't be affected by any of the implementation
details of pruning or freezing.
Also remove XLOG_HEAP2_CLEANUP_INFO records. These are no longer
necessary because we now rely entirely on heap pruning taking care of
recovery conflicts. There is no longer any need to generate recovery
conflicts for DEAD tuples that pruning just missed. This also means
that heap vacuuming now uses exactly the same strategy for recovery
conflicts as index vacuuming always has: REDO routines never need to
process a latestRemovedXid from the WAL record, since earlier REDO of
the WAL record from pruning is sufficient in all cases. The generic
XLOG_HEAP2_CLEAN record type is now split into two new record types to
reflect this new division (these are called XLOG_HEAP2_PRUNE and
XLOG_HEAP2_VACUUM).
Also stop acquiring a super-exclusive lock for heap pages when they're
vacuumed during VACUUM's second heap pass. A regular exclusive lock is
enough. This is correct because heap page vacuuming is now strictly a
matter of setting the LP_DEAD line pointers to LP_UNUSED. No other
backend can have a pointer to a tuple located in a pinned buffer that
can be invalidated by a concurrent heap page vacuum operation.
Heap vacuuming can now be thought of as conceptually similar to index
vacuuming and conceptually dissimilar to heap pruning. Heap pruning now
has sole responsibility for anything involving the logical contents of
the database (e.g., managing transaction status information, recovery
conflicts, considering what to do with HOT chains). Index vacuuming and
heap vacuuming are now only concerned with recycling garbage items from
physical data structures that back the logical database.
Bump XLOG_PAGE_MAGIC due to pruning and heap page vacuum WAL record
changes.
Credit for the idea of retrying pruning a page to avoid the tupgone case
goes to Andres Freund.
Author: Peter Geoghegan <pg@bowt.ie>
Reviewed-By: Andres Freund <andres@anarazel.de>
Reviewed-By: Masahiko Sawada <sawada.mshk@gmail.com>
Discussion: https://postgr.es/m/CAH2-WznneCXTzuFmcwx_EyRQgfsfJAAsu+CsqRFmFXCAar=nJw@mail.gmail.com
Similarly to the cryptohash implementations, this refactors the existing
HMAC code into a single set of APIs that can be plugged with any crypto
libraries PostgreSQL is built with (only OpenSSL currently). If there
is no such libraries, a fallback implementation is available. Those new
APIs are designed similarly to the existing cryptohash layer, so there
is no real new design here, with the same logic around buffer bound
checks and memory handling.
HMAC has a dependency on cryptohashes, so all the cryptohash types
supported by cryptohash{_openssl}.c can be used with HMAC. This
refactoring is an advantage mainly for SCRAM, that included its own
implementation of HMAC with SHA256 without relying on the existing
crypto libraries even if PostgreSQL was built with their support.
This code has been tested on Windows and Linux, with and without
OpenSSL, across all the versions supported on HEAD from 1.1.1 down to
1.0.1. I have also checked that the implementations are working fine
using some sample results, a custom extension of my own, and doing
cross-checks across different major versions with SCRAM with the client
and the backend.
Author: Michael Paquier
Reviewed-by: Bruce Momjian
Discussion: https://postgr.es/m/X9m0nkEJEzIPXjeZ@paquier.xyz
To allow inserts in parallel-mode this feature has to ensure that all the
constraints, triggers, etc. are parallel-safe for the partition hierarchy
which is costly and we need to find a better way to do that. Additionally,
we could have used existing cached information in some cases like indexes,
domains, etc. to determine the parallel-safety.
List of commits reverted, in reverse chronological order:
ed62d3737c Doc: Update description for parallel insert reloption.
c8f78b6161 Add a new GUC and a reloption to enable inserts in parallel-mode.
c5be48f092 Improve FK trigger parallel-safety check added by 05c8482f7f.
e2cda3c20a Fix use of relcache TriggerDesc field introduced by commit 05c8482f7f.
e4e87a32cc Fix valgrind issue in commit 05c8482f7f.
05c8482f7f Enable parallel SELECT for "INSERT INTO ... SELECT ...".
Discussion: https://postgr.es/m/E1lMiB9-0001c3-SY@gemulon.postgresql.org
Until now the bsearch_arg function was used only in extended statistics
code, so it was defined in that code. But we already have qsort_arg in
src/port, so let's move it next to it.
This seems to have been just copied-and-pasted from some other
header checks. But our C code is entirely unprepared to support
such a header name, so it's only wasting cycles to look for it.
If we did need to support it, some #ifdefs would be required.
(A quick trawl at codesearch.debian.net finds some packages that
reference lz4/lz4.h; but they use *only* that spelling, and
appear to be intending to reference their own copy rather than
a system-level installation of liblz4. There's no evidence of
freestanding installations that require this spelling.)
Discussion: https://postgr.es/m/457962.1616362509@sss.pgh.pa.us
It's not okay to just shove the pkg_config results right into our
build flags, for a couple different reasons:
* This fails to maintain the separation between CPPFLAGS and CFLAGS,
as well as that between LDFLAGS and LIBS. (The CPPFLAGS angle is,
I believe, the reason for warning messages reported when building
with MacPorts' liblz4.)
* If pkg_config emits anything other than -I/-D/-L/-l switches,
it's highly unlikely that we want to absorb those. That'd be more
likely to break the build than do anything helpful. (Even the -D
case is questionable; but we're doing that for libxml2, so I kept it.)
Also, it's not okay to skip doing an AC_CHECK_LIB probe, as
evidenced by recent build failure on topminnow; that should
have been caught at configure time.
Model fixes for this on configure's libxml2 support.
It appears that somebody overlooked an autoheader run, too.
Discussion: https://postgr.es/m/20210119190720.GL8560@telsasoft.com