Commit Graph

11336 Commits

Author SHA1 Message Date
a04ddd077e Improve support for ExplainOneQuery() hook
There is a hook called ExplainOneQuery_hook that gives modules the
possibility to plug into this code path, but, like utility.c for utility
statement execution, there is no corresponding "standard" routine in
the case of EXPLAIN executed for one Query.

This commit adds a new standard_ExplainOneQuery() in explain.c, which is
able to run explain on a non-utility Query without calling its hook.

Per the feedback received from a couple of hackers, this change gives
the possibility to cut a few hundred lines of code in some of the
popular out-of-core modules as these maintained a copy of
ExplainOneQuery(), adding custom extra information at the beginning or
the end of the EXPLAIN output.

Author: Mats Kindahl
Reviewed-by: Aleksander Alekseev, Jelte Fennema-Nio, Andrei Lepikhov
Discussion: https://postgr.es/m/CA+14427V_B4EAoC_o-iYYucRdMSOTfpuH9k-QbexffY1HYJBiA@mail.gmail.com
2024-03-11 08:40:40 +09:00
f696c0cd5f Catalog changes preparing for builtin collation provider.
Rename pg_collation.colliculocale to colllocale, and
pg_database.daticulocale to datlocale. These names reflects that the
fields will be useful for the upcoming builtin provider as well, not
just for ICU.

This is purely a rename; no changes to the meaning of the fields.

Discussion: https://postgr.es/m/ff4c2f2f9c8fc7ca27c1c24ae37ecaeaeaff6b53.camel%40j-davis.com
Reviewed-by: Peter Eisentraut
2024-03-09 14:48:18 -08:00
33ee2550d3 Fix type signedness error in commit 5c40364dd6.
Use ssize_t instead of size_t.

Discussion: https://postgr.es/m/b20d6d97-7338-48ea-ba33-837a1c8ef98e@iki.fi
Reported-by: Heikki Linnakangas
2024-03-08 16:00:46 -08:00
270af6f0df Admit deferrable PKs into rd_pkindex, but flag them as such
... and in particular don't return them as replica identity.

The motivation for this change is letting the primary keys be seen by
code that derives NOT NULL constraints from them, when creating
inheritance children; before this change, if you had a deferrable PK,
pg_dump would not recreate the attnotnull marking properly, because the
column would not be considered as having anything to back said marking
after dropping the throwaway NOT NULL constraint.

The reason we don't want these PKs as replica identities is that
replication can corrupt data, if the uniqueness constraint is
transiently broken.

Reported-by: Amul Sul <sulamul@gmail.com>
Reviewed-by: Dean Rasheed <dean.a.rasheed@gmail.com>
Discussion: https://postgr.es/m/CAAJ_b94QonkgsbDXofakHDnORQNgafd1y3Oa5QXfpQNJyXyQ7A@mail.gmail.com
2024-03-08 16:32:29 +01:00
4c1973fcae Avoid recursion in MemoryContext functions
You might run out of stack space with recursion, which is not nice in
functions that might be used e.g. at cleanup after transaction
abort. MemoryContext contains pointer to parent and siblings, so we
can traverse a tree of contexts iteratively, without using
stack. Refactor the functions to do that.

MemoryContextStats() still recurses, but it now has a limit to how
deep it recurses. Once the limit is reached, it prints just a summary
of the rest of the hierarchy, similar to how it summarizes contexts
with lots of children. That seems good anyway, because a context dump
with hundreds of nested contexts isn't very readable.

Report by Egor Chindyaskin and Alexander Lakhin.

Discussion: https://postgr.es/m/1672760457.940462079%40f306.i.mail.ru
Author: Heikki Linnakangas
Reviewed-by: Robert Haas, Andres Freund, Alexander Korotkov, Tom Lane
2024-03-08 13:18:30 +02:00
ab6ae62603 Fix link error for test_radixtree module on Windows
Add PGDLLIMPORT to pg_popcount32/64. In passing, fix a typo.

Diagnosis by Masahiko Sawada, patch by David Rowley

Per buildfarm members drongo and fairywren

Discussion: https://postgr.es/m/CAD21AoAMm1mQd%3Dw4PrfrKK%3DOMP8j8%3D7ntJRPF8%2B%3D10iUuvwiCA%40mail.gmail.com
Discussion: https://postgr.es/m/CAApHDvov7724UrD1Ug0D1eV%2B9Pd_x5VEQmw-6HVG9w1WdCxXPA%40mail.gmail.com
2024-03-08 10:57:40 +07:00
bf279ddd1c Introduce a new GUC 'standby_slot_names'.
This patch provides a way to ensure that physical standbys that are
potential failover candidates have received and flushed changes before
the primary server making them visible to subscribers. Doing so guarantees
that the promoted standby server is not lagging behind the subscribers
when a failover is necessary.

The logical walsender now guarantees that all local changes are sent and
flushed to the standby servers corresponding to the replication slots
specified in 'standby_slot_names' before sending those changes to the
subscriber.

Additionally, the SQL functions pg_logical_slot_get_changes,
pg_logical_slot_peek_changes and pg_replication_slot_advance are modified
to ensure that they process changes for failover slots only after physical
slots specified in 'standby_slot_names' have confirmed WAL receipt for those.

Author: Hou Zhijie and Shveta Malik
Reviewed-by: Masahiko Sawada, Peter Smith, Bertrand Drouvot, Ajin Cherian, Nisha Moond, Amit Kapila
Discussion: https://postgr.es/m/514f6f2f-6833-4539-39f1-96cd1e011f23@enterprisedb.com
2024-03-08 08:10:45 +05:30
4f8c1e7aaf Update comment of AlterTableCmd->name in parsenodes.h
Since b0483263dda0, this field can be used to store an access method
name for ALTER TABLE, but access methods were not mentioned in the
field's description.

Issue noticed while working on the area.

Discussion: https://postgr.es/m/ZeWKgCtk6xiAsDsc@paquier.xyz
2024-03-08 08:44:13 +09:00
5c40364dd6 Unicode case mapping tables and functions.
Implements Unicode simple case mapping, in which all code points map
to exactly one other code point unconditionally.

These tables are generated from UnicodeData.txt, which is already
being used by other infrastructure in src/common/unicode. The tables
are checked into the source tree, so they only need to be regenerated
when we update the Unicode version.

In preparation for the builtin collation provider, and possibly useful
for other callers.

Discussion: https://postgr.es/m/ff4c2f2f9c8fc7ca27c1c24ae37ecaeaeaff6b53.camel%40j-davis.com
Reviewed-by: Peter Eisentraut, Daniel Verite, Jeremy Schneider
2024-03-07 11:15:06 -08:00
6d470211e5 Fix description and grouping of RangeTblEntry.inh
The inh field of RangeTblEntry was doubly confusingly documented.
Some parts of the code insisted that it was only valid for
RTE_RELATION entries, other parts said the field was valid for all
entries.  Neither was quite correct.  More correctly, the field is
valid for RTE_RELATION entries but is also used in the planner for
RTE_SUBQUERY entries.  So it makes more sense to group it with other
fields that are primarily for RTE_RELATION but borrowed by
RTE_SUBQUERY.  (The exact position was chosen so that it is next to
relkind for better struct packing, and next to relid, since relid and
inh are sort of the input fields and the others are filled in later.)
Also add documentation for the planner's use at the struct definition.

Discussion: https://www.postgresql.org/message-id/6c1fbccc-85c8-40d3-b08b-4f47f2093711@eisentraut.org
2024-03-07 12:13:09 +01:00
e444ebcb85 Fix incorrect format specifier for int64
Follow-up to ee1b30f12, per buildfarm member mamba.

Discussion: https://postgr.es/m/CANWCAZYwyRMU%2BOTVOjK%3Dno1hm-W3ZQ5vrSFM1MFAaLtLydvwzA%40mail.gmail.com
2024-03-07 14:26:52 +07:00
ac234e6377 Fix redefinition of typedefs
Per buildfarm members sifaka and longfin, clang with
-Wtypedef-redefinition warns of duplicate typedefs unless building with
C11. Follow-up to ee1b30f12.

Masahiko Sawada

Discussion: https://postgr.es/m/CANWCAZauSg%3DLUbBbXhpeQtBuPifmzQNTYS6O8NsoAPz1zL-Txg%40mail.gmail.com
2024-03-07 14:11:49 +07:00
ee1b30f128 Add template for adaptive radix tree
This implements a radix tree data structure based on the design in
"The Adaptive Radix Tree: ARTful Indexing for Main-Memory Databases"
by Viktor Leis, Alfons Kemper, and ThomasNeumann, 2013. The main
technique that makes it adaptive is using several different node types,
each with a different capacity of elements, and a different algorithm
for accessing them. The nodes start small and grow/shrink as needed.

The main advantage over hash tables is efficient sorted iteration and
better memory locality when successive keys are lexicographically
close together. The implementation currently assumes 64-bit integer
keys, and traversing the tree is in general slower than a linear
probing hash table, so this is not a general-purpose associative array.

The paper describes two other techniques not implemented here,
namely "path compression" and "lazy expansion". These can further
reduce memory usage and speed up traversal, but the former would add
significant complexity and the latter requires storing the full key
with the value. We do trivially compress the path when leading bytes
of the key are zeros, however.

For value storage, we use "combined pointer/value slots", as
recommended in the paper. Values of size equal or smaller than the the
platform's pointer type are stored in the array of child pointers in
the last level node, while larger values are each stored in a separate
allocation. This is for now fixed at compile time, but it would be
fairly trivial to allow determining at runtime how variable-length
values are stored.

One innovation in our implementation compared to the ART paper is
decoupling the notion of node "size class" from "kind". The size
classes within a given node kind have the same underlying type, but
a variable capacity for children, so we can introduce additional node
sizes with little additional code.

To enable different use cases to specialize for different value types
and for shared/local memory, we use macro-templatized code generation
in the same manner as simplehash.h and sort_template.h.

Future commits will use this infrastructure for storing TIDs.

Patch by Masahiko Sawada and John Naylor, but a substantial amount of
credit is due to Andres Freund, whose proof-of-concept was a valuable
source of coding idioms and awareness of performance pitfalls, and
who reviewed earlier versions.

Discussion: https://postgr.es/m/CAD21AoAfOZvmfR0j8VmZorZjL7RhTiQdVttNuC4W-Shdc2a-AA%40mail.gmail.com
2024-03-07 12:40:11 +07:00
099ca50bd4 Revert "Fix parallel-safety check of expressions and predicate for index builds"
This reverts commit eae7be600be7, following a discussion with Tom Lane,
due to concerns that this impacts the decisions made by the planner for
the number of workers spawned based on the inlining and const-folding of
index expressions and predicate for cases that would have worked until
this commit.

Discussion: https://postgr.es/m/162802.1709746091@sss.pgh.pa.us
Backpatch-through: 12
2024-03-07 08:30:35 +09:00
ad49994538 Add Unicode property tables.
Provide functions to test for Unicode properties, such as Alphabetic
or Cased. These functions use tables derived from Unicode data files,
similar to the tables for Unicode normalization or general category,
and those tables can be updated with the 'update-unicode' build
target.

Use Unicode properties to provide functions to test for regex
character classes, like 'punct' or 'alnum'.

Infrastructure in preparation for a builtin collation provider, and
may also be useful for other callers.

Discussion: https://postgr.es/m/ff4c2f2f9c8fc7ca27c1c24ae37ecaeaeaff6b53.camel%40j-davis.com
Reviewed-by: Daniel Verite, Peter Eisentraut, Jeremy Schneider
2024-03-06 12:50:01 -08:00
de7c6fe834 Fix signedness error in 9f225e992 for gcc
The first argument of vshrq_n_s8 needs to be a signed vector type,
but it was passed unsigned. Clang is more lax with conversion, but
gcc needs a cast.

Fix by me, tested by Masahiko Sawada

Per buildfarm members splitfin, batta, widowbird, snakefly, parula,
massasauga

Discussion: https://postgr.es/m/20240306074106.mg6w4koohdlworbs%40alap3.anarazel.de
2024-03-06 15:55:55 +07:00
eae7be600b Fix parallel-safety check of expressions and predicate for index builds
As coded, the planner logic that calculates the number of parallel
workers to use for a parallel index build uses expressions and
predicates from the relcache, which are flattened for the planner by
eval_const_expressions().

As reported in the bug, an immutable parallel-unsafe function flattened
in the relcache would become a Const, which would be considered as
parallel-safe, even if the predicate or the expressions including the
function are not safe in parallel workers.  Depending on the expressions
or predicate used, this could cause the parallel build to fail.

Tests are included that check parallel index builds with parallel-unsafe
predicate and expressions.  Two routines are added to lsyscache.h to be
able to retrieve expressions and predicate of an index from its pg_index
data.

Reported-by: Alexander Lakhin
Author: Tender Wang
Reviewed-by: Jian He, Michael Paquier
Discussion: https://postgr.es/m/CAHewXN=UaAaNn9ruHDH3Os8kxLVmtWqbssnf=dZN_s9=evHUFA@mail.gmail.com
Backpatch-through: 12
2024-03-06 17:23:56 +09:00
3e76a806cb Move some bitmap logic out of bitmapset.c
Move the logic for selecting appropriate pg_bitutils.h
functions based on word size to bitmapset.h for wider
visibility.

Reviewed (in a previous version) by Tom Lane
Discussion: https://postgr.es/m/CAFBsxsFW2JjTo58jtDB%2B3sZhxMx3t-3evew8%3DAcr%2BGGhC%2BkFaA%40mail.gmail.com
2024-03-06 14:30:16 +07:00
9f225e992b Introduce helper SIMD functions for small byte arrays
vector8_min - helper for emulating ">=" semantics

vector8_highbit_mask - used to turn the result of a vector
comparison into a bitmask

Masahiko Sawada

Reviewed by Nathan Bossart, with additional adjustments by me
Discussion: https://postgr.es/m/CAFBsxsHbBm_M22gLBO%2BAZT4mfMq3L_oX3wdKZxjeNnT7fHsYMQ%40mail.gmail.com
2024-03-06 14:25:20 +07:00
d93627bcbe Add --copy-file-range option to pg_upgrade.
The copy_file_range() system call is available on at least Linux and
FreeBSD, and asks the kernel to use efficient ways to copy ranges of a
file.  Options available to the kernel include sharing block ranges
(similar to --clone mode), and pushing down block copies to the storage
layer.

For automated testing, see PG_TEST_PG_UPGRADE_MODE.  (Perhaps in a later
commit we could consider setting this mode for one of the CI targets.)

Reviewed-by: Peter Eisentraut <peter@eisentraut.org>
Discussion: https://postgr.es/m/CA%2BhUKGKe7Hb0-UNih8VD5UNZy5-ojxFb3Pr3xSBBL8qj2M2%3DdQ%40mail.gmail.com
2024-03-06 12:01:01 +13:00
e03349144b Improve field order in RangeTblEntry
When perminfoindex was added, it was just added at the end of the
block.  It would make sense to keep it closer to more related fields.
In passing, also add an inline comment, like the other fields have.
(Other field reorderings and documentation improvements in
RangeTblEntry are being discussed, but it's better not to mix them
together.)

Discussion: https://www.postgresql.org/message-id/flat/6c1fbccc-85c8-40d3-b08b-4f47f2093711%40eisentraut.org
2024-03-05 13:34:43 +01:00
030e10ff1a Rename pg_constraint.conwithoutoverlaps to conperiod
pg_constraint.conwithoutoverlaps was recently added to support primary
keys and unique constraints with the WITHOUT OVERLAPS clause.  An
upcoming patch provides the foreign-key side of this functionality,
but the syntax there is different and uses the keyword PERIOD.  It
would make sense to use the same pg_constraint field for both of
these, but then we should pick a more general name that conveys "this
constraint has a temporal/period-related feature".  conperiod works
for that and is nicely compact.  Changing this now avoids possibly
having to introduce versioning into clients.  Note there are still
some "without overlaps" variables left, which deal specifically with
the parsing of the primary key/unique constraint feature.

Author: Paul A. Jungwirth <pj@illuminatedcomputing.com>
Discussion: https://www.postgresql.org/message-id/flat/CA+renyUApHgSZF9-nd-a0+OPGharLQLO=mDHcY4_qQ0+noCUVg@mail.gmail.com
2024-03-05 11:24:17 +01:00
59825d1639 Fix buildfarm failures from 2af07e2f74.
Use GUC_ACTION_SAVE rather than GUC_ACTION_SET, necessary for working
with parallel query.

Now that the call requires more arguments, wrap the call in a new
function to avoid code duplication and offer a place for a comment.

Discussion: https://postgr.es/m/E1rhJpO-0027Wf-9L@gemulon.postgresql.org
2024-03-04 19:42:16 -08:00
2af07e2f74 Fix search_path to a safe value during maintenance operations.
While executing maintenance operations (ANALYZE, CLUSTER, REFRESH
MATERIALIZED VIEW, REINDEX, or VACUUM), set search_path to
'pg_catalog, pg_temp' to prevent inconsistent behavior.

Functions that are used for functional indexes, in index expressions,
or in materialized views and depend on a different search path must be
declared with CREATE FUNCTION ... SET search_path='...'.

This change was previously committed as 05e1737351, then reverted in
commit 2fcc7ee7af because it was too late in the cycle.

Preparation for the MAINTAIN privilege, which was previously reverted
due to search_path manipulation hazards.

Discussion: https://postgr.es/m/d4ccaf3658cb3c281ec88c851a09733cd9482f22.camel@j-davis.com
Discussion: https://postgr.es/m/E1q7j7Y-000z1H-Hr%40gemulon.postgresql.org
Discussion: https://postgr.es/m/e44327179e5c9015c8dda67351c04da552066017.camel%40j-davis.com
Reviewed-by: Greg Stark, Nathan Bossart, Noah Misch
2024-03-04 17:31:38 -08:00
2c29e7fc95 Add macro for customizing an archiving WARNING message.
Presently, if an archive module's check_configured_cb callback
returns false, a generic WARNING message is emitted, which
unfortunately provides no actionable details about the reason why
the module is not configured.  This commit introduces a macro that
archive module authors can use to add a DETAIL line to this WARNING
message.

Co-authored-by: Tung Nguyen
Reviewed-by: Daniel Gustafsson, Álvaro Herrera
Discussion: https://postgr.es/m/4109578306242a7cd5661171647e11b2%40oss.nttdata.com
2024-03-04 15:41:42 -06:00
e5bc9454e5 Explicitly list dependent types as extension members in pg_depend.
Auto-generated array types, multirange types, and relation rowtypes
are treated as dependent objects: they can't be dropped separately
from the base object, nor can they have their own ownership or
permissions.  We previously felt that, for objects that are in an
extension, only the base object needs to be listed as an extension
member in pg_depend.  While that's sufficient to prevent inappropriate
drops, it results in undesirable answers if someone asks whether a
dependent type belongs to the extension.  It looks like the dependent
type is just some random separately-created object that happens to
depend on the base object.  Notably, this results in postgres_fdw
concluding that expressions involving an array type are not shippable
to the remote server, even when the defining extension has been
whitelisted.

To fix, cause GenerateTypeDependencies to make extension dependencies
for dependent types as well as their base objects, and adjust
ExecAlterExtensionContentsStmt so that object addition and removal
operations recurse to dependent types.  The latter change means that
pg_upgrade of a type-defining extension will end with the dependent
type(s) now also listed as extension members, even if they were
not that way in the source database.  Normally we want pg_upgrade
to precisely reproduce the source extension's state, but it seems
desirable to make an exception here.

This is arguably a bug fix, but we can't back-patch it since it
causes changes in the expected contents of pg_depend.  (Because
it does, I've bumped catversion, even though there's no change
in the immediate post-initdb catalog contents.)

Tom Lane and David Geier

Discussion: https://postgr.es/m/4a847c55-489f-4e8d-a664-fc6b1cbe306f@gmail.com
2024-03-04 14:49:36 -05:00
cc09e6549f Remove the adminpack contrib extension
The adminpack extension was only used to support pgAdmin III,  which
in turn was declared EOL many years ago. Removing the extension also
allows us to remove functions from core as well which were only used
to support old version of adminpack.

Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us>
Reviewed-by: Nathan Bossart <nathandbossart@gmail.com>
Reviewed-by: Bharath Rupireddy <bharath.rupireddyforpostgres@gmail.com>
Discussion: https://postgr.es/m/CALj2ACUmL5TraYBUBqDZBi1C+Re8_=SekqGYqYprj_W8wygQ8w@mail.gmail.com
2024-03-04 12:39:22 +01:00
24eebc65c2 Remove unused 'countincludesself' argument to pq_sendcountedtext()
It has been unused since we removed support for protocol version 2.
2024-03-04 12:56:05 +02:00
0dd094c4a0 Remove unused ParallelWorkerInfo.pid field
The pid was originally used in error context of messages propagated
from parallel workers, but commit 292794f82b removed that. If the need
arises in the future, you can also get the pid with
"shm_mq_get_sender(pcxt->worker[i].error_mqh)->pid".
2024-03-04 12:56:02 +02:00
393b5599e5 Use MyBackendType in more places to check what process this is
Remove IsBackgroundWorker, IsAutoVacuumLauncherProcess(),
IsAutoVacuumWorkerProcess(), and IsLogicalSlotSyncWorker() in favor of
new Am*Process() macros that use MyBackendType. For consistency with
the existing Am*Process() macros.

Reviewed-by: Andres Freund
Discussion: https://www.postgresql.org/message-id/f3ecd4cb-85ee-4e54-8278-5fabfb3a4ed0@iki.fi
2024-03-04 10:25:12 +02:00
067701f577 Remove MyAuxProcType, use MyBackendType instead
MyAuxProcType was redundant with MyBackendType.

Reviewed-by: Reid Thompson, Andres Freund
Discussion: https://www.postgresql.org/message-id/f3ecd4cb-85ee-4e54-8278-5fabfb3a4ed0@iki.fi
2024-03-04 10:25:09 +02:00
024c521117 Replace BackendIds with 0-based ProcNumbers
Now that BackendId was just another index into the proc array, it was
redundant with the 0-based proc numbers used in other places. Replace
all usage of backend IDs with proc numbers.

The only place where the term "backend id" remains is in a few pgstat
functions that expose backend IDs at the SQL level. Those IDs are now
in fact 0-based ProcNumbers too, but the documentation still calls
them "backend ids". That term still seems appropriate to describe what
the numbers are, so I let it be.

One user-visible effect is that pg_temp_0 is now a valid temp schema
name, for backend with ProcNumber 0.

Reviewed-by: Andres Freund
Discussion: https://www.postgresql.org/message-id/8171f1aa-496f-46a6-afc3-c46fe7a9b407@iki.fi
2024-03-03 19:38:22 +02:00
ab355e3a88 Redefine backend ID to be an index into the proc array
Previously, backend ID was an index into the ProcState array, in the
shared cache invalidation manager (sinvaladt.c). The entry in the
ProcState array was reserved at backend startup by scanning the array
for a free entry, and that was also when the backend got its backend
ID. Things become slightly simpler if we redefine backend ID to be the
index into the PGPROC array, and directly use it also as an index to
the ProcState array. This uses a little more memory, as we reserve a
few extra slots in the ProcState array for aux processes that don't
need them, but the simplicity is worth it.

Aux processes now also have a backend ID. This simplifies the
reservation of BackendStatusArray and ProcSignal slots.

You can now convert a backend ID into an index into the PGPROC array
simply by subtracting 1. We still use 0-based "pgprocnos" in various
places, for indexes into the PGPROC array, but the only difference now
is that backend IDs start at 1 while pgprocnos start at 0. (The next
commmit will get rid of the term "backend ID" altogether and make
everything 0-based.)

There is still a 'backendId' field in PGPROC, now part of 'vxid' which
encapsulates the backend ID and local transaction ID together. It's
needed for prepared xacts. For regular backends, the backendId is
always equal to pgprocno + 1, but for prepared xact PGPROC entries,
it's the ID of the original backend that processed the transaction.

Reviewed-by: Andres Freund, Reid Thompson
Discussion: https://www.postgresql.org/message-id/8171f1aa-496f-46a6-afc3-c46fe7a9b407@iki.fi
2024-03-03 19:37:28 +02:00
653b55b570 Return ssize_t in fd.c I/O functions.
In the past, FileRead() and FileWrite() used types based on the Unix
read() and write() functions from before C and POSIX standardization,
though not exactly (we had int for amount instead of unsigned).  In
commit 2d4f1ba6 we changed to the appropriate standard C types, just
like the modern POSIX functions they wrap, but again not exactly: the
return type stayed as int.  In theory, a ssize_t value could be returned
by the underlying call that is too large for an int.

That wasn't really a live bug, because we don't expect PostgreSQL code
to perform reads or writes of gigabytes, and OSes probably apply
internal caps smaller than that anyway.  This change is done on the
principle that the return might as well follow the standard interfaces
consistently.

Reported-by: Tom Lane <tgl@sss.pgh.pa.us>
Reviewed-by: Peter Eisentraut <peter@eisentraut.org>
Discussion: https://postgr.es/m/1672202.1703441340%40sss.pgh.pa.us
2024-03-02 12:09:28 +13:00
655dc31046 Simplify pg_enc2gettext_tbl[] with C99-designated initializer syntax
This commit switches pg_enc2gettext_tbl[] in encnames.c to use a
C99-designated initializer syntax.

pg_bind_textdomain_codeset() is simplified so as it is possible to do
a direct lookup at the gettext() array with a value of the enum pg_enc
rather than doing a loop through all its elements, as long as the
encoding value provided by GetDatabaseEncoding() is in the correct range
of supported encoding values.  Note that PG_MULE_INTERNAL gains a value
in the array, pointing to NULL.

Author: Jelte Fennema-Nio
Discussion: https://postgr.es/m/CAGECzQT3caUbcCcszNewCCmMbCuyP7XNAm60J3ybd6PN5kH2Dw@mail.gmail.com
2024-03-01 18:03:48 +09:00
bd5132db55 Introduce atomic read/write functions with full barrier semantics.
Writing correct code using atomic variables is often difficult due
to the memory barrier semantics (or lack thereof) of the underlying
operations.  This commit introduces atomic read/write functions
with full barrier semantics to ease this cognitive load.  For
example, some spinlocks protect a single value, and these new
functions make it easy to convert the value to an atomic variable
(thus eliminating the need for the spinlock) without modifying the
barrier semantics previously provided by the spinlock.  Since these
functions may be less performant than the other atomic reads and
writes, they are not suitable for every use-case.  However, using a
single atomic operation with full barrier semantics may be more
performant in cases where a separate explicit barrier would
otherwise be required.

The base implementations for these new functions are atomic
exchanges (for writes) and atomic fetch/adds with 0 (for reads).
These implementations can be overwritten with better architecture-
specific versions as they are discovered.

This commit leaves converting existing code to use these new
functions as a future exercise.

Reviewed-by: Andres Freund, Yong Li, Jeff Davis
Discussion: https://postgr.es/m/20231110205128.GB1315705%40nathanxps13
2024-02-29 10:00:44 -06:00
5f2e179bd3 Support MERGE into updatable views.
This allows the target relation of MERGE to be an auto-updatable or
trigger-updatable view, and includes support for WITH CHECK OPTION,
security barrier views, and security invoker views.

A trigger-updatable view must have INSTEAD OF triggers for every type
of action (INSERT, UPDATE, and DELETE) mentioned in the MERGE command.
An auto-updatable view must not have any INSTEAD OF triggers. Mixing
auto-update and trigger-update actions (i.e., having a partial set of
INSTEAD OF triggers) is not supported.

Rule-updatable views are also not supported, since there is no
rewriter support for non-SELECT rules with MERGE operations.

Dean Rasheed, reviewed by Jian He and Alvaro Herrera.

Discussion: https://postgr.es/m/CAEZATCVcB1g0nmxuEc-A+gGB0HnfcGQNGYH7gS=7rq0u0zOBXA@mail.gmail.com
2024-02-29 15:56:59 +00:00
ada87a4d95 Use C99-designated initializer syntax for arrays related to encodings
This updates the following lookup arrays to use C99-designated
initializer syntax, indexed based on the enum pg_enc:
pg_enc2icu_tbl[]
pg_enc2name_tbl[]
pg_wchar_table[]

This is more readable, and removes problems with ordering mistakes as
this removes dependencies between the arrays and their lookup index in
the enum pg_enc.  So, adding new encodings becomes easier, even if this
does not happen often.

Author: Jelte Fennema-Nio
Reviewed-by: Jian He, Japin Li
Discussion: https://postgr.es/m/CAGECzQT3caUbcCcszNewCCmMbCuyP7XNAm60J3ybd6PN5kH2Dw@mail.gmail.com
2024-02-29 09:54:25 +09:00
53c2a97a92 Improve performance of subsystems on top of SLRU
More precisely, what we do here is make the SLRU cache sizes
configurable with new GUCs, so that sites with high concurrency and big
ranges of transactions in flight (resp. multixacts/subtransactions) can
benefit from bigger caches.  In order for this to work with good
performance, two additional changes are made:

1. the cache is divided in "banks" (to borrow terminology from CPU
   caches), and algorithms such as eviction buffer search only affect
   one specific bank.  This forestalls the problem that linear searching
   for a specific buffer across the whole cache takes too long: we only
   have to search the specific bank, whose size is small.  This work is
   authored by Andrey Borodin.

2. Change the locking regime for the SLRU banks, so that each bank uses
   a separate LWLock.  This allows for increased scalability.  This work
   is authored by Dilip Kumar.  (A part of this was previously committed as
   d172b717c6f4.)

Special care is taken so that the algorithms that can potentially
traverse more than one bank release one bank's lock before acquiring the
next.  This should happen rarely, but particularly clog.c's group commit
feature needed code adjustment to cope with this.  I (Álvaro) also added
lots of comments to make sure the design is sound.

The new GUCs match the names introduced by bcdfa5f2e2f2 in the
pg_stat_slru view.

The default values for these parameters are similar to the previous
sizes of each SLRU.  commit_ts, clog and subtrans accept value 0, which
means to adjust by dividing shared_buffers by 512 (so 2MB for every 1GB
of shared_buffers), with a cap of 8MB.  (A new slru.c function
SimpleLruAutotuneBuffers() was added to support this.)  The cap was
previously 1MB for clog, so for sites with more than 512MB of shared
memory the total memory used increases, which is likely a good tradeoff.
However, other SLRUs (notably multixact ones) retain smaller sizes and
don't support a configured value of 0.  These values based on
shared_buffers may need to be revisited, but that's an easy change.

There was some resistance to adding these new GUCs: it would be better
to adjust to memory pressure automatically somehow, for example by
stealing memory from shared_buffers (where the caches can grow and
shrink naturally).  However, doing that seems to be a much larger
project and one which has made virtually no progress in several years,
and because this is such a pain point for so many users, here we take
the pragmatic approach.

Author: Andrey Borodin <x4mmm@yandex-team.ru>
Author: Dilip Kumar <dilipbalaut@gmail.com>
Reviewed-by: Amul Sul, Gilles Darold, Anastasia Lubennikova,
	Ivan Lazarev, Robert Haas, Thomas Munro, Tomas Vondra,
	Yura Sokolov, Васильев Дмитрий (Dmitry Vasiliev).
Discussion: https://postgr.es/m/2BEC2B3F-9B61-4C1D-9FB5-5FAB0F05EF86@yandex-team.ru
Discussion: https://postgr.es/m/CAFiTN-vzDvNz=ExGXz6gdyjtzGixKSqs0mKHMmaQ8sOSEFZ33A@mail.gmail.com
2024-02-28 17:05:31 +01:00
0b16bb8776 Remove AIX support
There isn't a lot of user demand for AIX support, we have a bunch of
hacks to work around AIX-specific compiler bugs and idiosyncrasies,
and no one has stepped up to the plate to properly maintain it.
Remove support for AIX to get rid of that maintenance overhead. It's
still supported for stable versions.

The acute issue that triggered this decision was that after commit
8af2565248, the AIX buildfarm members have been hitting this
assertion:

    TRAP: failed Assert("(uintptr_t) buffer == TYPEALIGN(PG_IO_ALIGN_SIZE, buffer)"), File: "md.c", Line: 472, PID: 2949728

Apperently the "pg_attribute_aligned(a)" attribute doesn't work on AIX
for values larger than PG_IO_ALIGN_SIZE, for a static const variable.
That could be worked around, but we decided to just drop the AIX support
instead.

Discussion: https://www.postgresql.org/message-id/20240224172345.32@rfd.leadboat.com
Reviewed-by: Andres Freund, Noah Misch, Thomas Munro
2024-02-28 15:17:23 +04:00
bcdfa5f2e2 Rename SLRU elements in view pg_stat_slru
The new names are intended to match those in an upcoming patch that adds
a few GUCs to configure the SLRU buffer sizes.

Backwards compatibility concern: this changes the accepted names for
function pg_stat_slru_rest().  Since this function recognizes "any other
string" as a request to reset the entry for "other", this means that
calling it with the old names would silently reset "other" instead of
doing nothing or throwing an error.

Reviewed-by: Andrey M. Borodin <x4mmm@yandex-team.ru>
Discussion: https://postgr.es/m/202402261616.dlriae7b6emv@alvherre.pgsql
2024-02-28 09:39:52 +01:00
e1724af42c Fix comments for the dshash_parameters struct.
A recent commit added a copy_function member to the
dshash_parameters struct, but it missed updating a couple of
comments that refer to the function pointer members of this struct.
One of those comments also refers to a tranche_name member and non-
arg variants of the function pointer members, all of which were
either removed during development or removed shortly after dshash
table support was committed.

Oversights in commits 8c0d7bafad, d7694fc148, and 42a1de3013.

Discussion: https://postgr.es/m/20240227045213.GA2329190%40nathanxps13
2024-02-27 09:44:59 -06:00
ef5e2e9085 Remove unnecessary array object_classes[] in dependency.c
object_classes[] provided unnecessary indirection between catalog OIDs
and the enum ObjectClass when calling add_object_address().  This array
has been originally introduced in 30ec31604d5 and was useful because not
all relation OIDs were compile-time constants back then, which has not
been the case for a long time now for all the elements of ObjectClass.

This commit removes object_classes[], switching to the catalog OIDs
when calling add_object_address().  This shaves some code while saving
in maintenance because it was necessary to maintain the enum ObjectClass
and the array in sync when adding new object types.

Reported-by: Jeff Davis
Author: Jelte Fennema-Nio
Reviewed-by: Jian He, Michael Paquier
Discussion: https://postgr.es/m/CAGECzQT3caUbcCcszNewCCmMbCuyP7XNAm60J3ybd6PN5kH2Dw@mail.gmail.com
2024-02-27 15:18:17 +09:00
743112a2e9 Adjust memory allocation functions to allow sibling calls
Many modern compilers are able to optimize function calls to functions
where the parameters of the called function match a leading subset of
the calling function's parameters.  If there are no instructions in the
calling function after the function is called, then the compiler is free
to avoid any stack frame setup and implement the function call as a
"jmp" rather than a "call".  This is called sibling call optimization.

Here we adjust the memory allocation functions in mcxt.c to allow this
optimization.  This requires moving some responsibility into the memory
context implementations themselves.  It's now the responsibility of the
MemoryContext to check for malloc failures.  This is good as it both
allows the sibling call optimization, but also because most small and
medium allocations won't call malloc and just allocate memory to an
existing block.  That can't fail, so checking for NULLs in that case
isn't required.

Also, traditionally it's been the responsibility of palloc and the other
allocation functions in mcxt.c to check for invalid allocation size
requests.  Here we also move the responsibility of checking that into the
MemoryContext.  This isn't to allow the sibling call optimization, but
more because most of our allocators handle large allocations separately
and we can just add the size check when doing large allocations.  We no
longer check this for non-large allocations at all.

To make checking the allocation request sizes and ERROR handling easier,
add some helper functions to mcxt.c for the allocators to use.

Author: Andres Freund
Reviewed-by: David Rowley
Discussion: https://postgr.es/m/20210719195950.gavgs6ujzmjfaiig@alap3.anarazel.de
2024-02-27 16:39:42 +13:00
42a1de3013 Add helper functions for dshash tables with string keys.
Presently, string keys are not well-supported for dshash tables.
The dshash code always copies key_size bytes into new entries'
keys, and dshash.h only provides compare and hash functions that
forward to memcmp() and tag_hash(), both of which do not stop at
the first NUL.  This means that callers must pad string keys so
that the data beyond the first NUL does not adversely affect the
results of copying, comparing, and hashing the keys.

To better support string keys in dshash tables, this commit does
a couple things:

* A new copy_function field is added to the dshash_parameters
  struct.  This function pointer specifies how the key should be
  copied into new table entries.  For example, we only want to copy
  up to the first NUL byte for string keys.  A dshash_memcpy()
  helper function is provided and used for all existing in-tree
  dshash tables without string keys.

* A set of helper functions for string keys are provided.  These
  helper functions forward to strcmp(), strcpy(), and
  string_hash(), all of which ignore data beyond the first NUL.

This commit also adjusts the DSM registry's dshash table to use the
new helper functions for string keys.

Reviewed-by: Andy Fan
Discussion: https://postgr.es/m/20240119215941.GA1322079%40nathanxps13
2024-02-26 15:47:13 -06:00
449e798c77 Introduce sequence_*() access functions
Similarly to tables and indexes, these functions are able to open
relations with a sequence relkind, which is useful to make a distinction
with the other relation kinds.  Previously, commands/sequence.c used a
mix of table_{close,open}() and relation_{close,open}() routines when
manipulating sequence relations, so this clarifies the code.

A direct effect of this change is to align the error messages produced
when attempting DDLs for sequences on relations with an unexpected
relkind, like a table or an index with ALTER SEQUENCE, providing an
extra error detail about the relkind of the relation used in the DDL
query.

Author: Michael Paquier
Reviewed-by: Tomas Vondra
Discussion: https://postgr.es/m/ZWlohtKAs0uVVpZ3@paquier.xyz
2024-02-26 16:04:59 +09:00
d360e3cc60 Fix compiler warning on typedef redeclaration
bulk_write.c:78:3: error: redefinition of typedef 'BulkWriteState' is a C11 feature [-Werror,-Wtypedef-redefinition]
    } BulkWriteState;
      ^
    ../../../../src/include/storage/bulk_write.h:20:31: note: previous definition is here
    typedef struct BulkWriteState BulkWriteState;
                                  ^
    1 error generated.

Per buildfarm animals 'sifaka' and 'longfin'.

Discussion: https://www.postgresql.org/message-id/9e1f63c3-ef16-404c-b3cb-859a96eaba39@iki.fi
2024-02-23 17:39:27 +02:00
8af2565248 Introduce a new smgr bulk loading facility.
The new facility makes it easier to optimize bulk loading, as the
logic for buffering, WAL-logging, and syncing the relation only needs
to be implemented once. It's also less error-prone: We have had a
number of bugs in how a relation is fsync'd - or not - at the end of a
bulk loading operation. By centralizing that logic to one place, we
only need to write it correctly once.

The new facility is faster for small relations: Instead of of calling
smgrimmedsync(), we register the fsync to happen at next checkpoint,
which avoids the fsync latency. That can make a big difference if you
are e.g. restoring a schema-only dump with lots of relations.

It is also slightly more efficient with large relations, as the WAL
logging is performed multiple pages at a time. That avoids some WAL
header overhead. The sorted GiST index build did that already, this
moves the buffering to the new facility.

The changes to pageinspect GiST test needs an explanation: Before this
patch, the sorted GiST index build set the LSN on every page to the
special GistBuildLSN value, not the LSN of the WAL record, even though
they were WAL-logged. There was no particular need for it, it just
happened naturally when we wrote out the pages before WAL-logging
them. Now we WAL-log the pages first, like in B-tree build, so the
pages are stamped with the record's real LSN. When the build is not
WAL-logged, we still use GistBuildLSN. To make the test output
predictable, use an unlogged index.

Reviewed-by: Andres Freund
Discussion: https://www.postgresql.org/message-id/30e8f366-58b3-b239-c521-422122dd5150%40iki.fi
2024-02-23 16:10:51 +02:00
93db6cbda0 Add a new slot sync worker to synchronize logical slots.
By enabling slot synchronization, all the failover logical replication
slots on the primary (assuming configurations are appropriate) are
automatically created on the physical standbys and are synced
periodically. The slot sync worker on the standby server pings the primary
server at regular intervals to get the necessary failover logical slots
information and create/update the slots locally. The slots that no longer
require synchronization are automatically dropped by the worker.

The nap time of the worker is tuned according to the activity on the
primary. The slot sync worker waits for some time before the next
synchronization, with the duration varying based on whether any slots were
updated during the last cycle.

A new parameter sync_replication_slots enables or disables this new
process.

On promotion, the slot sync worker is shut down by the startup process to
drop any temporary slots acquired by the slot sync worker and to prevent
the worker from trying to fetch the failover slots.

A functionality to allow logical walsenders to wait for the physical will
be done in a subsequent commit.

Author: Shveta Malik, Hou Zhijie based on design inputs by Masahiko Sawada and Amit Kapila
Reviewed-by: Masahiko Sawada, Bertrand Drouvot, Peter Smith, Dilip Kumar, Ajin Cherian, Nisha Moond, Kuroda Hayato, Amit Kapila
Discussion: https://postgr.es/m/514f6f2f-6833-4539-39f1-96cd1e011f23@enterprisedb.com
2024-02-22 15:25:15 +05:30
fbc93b8b5f Remove custom Constraint node read/write implementations
This is part of an effort to reduce the number of special cases in the
automatically generated node support functions.

Allegedly, only certain fields of the Constraint node are valid based
on contype.  But this has historically not been kept up to date in the
read/write functions.  The Constraint node is only used for debugging
DDL statements, so there are no strong requirements for its output,
and there is no enforcement for its correctness.  (There was no read
support before a6bc3301925.)  Commits e7a552f303c and abf46ad9c7b are
examples of where omissions were fixed.

This patch just removes the custom read/write implementations for the
Constraint node type.  Now we just output all the fields, which is a
bit more than before, but at least we don't have to maintain these
functions anymore.  Also, we lose the string representation of the
contype field, but for this marginal use case that seems tolerable.
This patch also changes the documentation of the Constraint struct to
put less emphasis on grouping fields by constraint type but rather
document for each field how it's used.

Reviewed-by: Paul Jungwirth <pj@illuminatedcomputing.com>
Discussion: https://www.postgresql.org/message-id/flat/4b27fc50-8cd6-46f5-ab20-88dbaadca645@eisentraut.org
2024-02-22 07:07:12 +01:00