Commit Graph

11755 Commits

Author SHA1 Message Date
d2b4b4c225 Fix C23 compiler warning
The approach of declaring a function pointer with an empty argument
list and hoping that the compiler will not complain about casting it
to another type no longer works with C23, because foo() is now
equivalent to foo(void).

We don't need to do this here.  With a few struct forward declarations
we can supply a correct argument list without having to pull in
another header file.

(This is the only new warning with C23.  Together with the previous
fix a67a49648d9, this makes the whole code compile cleanly under C23.)

Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us>
Discussion: https://www.postgresql.org/message-id/flat/95c6a9bf-d306-43d8-b880-664ef08f2944%40eisentraut.org
2024-10-22 08:17:42 +02:00
11c87216d1 SQL/JSON: Fix some oversights in commit b6e1157e7
The decision in b6e1157e7 to ignore raw_expr when evaluating a
JsonValueExpr was incorrect.  While its value is not ultimately
used (since formatted_expr's value is), failing to initialize it
can lead to problems, for instance,  when the expression tree in
raw_expr contains Aggref nodes, which must be initialized to
ensure the parent Agg node works correctly.

Also, optimize eval_const_expressions_mutator()'s handling of
JsonValueExpr a bit.  Currently, when formatted_expr cannot be folded
into a constant, we end up processing it twice -- once directly in
eval_const_expressions_mutator() and again recursively via
ece_generic_processing().  This recursive processing is required to
handle raw_expr. To avoid the redundant processing of formatted_expr,
we now  process raw_expr directly in eval_const_expressions_mutator().

Finally, update the comment of JsonValueExpr to describe the roles of
raw_expr and formatted_expr more clearly.

Bug: #18657
Reported-by: Alexander Lakhin <exclusion@gmail.com>
Diagnosed-by: Fabio R. Sluzala <fabio3rs@gmail.com>
Diagnosed-by: Tender Wang <tndrwang@gmail.com>
Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us>
Discussion: https://postgr.es/m/18657-1b90ccce2b16bdb8@postgresql.org
Backpatch-through: 16
2024-10-20 12:20:55 +09:00
52475b4d30 Fix comment about pg_authid.
pg_shadow is not "publicly readable".  (pg_group is, but there seems
no need to make that distinction here.)  Seems to be a thinko dating
clear back to 7762619e9.

Antonin Houska

Discussion: https://postgr.es/m/31926.1729252247@antos
2024-10-19 11:44:14 -04:00
1bd4bc85ca Optimize nbtree backwards scans.
Make nbtree backwards scans optimistically access the next page to be
read to the left by following a prevPage block number that's now stashed
in currPos when the leaf page is first read.  This approach matches the
one taken during forward scans, which follow a symmetric nextPage block
number from currPos.  We stash both a prevPage and a nextPage, since the
scan direction might change (when fetching from a scrollable cursor).

Backwards scans will no longer need to lock the same page twice, except
in rare cases where the scan detects a concurrent page split (or page
deletion).  Testing has shown this optimization to be particularly
effective during parallel index-only backwards scans: ~12% reductions in
query execution time are quite possible.

We're much better off being optimistic; concurrent left sibling page
splits are rare in general.  It's possible that we'll need to lock more
pages than the pessimistic approach would have, but only when there are
_multiple_ concurrent splits of the left sibling page we now start at.
If there's just a single concurrent left sibling page split, the new
approach to scanning backwards will at least break even relative to the
old one (we'll acquire the same number of leaf page locks as before).

The optimization from this commit has long been contemplated by comments
added by commit 2ed5b87f96, which changed the rules for locking/pinning
during nbtree index scans.  The approach that that commit introduced to
leaf level link traversal when scanning forwards is now more or less
applied all the time, regardless of the direction we're scanning in.

Following uniform conventions around sibling link traversal is simpler.
The only real remaining difference between our forward and backwards
handling is that our backwards handling must still detect and recover
from any concurrent left sibling splits (and concurrent page deletions),
as documented in the nbtree README.  That is structured as a single,
isolated extra step that takes place in _bt_readnextpage.

Also use this opportunity to further simplify the functions that deal
with reading pages and traversing sibling links on the leaf level, and
to document their preconditions and postconditions (with respect to
things like buffer locks, buffer pins, and seizing the parallel scan).

This enhancement completely supersedes the one recently added by commit
3f44959f.

Author: Matthias van de Meent <boekewurm+postgres@gmail.com>
Author: Peter Geoghegan <pg@bowt.ie>
Discussion: https://postgr.es/m/CAEze2WgpBGRgTTxTWVPXc9+PB6fc1a7t+VyGXHzfnrFXcQVxnA@mail.gmail.com
Discussion: https://postgr.es/m/CAH2-WzkBTuFv7W2+84jJT8mWZLXVL0GHq2hMUTn6c9Vw=eYrCw@mail.gmail.com
2024-10-18 11:25:32 -04:00
eecd9138a0 Improve ThrowErrorData() comments for use with soft errors.
Reviewed-by: Corey Huinker
Discussion: https://postgr.es/m/901ab7cf01957f92ea8b30b6feeb0eacfb7505fc.camel@j-davis.com
2024-10-17 14:56:44 -07:00
eafda78fc4 Improve node type forward reference
Instead of using Node *, we can use an incomplete struct.  That way,
everything has the correct type and fewer casts are required.  This
technique is already used elsewhere in node type definitions.

Reviewed-by: Nathan Bossart <nathandbossart@gmail.com>
Reviewed-by: Tender Wang <tndrwang@gmail.com>
Discussion: https://www.postgresql.org/message-id/flat/637eeea8-5663-460b-a114-39572c0f6c6e%40eisentraut.org
2024-10-17 08:36:48 +02:00
9ca67658d1 Don't store intermediate hash values in ExprState->resvalue
adf97c156 made it so ExprStates could support hashing and changed Hash
Join to use that instead of manually extracting Datums from tuples and
hashing them one column at a time.

When hashing multiple columns or expressions, the code added in that
commit stored the intermediate hash value in the ExprState's resvalue
field.  That was a mistake as steps may be injected into the ExprState
between each hashing step that look at or overwrite the stored
intermediate hash value.  EEOP_PARAM_SET is an example of such a step.

Here we fix this by adding a new dedicated field for storing
intermediate hash values and adjust the code so that all apart from the
final hashing step store their result in the intermediate field.

In passing, rename a variable so that it's more aligned to the
surrounding code and also so a few lines stay within the 80 char margin.

Reported-by: Andres Freund
Reviewed-by: Alena Rybakina <a.rybakina@postgrespro.ru>
Discussion: https://postgr.es/m/CAApHDvqo9eenEFXND5zZ9JxO_k4eTA4jKMGxSyjdTrsmYvnmZw@mail.gmail.com
2024-10-17 14:25:08 +13:00
79fa7b3b1a Normalize nbtree truncated high key array behavior.
Commit 5bf748b8 taught nbtree ScalarArrayOp index scans to decide when
and how to start the next primitive index scan based on physical index
characteristics.  This included rules for deciding whether to start a
new primitive index scan (or whether to move onto the right sibling leaf
page instead) that specifically consider truncated lower-order columns
(-inf columns) from leaf page high keys.

These omitted columns were treated as satisfying the scan's required
scan keys, though only for scan keys marked required in the current scan
direction (forward).  Scan keys that didn't get this behavior (those
marked required in the backwards direction only) usually didn't give the
scan reasonable cause to reposition itself to a later leaf page (via
another descent of the index in _bt_first), but _bt_advance_array_keys
would nevertheless always give up by forcing another call to _bt_first.

_bt_advance_array_keys was unwilling to allow the scan to continue onto
the next leaf page, to reconsider whether we really should start another
primitive scan based on the details of the sibling page's tuples.  This
didn't match its behavior with similar cases involving keys required in
the current scan direction (forward), which seems unprincipled.  It led
to an excessive number of primitive scans/index descents for queries
with a higher-order = array scan key (with dense, contiguous values)
mixed with a lower-order required > or >= scan key.

Bring > and >= strategy scan keys in line with other required scan key
types: treat truncated -inf scan keys as having satisfied scan keys
required in either scan direction (forwards and backwards alike) during
array advancement.  That way affected scans can continue to the right
sibling leaf page.  Advancement must now schedule an explicit recheck of
the right sibling page's high key in cases involving > or >= scan keys.
The recheck gives the scan a way to back out and start another primitive
index scan (we can't just rely on _bt_checkkeys with > or >= scan keys).

This work can be considered a stand alone optimization on top of the
work from commit 5bf748b8.  But it was written in preparation for an
upcoming patch that will add skip scan to nbtree.  In practice scans
that use "skip arrays" will tend to be much more sensitive to any
implementation deficiencies in this area.

Author: Peter Geoghegan <pg@bowt.ie>
Reviewed-By: Tomas Vondra <tomas@vondra.me>
Discussion: https://postgr.es/m/CAH2-Wz=9A_UtM7HzUThSkQ+BcrQsQZuNhWOvQWK06PRkEp=SKQ@mail.gmail.com
2024-10-16 12:17:49 -04:00
d5ca15ee54 Add type cast to foreach_internal's loop variable.
C++ requires explicitly casting void pointers to the appropriate
pointer type, which means the foreach_ptr macro cannot be used in
C++ code without this change.

Author: Jelte Fennema-Nio
Reviewed-by: Bruce Momjian
Discussion: https://postgr.es/m/CAGECzQSYG3QfHrc-rOk2KbnB9iJOd7Qu-Xii1s-GTA%3D3JFt49Q%40mail.gmail.com
Backpatch-through: 17
2024-10-15 16:20:49 -05:00
2453196107 Move clause_sides_match_join() into restrictinfo.h
Two near-identical copies of clause_sides_match_join() existed in
joinpath.c and analyzejoins.c.  Deduplicate this by moving the function
into restrictinfo.h.

It isn't quite clear that keeping the inline property of this function
is worthwhile, but this commit is just an exercise in code
deduplication.  More effort would be required to determine if the inline
property is worth keeping.

Author: James Hunter <james.hunter.pg@gmail.com>
Discussion: https://postgr.es/m/CAJVSvF7Nm_9kgMLOch4c-5fbh3MYg%3D9BdnDx3Dv7Fcb64zr64Q%40mail.gmail.com
2024-10-15 21:14:21 +13:00
7cdfeee320 Add contrib/pg_logicalinspect.
This module provides SQL functions that allow to inspect logical
decoding components.

It currently allows to inspect the contents of serialized logical
snapshots of a running database cluster, which is useful for debugging
or educational purposes.

Author: Bertrand Drouvot
Reviewed-by: Amit Kapila, Shveta Malik, Peter Smith, Peter Eisentraut
Reviewed-by: David G. Johnston
Discussion: https://postgr.es/m/ZscuZ92uGh3wm4tW%40ip-10-97-1-34.eu-west-3.compute.internal
2024-10-14 17:22:02 -07:00
e2fd615ecc Move SnapBuild and SnapBuildOnDisk structs to snapshot_internal.h.
This commit moves the definitions of the SnapBuild and SnapBuildOnDisk
structs, related to logical snapshots, to the snapshot_internal.h
file. This change allows external tools, such as
pg_logicalinspect (with an upcoming patch), to access and utilize the
contents of logical snapshots.

Author: Bertrand Drouvot
Reviewed-by: Amit Kapila, Shveta Malik, Peter Smith
Discussion: https://postgr.es/m/ZscuZ92uGh3wm4tW%40ip-10-97-1-34.eu-west-3.compute.internal
2024-10-14 17:19:33 -07:00
7be4ba4a9d Remove obsolete comment in reorderbuffer.h.
Commit 9fab40ad32e changed ReorderBuffer to use Slab Context for
allocating ReorderBufferTXN entries instead of using a caching
mechanism. The txn->node is no longer used as an element of the list
of preallocated ReorderBufferTXNs.

Reviewed-by: Amit Kapila
Discussion: https://postgr.es/m/CAD21AoB1CTnX66Ji3zTCnjoPVC9OzYe0B6LygUHcxEB2RV-hFw%40mail.gmail.com
2024-10-14 09:53:05 -07:00
c594f1ad2b Track scan reversals in MergeJoin
The MergeJoin struct was tracking "mergeStrategies", which were an
array of btree strategy numbers, purely for the purpose of comparing
it later against btree strategies to determine if the scan direction
was forward or reverse.  Change that.  Instead, track
"mergeReversals", an array of bool, to indicate the same without an
unfortunate assumption that a strategy number refers specifically to a
btree strategy.

Author: Mark Dilger <mark.dilger@enterprisedb.com>
Discussion: https://www.postgresql.org/message-id/flat/E72EAA49-354D-4C2E-8EB9-255197F55330@enterprisedb.com
2024-10-14 15:36:18 +02:00
0d2aa4d493 Track sort direction in SortGroupClause
Functions make_pathkey_from_sortop() and transformWindowDefinitions(),
which receive a SortGroupClause, were determining the sort order
(ascending vs. descending) by comparing that structure's operator
strategy to BTLessStrategyNumber, but could just as easily have gotten
it from the SortGroupClause object, if it had such a field, so add
one.  This reduces the number of places that hardcode the assumption
that the strategy refers specifically to a btree strategy, rather than
some other index AM's operators.

Author: Mark Dilger <mark.dilger@enterprisedb.com>
Discussion: https://www.postgresql.org/message-id/flat/E72EAA49-354D-4C2E-8EB9-255197F55330@enterprisedb.com
2024-10-14 15:36:02 +02:00
a2d9a9b95a Remove traces of BeOS.
Commit 15abc7788e6 tolerated namespace pollution from BeOS system
headers.  Commit 44f902122 de-supported BeOS.  Since that stuff didn't
make it into the Meson build system, synchronize by removing from
configure.

Author: Thomas Munro <thomas.munro@gmail.com>
Reviewed-by: Peter Eisentraut <peter@eisentraut.org>
Reviewed-by: Heikki Linnakangas <hlinnaka@iki.fi>
Reviewed-by: Japin Li <japinli@hotmail.com>
Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us> (the idea, not the patch)
Discussion: https://postgr.es/m/ME3P282MB3166F9D1F71F787929C0C7E7B6312%40ME3P282MB3166.AUSP282.PROD.OUTLOOK.COM
2024-10-14 08:33:36 +02:00
e839c8ecc9 Create functions pg_set_relation_stats, pg_clear_relation_stats.
These functions are used to tweak statistics on any relation, provided
that the user has MAINTAIN privilege on the relation, or is the database
owner.

Bump catalog version.

Author: Corey Huinker
Discussion: https://postgr.es/m/CADkLM=eErgzn7ECDpwFcptJKOk9SxZEk5Pot4d94eVTZsvj3gw@mail.gmail.com
2024-10-11 16:55:11 -07:00
6f782a2a17 Avoid mixing custom and OpenSSL BIO functions
PostgreSQL has for a long time mixed two BIO implementations, which can
lead to subtle bugs and inconsistencies. This cleans up our BIO by just
just setting up the methods we need. This patch does not introduce any
functionality changes.

The following methods are no longer defined due to not being needed:

  - gets: Not used by libssl
  - puts: Not used by libssl
  - create: Sets up state not used by libpq
  - destroy: Not used since libpq use BIO_NOCLOSE, if it was used it close
             the socket from underneath libpq
  - callback_ctrl: Not implemented by sockets

The following methods are defined for our BIO:

  - read: Used for reading arbitrary length data from the BIO. No change
          in functionality from the previous implementation.
  - write: Used for writing arbitrary length data to the BIO. No change
           in functionality from the previous implementation.
  - ctrl: Used for processing ctrl messages in the BIO (similar to ioctl).
          The only ctrl message which matters is BIO_CTRL_FLUSH used for
          writing out buffered data (or signal EOF and that no more data
          will be written). BIO_CTRL_FLUSH is mandatory to implement and
          is implemented as a no-op since there is no intermediate buffer
          to flush.
          BIO_CTRL_EOF is the out-of-band method for signalling EOF to
          read_ex based BIO's. Our BIO is not read_ex based but someone
          could accidentally call BIO_CTRL_EOF on us so implement mainly
          for completeness sake.

As the implementation is no longer related to BIO_s_socket or calling
SSL_set_fd, methods have been renamed to reference the PGconn and Port
types instead.

This also reverts back to using BIO_set_data, with our fallback, as a small
optimization as BIO_set_app_data require the ex_data mechanism in OpenSSL.

Author: David Benjamin <davidben@google.com>
Reviewed-by: Andres Freund <andres@anarazel.de>
Reviewed-by: Daniel Gustafsson <daniel@yesql.se>
Discussion: https://postgr.es/m/CAF8qwaCZ97AZWXtg_y359SpOHe+HdJ+p0poLCpJYSUxL-8Eo8A@mail.gmail.com
2024-10-11 21:58:58 +02:00
4e1fad3787 Add pg_ls_summariesdir().
This function returns the name, size, and last modification time of
each regular file in pg_wal/summaries.  This allows administrators
to grant privileges to view the contents of this directory without
granting privileges on pg_ls_dir(), which allows listing the
contents of many other directories.  This commit also gives the
pg_monitor predefined role EXECUTE privileges on the new
pg_ls_summariesdir() function.

Bumps catversion.

Author: Yushi Ogiwara
Reviewed-by: Michael Paquier, Fujii Masao
Discussion: https://postgr.es/m/a0a3af15a9b9daa107739eb45aa9a9bc%40oss.nttdata.com
2024-10-11 11:02:09 -05:00
fd64ed60b6 Unbreak overflow test for attinhcount/coninhcount
Commit 90189eefc1e1 narrowed pg_attribute.attinhcount and
pg_constraint.coninhcount from 32 to 16 bits, but kept other related
structs with 32-bit wide fields: ColumnDef and CookedConstraint contain
an int 'inhcount' field which is itself checked for overflow on
increments, but there's no check that the values aren't above INT16_MAX
before assigning to the catalog columns.  This means that a creative
user can get a inconsistent table definition and override some
protections.

Fix it by changing those other structs to also use int16.

Also, modernize style by using pg_add_s16_overflow for overflow testing
instead of checking for negative values.

We also have Constraint.inhcount, which is here removed completely.
This was added by commit b0e96f311985 and not removed by its revert at
6f8bb7c1e961.  It is not needed by the upcoming not-null constraints
patch.

This is mostly academic, so we agreed not to backpatch to avoid ABI
problems.

Bump catversion because of the changes to parse nodes.

Co-authored-by: Álvaro Herrera <alvherre@alvh.no-ip.org>
Co-authored-by: 何建 (jian he) <jian.universality@gmail.com>
Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us>
Discussion: https://postgr.es/m/202410081611.up4iyofb5ie7@alvherre.pgsql
2024-10-10 17:41:01 +02:00
1909835c28 Improve descriptions of some pg_stat_checkpoints functions in pg_proc.dat.
Previously, the descriptions of pg_stat_get_checkpointer_num_requested(),
pg_stat_get_checkpointer_restartpoints_requested(),
and pg_stat_get_checkpointer_restartpoints_performed() in pg_proc.dat
referred to "backend". This was misleading because these functions report
the number of checkpoints or restartpoints requested or performed
by other than backends as well.

This commit removes "backend" from these descriptions to avoid confusion.

Bump catalog version.

Idea from Anton A. Melnikov
Author: Fujii Masao
Reviewed-by: Anton A. Melnikov
Discussion: https://postgr.es/m/8e5f353f-8b31-4a8e-9cfa-c037f22b4aee@postgrespro.ru
2024-10-11 00:12:29 +09:00
de3a2ea3b2 Introduce two fields in EState to track parallel worker activity
These fields can be set by executor nodes to record how many parallel
workers were planned to be launched and how many of them have been
actually launched within the number initially planned.  This data is
able to give an approximation of the parallel worker draught a system
is facing, making easier the tuning of related configuration parameters.

These fields will be used by some follow-up patches to populate other
parts of the system with their data.

Author: Guillaume Lelarge, Benoit Lobréau
Discussion: https://postgr.es/m/783bc7f7-659a-42fa-99dd-ee0565644e25@dalibo.com
Discussion: https://postgr.es/m/CAECtzeWtTGOK0UgKXdDGpfTVSa5bd_VbUt6K6xn8P7X+_dZqKw@mail.gmail.com
2024-10-09 08:07:48 +09:00
2d24fd942c Add min and max aggregates for bytea type.
Similar to a0f1fce80, although we chose to duplicate logic
rather than invoke byteacmp, primarily to avoid repeat detoasting.

Marat Buharov, Aleksander Alekseev

Discussion: https://postgr.es/m/CAPCEVGXiASjodos4P8pgyV7ixfVn-ZgG9YyiRZRbVqbGmfuDyg@mail.gmail.com
2024-10-08 13:52:14 -04:00
57f3702471 Use aux process resource owner in walsender
AIO will need a resource owner to do IO. Right now we create a resowner
on-demand during basebackup, and we could do the same for AIO. But it seems
easier to just always create an aux process resowner.

Reviewed-by: Heikki Linnakangas <hlinnaka@iki.fi>
Reviewed-by: Noah Misch <noah@leadboat.com>
Discussion: https://postgr.es/m/1f6b50a7-38ef-4d87-8246-786d39f46ab9@iki.fi
2024-10-08 11:37:45 -04:00
755a4c10d1 bufmgr/smgr: Don't cross segment boundaries in StartReadBuffers()
With real AIO it doesn't make sense to cross segment boundaries with one
IO. Add smgrmaxcombine() to allow upper layers to query which buffers can be
merged.

We could continue to cross segment boundaries when not using AIO, but it
doesn't really make sense, because md.c will never be able to perform the read
across the segment boundary in one system call. Which means we'll mark more
buffers as undergoing IO than really makes sense - if another backend desires
to read the same blocks, it'll be blocked longer than necessary. So it seems
better to just never cross the boundary.

Reviewed-by: Heikki Linnakangas <hlinnaka@iki.fi>
Reviewed-by: Noah Misch <noah@leadboat.com>
Discussion: https://postgr.es/m/1f6b50a7-38ef-4d87-8246-786d39f46ab9@iki.fi
2024-10-08 11:37:45 -04:00
2bbc261ddb Use an shmem_exit callback to remove backend from PMChildFlags on exit
This seems nicer than having to duplicate the logic between
InitProcess() and ProcKill() for which child processes have a
PMChildFlags slot.

Move the MarkPostmasterChildActive() call earlier in InitProcess(),
out of the section protected by the spinlock.

Reviewed-by: Andres Freund <andres@anarazel.de>
Discussion: https://www.postgresql.org/message-id/a102f15f-eac4-4ff2-af02-f9ff209ec66f@iki.fi
2024-10-08 15:06:34 +03:00
4ac2a9bece Add REJECT_LIMIT option to the COPY command.
Previously, when ON_ERROR was set to 'ignore', the COPY command
would skip all rows with data type conversion errors, with no way to
limit the number of skipped rows before failing.

This commit introduces the REJECT_LIMIT option, allowing users to
specify the maximum number of erroneous rows that can be skipped.
If more rows encounter data type conversion errors than allowed by
REJECT_LIMIT, the COPY command will fail with an error, even when
ON_ERROR = 'ignore'.

Author: Atsushi Torikoshi
Reviewed-by: Junwang Zhao, Kirill Reshke, jian he, Fujii Masao
Discussion: https://postgr.es/m/63f99327aa6b404cc951217fa3e61fe4@oss.nttdata.com
2024-10-08 18:19:58 +09:00
8275325a06 Restrict password hash length.
Commit 6aa44060a3 removed pg_authid's TOAST table because the only
varlena column is rolpassword, which cannot be de-TOASTed during
authentication because we haven't selected a database yet and
cannot read pg_class.  Since that change, attempts to set password
hashes that require out-of-line storage will fail with a "row is
too big" error.  This error message might be confusing to users.

This commit places a limit on the length of password hashes so that
attempts to set long password hashes will fail with a more
user-friendly error.  The chosen limit of 512 bytes should be
sufficient to avoid "row is too big" errors independent of BLCKSZ,
but it should also be lenient enough for all reasonable use-cases
(or at least all the use-cases we could imagine).

Reviewed-by: Tom Lane, Jonathan Katz, Michael Paquier, Jacob Champion
Discussion: https://postgr.es/m/89e8649c-eb74-db25-7945-6d6b23992394%40gmail.com
2024-10-07 10:56:16 -05:00
094ae07160 Remove unneeded #include
Unneeded since commit d72731a704.

Discussion: https://www.postgresql.org/message-id/391abe21-413e-4d91-a650-b663af49500c@iki.fi
2024-10-05 15:09:32 +03:00
6c0c49f7d3 Remove unused latch
It was left unused by commit bc971f4025, which replaced the latch
usage with a condition variable

Discussion: https://www.postgresql.org/message-id/391abe21-413e-4d91-a650-b663af49500c@iki.fi
2024-10-05 15:09:27 +03:00
e7834a1a25 Add log_verbosity = 'silent' support to COPY command.
Previously, when the on_error option was set to ignore, the COPY command
would always log NOTICE messages for input rows discarded due to
data type incompatibility. Users had no way to suppress these messages.

This commit introduces a new log_verbosity setting, 'silent',
which prevents the COPY command from emitting NOTICE messages
when on_error = 'ignore' is used, even if rows are discarded.
This feature is particularly useful when processing malformed files
frequently, where a flood of NOTICE messages can be undesirable.

For example, when frequently loading malformed files via the COPY command
or querying foreign tables using file_fdw (with an upcoming patch to
add on_error support for file_fdw), users may prefer to suppress
these messages to reduce log noise and improve clarity.

Author: Atsushi Torikoshi
Reviewed-by: Masahiko Sawada, Fujii Masao
Discussion: https://postgr.es/m/ab59dad10490ea3734cf022b16c24cfd@oss.nttdata.com
2024-10-03 15:55:37 +09:00
d94cf5ca7f File size in a backup manifest should use uint64, not size_t.
size_t is the size of an object in memory, not the size of a file on disk.

Thanks to Tom Lane for noting the error.

Discussion: http://postgr.es/m/1865585.1727803933@sss.pgh.pa.us
2024-10-02 09:59:04 -04:00
17cc5f666f Fix inconsistent reporting of checkpointer stats.
Previously, the pg_stat_checkpointer view and the checkpoint completion
log message could show different numbers for buffers written
during checkpoints. The view only counted shared buffers,
while the log message included both shared and SLRU buffers,
causing inconsistencies.

This commit resolves the issue by updating both the view and the log message
to separately report shared and SLRU buffers written during checkpoints.
A new slru_written column is added to the pg_stat_checkpointer view
to track SLRU buffers, while the existing buffers_written column now
tracks only shared buffers. This change would help users distinguish
between the two types of buffers, in the pg_stat_checkpointer view and
the checkpoint complete log message, respectively.

Bump catalog version.

Author: Nitin Jadhav
Reviewed-by: Bharath Rupireddy, Michael Paquier, Kyotaro Horiguchi, Robert Haas
Reviewed-by: Andres Freund, vignesh C, Fujii Masao
Discussion: https://postgr.es/m/CAMm1aWb18EpT0whJrjG+-nyhNouXET6ZUw0pNYYAe+NezpvsAA@mail.gmail.com
2024-10-02 11:17:47 +09:00
10b721821d Use macro to define the number of enum values
Refactoring in the interest of code consistency, a follow-up to 2e068db56e31.

The argument against inserting a special enum value at the end of the enum
definition is that a switch statement might generate a compiler warning unless
it has a default clause.

Aleksander Alekseev, reviewed by Michael Paquier, Dean Rasheed, Peter Eisentraut

Discussion: https://postgr.es/m/CAJ7c6TMsiaV5urU_Pq6zJ2tXPDwk69-NKVh4AMN5XrRiM7N%2BGA%40mail.gmail.com
2024-10-01 09:30:24 -04:00
9c2a6c5a5f Simplify checking for xlocale.h
Instead of XXX_IN_XLOCALE_H for several features XXX, let's just
include <xlocale.h> if HAVE_XLOCALE_H.  The reason for the extra
complication was apparently that some old glibc systems also had an
<xlocale.h>, and you weren't supposed to include it directly, but it's
gone now (as far as I can tell it was harmless to do so anyway).

Author: Thomas Munro <thomas.munro@gmail.com>
Discussion: https://postgr.es/m/CWZBBRR6YA8D.8EHMDRGLCKCD%40neon.tech
2024-10-01 07:23:45 -04:00
ee4859123e jit: Use opaque pointers in all supported LLVM versions.
LLVM's opaque pointer change began in LLVM 14, but remained optional
until LLVM 16.  When commit 37d5babb added opaque pointer support, we
didn't turn it on for LLVM 14 and 15 yet because we didn't want to risk
weird bitcode incompatibility problems in released branches of
PostgreSQL.  (That might have been overly cautious, I don't know.)

Now that PostgreSQL 18 has dropped support for LLVM versions < 14, and
since it hasn't been released yet and no extensions or bitcode have been
built against it in the wild yet, we can be more aggressive.  We can rip
out the support code and build system clutter that made opaque pointer
use optional.

Author: Thomas Munro <thomas.munro@gmail.com>
Reviewed-by: Peter Eisentraut <peter@eisentraut.org>
Discussions: https://postgr.es/m/CA%2BhUKGLhNs5geZaVNj2EJ79Dx9W8fyWUU3HxcpZy55sMGcY%3DiA%40mail.gmail.com
2024-10-01 06:10:15 -04:00
4c7cd07aa6 Bump catalog version for change in VariableSetStmt
Oversight in dc68515968e8, as this breaks SQL functions with a SET
command.

Reported-by: Tom Lane
Discussion: https://postgr.es/m/1364409.1727673407@sss.pgh.pa.us
2024-09-30 14:52:03 +09:00
dc68515968 Show values of SET statements as constants in pg_stat_statements
This is a continuation of work like 11c34b342bd7, done to reduce the
bloat of pg_stat_statements by applying more normalization to query
entries.  This commit is able to detect and normalize values in
VariableSetStmt, resulting in:
SET conf_param = $1

Compared to other parse nodes, VariableSetStmt is embedded in much more
places in the parser, impacting many query patterns in
pg_stat_statements.  A custom jumble function is used, with an extra
field in the node to decide if arguments should be included in the
jumbling or not, a location field being not enough for this purpose.
This approach allows for a finer tuning.

Clauses relying on one or more keywords are not normalized, for example:
* DEFAULT
* FROM CURRENT
* List of keywords.  SET SESSION CHARACTERISTICS AS TRANSACTION,
where it is critical to differentiate different sets of options, is a
good example of why normalization should not happen.

Some queries use VariableSetStmt for some subclauses with SET, that also
have their values normalized:
- ALTER DATABASE
- ALTER ROLE
- ALTER SYSTEM
- CREATE/ALTER FUNCTION

ba90eac7a995 has added test coverage for most of the existing SET
patterns.  The expected output of these tests shows the difference this
commit creates.  Normalization could be perhaps applied to more portions
of the grammar but what is done here is conservative, and good enough as
a starting point.

Author: Greg Sabino Mullane, Michael Paquier
Discussion: https://postgr.es/m/36e5bffe-e989-194f-85c8-06e7bc88e6f7@amazon.com
Discussion: https://postgr.es/m/B44FA29D-EBD0-4DD9-ABC2-16F1CB087074@amazon.com
Discussion: https://postgr.es/m/CAKAnmmJtJY2jzQN91=2QAD2eAJAA-Per61eyO48-TyxEg-q0Rg@mail.gmail.com
2024-09-30 14:02:00 +09:00
559efce1d6 Add num_done counter to the pg_stat_checkpointer view.
Checkpoints can be skipped when the server is idle. The existing num_timed and
num_requested counters in pg_stat_checkpointer track both completed and
skipped checkpoints, but there was no way to count only the completed ones.

This commit introduces the num_done counter, which tracks only completed
checkpoints, making it easier to see how many were actually performed.

Bump catalog version.

Author: Anton A. Melnikov
Reviewed-by: Fujii Masao
Discussion: https://postgr.es/m/9ea77f40-818d-4841-9dee-158ac8f6e690@oss.nttdata.com
2024-09-30 11:56:05 +09:00
a3179ab692 Recalculate where-needed data accurately after a join removal.
Up to now, remove_rel_from_query() has done a pretty shoddy job
of updating our where-needed bitmaps (per-Var attr_needed and
per-PlaceHolderVar ph_needed relid sets).  It removed direct mentions
of the to-be-removed baserel and outer join, which is the minimum
amount of effort needed to keep the data structures self-consistent.
But it didn't account for the fact that the removed join ON clause
probably mentioned Vars of other relations, and those Vars might now
not be needed as high up in the join tree as before.  It's easy to
show cases where this results in failing to remove a lower outer join
that could also have been removed.

To fix, recalculate the where-needed bitmaps from scratch after
each successful join removal.  This sounds expensive, but it seems
to add only negligible planner runtime.  (We cheat a little bit
by preserving "relation 0" entries in the bitmaps, allowing us to
skip re-scanning the targetlist and HAVING qual.)

The submitted test case drew attention because we had successfully
optimized away the lower join prior to v16.  I suspect that that's
somewhat accidental and there are related cases that were never
optimized before and now can be.  I've not tried to come up with
one, though.

Perhaps we should back-patch this into v16 and v17 to repair the
performance regression.  However, since it took a year for anyone
to notice the problem, it can't be affecting too many people.  Let's
let the patch bake awhile in HEAD, and see if we get more complaints.

Per bug #18627 from Mikaël Gourlaouen.  No back-patch for now.

Discussion: https://postgr.es/m/18627-44f950eb6a8416c2@postgresql.org
2024-09-27 16:04:04 -04:00
8dfd312902 pg_verifybackup: Verify tar-format backups.
This also works for compressed tar-format backups. However, -n must be
used, because we use pg_waldump to verify WAL, and it doesn't yet know
how to verify WAL that is stored inside of a tarfile.

Amul Sul, reviewed by Sravan Kumar and by me, and revised by me.
2024-09-27 08:40:24 -04:00
f762d99c87 Fix catalog data of new LO privilege functions
This commit improves the catalog data in pg_proc for the three functions
for has_largeobject_privilege(), introduced in 4eada203a5a8:
- Fix their descriptions (typos and consistency).
- Reallocate OIDs to be within the 8000-9999 range as required by
a6417078c414.

Bump catalog version.

Reviewed-by: Fujii Masao
Discussion: https://postgr.es/m/ZvUYR0V0dzWaLnsV@paquier.xyz
2024-09-27 07:26:29 +09:00
e658038772 Update oid for pg_wal_replay_wait() procedure
Use an oid from 8000-9999 range, as required by 98eab30b93d5.

Reported-by: Michael Paquier
Discussion: https://postgr.es/m/ZvUY6bfTwB0GsyzP%40paquier.xyz
2024-09-26 11:49:41 +03:00
aac2c9b4fd For inplace update durability, make heap_update() callers wait.
The previous commit fixed some ways of losing an inplace update.  It
remained possible to lose one when a backend working toward a
heap_update() copied a tuple into memory just before inplace update of
that tuple.  In catalogs eligible for inplace update, use LOCKTAG_TUPLE
to govern admission to the steps of copying an old tuple, modifying it,
and issuing heap_update().  This includes MERGE commands.  To avoid
changing most of the pg_class DDL, don't require LOCKTAG_TUPLE when
holding a relation lock sufficient to exclude inplace updaters.
Back-patch to v12 (all supported versions).  In v13 and v12, "UPDATE
pg_class" or "UPDATE pg_database" can still lose an inplace update.  The
v14+ UPDATE fix needs commit 86dc90056dfdbd9d1b891718d2e5614e3e432f35,
and it wasn't worth reimplementing that fix without such infrastructure.

Reviewed by Nitin Motiani and (in earlier versions) Heikki Linnakangas.

Discussion: https://postgr.es/m/20231027214946.79.nmisch@google.com
2024-09-24 15:25:18 -07:00
a07e03fd8f Fix data loss at inplace update after heap_update().
As previously-added tests demonstrated, heap_inplace_update() could
instead update an unrelated tuple of the same catalog.  It could lose
the update.  Losing relhasindex=t was a source of index corruption.
Inplace-updating commands like VACUUM will now wait for heap_update()
commands like GRANT TABLE and GRANT DATABASE.  That isn't ideal, but a
long-running GRANT already hurts VACUUM progress more just by keeping an
XID running.  The VACUUM will behave like a DELETE or UPDATE waiting for
the uncommitted change.

For implementation details, start at the systable_inplace_update_begin()
header comment and README.tuplock.  Back-patch to v12 (all supported
versions).  In back branches, retain a deprecated heap_inplace_update(),
for extensions.

Reported by Smolkin Grigory.  Reviewed by Nitin Motiani, (in earlier
versions) Heikki Linnakangas, and (in earlier versions) Alexander
Lakhin.

Discussion: https://postgr.es/m/CAMp+ueZQz3yDk7qg42hk6-9gxniYbp-=bG2mgqecErqR5gGGOA@mail.gmail.com
2024-09-24 15:25:18 -07:00
ac30021356 Allow length=-1 for NUL-terminated input to pg_strncoll(), etc.
Like ICU, allow a length of -1 to be specified for NUL-terminated
arguments to pg_strncoll(), pg_strnxfrm(), and pg_strnxfrm_prefix().

Simplifies the code and comments.

Discussion: https://postgr.es/m/2d758e07dff26bcc7cbe2aec57431329bfe3679a.camel@j-davis.com
2024-09-24 15:15:18 -07:00
ceeaaed87a Tighten up make_libc_collator() and make_icu_collator().
Ensure that error paths within these functions do not leak a collator,
and return the result rather than using an out parameter. (Error paths
in the caller may still result in a leaked collator, which will be
addressed separately.)

In make_libc_collator(), if the first newlocale() succeeds and the
second one fails, close the first locale_t object.

The function make_icu_collator() doesn't have any external callers, so
change it to be static.

Discussion: https://postgr.es/m/54d20e812bd6c3e44c10eddcd757ec494ebf1803.camel@j-davis.com
2024-09-24 12:01:45 -07:00
6aa44060a3 Remove pg_authid's TOAST table.
pg_authid's only varlena column is rolpassword, which unfortunately
cannot be de-TOASTed during authentication because we haven't
selected a database yet and cannot read pg_class.  By removing
pg_authid's TOAST table, attempts to set password hashes that
require out-of-line storage will fail with a "row is too big"
error instead.  We may want to provide a more user-friendly error
in the future, but for now let's just remove the useless TOAST
table.

Bumps catversion.

Reported-by: Alexander Lakhin
Reviewed-by: Tom Lane, Michael Paquier
Discussion: https://postgr.es/m/89e8649c-eb74-db25-7945-6d6b23992394%40gmail.com
2024-09-21 15:17:46 -05:00
c4d5cb71d2 Increase the number of fast-path lock slots
Replace the fixed-size array of fast-path locks with arrays, sized on
startup based on max_locks_per_transaction. This allows using fast-path
locking for workloads that need more locks.

The fast-path locking introduced in 9.2 allowed each backend to acquire
a small number (16) of weak relation locks cheaply. If a backend needs
to hold more locks, it has to insert them into the shared lock table.
This is considerably more expensive, and may be subject to contention
(especially on many-core systems).

The limit of 16 fast-path locks was always rather low, because we have
to lock all relations - not just tables, but also indexes, views, etc.
For planning we need to lock all relations that might be used in the
plan, not just those that actually get used in the final plan. So even
with rather simple queries and schemas, we often need significantly more
than 16 locks.

As partitioning gets used more widely, and the number of partitions
increases, this limit is trivial to hit. Complex queries may easily use
hundreds or even thousands of locks. For workloads doing a lot of I/O
this is not noticeable, but for workloads accessing only data in RAM,
the access to the shared lock table may be a serious issue.

This commit removes the hard-coded limit of the number of fast-path
locks. Instead, the size of the fast-path arrays is calculated at
startup, and can be set much higher than the original 16-lock limit.
The overall fast-path locking protocol remains unchanged.

The variable-sized fast-path arrays can no longer be part of PGPROC, but
are allocated as a separate chunk of shared memory and then references
from the PGPROC entries.

The fast-path slots are organized as a 16-way set associative cache. You
can imagine it as a hash table of 16-slot "groups". Each relation is
mapped to exactly one group using hash(relid), and the group is then
processed using linear search, just like the original fast-path cache.
With only 16 entries this is cheap, with good locality.

Treating this as a simple hash table with open addressing would not be
efficient, especially once the hash table gets almost full. The usual
remedy is to grow the table, but we can't do that here easily. The
access would also be more random, with worse locality.

The fast-path arrays are sized using the max_locks_per_transaction GUC.
We try to have enough capacity for the number of locks specified in the
GUC, using the traditional 2^n formula, with an upper limit of 1024 lock
groups (i.e. 16k locks). The default value of max_locks_per_transaction
is 64, which means those instances will have 64 fast-path slots.

The main purpose of the max_locks_per_transaction GUC is to size the
shared lock table. It is often set to the "average" number of locks
needed by backends, with some backends using significantly more locks.
This should not be a major issue, however. Some backens may have to
insert locks into the shared lock table, but there can't be too many of
them, limiting the contention.

The only solution is to increase the GUC, even if the shared lock table
already has sufficient capacity. That is not free, especially in terms
of memory usage (the shared lock table entries are fairly large). It
should only happen on machines with plenty of memory, though.

In the future we may consider a separate GUC for the number of fast-path
slots, but let's try without one first.

Reviewed-by: Robert Haas, Jakub Wartak
Discussion: https://postgr.es/m/510b887e-c0ce-4a0c-a17a-2c6abb8d9a5c@enterprisedb.com
2024-09-21 20:09:35 +02:00
54562c9cfa Improve Asserts checking relation matching in parallel scans.
table_beginscan_parallel and index_beginscan_parallel contain
Asserts checking that the relation a worker will use in
a parallel scan is the same one the leader intended.  However,
they were checking for relation OID match, which was not strong
enough to detect the mismatch problem fixed in 126ec0bc7.
What would be strong enough is to compare relfilenodes instead.
Arguably, that's a saner definition anyway, since a scan surely
operates on a physical relation not a logical one.  Hence,
store and compare RelFileLocators not relation OIDs.  Also
ensure that index_beginscan_parallel checks the index identity
not just the table identity.

Discussion: https://postgr.es/m/2127254.1726789524@sss.pgh.pa.us
2024-09-20 16:37:55 -04:00