SLRU bank locks are referred as "bank locks" or "SLRU bank locks" in the
code comments. The comments updated in this commit use the latter term.
Oversight in 53c2a97a9266, that has replaced the single ControlLock by
the bank control locks.
Author: Julien Rouhaud <julien.rouhaud@free.fr>
Discussion: https://postgr.es/m/aLUT2UO8RjJOzZNq@jrouhaud
Backpatch-through: 17
Note that this doesn't require bumping SLOT_VERSION because we
require sizeof(bool) == 1, thanks to commit 97525bc5c8.
Overight in commit ddd5f4f54a.
Discussion: Ranier Vilela <ranier.vf@gmail.com>
This change replaces seven functions definitions by macros, reducing a
bit some repetitive patterns in the code. An interesting side effect is
that this removes an inconsistency in the naming of SLRU increment
functions with the field names.
This change is similar to 850f4b4c8cab, 8018ffbf5895 or 83a1a1b56645.
Author: Bertrand Drouvot <bertranddrouvot.pg@gmail.com>
Discussion: https://postgr.es/m/aLHA//gr4dTpDHHC@ip-10-97-1-34.eu-west-3.compute.internal
This commit introduces a new subscription parameter,
max_retention_duration, aimed at mitigating excessive accumulation of dead
tuples when retain_dead_tuples is enabled and the apply worker lags behind
the publisher.
When the time spent advancing a non-removable transaction ID exceeds the
max_retention_duration threshold, the apply worker will stop retaining
conflict detection information. In such cases, the conflict slot's xmin
will be set to InvalidTransactionId, provided that all apply workers
associated with the subscription (with retain_dead_tuples enabled) confirm
the retention duration has been exceeded.
To ensure retention status persists across server restarts, a new column
subretentionactive has been added to the pg_subscription catalog. This
prevents unnecessary reactivation of retention logic after a restart.
The conflict detection slot will not be automatically re-initialized
unless a new subscription is created with retain_dead_tuples = true, or
the user manually re-enables retain_dead_tuples.
A future patch will introduce support for automatic slot re-initialization
once at least one apply worker confirms that the retention duration is
within the configured max_retention_duration.
Author: Zhijie Hou <houzj.fnst@fujitsu.com>
Reviewed-by: shveta malik <shveta.malik@gmail.com>
Reviewed-by: Nisha Moond <nisha.moond412@gmail.com>
Reviewed-by: Masahiko Sawada <sawada.mshk@gmail.com>
Reviewed-by: Dilip Kumar <dilipbalaut@gmail.com>
Reviewed-by: Amit Kapila <amit.kapila16@gmail.com>
Discussion: https://postgr.es/m/OS0PR01MB5716BE80DAEB0EE2A6A5D1F5949D2@OS0PR01MB5716.jpnprd01.prod.outlook.com
Using the LWLockCounter requires first calculating its address in
shared memory like this:
LWLockCounter = (int *) ((char *) MainLWLockArray - sizeof(int));
Commit 82e861fbe1 started this trend in order to fix EXEC_BACKEND
builds, but it could also be fixed by adding it to the
BackendParameters struct. The current approach is somewhat
difficult to follow, so this commit switches to the latter. While
at it, swap around the code in LWLockShmemSize() to match the order
of assignments in CreateLWLocks() for added readability.
Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us>
Reviewed-by: Bertrand Drouvot <bertranddrouvot.pg@gmail.com>
Discussion: https://postgr.es/m/aLDLnan9gNCS9fHx%40nathan
This has been done historically because of get_database_name (which
since commit cb98e6fb8fd4 belongs in lsyscache.c/h, so let's move it
there) and get_database_oid (which is in the right place, but whose
declaration should appear in pg_database.h rather than dbcommands.h).
Clean this up.
Also, xlogreader.h and stringinfo.h are no longer needed by dbcommands.h
since commit f1fd515b393a, so remove them.
Author: Álvaro Herrera <alvherre@kurilemu.de>
Reviewed-by: Bertrand Drouvot <bertranddrouvot.pg@gmail.com>
Reviewed-by: Michael Paquier <michael@paquier.xyz>
Discussion: https://postgr.es/m/202508191031.5ipojyuaswzt@alvherre.pgsql
During an investigation into rather odd aio related errors on macos, observed
by Alexander and Konstantin, we started to wonder if bitfield access is
related to the error. At the moment it looks like it is related, we cannot
reproduce the failures when replacing the bitfields. In addition, the problem
can only be reproduced with some compiler [versions] and not everyone has been
able to reproduce the issue.
The observed problem is that, very rarely, PgAioHandle->{state,target} are in
an inconsistent state, after having been checked to be in a valid state not
long before, triggering an assertion failure. Unfortunately, this could be
caused by wrong compiler code generation or somehow of missing memory barriers
- we don't really know. In theory there should not be any concurrent write
access to the handle in the state the bug is triggered, as the handle was idle
and is just being initialized.
Separately from the bug, we observed that at least gcc and clang generate
rather terrible code for the bitfield access. Even if it's not clear if the
observed assertion failure is actually caused by the bitfield somehow, the bad
code generation alone is sufficient reason to stop using bitfields.
Therefore, replace the enum bitfields with uint8s and instead cast in each
switch statement.
Reported-by: Alexander Lakhin <exclusion@gmail.com>
Reported-by: Konstantin Knizhnik <knizhnik@garret.ru>
Discussion: https://postgr.es/m/1500090.1745443021@sss.pgh.pa.us
Backpatch-through: 18
Commit 1585ff7387 changed GetTransactionSnapshot() to throw an error
if it's called during logical decoding, instead of returning the
historic snapshot. I made that change for extra protection, because a
historic snapshot can only be used to access catalog tables while
GetTransactionSnapshot() is usually called when you're executing
arbitrary queries. You might get very subtle visibility problems if
you tried to use the historic snapshot for arbitrary queries.
There's no built-in code in PostgreSQL that calls
GetTransactionSnapshot() during logical decoding, but it turns out
that the pglogical extension does just that, to evaluate row filter
expressions. You would get weird results if the row filter runs
arbitrary queries, but it is sane as long as you don't access any
non-catalog tables. Even though there are no checks to enforce that in
pglogical, a typical row filter expression does not access any tables
and works fine. Accessing tables marked with the user_catalog_table =
true option is also OK.
To fix pglogical with row filters, and any other extensions that might
do similar things, revert GetTransactionSnapshot() to return a
historic snapshot during logical decoding.
To try to still catch the unsafe usage of historic snapshots, add
checks in heap_beginscan() and index_beginscan() to complain if you
try to use a historic snapshot to scan a non-catalog table. We're very
close to the version 18 release however, so add those new checks only
in master.
Backpatch-through: 18
Reported-by: Noah Misch <noah@leadboat.com>
Reviewed-by: Noah Misch <noah@leadboat.com>
Discussion: https://www.postgresql.org/message-id/20250809222338.cc.nmisch@google.com
This code was relying on "long", which is signed 8 bytes everywhere
except on Windows where it is 4 bytes, that could potentially expose it
to overflows, even if the current uses in the code are fine as far as I
know. This code is now able to rely on the same sizeof() variable
everywhere, with int64. long was used for sizes, partition counts and
entry counts.
Some callers of the dynahash.c routines used long declarations, that can
be cleaned up to use int64 instead. There was one shortcut based on
SIZEOF_LONG, that can be removed. long is entirely removed from
dynahash.c and hsearch.h.
Similar work was done in b1e5c9fa9ac4.
Reviewed-by: Peter Eisentraut <peter@eisentraut.org>
Reviewed-by: Chao Li <li.evan.chao@gmail.com>
Discussion: https://postgr.es/m/aKQYp-bKTRtRauZ6@paquier.xyz
This just includes a link to the bki documentation, to help people get
started.
Before commit 372728b0d49, there was a README at
src/backend/catalog/README, but then this was moved to the SGML
documentation. So this effectively puts back a link to what was
moved. But src/include/catalog/ is probably a better location,
because that's where all the interesting files are.
Co-authored-by: Florents Tselai <florents.tselai@gmail.com>
Discussion: https://www.postgresql.org/message-id/flat/CA+v5N400GJFJ9RyXAX7hFKbtF7vVQGvWdFWEfcSQmvVhi9xfrA@mail.gmail.com
This provides a single entry point to access some information about the
state of MultiXacts, able to return some data about multixacts offsets
and counts. Originally this function was only able to return some
information about the number of multixacts and multixact members,
extended here to provide some data about the oldest multixact ID in use
and the oldest offset, if known.
This change has been proposed in a patch that aims at providing more
monitoring capabilities for multixacts, and it is useful on its own.
GetMultiXactInfo() is added to multixact.h, becoming available for
out-of-core code.
Extracted from a larger patch by the same author.
Author: Naga Appani <nagnrik@gmail.com>
Reviewed-by: Ashutosh Bapat <ashutosh.bapat.oss@gmail.com>
Reviewed-by: Michael Paquier <michael@paquier.xyz>
Discussion: https://postgr.es/m/CA+QeY+AAsYK6WvBW4qYzHz4bahHycDAY_q5ECmHkEV_eB9ckzg@mail.gmail.com
There are two implementation techniques for semijoins: one uses the
JOIN_SEMI jointype, where the executor emits at most one matching row
per left-hand side (LHS) row; the other unique-ifies the right-hand
side (RHS) and then performs a plain inner join.
The latter technique currently has some drawbacks related to the
unique-ification step.
* Only the cheapest-total path of the RHS is considered during
unique-ification. This may cause us to miss some optimization
opportunities; for example, a path with a better sort order might be
overlooked simply because it is not the cheapest in total cost. Such
a path could help avoid a sort at a higher level, potentially
resulting in a cheaper overall plan.
* We currently rely on heuristics to choose between hash-based and
sort-based unique-ification. A better approach would be to generate
paths for both methods and allow add_path() to decide which one is
preferable, consistent with how path selection is handled elsewhere in
the planner.
* In the sort-based implementation, we currently pay no attention to
the pathkeys of the input subpath or the resulting output. This can
result in redundant sort nodes being added to the final plan.
This patch improves semijoin planning by creating a new RelOptInfo for
the RHS rel to represent its unique-ified version. It then generates
multiple paths that represent elimination of distinct rows from the
RHS, considering both a hash-based implementation using the cheapest
total path of the original RHS rel, and sort-based implementations
that either exploit presorted input paths or explicitly sort the
cheapest total path. All resulting paths compete in add_path(), and
those deemed worthy of consideration are added to the new RelOptInfo.
Finally, the unique-ified rel is joined with the other side of the
semijoin using a plain inner join.
As a side effect, most of the code related to the JOIN_UNIQUE_OUTER
and JOIN_UNIQUE_INNER jointypes -- used to indicate that the LHS or
RHS path should be made unique -- has been removed. Besides, the
T_Unique path now has the same meaning for both semijoins and upper
DISTINCT clauses: it represents adjacent-duplicate removal on
presorted input. This patch unifies their handling by sharing the
same data structures and functions.
This patch also removes the UNIQUE_PATH_NOOP related code along the
way, as it is dead code -- if the RHS rel is provably unique, the
semijoin should have already been simplified to a plain inner join by
analyzejoins.c.
Author: Richard Guo <guofenglinux@gmail.com>
Reviewed-by: Alexandra Wang <alexandra.wang.oss@gmail.com>
Reviewed-by: wenhui qiu <qiuwenhuifx@gmail.com>
Discussion: https://postgr.es/m/CAMbWs4-EBnaRvEs7frTLbsXiweSTUXifsteF-d3rvv01FKO86w@mail.gmail.com
Previously this was being output to stderr. This commit adjusts things
to use elog(DEBUG4). Here we also adjust the format of the message to
add the hash table name and also put the message on a single line. This
should make grepping the logs for this information easier.
Also get rid of the global hash table statistics. This seems very dated
and didn't fit very well with trying to put all the statistics for a
specific hash table on a single log line.
The main aim here is to allow it so we can have at least one buildfarm
member build with HASH_STATISTICS to help prevent future changes from
breaking things in that area. ca3891251 recently fixed some issues
here.
In passing, switch to using uint64 data types rather than longs for the
usage counters. The long type is 32 bits on some platforms we support.
Author: David Rowley <dgrowleyml@gmail.com>
Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us>
Reviewed-by: Michael Paquier <michael@paquier.xyz>
Discussion: https://postgr.es/m/CAApHDvoccvJ9CG5zx+i-EyCzJbcL5K=CzqrnL_YN59qaL5hiaw@mail.gmail.com
Doing that seems rather random and unnecessary. This commit removes
those and fixes fallout, which is pretty minimal. We do need to add a
forward declaration of struct TM_IndexDeleteOp (whose full definition
appears in tableam.h) so that _bt_delitems_delete_check()'s declaration
can use it.
Author: Álvaro Herrera <alvherre@kurilemu.de>
Reviewed-by: Bertrand Drouvot <bertranddrouvot.pg@gmail.com>
Discussion: https://postgr.es/m/202508051109.lzk3lcuzsaxo@alvherre.pgsql
Remove conditionally-compiled code for the other case.
Replace uses of FLOAT8PASSBYVAL with constant "true", mainly because
it was quite confusing in cases where the type we were dealing with
wasn't float8.
I left the associated pg_control and Pg_magic_struct fields in place.
Perhaps we should get rid of them, but it would save little, so it
doesn't seem worth thinking hard about the compatibility implications.
I just labeled them "vestigial" in places where that seemed helpful.
Author: Tom Lane <tgl@sss.pgh.pa.us>
Reviewed-by: Peter Eisentraut <peter@eisentraut.org>
Discussion: https://postgr.es/m/1749799.1752797397@sss.pgh.pa.us
Remove conditionally-compiled code for smaller Datum widths,
and simplify comments that describe cases no longer of interest.
I also fixed up a few more places that were not using
DatumGetIntXX where they should, and made some cosmetic
adjustments such as using sizeof(int64) not sizeof(Datum)
in places where that fit better with the surrounding code.
One thing I remembered while preparing this part is that SP-GiST
stores pass-by-value prefix keys as Datums, so that the on-disk
representation depends on sizeof(Datum). That's even more
unfortunate than the existing commentary makes it out to be,
because now there is a hazard that the change of sizeof(Datum)
will break SP-GiST indexes on 32-bit machines. It appears that
there are no existing SP-GiST opclasses that are actually
affected; and if there are some that I didn't find, the number
of installations that are using them on 32-bit machines is
doubtless tiny. So I'm proceeding on the assumption that we
can get away with this, but it's something to worry about.
(gininsert.c looks like it has a similar problem, but it's okay
because the "tuples" it's constructing are just transient data
within the tuplesort step. That's pretty poorly documented
though, so I added some comments.)
Author: Tom Lane <tgl@sss.pgh.pa.us>
Reviewed-by: Peter Eisentraut <peter@eisentraut.org>
Discussion: https://postgr.es/m/1749799.1752797397@sss.pgh.pa.us
This patch makes sizeof(Datum) be 8 on all platforms including
32-bit ones. The objective is to allow USE_FLOAT8_BYVAL to be true
everywhere, and in consequence to remove a lot of code that is
specific to pass-by-reference handling of float8, int8, etc. The
code for abbreviated sort keys can be simplified similarly. In this
way we can reduce the maintenance effort involved in supporting 32-bit
platforms, without going so far as to actually desupport them. Since
Datum is strictly an in-memory concept, this has no impact on on-disk
storage, though an initdb or pg_upgrade will be needed to fix affected
catalog entries.
We have required platforms to support [u]int64 for ages, so this
breaks no supported platform. We can expect that this change will
make 32-bit builds a bit slower and more memory-hungry, although being
able to use pass-by-value handling of 8-byte types may buy back some
of that. But we stopped optimizing for 32-bit cases a long time ago,
and this seems like just another step on that path.
This initial patch simply forces the correct type definition and
USE_FLOAT8_BYVAL setting, and cleans up a couple of minor compiler
complaints that ensued. This is sufficient for testing purposes.
In the wake of a bunch of Datum-conversion cleanups by Peter
Eisentraut, this now compiles cleanly with gcc on a 32-bit platform.
(I'd only tested the previous version with clang, which it turns out
is less picky than gcc about width-changing coercions.)
There is a good deal of now-dead code that I'll remove in separate
follow-up patches.
A catversion bump is required because this affects initial catalog
contents (on 32-bit machines) in two ways: pg_type.typbyval changes
for some built-in types, and Const nodes in stored views/rules will
now have 8 bytes not 4 for pass-by-value types.
Author: Tom Lane <tgl@sss.pgh.pa.us>
Reviewed-by: Peter Eisentraut <peter@eisentraut.org>
Discussion: https://postgr.es/m/1749799.1752797397@sss.pgh.pa.us
Without using stamp files, meson lists the generated headers as the dependency
for every .c file, bloating build.ninja by more than 2x. Processing all the
dependencies also increases the time to generate build.ninja.
The immediate benefit is that this makes re-configuring and clean builds a bit
faster. The main motivation however is that I have other patches that
introduce additional build targets that further would increase the size of
build.ninja, making re-configuring more noticeably slower.
Reviewed-by: Nazir Bilal Yavuz <byavuz81@gmail.com>
Discussion: https://postgr.es/m/cgkdgvzdpinkacf4v33mky7tbmk467oda5dd4dlmucjjockxzi@xkqfvjoq4uiy
Commit e2d4ef8de86 (the fix for CVE-2017-7484) added security checks
to the selectivity estimation functions to prevent them from running
user-supplied operators on data obtained from pg_statistic if the user
lacks privileges to select from the underlying table. In cases
involving inheritance/partitioning, those checks were originally
performed against the child RTE (which for plain inheritance might
actually refer to the parent table). Commit 553d2ec2710 then extended
that to also check the parent RTE, allowing access if the user had
permissions on either the parent or the child. It turns out, however,
that doing any checks using the child RTE is incorrect, since
securityQuals is set to NULL when creating an RTE for an inheritance
child (whether it refers to the parent table or the child table), and
therefore such checks do not correctly account for any RLS policies or
security barrier views. Therefore, do the security checks using only
the parent RTE. This is consistent with how RLS policies are applied,
and the executor's ACL checks, both of which use only the parent
table's permissions/policies. Similar checks are performed in the
extended stats code, so update that in the same way, centralizing all
the checks in a new function.
In addition, note that these checks by themselves are insufficient to
ensure that the user has access to the table's data because, in a
query that goes via a view, they only check that the view owner has
permissions on the underlying table, not that the current user has
permissions on the view itself. In the selectivity estimation
functions, there is no easy way to navigate from underlying tables to
views, so add permissions checks for all views mentioned in the query
to the planner startup code. If the user lacks permissions on a view,
a permissions error will now be reported at planner-startup, and the
selectivity estimation functions will not be run.
Checking view permissions at planner-startup in this way is a little
ugly, since the same checks will be repeated at executor-startup.
Longer-term, it might be better to move all the permissions checks
from the executor to the planner so that permissions errors can be
reported sooner, instead of creating a plan that won't ever be run.
However, such a change seems too far-reaching to be back-patched.
Back-patch to all supported versions. In v13, there is the added
complication that UPDATEs and DELETEs on inherited target tables are
planned using inheritance_planner(), which plans each inheritance
child table separately, so that the selectivity estimation functions
do not know that they are dealing with a child table accessed via its
parent. Handle that by checking access permissions on the top parent
table at planner-startup, in the same way as we do for views. Any
securityQuals on the top parent table are moved down to the child
tables by inheritance_planner(), so they continue to be checked by the
selectivity estimation functions.
Author: Dean Rasheed <dean.a.rasheed@gmail.com>
Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us>
Reviewed-by: Noah Misch <noah@leadboat.com>
Backpatch-through: 13
Security: CVE-2025-8713
Fix a couple more places where an explicit Datum conversion
is needed (not clear how we missed these in ff89e182d and
previous commits).
Replace the minority usage "(Datum) NULL" with "(Datum) 0".
The former depends on the assumption that Datum is the same
width as Pointer, the latter doesn't. Anyway consistency
is a good thing.
This is, I believe, the last of the notational mop-up needed
before we can consider changing Datum to uint64 everywhere.
It's also important cleanup for more aggressive ideas such
as making Datum a struct.
Discussion: https://postgr.es/m/1749799.1752797397@sss.pgh.pa.us
Discussion: https://postgr.es/m/8246d7ff-f4b7-4363-913e-827dadfeb145@eisentraut.org
Commit 9e6104c66 disallowed transition tables on foreign tables, but
failed to account for cases where a foreign table is a child table of a
partitioned/inherited table on which transition tables exist, leading to
incorrect transition tuples collected from such foreign tables for
queries on the parent table triggering transition capture. This
occurred not only for inherited UPDATE/DELETE but for partitioned INSERT
later supported by commit 3d956d956, which should have handled it at
least for the INSERT case, but didn't.
To fix, modify ExecAR*Triggers to throw an error if the given relation
is a foreign table requesting transition capture. Also, this commit
fixes make_modifytable so that in case of an inherited UPDATE/DELETE
triggering transition capture, FDWs choose normal operations to modify
child foreign tables, not DirectModify; which is needed because they
would otherwise skip the calls to ExecAR*Triggers at execution, causing
unexpected behavior.
Author: Etsuro Fujita <etsuro.fujita@gmail.com>
Reviewed-by: Amit Langote <amitlangote09@gmail.com>
Discussion: https://postgr.es/m/CAPmGK14QJYikKzBDCe3jMbpGENnQ7popFmbEgm-XTNuk55oyHg%40mail.gmail.com
Backpatch-through: 13
This adds a few more functions to int128.h, allowing more of numeric.c
to use 128-bit integers on all platforms.
Specifically, int64_div_fast_to_numeric() and the following aggregate
functions can now use 128-bit integers for improved performance on all
platforms, rather than just platforms with native support for int128:
- SUM(int8)
- AVG(int8)
- STDDEV_POP(int2 or int4)
- STDDEV_SAMP(int2 or int4)
- VAR_POP(int2 or int4)
- VAR_SAMP(int2 or int4)
In addition to improved performance on platforms lacking native
128-bit integer support, this significantly simplifies this numeric
code by allowing a lot of conditionally compiled code to be deleted.
A couple of numeric functions (div_var_int64() and sqrt_var()) still
contain conditionally compiled 128-bit integer code that only works on
platforms with native 128-bit integer support. Making those work more
generally would require rolling our own higher precision 128-bit
division, which isn't supported for now.
Author: Dean Rasheed <dean.a.rasheed@gmail.com>
Reviewed-by: John Naylor <johncnaylorls@gmail.com>
Discussion: https://postgr.es/m/CAEZATCWgBMc9ZwKMYqQpaQz2X6gaamYRB+RnMsUNcdMcL2Mj_w@mail.gmail.com
Recent ICU versions have added U_SHOW_CPLUSPLUS_HEADER_API, and we need
to set this to zero as well to hide the ICU C++ APIs from pg_locale.h
Per discussion, we want cpluspluscheck to work cleanly in backbranches,
so backpatch both this and its predecessor commit ed26c4e25a4 to all
supported versions.
Reported-by: Tom Lane <tgl@sss.pgh.pa.us>
Discussion: https://postgr.es/m/1115793.1754414782%40sss.pgh.pa.us
Backpatch-through: 13
In the non-native code in int128_add_int64_mul_int64(), use signed
64-bit integer multiplication instead of unsigned multiplication for
the first three product terms. This simplifies the code needed to add
each product term to the result, leading to more compact and efficient
code. The actual performance gain is quite modest, but it seems worth
it to improve the code's readability.
Author: Dean Rasheed <dean.a.rasheed@gmail.com>
Reviewed-by: John Naylor <johncnaylorls@gmail.com>
Discussion: https://postgr.es/m/CAEZATCWgBMc9ZwKMYqQpaQz2X6gaamYRB+RnMsUNcdMcL2Mj_w@mail.gmail.com
On platforms without native 128-bit integer support, simplify the test
for carry in int128_add_uint64() by noting that the low-part addition
is unsigned integer arithmetic, which is just modular arithmetic.
Therefore the test for carry can simply be written as "new value < old
value" (i.e., a test for modular wrap-around). This can then be made
branchless so that on modern compilers it produces the same machine
instructions as native 128-bit addition, making it significantly
simpler and faster.
Similarly, the test for carry in int128_add_int64() can be written in
much the same way, but with an extra term to compensate for the sign
of the value being added. Again, on modern compilers this leads to
branchless code, often identical to the native 128-bit integer
addition machine code.
Author: Dean Rasheed <dean.a.rasheed@gmail.com>
Reviewed-by: John Naylor <johncnaylorls@gmail.com>
Discussion: https://postgr.es/m/CAEZATCWgBMc9ZwKMYqQpaQz2X6gaamYRB+RnMsUNcdMcL2Mj_w@mail.gmail.com
This rearranges the code in src/include/common/int128.h, so that the
native and non-native implementations of each function are together
inside the function body (as they are in src/include/common/int.h),
rather than being in separate parts of the file.
This improves readability and maintainability, making it easier to
compare the native and non-native implementations, and avoiding the
need to duplicate every function comment and declaration.
Author: Dean Rasheed <dean.a.rasheed@gmail.com>
Reviewed-by: John Naylor <johncnaylorls@gmail.com>
Discussion: https://postgr.es/m/CAEZATCWgBMc9ZwKMYqQpaQz2X6gaamYRB+RnMsUNcdMcL2Mj_w@mail.gmail.com
These were introduced (commit efdc7d74753) at the same time as we were
moving to using the standard inttypes.h format macros (commit
a0ed19e0a9e). It doesn't seem useful to keep a new already-deprecated
interface like this with only a few users, so remove the new symbols
again and have the callers use PRIx64.
(Also, INT64_HEX_FORMAT was kind of a misnomer, since hex formats all
use unsigned types.)
Reviewed-by: Nathan Bossart <nathandbossart@gmail.com>
Reviewed-by: Thomas Munro <thomas.munro@gmail.com>
Discussion: https://www.postgresql.org/message-id/flat/0ac47b5d-e5ab-4cac-98a7-bdee0e2831e4%40eisentraut.org
This creates a new test module src/test/modules/test_int128 and moves
src/tools/testint128.c into it so that it can be built using the
normal build system, allowing the 128-bit integer arithmetic functions
in src/include/common/int128.h to be tested automatically. For now,
the tests are skipped on platforms that don't have native int128
support.
While at it, fix the test128 union in the test code: the "hl" member
of test128 was incorrectly defined to be a union instead of a struct,
which meant that the tests were only ever setting and checking half of
each 128-bit integer value.
Author: Dean Rasheed <dean.a.rasheed@gmail.com>
Reviewed-by: John Naylor <johncnaylorls@gmail.com>
Discussion: https://postgr.es/m/CAEZATCWgBMc9ZwKMYqQpaQz2X6gaamYRB+RnMsUNcdMcL2Mj_w@mail.gmail.com
We've only bothered converting the external interfaces, not the
endian-dependent internal macros (which should not be used by any
callers other than the interface functions in this header, anyway).
The VARTAG_1B_E() changes are required for C++ compatibility.
Author: Peter Eisentraut <peter@eisentraut.org>
Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us>
Discussion: https://postgr.es/m/928ea48f-77c6-417b-897c-621ef16685a6@eisentraut.org
This enhancement builds upon the infrastructure introduced in commit
228c370868, which enables the preservation of deleted tuples and their
origin information on the subscriber. This capability is crucial for
handling concurrent transactions replicated from remote nodes.
The update introduces support for detecting update_deleted conflicts
during the application of update operations on the subscriber. When an
update operation fails to locate the target row-typically because it has
been concurrently deleted-we perform an additional table scan. This scan
uses the SnapshotAny mechanism and we do this additional scan only when
the retain_dead_tuples option is enabled for the relevant subscription.
The goal of this scan is to locate the most recently deleted tuple-matching
the old column values from the remote update-that has not yet been removed
by VACUUM and is still visible according to our slot (i.e., its deletion
is not older than conflict-detection-slot's xmin). If such a tuple is
found, the system reports an update_deleted conflict, including the origin
and transaction details responsible for the deletion.
This provides a groundwork for more robust and accurate conflict
resolution process, preventing unexpected behavior by correctly
identifying cases where a remote update clashes with a deletion from
another origin.
Author: Zhijie Hou <houzj.fnst@fujitsu.com>
Reviewed-by: shveta malik <shveta.malik@gmail.com>
Reviewed-by: Nisha Moond <nisha.moond412@gmail.com>
Reviewed-by: Dilip Kumar <dilipbalaut@gmail.com>
Reviewed-by: Hayato Kuroda <kuroda.hayato@fujitsu.com>
Reviewed-by: Amit Kapila <amit.kapila16@gmail.com>
Discussion: https://postgr.es/m/OS0PR01MB5716BE80DAEB0EE2A6A5D1F5949D2@OS0PR01MB5716.jpnprd01.prod.outlook.com
These changes don't actually fix any leaks. They just make sure that
Valgrind will find pointers to data structures that remain allocated
at process exit, and thus not falsely complain about leaks. In
particular, we are trying to avoid situations where there is no
pointer to the beginning of an allocated block (except possibly
within the block itself, which Valgrind won't count).
* Because dynahash.c never frees hashtable storage except by deleting
the whole hashtable context, it doesn't bother to track the individual
blocks of elements allocated by element_alloc(). This results in
"possibly lost" complaints from Valgrind except when the first element
of each block is actively in use. (Otherwise it'll be on a freelist,
but very likely only reachable via "interior pointers" within element
blocks, which doesn't satisfy Valgrind.)
To fix, if we're building with USE_VALGRIND, expend an extra pointer's
worth of space in each element block so that we can chain them all
together from the HTAB header. Skip this in shared hashtables though:
Valgrind doesn't track those, and we'd need additional locking to make
it safe to manipulate a shared chain.
While here, update a comment obsoleted by 9c911ec06.
* Put the dlist_node fields of catctup and catclist structs first.
This ensures that the dlist pointers point to the starts of these
palloc blocks, and thus that Valgrind won't consider them
"possibly lost".
* The postmaster's PMChild structs and the autovac launcher's
avl_dbase structs also have the dlist_node-is-not-first problem,
but putting it first still wouldn't silence the warning because we
bulk-allocate those structs in an array, so that Valgrind sees a
single allocation. Commonly the first array element will be pointed
to only from some later element, so that the reference would be an
interior pointer even if it pointed to the array start. (This is the
same issue as for dynahash elements.) Since these are pretty simple
data structures, I don't feel too bad about faking out Valgrind by
just keeping a static pointer to the array start.
(This is all quite hacky, and it's not hard to imagine usages where
we'd need some other idea in order to have reasonable leak tracking of
structures that are only accessible via dlist_node lists. But these
changes seem to be enough to silence this class of leakage complaints
for the moment.)
* Free a couple of data structures manually near the end of an
autovacuum worker's run when USE_VALGRIND, and ensure that the final
vac_update_datfrozenxid() call is done in a non-permanent context.
This doesn't have any real effect on the process's total memory
consumption, since we're going to exit as soon as that last
transaction is done. But it does pacify Valgrind.
* Valgrind complains about the postmaster's socket-files and
lock-files lists being leaked, which we can silence by just
not nulling out the static pointers to them.
* Valgrind seems not to consider the global "environ" variable as
a valid root pointer; so when we allocate a new environment array,
it claims that data is leaked. To fix that, keep our own
statically-allocated copy of the pointer, similarly to the previous
item.
Author: Tom Lane <tgl@sss.pgh.pa.us>
Reviewed-by: Andres Freund <andres@anarazel.de>
Discussion: https://postgr.es/m/285483.1746756246@sss.pgh.pa.us
When determining whether an allocated chunk is still reachable,
Valgrind will consider only pointers within what it believes to be
allocated chunks. Normally, all of a block obtained from malloc()
would be considered "allocated" --- but it turns out that if we use
VALGRIND_MEMPOOL_ALLOC to designate sub-section(s) of a malloc'ed
block as allocated, all the rest of that malloc'ed block is ignored.
This leads to lots of false positives of course. In particular,
in any multi-malloc-block context, all but the primary block were
reported as leaked. We also had a problem with context "ident"
strings, which were reported as leaked unless there was some other
pointer to them besides the one in the context header.
To fix, we need to use VALGRIND_MEMPOOL_ALLOC to designate
a context's management structs (the context struct itself and
any per-block headers) as allocated chunks. That forces moving
the VALGRIND_CREATE_MEMPOOL/VALGRIND_DESTROY_MEMPOOL calls into
the per-context-type code, so that the pool identifier can be
made as soon as we've allocated the initial block, but otherwise
it's fairly straightforward. Note that in Valgrind's eyes there
is no distinction between these allocations and the allocations
that the mmgr modules hand out to user code. That's fine for
now, but perhaps someday we'll want to do better yet.
When reading this patch, it's helpful to start with the comments
added at the head of mcxt.c.
Author: Andres Freund <andres@anarazel.de>
Co-authored-by: Tom Lane <tgl@sss.pgh.pa.us>
Discussion: https://postgr.es/m/285483.1746756246@sss.pgh.pa.us
Discussion: https://postgr.es/m/20210317181531.7oggpqevzz6bka3g@alap3.anarazel.de
A deadlock can occur when the DDL command and the apply worker acquire
catalog locks in different orders while dropping replication origins.
The issue is rare in PG16 and higher branches because, in most cases, the
tablesync worker performs the origin drop in those branches, and its
locking sequence does not conflict with DDL operations.
This patch ensures consistent lock acquisition to prevent such deadlocks.
As per buildfarm.
Reported-by: Alexander Lakhin <exclusion@gmail.com>
Author: Ajin Cherian <itsajin@gmail.com>
Reviewed-by: Hayato Kuroda <kuroda.hayato@fujitsu.com>
Reviewed-by: vignesh C <vignesh21@gmail.com>
Reviewed-by: Amit Kapila <amit.kapila16@gmail.com>
Backpatch-through: 14, where it was introduced
Discussion: https://postgr.es/m/bab95e12-6cc5-4ebb-80a8-3e41956aa297@gmail.com
Commit 719dcf3c42 introduced a field called CachedPlanType in
PlannedStmt to allow extensions to determine whether a cached plan is
generic or custom.
After discussion, the concepts that we want to track are a bit wider
than initially anticipated, as it is closer to knowing from which
"source" or "origin" a PlannedStmt has been generated or retrieved.
Custom and generic cached plans are a subset of that.
Based on the state of HEAD, we have been able to define two more
origins:
- "standard", for the case where PlannedStmt is generated in
standard_planner(), the most common case.
- "internal", for the fake PlannedStmt generated internally by some
query patterns.
This could be tuned in the future depending on what is needed. This
looks like a good starting point, at least. The default value is called
"UNKNOWN", provided as fallback value. This value is not used in the
core code, the idea is to let extensions building their own PlannedStmts
know about this new field.
Author: Michael Paquier <michael@paquier.xyz>
Co-authored-by: Sami Imseih <samimseih@gmail.com>
Discussion: https://postgr.es/m/aILaHupXbIGgF2wJ@paquier.xyz
There've been a few complaints that it can be overly difficult to figure
out why the planner picked a Memoize plan. To help address that, here we
adjust the EXPLAIN output to display the following additional details:
1) The estimated number of cache entries that can be stored at once
2) The estimated number of unique lookup keys that we expect to see
3) The number of lookups we expect
4) The estimated hit ratio
Technically #4 can be calculated using #1, #2 and #3, but it's not a
particularly obvious calculation, so we opt to display it explicitly.
The original patch by Lukas Fittl only displayed the hit ratio, but
there was a fear that might lead to more questions about how that was
calculated. The idea with displaying all 4 is to be transparent which
may allow queries to be tuned more easily. For example, if #2 isn't
correct then maybe extended statistics or a manual n_distinct estimate can
be used to help fix poor plan choices.
Author: Ilia Evdokimov <ilya.evdokimov@tantorlabs.com>
Author: Lukas Fittl <lukas@fittl.com>
Reviewed-by: David Rowley <dgrowleyml@gmail.com>
Reviewed-by: Andrei Lepikhov <lepihov@gmail.com>
Reviewed-by: Robert Haas <robertmhaas@gmail.com>
Discussion: https://postgr.es/m/CAP53Pky29GWAVVk3oBgKBDqhND0BRBN6yTPeguV_qSivFL5N_g%40mail.gmail.com
The callback added in fc415edf8ca8 used to check if there is any pending
data to flush for fixed-numbered statistics, done by looping across all
the builtin and custom stats kinds with a call to have_fixed_pending_cb,
is proving to able to show in workloads that do not report any stats
(read-only, no function calls, no WAL, no IO, etc). The code used in
v17 was cheaper than that what HEAD has introduced, relying on three
boolean checks for WAL, SLRU and IO stats.
This commit switches the code to use a more efficient approach than
fc415edf8ca8, with a single boolean flag that can be switched to "true"
by any fixed-numbered stats kinds to force pgstat_report_stat() to go
through one round of reports. The flag is reset by pgstat_report_stat()
once a full round of reports is done. The flag being false means that
fixed-numbered stats kinds saw no activity, and that there is no pending
data to flush.
ac000fca743e took one step in improving the performance by reducing the
number of stats kinds that the backend can hold. This commit takes a
more drastic step by bringing back the code efficiency to what it was
before v18 with a cheap check at the beginning of pgstat_report_stat()
for its fast-exit path.
The callback have_static_pending_cb is removed as an effect of all that.
Reported-by: Andres Freund <andres@anarazel.de>
Reviewed-by: Bertrand Drouvot <bertranddrouvot.pg@gmail.com>
Discussion: https://postgr.es/m/eb224uegsga2hgq7dfq3ps5cduhpqej7ir2hjxzzozjthrekx5@dysei6buqthe
Backpatch-through: 18
Remove a bunch of PG_TRY constructs, de-volatilize related
variables, remove some PQclear calls in error paths.
Aside from making the code simpler and shorter, this should
provide some marginal performance gains.
For ease of review, I did not re-indent code within the removed
PG_TRY constructs. That'll be done in a separate patch.
Author: Tom Lane <tgl@sss.pgh.pa.us>
Reviewed-by: Matheus Alcantara <matheusssilv97@gmail.com>
Discussion: https://postgr.es/m/2976982.1748049023@sss.pgh.pa.us