postgresql

mirror of https://git.postgresql.org/git/postgresql.git synced 2026-02-23 14:57:03 +08:00

Author	SHA1	Message	Date
Michael Paquier	d46aa32ea5	Fix build inconsistency due to the generation of wait-event code The build generates four files based on the wait event contents stored in wait_event_names.txt: - wait_event_types.h - pgstat_wait_event.c - wait_event_funcs_data.c - wait_event_types.sgml The SGML file is generated as part of a documentation build, with its data stored in doc/src/sgml/ for meson and configure. The three others are handled differently for meson and configure: - In configure, all the files are created in src/backend/utils/activity/. A link to wait_event_types.h is created in src/include/utils/. - In meson, all the files are created in src/include/utils/. The two C files, pgstat_wait_event.c and wait_event_funcs_data.c, are then included in respectively wait_event.c and wait_event_funcs.c, without the "utils/" path. For configure, this does not present a problem. For meson, this has to be combined with a trick in src/backend/utils/activity/meson.build, where include_directories needs to point to include/utils/ to make the inclusion of the C files work properly, causing builds to pull in PostgreSQL headers rather than system headers in some build paths, as src/include/utils/ would take priority. In order to fix this issue, this commit reworks the way the C/H files are generated, becoming consistent with guc_tables.inc.c: - For meson, basically nothing changes. The files are still generated in src/include/utils/. The trick with include_directories is removed. - For configure, the files are now generated in src/backend/utils/, with links in src/include/utils/ pointing to the ones in src/backend/. This requires extra rules in src/backend/utils/activity/Makefile so as a make command in this sub-directory is able to work. - The three files now fall under header-stamp, which is actually simpler as guc_tables.inc.c does the same. - wait_event_funcs_data.c and pgstat_wait_event.c are now included with "utils/" in their path. This problem has not been an issue in the buildfarm; it has been noted with AIX and a conflict with float.h. This issue could, however, create conflicts in the buildfarm depending on the environment with unexpected headers pulled in, so this fix is backpatched down to where the generation of the wait-event files has been introduced. While on it, this commit simplifies wait_event_names.txt regarding the paths of the files generated, to mention just the names of the files generated. The paths where the files are generated became incorrect. The path of the SGML path was wrong. This change has been tested in the CI, down to v17. Locally, I have run tests with configure (with and without VPATH), as well as meson, on the three branches. Combo oversight in fa88928470b5 and 1e68e43d3f0f. Reported-by: Aditya Kamath <aditya.kamath1@ibm.com> Discussion: https://postgr.es/m/LV8PR15MB64888765A43D229EA5D1CFE6D691A@LV8PR15MB6488.namprd15.prod.outlook.com Backpatch-through: 17	2026-02-02 08:02:39 +09:00
Álvaro Herrera	e76221bd95	Minor cosmetic tweaks These changes should have been done by 2f9661311b83, but were overlooked. I noticed while reviewing the code for commit b8926a5b4bb8. Author: Álvaro Herrera <alvherre@kurilemu.de> Discussion: https://postgr.es/m/18984-0f4778a6599ac3ae@postgresql.org	2026-01-30 14:26:02 +01:00
Jeff Davis	de90bb7db1	Fix theoretical memory leaks in pg_locale_libc.c. The leaks were hard to reach in practice and the impact was low. The callers provide a buffer the same number of bytes as the source string (plus one for NUL terminator) as a starting size, and libc never increases the number of characters. But, if the byte length of one of the converted characters is larger, then it might need a larger destination buffer. Previously, in that case, the working buffers would be leaked. Even in that case, the call typically happens within a context that will soon be reset. Regardless, it's worth fixing to avoid such assumptions, and the fix is simple so it's worth backporting. Discussion: https://postgr.es/m/e2b7a0a88aaadded7e2d19f42d5ab03c9e182ad8.camel@j-davis.com Backpatch-through: 18	2026-01-29 10:14:55 -08:00
Masahiko Sawada	1fdbca159e	Standardize replication origin naming to use "ReplOrigin". The replication origin code was using inconsistent naming conventions. Functions were typically prefixed with 'replorigin', while typedefs and constants used "RepOrigin". This commit unifies the naming convention by renaming RepOriginId to ReplOriginId. Reviewed-by: Chao Li <li.evan.chao@gmail.com> Discussion: https://postgr.es/m/CAD21AoBDgm3hDqUZ+nqu=ViHmkCnJBuJyaxG_yvv27BAi2zBmQ@mail.gmail.com	2026-01-28 11:03:29 -08:00
Tomas Vondra	db14dcdec6	Lookup the correct ordering for parallel GIN builds When building a tuplesort during parallel GIN builds, the function incorrectly looked up the default B-Tree operator, not the function associated with the GIN opclass (through GIN_COMPARE_PROC). Fixed by using the same logic as initGinState(), and the other place in parallel GIN builds. This could cause two types of issues. First, a data type might not have a B-Tree opclass, in which case the PrepareSortSupportFromOrderingOp() fails with an ERROR. Second, a data type might have both B-Tree and GIN opclasses, defining order/equality in different ways. This could lead to logical corruption in the index. Backpatch to 18, where parallel GIN builds were introduced. Discussion: https://postgr.es/m/73a28b94-43d5-4f77-b26e-0d642f6de777@iki.fi Reported-by: Heikki Linnakangas <hlinnaka@iki.fi> Backpatch-through: 18	2026-01-26 20:05:17 +01:00
Peter Eisentraut	5ca5f12c2c	Fix accidentally cast away qualifiers This fixes cases where a qualifier (const, in all cases here) was dropped by a cast, but the cast was otherwise necessary or desirable, so the straightforward fix is to add the qualifier into the cast. Co-authored-by: Bertrand Drouvot <bertranddrouvot.pg@gmail.com> Reviewed-by: Chao Li <li.evan.chao@gmail.com> Discussion: https://www.postgresql.org/message-id/flat/b04f4d3a-5e70-4e73-9ef2-87f777ca4aac%40eisentraut.org	2026-01-26 16:02:31 +01:00
Peter Eisentraut	a5b40d156e	Mark commented out code as unused There were many PG_GETARG_* calls, mostly around gin, gist, spgist code, that were commented out, presumably to indicate that the argument was unused and to indicate that it wasn't forgotten or miscounted. But keeping commented-out code updated with refactorings and style changes is annoying. So this commit changes them to #ifdef NOT_USED blocks, which is a style already in use. That way, at least the indentation and syntax highlighting works correctly, making some of these blocks much easier to read. An alternative would be to just delete that code, but there is some value in making unused arguments explicit, and some of this arguably serves as example code for index AM APIs. Reviewed-by: Heikki Linnakangas <hlinnaka@iki.fi> Reviewed-by: David Geier <geidav.pg@gmail.com> Reviewed-by: Chao Li <li.evan.chao@gmail.com> Reviewed-by: Dean Rasheed <dean.a.rasheed@gmail.com> Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us> Discussion: https://www.postgresql.org/message-id/flat/328e4371-9a4c-4196-9df9-1f23afc900df%40eisentraut.org	2026-01-22 12:44:07 +01:00
Peter Eisentraut	846fb3c790	Remove incorrect commented out code These calls, if activated, are happening before null checks, so they are not correct. Also, the "in" variable is shadowed later. Remove them to avoid confusion and bad examples. Reviewed-by: Heikki Linnakangas <hlinnaka@iki.fi> Reviewed-by: David Geier <geidav.pg@gmail.com> Discussion: https://www.postgresql.org/message-id/flat/328e4371-9a4c-4196-9df9-1f23afc900df%40eisentraut.org	2026-01-22 12:44:07 +01:00
Tom Lane	4576208454	Force standard_conforming_strings to always be ON. Continuing to support this backwards-compatibility feature has nontrivial costs; in particular it is potentially a security hazard if an application somehow gets confused about which setting the server is using. We changed the default to ON fifteen years ago, which seems like enough time for applications to have adapted. Let's remove support for the legacy string syntax. We should not remove the GUC altogether, since client-side code will still test it, pg_dump scripts will attempt to set it to ON, etc. Instead, just prevent it from being set to OFF. There is precedent for this approach (see commit de66987ad). This patch does remove the related GUC escape_string_warning, however. That setting does nothing when standard_conforming_strings is on, so it's now useless. We could leave it in place as a do-nothing setting to avoid breaking clients that still set it, if there are any. But it seems likely that any such client is also trying to turn off standard_conforming_strings, so it'll need work anyway. The client-side changes in this patch are pretty minimal, because even though we are dropping the server's support, most of our clients still need to be able to talk to older server versions. We could remove dead client code only once we disclaim compatibility with pre-v19 servers, which is surely years away. One change of note is that pg_dump/pg_dumpall now set standard_conforming_strings = on in their source session, rather than accepting the source server's default. This ensures that literals in view definitions and such will be printed in a way that's acceptable to v19+. In particular, pg_upgrade will work transparently even if the source installation has standard_conforming_strings = off. (However, pg_restore will behave the same as before if given an archive file containing standard_conforming_strings = off. Such an archive will not be safely restorable into v19+, but we shouldn't break the ability to extract valid data from it for use with an older server.) Author: Tom Lane <tgl@sss.pgh.pa.us> Discussion: https://postgr.es/m/3279216.1767072538@sss.pgh.pa.us	2026-01-21 15:08:38 -05:00
Andres Freund	fcb9c977aa	bufmgr: Implement buffer content locks independently of lwlocks Until now buffer content locks were implemented using lwlocks. That has the obvious advantage of not needing a separate efficient implementation of locks. However, the time for a dedicated buffer content lock implementation has come: 1) Hint bits are currently set while holding only a share lock. This leads to having to copy pages while they are being written out if checksums are enabled, which is not cheap. We would like to add AIO writes, however once many buffers can be written out at the same time, it gets a lot more expensive to copy them, particularly because that copy needs to reside in shared buffers (for worker mode to have access to the buffer). In addition, modifying buffers while they are being written out can cause issues with unbuffered/direct-IO, as some filesystems (like btrfs) do not like that, due to filesystem internal checksums getting corrupted. The solution to this is to require a new share-exclusive lock-level to set hint bits and to write out buffers, making those operations mutually exclusive. We could introduce such a lock-level into the generic lwlock implementation, however it does not look like there would be other users, and it does add some overhead into important code paths. 2) For AIO writes we need to be able to race-freely check whether a buffer is undergoing IO and whether an exclusive lock on the page can be acquired. That is rather hard to do efficiently when the buffer state and the lock state are separate atomic variables. This is a major hindrance to allowing writes to be done asynchronously. 3) Buffer locks are by far the most frequently taken locks. Optimizing them specifically for their use case is worth the effort. E.g. by merging content locks into buffer locks we will be able to release a buffer lock and pin in one atomic operation. 4) There are more complicated optimizations, like long-lived "super pinned & locked" pages, that cannot realistically be implemented with the generic lwlock implementation. Therefore implement content locks inside bufmgr.c. The lockstate is stored as part of BufferDesc.state. The implementation of buffer content locks is fairly similar to lwlocks, with a few important differences: 1) An additional lock-level share-exclusive has been added. This lock-level conflicts with exclusive locks and itself, but not share locks. 2) Error recovery for content locks is implemented as part of the already existing private-refcount tracking mechanism in combination with resowners, instead of a bespoke mechanism as the case for lwlocks. This means we do not need to add dedicated error-recovery code paths to release all content locks (like done with LWLockReleaseAll() for lwlocks). 3) The lock state is embedded in BufferDesc.state instead of having its own struct. 4) The wakeup logic is a tad more complicated due to needing to support the additional lock-level This commit unfortunately introduces some code that is very similar to the code in lwlock.c, however the code is not equivalent enough to easily merge it. The future wins that this commit makes possible seem worth the cost. As of this commit nothing uses the new share-exclusive lock mode. It will be used in a future commit. It seemed too complicated to introduce the lock-level in a separate commit. It's worth calling out one wart in this commit: Despite content locks not being lwlocks anymore, they continue to use PGPROC->lw* - that seemed better than duplicating the relevant infrastructure. Another thing worth pointing out is that, after this change, content locks are not reported as LWLock wait events anymore, but as new wait events in the "Buffer" wait event class (see also 6c5c393b740). The old BufferContent lwlock tranche has been removed. Reviewed-by: Melanie Plageman <melanieplageman@gmail.com> Reviewed-by: Heikki Linnakangas <heikki.linnakangas@iki.fi> Reviewed-by: Greg Burd <greg@burd.me> Reviewed-by: Chao Li <li.evan.chao@gmail.com> Discussion: https://postgr.es/m/fvfmkr5kk4nyex56ejgxj3uzi63isfxovp2biecb4bspbjrze7@az2pljabhnff	2026-01-15 14:26:53 -05:00
Tom Lane	282b1cde9d	Optimize LISTEN/NOTIFY via shared channel map and direct advancement. This patch reworks LISTEN/NOTIFY to avoid waking backends that have no need to process the notification messages we just sent. The primary change is to create a shared hash table that tracks which processes are listening to which channels (where a "channel" is defined by a database OID and channel name). This allows a notifying process to accurately determine which listeners are interested, replacing the previous weak approximation that listeners in other databases couldn't be interested. Secondly, if a listener is known not to be interested and is currently stopped at the old queue head, we avoid waking it at all and just directly advance its queue pointer past the notifications we inserted. These changes permit very significant improvements (integer multiples) in NOTIFY throughput, as well as a noticeable reduction in latency, when there are many listeners but only a few are interested in any specific message. There is no improvement for the simplest case where every listener reads every message, but any loss seems below the noise level. Author: Joel Jacobson <joel@compiler.org> Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us> Discussion: https://postgr.es/m/6899c044-4a82-49be-8117-e6f669765f7e@app.fastmail.com	2026-01-15 14:12:15 -05:00
Álvaro Herrera	35e3fae738	Remove #include <math.h> where not needed Liujinyang reported the one in binaryheap.c, I then found and analyzed the rest. For future patches, we require git archaelogical analysis before we accept patches of this nature. Co-authored-by: liujinyang <21043272@qq.com> Co-authored-by: Álvaro Herrera <alvherre@kurilemu.de> Discussion: https://postgr.es/m/tencent_6B302BFCAF6F010E00AB5C2C0ECB7AA3F205@qq.com	2026-01-15 19:09:47 +01:00
Álvaro Herrera	4196d6178a	Reword confusing comment to avoid "typo fixes" Author: Álvaro Herrera <alvherre@kurilemu.de> Reviewed-by: David Rowley <dgrowleyml@gmail.com> Reviewed-by: John Naylor <johncnaylorls@gmail.com> Discussion: https://postgr.es/m/CAApHDvqPmpa53jcTmfU8arFFm7=hB5cFoXX5dcUH=1qV0tRFHA@mail.gmail.com	2026-01-14 10:07:44 +01:00
Michael Paquier	6dcfac9696	Use more consistent *GetDatum() macros for some unsigned numbers This patch switches some code paths to use GetDatum() macros more in line with the data types of the variables they manipulate. This set of changes does not fix a problem, but it is always nice to be more consistent across the board. Author: Kirill Reshke <reshkekirill@gmail.com> Reviewed-by: Roman Khapov <rkhapov@yandex-team.ru> Reviewed-by: Yuan Li <carol.li2025@outlook.com> Reviewed-by: Michael Paquier <michael@paquier.xyz> Reviewed-by: Man Zeng <zengman@halodbtech.com> Discussion: https://postgr.es/m/CALdSSPidtC7j3MwhkqRj0K2hyp36ztnnjSt6qzGxQtiePR1dzw@mail.gmail.com	2026-01-14 17:07:49 +09:00
Jeff Davis	a00a25b6ce	Fix error message typo. Reported-by: Chao Li <li.evan.chao@gmail.com> Discussion: https://postgr.es/m/CAEoWx2mMmm9fTZYgE-r_T-KPTFR1rKO029QV-S-6n=7US_9EMA@mail.gmail.com	2026-01-12 19:07:00 -08:00
Álvaro Herrera	2defd00062	Move instrumentation-related structs to instrument_node.h Some structs and enums related to parallel query instrumentation had organically grown scattered across various files, and were causing header pollution especially through execnodes.h. Create a single file where they can live together. This only moves the structs to the new file; cleaning up the pollution by removing no-longer-necessary cross-header inclusion will be done in future commits. Co-authored-by: Álvaro Herrera <alvherre@kurilemu.de> Co-authored-by: Mario González <gonzalemario@gmail.com> Reviewed-by: Chao Li <li.evan.chao@gmail.com> Discussion: https://postgr.es/m/202510051642.wwmn4mj77wch@alvherre.pgsql Discussion: https://postgr.es/m/CAFsReFUr4KrQ60z+ck9cRM4WuUw1TCghN7EFwvV0KvuncTRc2w@mail.gmail.com	2026-01-12 16:59:28 +01:00
Peter Eisentraut	c3c240537f	Avoid casting void * function arguments In many cases, the cast would silently drop a const qualifier. To fix, drop the unnecessary cast and let the compiler check the types and qualifiers. Add const to read-only local variables, preserving the const qualifiers from the function signatures. Co-authored-by: Bertrand Drouvot <bertranddrouvot.pg@gmail.com> Co-authored-by: Peter Eisentraut <peter@eisentraut.org> Discussion: https://postgr.es/m/aUQHy/MmWq7c97wK%40ip-10-97-1-34.eu-west-3.compute.internal	2026-01-12 16:12:56 +01:00
Heikki Linnakangas	ad853bb877	Fix misc typos, mostly in comments The only user-visible change is the fix in the "malformed pg_dependencies" error detail. That one is new in commit e1405aa5e3ac, so no backpatching required.	2026-01-08 18:10:08 +02:00
Amit Kapila	31ddbb38ee	Fix typos in the code. Author: "Dewei Dai" <daidewei1970@163.com> Author: zengman <zengman@halodbtech.com> Author: Zhiyuan Su <suzhiyuan_pg@126.com> Discussion: https://postgr.es/m/2026010719201902382410@163.com Discussion: https://postgr.es/m/tencent_4DC563C83443A4B1082D2BFF@qq.com Discussion: https://postgr.es/m/44656d72.2a63.19b9b92b0a3.Coremail.suzhiyuan_pg@126.com	2026-01-08 09:43:50 +00:00
Michael Paquier	68119480a7	Fix unexpected reversal of lists during catcache rehash During catcache searches, the most-recently searched entries are kept at the head of the list to speed up subsequent searches, keeping the "freshest" entries at its beginning. A rehash of the catcache was doing the opposite: fresh entries were moved to the tail of the newly-created buckets, causing a rehash to slow down a bit. When a rehash is done, this commit switches the code to use dlist_push_tail() instead of dlist_push_head(), so as fresh entries are kept at the head of the lists, not their tail. Author: ChangAo Chen <cca5507@qq.com> Reviewed-by: John Naylor <johncnaylorls@gmail.com> Reviewed-by: Michael Paquier <michael@paquier.xyz> Discussion: https://postgr.es/m/tencent_9EA10D8512B5FE29E7323F780A0749768708@qq.com	2026-01-07 17:52:54 +09:00
Michael Paquier	b139bd3b6e	Add data type oid8, 64-bit unsigned identifier This new identifier type provides support for 64-bit unsigned values, to be used in catalogs, like OIDs. An advantage of a new data type is that it becomes easier to grep for it in the code when assigning this type to a catalog attribute, linking it to dedicated APIs and internal structures. The following operators are added in this commit, with dedicated tests: - Casts with integer types and OID. - btree and hash operators - min/max functions. - C type with related macros and defines, named around "Oid8". This has been mentioned as useful on its own on the thread to add support for 64-bit TOAST values, so as it becomes possible to attach this data type to the TOAST code and catalog definitions. However, as this concept can apply to many more areas, it is implemented as its own independent change. This is based on a discussion with Andres Freund and Tom Lane. Bump catalog version. Author: Michael Paquier <michael@paquier.xyz> Reviewed-by: Greg Burd <greg@burd.me> Reviewed-by: Nikhil Kumar Veldanda <veldanda.nikhilkumar17@gmail.com> Discussion: https://postgr.es/m/1891064.1754681536@sss.pgh.pa.us	2026-01-07 11:37:00 +09:00
Jeff Davis	af2d4ca191	Clean up ICU includes. Remove ICU includes from pg_locale.h, and instead include them in the few C files that need ICU. Clean up a few other includes in passing. Reviewed-by: Andres Freund <andres@anarazel.de> Discussion: https://postgr.es/m/48911db71d953edec66df0d2ce303563d631fbe0.camel@j-davis.com	2026-01-06 17:19:51 -08:00
Jeff Davis	c4ff35f104	ICU: use UTF8-optimized case conversion API Initializes a UCaseMap object once for use across calls, and uses UTF8-optimized APIs. Author: Andreas Karlsson <andreas@proxel.se> Reviewed-by: zengman <zengman@halodbtech.com> Discussion: https://postgr.es/m/5a010b27-8ed9-4739-86fe-1562b07ba564@proxel.se	2026-01-06 14:09:07 -08:00
Alexander Korotkov	7a39f43d88	Extend xlogwait infrastructure with write and flush wait types Add support for waiting on WAL write and flush LSNs in addition to the existing replay LSN wait type. This provides the foundation for extending the WAIT FOR command with MODE parameter. Key changes are following. - Add WAIT_LSN_TYPE_STANDBY_WRITE and WAIT_LSN_TYPE_STANDBY_FLUSH to WaitLSNType. - Add GetCurrentLSNForWaitType() to retrieve the current LSN for each wait type. - Add new wait events WAIT_EVENT_WAIT_FOR_WAL_WRITE and WAIT_EVENT_WAIT_FOR_WAL_FLUSH for pg_stat_activity visibility. - Update WaitForLSN() to use GetCurrentLSNForWaitType() internally. Discussion: https://postgr.es/m/CABPTF7UiArgW-sXj9CNwRzUhYOQrevLzkYcgBydmX5oDes1sjg%40mail.gmail.com Author: Xuneng Zhou <xunengzhou@gmail.com> Reviewed-by: Alexander Korotkov <aekorotkov@gmail.com> Reviewed-by: Chao Li <li.evan.chao@gmail.com> Reviewed-by: Alvaro Herrera <alvherre@kurilemu.de>	2026-01-05 19:56:19 +02:00
Michael Paquier	b8cfcb9e00	Fix typos and inconsistencies in code and comments This change is a cocktail of harmonization of function argument names, grammar typos, renames for better consistency and unused code (see ltree). All of these have been spotted by the author. Author: Alexander Lakhin <exclusion@gmail.com> Discussion: https://postgr.es/m/b2c0d0b7-3944-487d-a03d-d155851958ff@gmail.com	2026-01-05 09:19:15 +09:00
Bruce Momjian	451c43974f	Update copyright for 2026 Backpatch-through: 14	2026-01-01 13:24:10 -05:00
Tom Lane	bc6374cd76	Change IndexAmRoutines to be statically-allocated structs. Up to now, index amhandlers were expected to produce a new, palloc'd struct on each call. That requires palloc/pfree overhead, and creates a risk of memory leaks if the caller fails to pfree, and the time taken to fill such a large structure isn't nil. Moreover, we were storing these things in the relcache, eating several hundred bytes for each cached index. There is not anything in these structs that needs to vary at runtime, so let's change the definition so that an amhandler can return a pointer to a "static const" struct of which there's only one copy per index AM. Mark all the core code's IndexAmRoutine pointers const so that we catch anyplace that might still try to change or pfree one. (This is similar to the way we were already handling TableAmRoutine structs. This commit does fix one comment that was infelicitously copied-and-pasted into tableamapi.c.) This commit needs to be called out in the v19 release notes as an API change for extension index AMs. An un-updated AM will still work (as of now, anyway) but it risks memory leaks and will be slower than necessary. Author: Matthias van de Meent <boekewurm+postgres@gmail.com> Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us> Discussion: https://postgr.es/m/CAEoWx2=vApYk2LRu8R0DdahsPNEhWUxGBZ=rbZo1EXE=uA+opQ@mail.gmail.com	2025-12-30 18:26:23 -05:00
Michael Paquier	ffdcc9c638	Fix comment in lsyscache.c Author: Chao Li <li.evan.chao@gmail.com> Discussion: https://postgr.es/m/CAEoWx2miv0KGcM9j29ANRN45-Vz-2qAqrM0cv9OtaLx8e_WCMQ@mail.gmail.com	2025-12-30 16:42:21 +09:00
Michael Paquier	97b101776c	Add pg_get_multixact_stats() This new function exposes at SQL level some information related to multixacts, not available until now. This data is useful for monitoring purposes, especially for workloads that make a heavy use of multixacts: - num_mxids, number of MultiXact IDs in use. - num_members, number of member entries in use. - members_size, bytes used by num_members in pg_multixact/members/. - oldest_multixact: oldest MultiXact still needed. This patch has been originally proposed when MultiXactOffset was still 32 bits, to monitor wraparound. This part is not relevant anymore since bd8d9c9bdfa0 that has widen MultiXactOffset to 64 bits. The monitoring of disk space usage for the members is still relevant. Some tests are added to check this function, in the shape of one isolation test with concurrent transactions that take a ROW SHARE lock, and some SQL tests for pg_read_all_stats. Some documentation is added to explain some patterns that can come from the information provided by the function. Bump catalog version. Author: Naga Appani <nagnrik@gmail.com> Reviewed-by: Ashutosh Bapat <ashutosh.bapat.oss@gmail.com> Reviewed-by: Michael Paquier <michael@paquier.xyz> Reviewed-by: Atsushi Torikoshi <torikoshia@oss.nttdata.com> Discussion: https://postgr.es/m/CA+QeY+AAsYK6WvBW4qYzHz4bahHycDAY_q5ECmHkEV_eB9ckzg@mail.gmail.com	2025-12-30 15:38:50 +09:00
Tom Lane	bd3e3e9e56	Ensure sanity of hash-join costing when there are no MCV statistics. estimate_hash_bucket_stats is defined to return zero to mcv_freq if it cannot obtain a value for the frequency of the most common value. Its sole caller final_cost_hashjoin ignored this provision and would blindly believe the zero value, resulting in computing zero for the largest bucket size. In consequence, the safety check that intended to prevent the largest bucket from exceeding get_hash_memory_limit() was ineffective, allowing very silly plans to be chosen if statistics were missing. After fixing final_cost_hashjoin to disregard zero results for mcv_freq, a second problem appeared: some cases that should use hash joins failed to. This is because estimate_hash_bucket_stats was unaware of the fact that ANALYZE won't store MCV statistics if it doesn't find any multiply-occurring values. Thus the lack of an MCV stats entry doesn't necessarily mean that we know nothing; we may well know that the column is unique. The former coding returned zero for mcv_freq in this case, which was pretty close to correct, but now final_cost_hashjoin doesn't believe it and disables the hash join. So check to see if there is a HISTOGRAM stats entry; if so, ANALYZE has in fact run for this column and must have found it to be unique. In that case report the MCV frequency as 1 / rows, instead of claiming ignorance. Reporting a more accurate *mcv_freq in this case can also affect the bucket-size skew adjustment further down in estimate_hash_bucket_stats, causing hash-join cost estimates to change slightly. This affects some plan choices in the core regression tests. The first diff in join.out corresponds to a case where we have no stats and should not risk a hash join, but the remaining changes are caused by producing a better bucket-size estimate for unique join columns. Those are all harmless changes so far as I can tell. The existing behavior was introduced in commit 4867d7f62 in v11. It appears from the commit log that disabling the bucket-size safety check in the absence of statistics was intentional; but we've now seen a case where the ensuing behavior is bad enough to make that seem like a poor decision. In any case the lack of other problems with that safety check after several years helps to justify enforcing it more strictly. However, we won't risk back-patching this, in case any applications are depending on the existing behavior. Bug: #19363 Reported-by: Jinhui Lai <jinhui.lai@qq.com> Author: Tom Lane <tgl@sss.pgh.pa.us> Reviewed-by: Chao Li <li.evan.chao@gmail.com> Discussion: https://postgr.es/m/2380165.1766871097@sss.pgh.pa.us Discussion: https://postgr.es/m/19363-8dd32fc7600a1153@postgresql.org	2025-12-29 13:01:27 -05:00
Richard Guo	559f9e90db	Ignore PlaceHolderVars when looking up statistics When looking up statistical data about an expression, we failed to look through PlaceHolderVar nodes, treating them as opaque. This could prevent us from matching an expression to base columns, index expressions, or extended statistics, as examine_variable() relies on strict structural matching. As a result, queries involving PlaceHolderVar nodes often fell back to default selectivity estimates, potentially leading to poor plan choices. This patch updates examine_variable() to strip PlaceHolderVars before analysis. This is safe during estimation because PlaceHolderVars are transparent for the purpose of statistics lookup: they do not alter the value distribution of the underlying expression. To minimize performance overhead on this hot path, a lightweight walker first checks for the presence of PlaceHolderVars. The more expensive mutator is invoked only when necessary. There is one ensuing plan change in the regression tests, which is expected and demonstrates the fix: the rowcount estimate becomes much more accurate with this patch. Back-patch to v18. Although this issue exists before that, changes in this version made it common enough to notice. Given the lack of field reports for older versions, I am not back-patching further. Reported-by: Haowu Ge <gehaowu@bitmoe.com> Author: Richard Guo <guofenglinux@gmail.com> Discussion: https://postgr.es/m/62af586c-c270-44f3-9c5e-02c81d537e3d.gehaowu@bitmoe.com Backpatch-through: 18	2025-12-29 11:40:45 +09:00
Peter Eisentraut	b7057e4346	Change some Datum to void * for opaque pass-through pointer Here, Datum was used to pass around an opaque pointer between a group of functions. But one might as well use void * for that; the use of Datum doesn't achieve anything here and is just distracting. Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us> Discussion: https://www.postgresql.org/message-id/flat/1c5d23cb-288b-4154-b1cd-191fe2301707%40eisentraut.org	2025-12-28 14:34:12 +01:00
Michael Paquier	36b8f4974a	Fix pg_stat_get_backend_activity() to use multi-byte truncated result pg_stat_get_backend_activity() calls pgstat_clip_activity() to ensure that the reported query string is correctly truncated when it finishes with an incomplete multi-byte sequence. However, the result returned by the function was not what pgstat_clip_activity() generated, but the non-truncated, original, contents from PgBackendStatus.st_activity_raw. Oversight in 54b6cd589ac2, so backpatch all the way down. Author: Chao Li <li.evan.chao@gmail.com> Discussion: https://postgr.es/m/CAEoWx2mDzwc48q2EK9tSXS6iJMJ35wvxNQnHX+rXjy5VgLvJQw@mail.gmail.com Backpatch-through: 14	2025-12-27 17:23:30 +09:00
Michael Paquier	b947dd5c75	Improve comment in pgstatfuncs.c Author: Zizhen Qiao <zizhen_qiao@163.com> Discussion: https://postgr.es/m/5ee635f9.49f7.19b4ed9e803.Coremail.zizhen_qiao@163.com	2025-12-24 17:09:13 +09:00
Masahiko Sawada	67c20979ce	Toggle logical decoding dynamically based on logical slot presence. Previously logical decoding required wal_level to be set to 'logical' at server start. This meant that users had to incur the overhead of logical-level WAL logging even when no logical replication slots were in use. This commit adds functionality to automatically control logical decoding availability based on logical replication slot presence. The newly introduced module logicalctl.c allows logical decoding to be dynamically activated when needed when wal_level is set to 'replica'. When the first logical replication slot is created, the system automatically increases the effective WAL level to maintain logical-level WAL records. Conversely, after the last logical slot is dropped or invalidated, it decreases back to 'replica' WAL level. While activation occurs synchronously right after creating the first logical slot, deactivation happens asynchronously through the checkpointer process. This design avoids a race condition at the end of recovery; a concurrent deactivation could happen while the startup process enables logical decoding at the end of recovery, but WAL writes are still not permitted until recovery fully completes. The checkpointer will handle it after recovery is done. Asynchronous deactivation also avoids excessive toggling of the logical decoding status in workloads that repeatedly create and drop a single logical slot. On the other hand, this lazy approach can delay changes to effective_wal_level and the disabling logical decoding, especially when the checkpointer is busy with other tasks. We chose this lazy approach in all deactivation paths to keep the implementation simple, even though laziness is strictly required only for end-of-recovery cases. Future work might address this limitation either by using a dedicated worker instead of the checkpointer, or by implementing synchronous waiting during slot drops if workloads are significantly affected by the lazy deactivation of logical decoding. The effective WAL level, determined internally by XLogLogicalInfo, is allowed to change within a transaction until an XID is assigned. Once an XID is assigned, the value becomes fixed for the remainder of the transaction. This behavior ensures that the logging mode remains consistent within a writing transaction, similar to the behavior of GUC parameters. A new read-only GUC parameter effective_wal_level is introduced to monitor the actual WAL level in effect. This parameter reflects the current operational WAL level, which may differ from the configured wal_level setting. Bump PG_CONTROL_VERSION as it adds a new field to CheckPoint struct. Reviewed-by: Shveta Malik <shveta.malik@gmail.com> Reviewed-by: Amit Kapila <amit.kapila16@gmail.com> Reviewed-by: Hayato Kuroda <kuroda.hayato@fujitsu.com> Reviewed-by: Bertrand Drouvot <bertranddrouvot.pg@gmail.com> Reviewed-by: Peter Smith <smithpb2250@gmail.com> Reviewed-by: Shlok Kyal <shlok.kyal.oss@gmail.com> Reviewed-by: Ashutosh Bapat <ashutosh.bapat.oss@gmail.com> Discussion: https://postgr.es/m/CAD21AoCVLeLYq09pQPaWs+Jwdni5FuJ8v2jgq-u9_uFbcp6UbA@mail.gmail.com	2025-12-23 10:13:16 -08:00
Michael Paquier	e5f3839af6	Switch buffile.c/h to use pgoff_t instead of off_t off_t was previously used for offsets, which is 4 bytes on Windows, hence limiting the backend code to a hard limit for files longer than 2GB. This leads to some simplification in these files, removing some casts based on long, also 4 bytes on Windows. This commit removes one comment introduced in db3c4c3a2d98, not relevant anymore as pgoff_t is a safe 8-byte alternative on Windows. This change is surprisingly not invasive, as the callers of BufFileTell(), BufFileSeek() and BufFileTruncateFileSet() (worker.c, tuplestore.c, etc.) track offsets in local structures that just to switch from off_t to pgoff_t for the most part. The file is still relying on a maximum file size of MAX_PHYSICAL_FILESIZE (1GB). This change allows the code to make this maximum potentially larger in the future, or larger on a per-demand basis. Reviewed-by: Chao Li <li.evan.chao@gmail.com> Discussion: https://postgr.es/m/aUStrqoOCDRFAq1M@paquier.xyz	2025-12-23 07:41:34 +09:00
Heikki Linnakangas	47a9f61fca	Use proper type for RestoreTransactionSnapshot's PGPROC arg Reviewed-by: Chao Li <li.evan.chao@gmail.com> Discussion: https://www.postgresql.org/message-id/08cbaeb5-aaaf-47b6-9ed8-4f7455b0bc4b@iki.fi	2025-12-19 13:40:02 +02:00
Fujii Masao	b3ccb0a2cb	Add guard to prevent recursive memory context logging. Previously, if memory context logging was triggered repeatedly and rapidly while a previous request was still being processed, it could result in recursive calls to ProcessLogMemoryContextInterrupt(). This could lead to infinite recursion and potentially crash the process. This commit adds a guard to prevent such recursion. If ProcessLogMemoryContextInterrupt() is already in progress and logging memory contexts, subsequent calls will exit immediately, avoiding unintended recursive calls. While this scenario is unlikely in practice, it's not impossible. This change adds a safety check to prevent such failures. Back-patch to v14, where memory context logging was introduced. Reported-by: Robert Haas <robertmhaas@gmail.com> Author: Fujii Masao <masao.fujii@gmail.com> Reviewed-by: Atsushi Torikoshi <torikoshia@oss.nttdata.com> Reviewed-by: Robert Haas <robertmhaas@gmail.com> Reviewed-by: Artem Gavrilov <artem.gavrilov@percona.com> Discussion: https://postgr.es/m/CA+TgmoZMrv32tbNRrFTvF9iWLnTGqbhYSLVcrHGuwZvCtph0NA@mail.gmail.com Backpatch-through: 14	2025-12-19 12:05:37 +09:00
Michael Paquier	f4e797171e	Change pgstat_report_vacuum() to use Relation This change makes pgstat_report_vacuum() more consistent with pgstat_report_analyze(), that also uses a Relation. This enforces a policy that callers of this routine should open and lock the relation whose statistics are updated before calling this routine. We will unlikely have a lot of callers of this routine in the tree, but it seems like a good idea to imply this requirement in the long run. Author: Bertrand Drouvot <bertranddrouvot.pg@gmail.com> Suggested-by: Andres Freund <andres@anarazel.de> Reviewed-by: Chao Li <li.evan.chao@gmail.com> Discussion: https://postgr.es/m/aUEA6UZZkDCQFgSA@ip-10-97-1-34.eu-west-3.compute.internal	2025-12-17 11:26:17 +09:00
Michael Paquier	1d325ad99c	Remove useless code in InjectionPointAttach() strlcpy() ensures that a target string is zero-terminated, so there is no need to enforce it a second time in this code. This simplification could have been done in 0eb23285a257. Author: Feilong Meng <feelingmeng@foxmail.com> Reviewed-by: Chao Li <li.evan.chao@gmail.com> Discussion: https://postgr.es/m/tencent_771178777C5BC17FCB7F7A1771CD1FFD5708@qq.com	2025-12-17 08:58:58 +09:00
Jeff Davis	0a90df58cf	Avoid global LC_CTYPE dependency in pg_locale_icu.c. ICU still depends on libc for compatibility with certain historical behavior for single-byte encodings. Make the dependency explicit by holding a locale_t object when required. We should consider a better solution in the future, such as decoding the text to UTF-32 and using u_tolower(). That would be a behavior change and require additional infrastructure though; so for now, just avoid the global LC_CTYPE dependency. Reviewed-by: Chao Li <li.evan.chao@gmail.com> Discussion: https://postgr.es/m/450ceb6260cad30d7afdf155d991a9caafee7c0d.camel@j-davis.com	2025-12-16 15:32:57 -08:00
Jeff Davis	87b2968df0	downcase_identifier(): use method table from locale provider. Previously, libc's tolower() was always used for lowercasing identifiers, regardless of the database locale (though only characters beyond 127 in single-byte encodings were affected). Refactor to allow each provider to supply its own implementation of identifier downcasing. For historical compatibility, when using a single-byte encoding, ICU still relies on tolower(). One minor behavior change is that, before the database default locale is initialized, it uses ASCII semantics to downcase the identifiers. Previously, it would use the postmaster's LC_CTYPE setting from the environment. While that could have some effect during GUC processing, for example, it would have been fragile to rely on the environment setting anyway. (Also, it only matters when the encoding is single-byte.) Reviewed-by: Chao Li <li.evan.chao@gmail.com> Reviewed-by: Peter Eisentraut <peter@eisentraut.org> Discussion: https://postgr.es/m/450ceb6260cad30d7afdf155d991a9caafee7c0d.camel@j-davis.com	2025-12-16 15:32:41 -08:00
John Naylor	9303d62c6d	Separate out bytea sort support from varlena.c In the wake of commit b45242fd3, bytea_sortsupport() still called out to varstr_sortsupport(). Treating bytea as a kind of text/varchar required varstr_sortsupport() to allow for the possibility of NUL bytes, but only for C collation. This was confusing. For better separation of concerns, create an independent sortsupport implementation in bytea.c. The heuristics for bytea_abbrev_abort() remain the same as for varstr_abbrev_abort(). It's possible that the bytea case warrants different treatment, but that is left for future investigation. In passing, adjust some strange looking comparisons in varstr_abbrev_abort(). Author: Aleksander Alekseev <aleksander@tigerdata.com> Reviewed-by: John Naylor <johncnaylorls@gmail.com> Reviewed-by: Chao Li <li.evan.chao@gmail.com> Discussion: https://postgr.es/m/CAJ7c6TP1bAbEhUJa6+rgceN6QJWMSsxhg1=mqfSN=Nb-n6DAKg@mail.gmail.com	2025-12-16 15:19:16 +07:00
Noah Misch	64bf53dd61	Revisit cosmetics of "For inplace update, send nontransactional invalidations." This removes a never-used CacheInvalidateHeapTupleInplace() parameter. It adds README content about inplace update visibility in logical decoding. It rewrites other comments. Back-patch to v18, where commit 243e9b40f1b2dd09d6e5bf91ebf6e822a2cd3704 first appeared. Since this removes a CacheInvalidateHeapTupleInplace() parameter, expect a v18 ".abi-compliance-history" edit to follow. PGXN contains no calls to that function. Reported-by: Paul A Jungwirth <pj@illuminatedcomputing.com> Reported-by: Ilyasov Ian <ianilyasov@outlook.com> Reviewed-by: Paul A Jungwirth <pj@illuminatedcomputing.com> Reviewed-by: Surya Poondla <s_poondla@apple.com> Discussion: https://postgr.es/m/CA+renyU+LGLvCqS0=fHit-N1J-2=2_mPK97AQxvcfKm+F-DxJA@mail.gmail.com Backpatch-through: 18	2025-12-15 12:19:49 -08:00
Jeff Davis	54c41a6deb	Remove unused single-byte char_is_cased() API. https://postgr.es/m/450ceb6260cad30d7afdf155d991a9caafee7c0d.camel@j-davis.com	2025-12-15 10:24:57 -08:00
Jeff Davis	9c8de15969	Use multibyte-aware extraction of pattern prefixes. Previously, like_fixed_prefix() used char-at-a-time logic, which forced it to be too conservative for case-insensitive matching. Introduce like_fixed_prefix_ci(), and use that for case-insensitive pattern prefixes. It uses multibyte and locale-aware logic, along with the new pg_iswcased() API introduced in 630706ced0. Reviewed-by: Chao Li <li.evan.chao@gmail.com> Reviewed-by: Peter Eisentraut <peter@eisentraut.org> Discussion: https://postgr.es/m/450ceb6260cad30d7afdf155d991a9caafee7c0d.camel@j-davis.com	2025-12-15 10:24:47 -08:00
Amit Kapila	0d2d4a0ec3	Add retry logic to pg_sync_replication_slots(). Previously, pg_sync_replication_slots() would finish without synchronizing slots that didn't meet requirements, rather than failing outright. This could leave some failover slots unsynchronized if required catalog rows or WAL segments were missing or at risk of removal, while the standby continued removing needed data. To address this, the function now waits for the primary slot to advance to a position where all required data is available on the standby before completing synchronization. It retries cyclically until all failover slots that existed on the primary at the start of the call are synchronized. Slots created after the function begins are not included. If the standby is promoted during this wait, the function exits gracefully and the temporary slots will be removed. Author: Ajin Cherian <itsajin@gmail.com> Author: Hou Zhijie <houzj.fnst@fujitsu.com> Reviewed-by: Shveta Malik <shveta.malik@gmail.com> Reviewed-by: Japin Li <japinli@hotmail.com> Reviewed-by: Ashutosh Bapat <ashutosh.bapat.oss@gmail.com> Reviewed-by: Ashutosh Sharma <ashu.coek88@gmail.com> Reviewed-by: Chao Li <li.evan.chao@gmail.com> Reviewed-by: Yilin Zhang <jiezhilove@126.com> Reviewed-by: Amit Kapila <amit.kapila16@gmail.com> Discussion: https://postgr.es/m/CAFPTHDZAA%2BgWDntpa5ucqKKba41%3DtXmoXqN3q4rpjO9cdxgQrw%40mail.gmail.com	2025-12-15 02:50:21 +00:00
Michael Paquier	4ba012a8ed	Allow cumulative statistics to read/write auxiliary data from/to disk Cumulative stats kinds gain the capability to write additional per-entry data when flushing the stats at shutdown, and read this data when loading back the stats at startup. This can be fit for example in the case of variable-length data (like normalized query strings), so as it becomes possible to link the shared memory stats entries to data that is stored in a different area, like a DSA segment. Three new optional callbacks are added to PgStat_KindInfo, available to variable-numbered stats kinds: * to_serialized_data: writes auxiliary data for an entry. * from_serialized_data: reads auxiliary data for an entry. * finish: performs actions after read/write/discard operations. This is invoked after processing all the entries of a kind, allowing extensions to close file handles and clean up resources. Stats kinds have the option to store this data in the existing pgstats file, but can as well store it in one or more additional files whose names can be built upon the entry keys. The new serialized callbacks are called once an entry key is read or written from the main stats file. A file descriptor to the main pgstats file is available in the arguments of the callbacks. Author: Sami Imseih <samimseih@gmail.com> Co-authored-by: Michael Paquier <michael@paquier.xyz> Reviewed-by: Chao Li <li.evan.chao@gmail.com> Discussion: https://postgr.es/m/CAA5RZ0s9SDOu+Z6veoJCHWk+kDeTktAtC-KY9fQ9Z6BJdDUirQ@mail.gmail.com	2025-12-15 09:40:56 +09:00
Tom Lane	58dad7f349	Update typedefs.list to match what the buildfarm currently reports. The current list from the buildfarm includes quite a few typedef names that it used to miss. The reason is a bit obscure, but it seems likely to have something to do with our recent increased use of palloc_object and palloc_array. In any case, this makes the relevant struct declarations be much more nicely formatted, so I'll take it. Install the current list and re-run pgindent to update affected code. Syncing with the current list also removes some obsolete typedef names and fixes some alphabetization errors. Discussion: https://postgr.es/m/1681301.1765742268@sss.pgh.pa.us	2025-12-14 17:03:53 -05:00
Tom Lane	ef5f559b95	Fix jsonb_object_agg crash after eliminating null-valued pairs. In commit b61aa76e4 I added an assumption in jsonb_object_agg_finalfn that it'd be okay to apply uniqueifyJsonbObject repeatedly to a JsonbValue. I should have studied that code more closely first, because in skip_nulls mode it removed leading nulls by changing the "pairs" array start pointer. This broke the data structure's invariants in two ways: pairs no longer references a repalloc-able chunk, and the distance from pairs to the end of its array is less than parseState->size. So any subsequent addition of more pairs is at high risk of clobbering memory and/or causing repalloc to crash. Unfortunately, adding more pairs is exactly what will happen when the aggregate is being used as a window function. Fix by rewriting uniqueifyJsonbObject to not do that. The prior coding had little to recommend it anyway. Reported-by: Alexander Lakhin <exclusion@gmail.com> Author: Tom Lane <tgl@sss.pgh.pa.us> Discussion: https://postgr.es/m/ec5e96fb-ee49-4e5f-8a09-3f72b4780538@gmail.com	2025-12-13 16:18:29 -05:00

1 2 3 4 5 ...

9503 Commits