postgresql

mirror of https://git.postgresql.org/git/postgresql.git synced 2026-02-23 23:07:15 +08:00

Author	SHA1	Message	Date
Tom Lane	6498287696	Handle constant inputs to corr() and related aggregates more precisely. The SQL standard says that corr() and friends should return NULL in the mathematically-undefined case where all the inputs in one of the columns have the same value. We were checking that by seeing if the sums Sxx and Syy were zero, but that approach is very vulnerable to roundoff error: if a sum is close to zero but not exactly that, we'd come out with a pretty silly non-NULL result. Instead, directly track whether the inputs are all equal by remembering the common value in each column. Once we detect that a new input is different from before, represent that by storing NaN for the common value. (An objection to this scheme is that if the inputs are all NaN, we will consider that they were not all equal. But under IEEE float arithmetic rules, one NaN is never equal to another, so this behavior is arguably correct. Moreover it matches what we did before in such cases.) Then, leave the sums at their exact value of zero for as long as we haven't detected different input values. This solution requires the aggregate transition state to contain 8 float values not 6, which is not problematic, and it seems to add less than 1% to the aggregates' runtime, which seems acceptable. While we're here, improve corr()'s final function to cope with overflow/underflow in the final calculation, and to clamp its result to [-1, 1] in case of roundoff error. Although this is arguably a bug fix, it requires a catversion bump due to the change in aggregates' initial states, so it can't be back-patched. Patch written by me, but many of the ideas are due to Dean Rasheed, who also did a deal of testing. Bug: #19340 Reported-by: Oleg Ivanov <o15611@gmail.com> Author: Tom Lane <tgl@sss.pgh.pa.us> Co-authored-by: Dean Rasheed <dean.a.rasheed@gmail.com> Discussion: https://postgr.es/m/19340-6fb9f6637f562092@postgresql.org	2025-12-06 18:31:26 -05:00
Amit Kapila	5db6a344ab	Rename column slotsync_skip_at to slotsync_last_skip. Commit 76b78721ca introduced two new columns in pg_stat_replication_slots to improve monitoring of slot synchronization. One of these columns was named slotsync_skip_at, which is inconsistent with the naming convention used for similar columns in other system views. Columns that store timestamps of the most recent event typically use the 'last_' in the column name (e.g., last_autovacuum, checksum_last_failure). Renaming slotsync_skip_at to slotsync_last_skip aligns with this pattern, making the purpose of the column clearer and improving overall consistency across the views. Author: Shlok Kyal <shlok.kyal.oss@gmail.com> Reviewed-by: Michael Banck <mbanck@gmx.net> Discussion: https://postgr.es/m/20251128091552.GB13635@p46.dedyn.io;lightning.p46.dedyn.io Discussion: https://postgr.es/m/CAE9k0PkhfKrTEAsGz4DjOhEj1nQ+hbQVfvWUxNacD38ibW3a1g@mail.gmail.com	2025-12-05 04:12:55 +00:00
Peter Eisentraut	c6be3daa05	Remove no longer needed casts to Pointer These casts used to be required when Pointer was char , but now it's void (commit 1b2bb5077e9), so they are not needed anymore. Reviewed-by: Chao Li <li.evan.chao@gmail.com> Reviewed-by: Bertrand Drouvot <bertranddrouvot.pg@gmail.com> Discussion: https://www.postgresql.org/message-id/4154950a-47ae-4223-bd01-1235cc50e933%40eisentraut.org	2025-12-04 19:40:08 +01:00
Andres Freund	6c5c393b74	Rename BUFFERPIN wait event class to BUFFER In an upcoming patch more wait events will be added to the wait event class (for buffer locking), making the current name too specific. Alternatively we could introduce a dedicated wait event class for those, but it seems somewhat confusing to have a BUFFERPIN and a BUFFER wait event class. Reviewed-by: Melanie Plageman <melanieplageman@gmail.com> Discussion: https://postgr.es/m/fvfmkr5kk4nyex56ejgxj3uzi63isfxovp2biecb4bspbjrze7@az2pljabhnff	2025-12-03 18:38:20 -05:00
Andres Freund	7902a47c20	Add pg_atomic_unlocked_write_u64 The 64bit equivalent of pg_atomic_unlocked_write_u32(), to be used in an upcoming patch converting BufferDesc.state into a 64bit atomic. Reviewed-by: Melanie Plageman <melanieplageman@gmail.com> Discussion: https://postgr.es/m/fvfmkr5kk4nyex56ejgxj3uzi63isfxovp2biecb4bspbjrze7@az2pljabhnff	2025-12-03 18:38:20 -05:00
Andres Freund	156680055d	bufmgr: Turn BUFFER_LOCK_* into an enum It seems cleaner to use an enum to tie the different values together. It also helps to have a more descriptive type in the argument to various functions. Reviewed-by: Melanie Plageman <melanieplageman@gmail.com> Discussion: https://postgr.es/m/fvfmkr5kk4nyex56ejgxj3uzi63isfxovp2biecb4bspbjrze7@az2pljabhnff	2025-12-03 18:38:20 -05:00
Heikki Linnakangas	789d65364c	Set next multixid's offset when creating a new multixid With this commit, the next multixid's offset will always be set on the offsets page, by the time that a backend might try to read it, so we no longer need the waiting mechanism with the condition variable. In other words, this eliminates "corner case 2" mentioned in the comments. The waiting mechanism was broken in a few scenarios: - When nextMulti was advanced without WAL-logging the next multixid. For example, if a later multixid was already assigned and WAL-logged before the previous one was WAL-logged, and then the server crashed. In that case the next offset would never be set in the offsets SLRU, and a query trying to read it would get stuck waiting for it. Same thing could happen if pg_resetwal was used to forcibly advance nextMulti. - In hot standby mode, a deadlock could happen where one backend waits for the next multixid assignment record, but WAL replay is not advancing because of a recovery conflict with the waiting backend. The old TAP test used carefully placed injection points to exercise the old waiting code, but now that the waiting code is gone, much of the old test is no longer relevant. Rewrite the test to reproduce the IPC/MultixactCreation hang after crash recovery instead, and to verify that previously recorded multixids stay readable. Backpatch to all supported versions. In back-branches, we still need to be able to read WAL that was generated before this fix, so in the back-branches this includes a hack to initialize the next offsets page when replaying XLOG_MULTIXACT_CREATE_ID for the last multixid on a page. On 'master', bump XLOG_PAGE_MAGIC instead to indicate that the WAL is not compatible. Author: Andrey Borodin <amborodin@acm.org> Reviewed-by: Dmitry Yurichev <dsy.075@yandex.ru> Reviewed-by: Álvaro Herrera <alvherre@kurilemu.de> Reviewed-by: Kirill Reshke <reshkekirill@gmail.com> Reviewed-by: Ivan Bykov <i.bykov@modernsys.ru> Reviewed-by: Chao Li <li.evan.chao@gmail.com> Discussion: https://www.postgresql.org/message-id/172e5723-d65f-4eec-b512-14beacb326ce@yandex.ru Backpatch-through: 14	2025-12-03 19:15:08 +02:00
Peter Eisentraut	9790affcce	Fix stray references to SubscriptRef This type never existed. SubscriptingRef was meant instead. Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us> Discussion: https://www.postgresql.org/message-id/flat/2eaa45e3-efc5-4d75-b082-f8159f51445f%40eisentraut.org	2025-12-03 14:44:14 +01:00
Peter Eisentraut	1b2bb5077e	Change Pointer to void * The comment for the Pointer type said 'XXX Pointer arithmetic is done with this, so it can't be void * under "true" ANSI compilers.'. This has been fixed in the previous commit 756a4368932. This now changes the definition of the type from char * to void *, as envisaged by that comment. Extension code that relies on using Pointer for pointer arithmetic will need to make changes similar to commit 756a4368932, but those changes would be backward compatible. Reviewed-by: Chao Li <li.evan.chao@gmail.com> Reviewed-by: Bertrand Drouvot <bertranddrouvot.pg@gmail.com> Discussion: https://www.postgresql.org/message-id/4154950a-47ae-4223-bd01-1235cc50e933%40eisentraut.org	2025-12-03 10:22:17 +01:00
Nathan Bossart	f894acb24a	Show size of DSAs and dshashes in pg_dsm_registry_allocations. Presently, this view reports NULL for the size of DSAs and dshash tables because 1) the current backend might not be attached to them and 2) the registry doesn't save the pointers to the dsa_area or dshash_table in local memory. Also, the view doesn't show partially-initialized entries to avoid ambiguity, since those entries would report a NULL size as well. This commit introduces a function that looks up the size of a DSA given its handle (transiently attaching to the control segment if needed) and teaches pg_dsm_registry_allocations to use it to show the size of successfully-initialized DSA and dshash entries. Furthermore, the view now reports partially-initialized entries with a NULL size. Reviewed-by: Rahila Syed <rahilasyed90@gmail.com> Reviewed-by: Robert Haas <robertmhaas@gmail.com> Reviewed-by: Chao Li <li.evan.chao@gmail.com> Discussion: https://postgr.es/m/aSeEDeznAsHR1_YF%40nathan	2025-12-02 10:29:45 -06:00
Michael Paquier	713d9a847e	Update some timestamp[tz] functions to use soft-error reporting This commit updates two functions that convert "timestamptz" to "timestamp", and vice-versa, to use the soft error reporting rather than a their own logic to do the same. These are now named as follows: - timestamp2timestamptz_safe() - timestamptz2timestamp_safe() These functions were suffixed with "_opt_overflow", previously. This shaves some code, as it is possible to detect how a timestamp[tz] overflowed based on the returned value rather than a custom state. It is optionally possible for the callers of these functions to rely on the error generated internally by these functions, depending on the error context. Similar work has been done in d03668ea0566 and 4246a977bad6. Reviewed-by: Amul Sul <sulamul@gmail.com> Discussion: https://postgr.es/m/aS09YF2GmVXjAxbJ@paquier.xyz	2025-12-02 09:30:23 +09:00
Jeff Davis	19b966243c	Make regex "max_chr" depend on encoding, not provider. The regex mechanism scans through the first "max_chr" character values to cache character property ranges (isalpha, etc.). For single-byte encodings, there's no sense in scanning beyond UCHAR_MAX; but for UTF-8 it makes sense to cache higher code point values (though not all of them; only up to MAX_SIMPLE_CHR). Prior to 5a38104b36, the logic about how many character values to scan was based on the pg_regex_strategy, which was dependent on the provider. Commit 5a38104b36 preserved that logic exactly, allowing different providers to define the "max_chr". Now, change it to depend only on the encoding and whether ctype_is_c. For this specific calculation, distinguishing between providers creates more complexity than it's worth. Discussion: https://postgr.es/m/450ceb6260cad30d7afdf155d991a9caafee7c0d.camel@j-davis.com Reviewed-by: Chao Li <li.evan.chao@gmail.com>	2025-12-01 11:06:17 -08:00
Michael Paquier	a87987cafc	Move WAL sequence code into its own file This split exists for most of the other RMGRs, and makes cleaner the separation between the WAL code, the redo code and the record description code (already in its own file) when it comes to the sequence RMGR. The redo and masking routines are moved to a new file, sequence_xlog.c. All the RMGR routines are now located in a new header, sequence_xlog.h. This separation is useful for a different patch related to sequences that I have been working on, where it makes a refactoring of sequence.c easier if its RMGR routines and its core routines are split. Reviewed-by: Heikki Linnakangas <hlinnaka@iki.fi> Reviewed-by: Kirill Reshke <reshkekirill@gmail.com> Discussion: https://postgr.es/m/aSfTxIWjiXkTKh1E@paquier.xyz	2025-12-01 16:21:41 +09:00
Michael Paquier	d03668ea05	Switch some date/timestamp functions to use the soft error reporting This commit changes some functions related to the data types date and timestamp to use the soft error reporting rather than a custom boolean flag called "overflow", used to let the callers of these functions know if an overflow happens. This results in the removal of some boilerplate code, as it is possible to rely on an error context rather than a custom state, with the possibility to use the error generated inside the functions updated here, if necessary. These functions were suffixed with "_opt_overflow". They are now renamed to use "_safe" as suffix. This work is similar to 4246a977bad6. Author: Amul Sul <sulamul@gmail.com> Reviewed-by: Amit Langote <amitlangote09@gmail.com> Reviewed-by: Michael Paquier <michael@paquier.xyz> Discussion: https://postgr.es/m/CAAJ_b95HEmFyzHZfsdPquSHeswcopk8MCG1Q_vn4tVkZ+xxofw@mail.gmail.com	2025-12-01 15:22:20 +09:00
Peter Eisentraut	8b3e2c622a	Fix pg_isblank() There was a pg_isblank() function that claimed to be a replacement for the standard isblank() function, which was thought to be "not very portable yet". We can now assume that it's portable (it's in C99). But pg_isblank() actually diverged from the standard isblank() by also accepting '\r', while the standard one only accepts space and tab. This was added to support parsing pg_hba.conf under Windows. But the hba parsing code now works completely differently and already handles line endings before we get to pg_isblank(). The other user of pg_isblank() is for ident protocol message parsing, which also handles '\r' separately. So this behavior is now obsolete and confusing. To improve clarity, I separated those concerns. The ident parsing now gets its own function that hardcodes the whitespace characters mentioned by the relevant RFC. pg_isblank() is now static in hba.c and is a wrapper around the standard isblank(), with some extra logic to ensure robust treatment of non-ASCII characters. Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us> Discussion: https://www.postgresql.org/message-id/flat/170308e6-a7a3-4484-87b2-f960bb564afa%40eisentraut.org	2025-11-28 08:33:07 +01:00
Amit Kapila	e68b6adad9	Add slotsync_skip_reason column to pg_replication_slots view. Introduce a new column, slotsync_skip_reason, in the pg_replication_slots view. This column records the reason why the last slot synchronization was skipped. It is primarily relevant for logical replication slots on standby servers where the 'synced' field is true. The value is NULL when synchronization succeeds. Author: Shlok Kyal <shlok.kyal.oss@gmail.com> Reviewed-by: shveta malik <shveta.malik@gmail.com> Reviewed-by: Hayato Kuroda <kuroda.hayato@fujitsu.com> Reviewed-by: Ashutosh Sharma <ashu.coek88@gmail.com> Reviewed-by: Hou Zhijie <houzj.fnst@fujitsu.com> Reviewed-by: Amit Kapila <amit.kapila16@gmail.com> Discussion: https://postgr.es/m/CAE9k0PkhfKrTEAsGz4DjOhEj1nQ+hbQVfvWUxNacD38ibW3a1g@mail.gmail.com	2025-11-28 05:21:35 +00:00
Michael Paquier	9660906dbd	Add routines for marking buffers dirty efficiently This commit introduces new internal bufmgr routines for marking shared buffers as dirty: * MarkDirtyUnpinnedBuffer() * MarkDirtyRelUnpinnedBuffers() * MarkDirtyAllUnpinnedBuffers() These functions provide an efficient mechanism to respectively mark one buffer, all the buffers of a relation, or the entire shared buffer pool as dirty, something that can be useful to force patterns for the checkpointer. MarkDirtyUnpinnedBufferInternal(), an extra routine, is used by these three, to mark as dirty an unpinned buffer. They are intended as developer tools to manipulate buffer dirtiness in bulk, and will be used in a follow-up commit. Author: Nazir Bilal Yavuz <byavuz81@gmail.com> Reviewed-by: Andres Freund <andres@anarazel.de> Reviewed-by: Aidar Imamov <a.imamov@postgrespro.ru> Reviewed-by: Amit Kapila <amit.kapila16@gmail.com> Reviewed-by: Joseph Koshakow <koshy44@gmail.com> Reviewed-by: Michael Paquier <michael@paquier.xyz> Reviewed-by: Yuhang Qiu <iamqyh@gmail.com> Reviewed-by: Xuneng Zhou <xunengzhou@gmail.com> Discussion: https://postgr.es/m/CAN55FZ0h_YoSqqutxV6DES1RW8ig6wcA8CR9rJk358YRMxZFmw@mail.gmail.com	2025-11-28 07:39:33 +09:00
Peter Eisentraut	e7075a3405	Use C11 alignas in pg_atomic_uint64 definitions They were already using pg_attribute_aligned. This replaces that with alignas and moves that into the required syntactic position. This ends up making these three atomics implementations appear a bit more consistent, but shouldn't change anything otherwise. Reviewed-by: Chao Li <li.evan.chao@gmail.com> Discussion: https://www.postgresql.org/message-id/flat/46f05236-d4d4-4b4e-84d4-faa500f14691%40eisentraut.org	2025-11-27 07:53:34 +01:00
David Rowley	0ca3b16973	Add parallelism support for TID Range Scans In v14, bb437f995 added support for scanning for ranges of TIDs using a dedicated executor node for the purpose. Here, we allow these scans to be parallelized. The range of blocks to scan is divvied up similarly to how a Parallel Seq Scans does that, where 'chunks' of blocks are allocated to each worker and the size of those chunks is slowly reduced down to 1 block per worker by the time we're nearing the end of the scan. Doing that means workers finish at roughly the same time. Allowing TID Range Scans to be parallelized removes the dilemma from the planner as to whether a Parallel Seq Scan will cost less than a non-parallel TID Range Scan due to the CPU concurrency of the Seq Scan (disk costs are not divided by the number of workers). It was possible the planner could choose the Parallel Seq Scan which would result in reading additional blocks during execution than the TID Scan would have. Allowing Parallel TID Range Scans removes the trade-off the planner makes when choosing between reduced CPU costs due to parallelism vs additional I/O from the Parallel Seq Scan due to it scanning blocks from outside of the required TID range. There is also, of course, the traditional parallelism performance benefits to be gained as well, which likely doesn't need to be explained here. Author: Cary Huang <cary.huang@highgo.ca> Author: David Rowley <dgrowleyml@gmail.com> Reviewed-by: Junwang Zhao <zhjwpku@gmail.com> Reviewed-by: Rafia Sabih <rafia.pghackers@gmail.com> Reviewed-by: Steven Niu <niushiji@gmail.com> Discussion: https://postgr.es/m/18f2c002a24.11bc2ab825151706.3749144144619388582@highgo.ca	2025-11-27 14:05:04 +13:00
David Rowley	42473b3b31	Have the planner replace COUNT(ANY) with COUNT(), when possible This adds SupportRequestSimplifyAggref to allow pg_proc.prosupport functions to receive an Aggref and allow them to determine if there is a way that the Aggref call can be optimized. Also added is a support function to allow transformation of COUNT(ANY) into COUNT(). This is possible to do when the given "ANY" cannot be NULL and also that there are no ORDER BY / DISTINCT clauses within the Aggref. This is a useful transformation to do as it is common that people write COUNT(1), which until now has added unneeded overhead. When counting a NOT NULL column. The overheads can be worse as that might mean deforming more of the tuple, which for large fact tables may be many columns in. It may be possible to add prosupport functions for other aggregates. We could consider if ORDER BY could be dropped for some calls, e.g. the ORDER BY is quite useless in MAX(c ORDER BY c). There is a little bit of passing fallout from adjusting expr_is_nonnullable() to handle Const which results in a plan change in the aggregates.out regression test. Previously, nothing was able to determine that "One-Time Filter: (100 IS NOT NULL)" was always true, therefore useless to include in the plan. Author: David Rowley <dgrowleyml@gmail.com> Reviewed-by: Corey Huinker <corey.huinker@gmail.com> Reviewed-by: Matheus Alcantara <matheusssilv97@gmail.com> Discussion: https://postgr.es/m/CAApHDvqGcPTagXpKfH=CrmHBqALpziThJEDs_MrPqjKVeDF9wA@mail.gmail.com	2025-11-27 10:43:28 +13:00
Jeff Davis	8d299052fe	Add #define for UNICODE_CASEMAP_BUFSZ. Useful for mapping a single codepoint at a time into a statically-allocated buffer. Discussion: https://postgr.es/m/450ceb6260cad30d7afdf155d991a9caafee7c0d.camel@j-davis.com Reviewed-by: Chao Li <li.evan.chao@gmail.com>	2025-11-26 10:05:11 -08:00
Jeff Davis	ec4997a9d7	Inline pg_ascii_tolower() and pg_ascii_toupper(). Discussion: https://postgr.es/m/450ceb6260cad30d7afdf155d991a9caafee7c0d.camel@j-davis.com Reviewed-by: Chao Li <li.evan.chao@gmail.com>	2025-11-26 10:04:32 -08:00
Peter Eisentraut	8fe4aef829	Replace internal C function pg_hypot() by standard hypot() The code comment said, "It is expected that this routine will eventually be replaced with the C99 hypot() function.", so let's do that now. This function is tested via the geometry regression test, so if it is faulty on any platform, it will show up there. Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us> Discussion: https://www.postgresql.org/message-id/flat/170308e6-a7a3-4484-87b2-f960bb564afa%40eisentraut.org	2025-11-26 07:48:29 +01:00
Amit Kapila	76b78721ca	Add slotsync skip statistics. This patch adds two new columns to the pg_stat_replication_slots view: slotsync_skip_count - the total number of times a slotsync operation was skipped. slotsync_skip_at - the timestamp of the most recent skip. These additions provide better visibility into replication slot synchronization behavior. A future patch will introduce the slotsync_skip_reason column in pg_replication_slots to capture the reason for skip. Author: Shlok Kyal <shlok.kyal.oss@gmail.com> Reviewed-by: shveta malik <shveta.malik@gmail.com> Reviewed-by: Hayato Kuroda <kuroda.hayato@fujitsu.com> Reviewed-by: Ashutosh Sharma <ashu.coek88@gmail.com> Reviewed-by: Amit Kapila <amit.kapila16@gmail.com> Discussion: https://postgr.es/m/CAE9k0PkhfKrTEAsGz4DjOhEj1nQ+hbQVfvWUxNacD38ibW3a1g@mail.gmail.com	2025-11-25 07:06:02 +00:00
Michael Paquier	ed823da128	Rename routines for write/read of pgstats file This commit renames write_chunk and read_chunk to respectively pgstat_write_chunk() and pgstat_read_chunk(), along with the *_s convenience macros. These are made available for plug-ins, so as any code that decides to write and/or read stats data can rely on a single code path for this work. Extracted from a larger patch by the same author. Author: Sami Imseih <samimseih@gmail.com> Discussion: https://postgr.es/m/CAA5RZ0s9SDOu+Z6veoJCHWk+kDeTktAtC-KY9fQ9Z6BJdDUirQ@mail.gmail.com	2025-11-25 10:55:40 +09:00
Tom Lane	698fa924b1	Improve detection of implicitly-temporary views. We've long had a practice of making views temporary by default if they reference any temporary tables. However the implementation was pretty incomplete, in that it only searched for RangeTblEntry references to temp relations. Uses of temporary types, regclass constants, etc were not detected even though the dependency mechanism considers them grounds for dropping the view. Thus a view not believed to be temp could silently go away at session exit anyhow. To improve matters, replace the ad-hoc isQueryUsingTempRelation() logic with use of the dependency-based infrastructure introduced by commit 572c40ba9. This is complete by definition, and it's less code overall. While we're at it, we can also extend the warning NOTICE (or ERROR in the case of a materialized view) to mention one of the temp objects motivating the classification of the view as temp, as was done for functions in 572c40ba9. Author: Tom Lane <tgl@sss.pgh.pa.us> Reviewed-by: Jim Jones <jim.jones@uni-muenster.de> Discussion: https://postgr.es/m/19cf6ae1-04cd-422c-a760-d7e75fe6cba9@uni-muenster.de	2025-11-24 17:00:16 -05:00
Jacob Champion	0664aa4ff8	Reorganize pqcomm.h a bit Group the PG_PROTOCOL() codes, add a comment to AuthRequest now that the AUTH_REQ codes live in a different header, and make some small adjustments to spacing and comment style for the sake of scannability. Reviewed-by: Jelte Fennema-Nio <postgres@jeltef.nl> Reviewed-by: Heikki Linnakangas <hlinnaka@iki.fi> Discussion: https://postgr.es/m/CAOYmi%2B%3D6zg4oXXOQtifrVao_YKiujTDa3u6bxnU08r0FsSig4g%40mail.gmail.com	2025-11-24 10:01:30 -08:00
Jacob Champion	8934f2136c	Add pg_add_size_overflow() and friends Commit 600086f47 added (several bespoke copies of) size_t addition with overflow checks to libpq. Move this to common/int.h, along with its subtraction and multiplication counterparts. pg_neg_size_overflow() is intentionally omitted; I'm not sure we should add SSIZE_MAX to win32_port.h for the sake of a function with no callers. Reviewed-by: Chao Li <li.evan.chao@gmail.com> Reviewed-by: Michael Paquier <michael@paquier.xyz> Discussion: https://postgr.es/m/CAOYmi%2B%3D%2BpqUd2MUitvgW1pAJuXgG_TKCVc3_Ek7pe8z9nkf%2BAg%40mail.gmail.com	2025-11-24 09:59:38 -08:00
Peter Eisentraut	d4c0f91f7d	C11 alignas instead of unions -- extended alignments This replaces some uses of pg_attribute_aligned() with the standard alignas() for cases where extended alignment (larger than max_align_t) is required. This patch stipulates that all supported compilers must support alignments up to PG_IO_ALIGN_SIZE, but that seems pretty likely. We can then also desupport the case where direct I/O is disabled because pg_attribute_aligned is not supported. Reviewed-by: Chao Li <li.evan.chao@gmail.com> Discussion: https://www.postgresql.org/message-id/flat/46f05236-d4d4-4b4e-84d4-faa500f14691%40eisentraut.org	2025-11-24 07:39:37 +01:00
David Rowley	07d1dc3aeb	Fix incorrect IndexOptInfo header comment The comment incorrectly indicated that indexcollations[] stored collations for both key columns and INCLUDE columns, but in reality it only has elements for the key columns. canreturn[] didn't get a mention, so add that while we're here. Author: Junwang Zhao <zhjwpku@gmail.com> Reviewed-by: David Rowley <dgrowleyml@gmail.com> Discussion: https://postgr.es/m/CAEG8a3LwbZgMKOQ9CmZarX5DEipKivdHp5PZMOO-riL0w%3DL%3D4A%40mail.gmail.com Backpatch-through: 14	2025-11-24 17:00:01 +13:00
Tom Lane	572c40ba94	Issue a NOTICE if a created function depends on any temp objects. We don't have an official concept of temporary functions. (You can make one explicitly in pg_temp, but then you have to explicitly schema-qualify it on every call.) However, until now we were quite laissez-faire about whether a non-temporary function could depend on a temporary object, such as a temp table or view. If one does, it will silently go away at end of session, due to the automatic DROP ... CASCADE on the session's temporary objects. People have complained that that's surprising; however, we can't really forbid it because other people (including our own regression tests) rely on being able to do it. Let's compromise by emitting a NOTICE at CREATE FUNCTION time. This is somewhat comparable to our ancient practice of emitting a NOTICE when forcing a view to become temp because it depends on temp tables. Along the way, refactor recordDependencyOnExpr() so that the dependencies of an expression can be combined with other dependencies, instead of being emitted separately and perhaps duplicatively. We should probably make the implementation of temp-by-default views use the same infrastructure used here, but that's for another patch. It's unclear whether there are any other object classes that deserve similar treatment. Author: Jim Jones <jim.jones@uni-muenster.de> Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us> Discussion: https://postgr.es/m/19cf6ae1-04cd-422c-a760-d7e75fe6cba9@uni-muenster.de	2025-11-23 15:02:55 -05:00
Tom Lane	b140c8d7a3	Add SupportRequestInlineInFrom planner support request. This request allows a support function to replace a function call appearing in FROM (typically a set-returning function) with an equivalent SELECT subquery. The subquery will then be subject to the planner's usual optimizations, potentially allowing a much better plan to be generated. While the planner has long done this automatically for simple SQL-language functions, it's now possible for extensions to do it for functions outside that group. Notably, this could be useful for functions that are presently implemented in PL/pgSQL and work by generating and then EXECUTE'ing a SQL query. Author: Paul A Jungwirth <pj@illuminatedcomputing.com> Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us> Discussion: https://postgr.es/m/09de6afa-c33d-4d94-a5cb-afc6cea0d2bb@illuminatedcomputing.com	2025-11-22 19:33:34 -05:00
Peter Eisentraut	5eed8ce50c	Add range_minus_multi and multirange_minus_multi functions The existing range_minus function raises an exception when the range is "split", because then the result can't be represented by a single range. For example '[0,10)'::int4range - '[4,5)' would be '[0,4)' and '[5,10)'. This commit adds new set-returning functions so that callers can get results even in the case of splits. There is no risk of an exception for multiranges, but a set-returning function lets us handle them the same way we handle ranges. Both functions return zero results if the subtraction would give an empty range/multirange. The main use-case for these functions is to implement UPDATE/DELETE FOR PORTION OF, which must compute the application-time of "temporal leftovers": the part of history in an updated/deleted row that was not changed. To preserve the untouched history, we will implicitly insert one record for each result returned by range/multirange_minus_multi. Using a set-returning function will also let us support user-defined types for application-time update/delete in the future. Author: Paul A. Jungwirth <pj@illuminatedcomputing.com> Reviewed-by: Peter Eisentraut <peter@eisentraut.org> Reviewed-by: Chao Li <li.evan.chao@gmail.com> Discussion: https://www.postgresql.org/message-id/flat/ec498c3d-5f2b-48ec-b989-5561c8aa2024%40illuminatedcomputing.com	2025-11-22 09:42:03 +01:00
Peter Eisentraut	97e04c74be	C11 alignas instead of unions This changes a few union members that only existed to ensure alignments and replaces them with the C11 alignas specifier. This change only uses fundamental alignments (meaning approximately alignments of basic types), which all C11 compilers must support. There are opportunities for similar changes using extended alignments, for example in PGIOAlignedBlock, but these are not necessarily supported by all compilers, so they are kept as a separate change. Reviewed-by: Chao Li <li.evan.chao@gmail.com> Discussion: https://www.postgresql.org/message-id/flat/46f05236-d4d4-4b4e-84d4-faa500f14691%40eisentraut.org	2025-11-21 10:08:24 +01:00
Melanie Plageman	1937ed7062	Refactor heap_page_prune_and_freeze() parameters into a struct heap_page_prune_and_freeze() had accumulated an unwieldy number of input parameters and upcoming work to handle VM updates in this function will add even more. Introduce a new PruneFreezeParams struct to group the function’s input parameters, improving readability and maintainability. Author: Melanie Plageman <melanieplageman@gmail.com> Reviewed-by: Kirill Reshke <reshkekirill@gmail.com> Discussion: https://postgr.es/m/yn4zp35kkdsjx6wf47zcfmxgexxt4h2og47pvnw2x5ifyrs3qc%407uw6jyyxuyf7	2025-11-20 10:32:14 -05:00
Peter Eisentraut	300c8f5324	Add <stdalign.h> to c.h This allows using the C11 constructs alignas and alignof (not done in this patch). Reviewed-by: Chao Li <li.evan.chao@gmail.com> Discussion: https://www.postgresql.org/message-id/flat/46f05236-d4d4-4b4e-84d4-faa500f14691%40eisentraut.org	2025-11-19 08:18:25 +01:00
Alexander Korotkov	75e82b2f5a	Optimize shared memory usage for WaitLSNProcInfo We need separate pairing heaps for different WaitLSNType's, because there might be waiters for different LSN's at the same time. However, one process can wait only for one type of LSN at a time. So, no need for inHeap and heapNode fields to be arrays. Discussion: https://postgr.es/m/CAPpHfdsBR-7sDtXFJ1qpJtKiohfGoj%3DvqzKVjWxtWsWidx7G_A%40mail.gmail.com Author: Alexander Korotkov <aekorotkov@gmail.com> Reviewed-by: Xuneng Zhou <xunengzhou@gmail.com>	2025-11-18 09:50:12 +02:00
Amit Kapila	3edaf29fa5	Rename two columns in pg_stat_subscription_stats. This patch renames the sync_error_count column to sync_table_error_count in the pg_stat_subscription_stats view. The new name makes the purpose explicit now that a separate column exists to track sequence synchronization errors. Additionally, the column seq_sync_error_count is renamed to sync_seq_error_count to maintain a consistent naming pattern, making it easier for users to group, and query synchronization related counters. Author: Vignesh C <vignesh21@gmail.com> Reviewed-by: Peter Smith <smithpb2250@gmail.com> Reviewed-by: Chao Li <li.evan.chao@gmail.com> Discussion: https://postgr.es/m/CALDaNm3WwJmz=-4ybTkhniB-Nf3qmFG9Zx1uKjyLLoPF5NYYXA@mail.gmail.com	2025-11-18 03:58:55 +00:00
Michael Paquier	e76defbcf0	Rework output format of pg_dependencies The existing format of pg_dependencies uses a single-object JSON structure, with each key value embedding all the knowledge about the set attributes tracked, like: {"1 => 5": 1.000000, "5 => 1": 0.423130} While this is a very compact format, it is confusing to read and it is difficult to manipulate the values within the object, particularly when tracking multiple attributes. The new output format introduced in this commit is a JSON array of objects, with: - A key named "degree", with a float value. - A key named "attributes", with an array of attribute numbers. - A key named "dependency", with an attribute number. The values use the same underlying type as previously when printed, with a new output format that shows now as follows: [{"degree": 1.000000, "attributes": [1], "dependency": 5}, {"degree": 0.423130, "attributes": [5], "dependency": 1}] This new format will become handy for a follow-up set of changes, so as it becomes possible to inject extended statistics rather than require an ANALYZE, like in a dump/restore sequence or after pg_upgrade on a new cluster. This format has been suggested by Tomas Vondra. The key names are defined in the header introduced by 1f927cce4498, to ease the integration of frontend-specific changes that are still under discussion. (Again a personal note: if anybody comes up with better name for the keys, of course feel free.) The bulk of the changes come from the regression tests, where jsonb_pretty() is now used to make the outputs generated easier to parse. Author: Corey Huinker <corey.huinker@gmail.com> Reviewed-by: Jian He <jian.universality@gmail.com> Reviewed-by: Michael Paquier <michael@paquier.xyz> Discussion: https://postgr.es/m/CADkLM=dpz3KFnqP-dgJ-zvRvtjsa8UZv8wDAQdqho=qN3kX0Zg@mail.gmail.com	2025-11-17 10:44:26 +09:00
Michael Paquier	1f927cce44	Rework output format of pg_ndistinct The existing format of pg_ndistinct uses a single-object JSON structure where each key is itself a comma-separated list of attnums, like: {"3, 4": 11, "3, 6": 11, "4, 6": 11, "3, 4, 6": 11} While this is a very compact format, it is confusing to read and it is difficult to manipulate the values within the object. The new output format introduced in this commit is an array of objects, with: - A key named "attributes", that contains an array of attribute numbers. - A key named "ndistinct", represented as an integer. The values use the same underlying type as previously when printed, with a new output format that shows now as follows: [{"ndistinct": 11, "attributes": [3,4]}, {"ndistinct": 11, "attributes": [3,6]}, {"ndistinct": 11, "attributes": [4,6]}, {"ndistinct": 11, "attributes": [3,4,6]}] This new format will become handy for a follow-up set of changes, so as it becomes possible to inject extended statistics rather than require an ANALYZE, like in a dump/restore sequence or after pg_upgrade on a new cluster. This format has been suggested by Tomas Vondra. The key names are defined in a new header, to ease with the integration of frontend-specific changes that are still under discussion. (Personal note: I am not specifically wedded to these key names, but if there are better name suggestions for this release, feel free.) The bulk of the changes come from the regression tests, where jsonb_pretty() is now used to make the outputs generated easier to parse. Author: Corey Huinker <corey.huinker@gmail.com> Reviewed-by: Jian He <jian.universality@gmail.com> Reviewed-by: Michael Paquier <michael@paquier.xyz> Discussion: https://postgr.es/m/CADkLM=dpz3KFnqP-dgJ-zvRvtjsa8UZv8wDAQdqho=qN3kX0Zg@mail.gmail.com	2025-11-17 09:52:20 +09:00
David Rowley	586d63214e	Adjust MemSet macro to use size_t rather than long Likewise for MemSetAligned. "long" wasn't the most suitable type for these macros as with MSVC in 64-bit builds, sizeof(long) == 4, which is narrower than the processor's word size, therefore these macros had to perform twice as many loops as they otherwise might. Author: David Rowley <dgrowleyml@gmail.com> Reviewed-by: Heikki Linnakangas <hlinnaka@iki.fi> Reviewed-by: Peter Eisentraut <peter@eisentraut.org> Discussion: https://postgr.es/m/CAApHDvoGFjSA3aNyVQ3ivbyc4ST=CC5L-_VjEUQ92HbE2Cxovg@mail.gmail.com	2025-11-17 12:27:00 +13:00
David Rowley	9c047da51f	Get rid of long datatype in CATCACHE_STATS enabled builds "long" is 32 bits on Windows 64-bit. Switch to a datatype that's 64-bit on all platforms. While we're there, use an unsigned type as these fields count things that have occurred, of which it's not possible to have negative numbers of. Author: David Rowley <dgrowleyml@gmail.com> Reviewed-by: Heikki Linnakangas <hlinnaka@iki.fi> Reviewed-by: Peter Eisentraut <peter@eisentraut.org> Discussion: https://postgr.es/m/CAApHDvoGFjSA3aNyVQ3ivbyc4ST=CC5L-_VjEUQ92HbE2Cxovg@mail.gmail.com	2025-11-17 12:26:41 +13:00
Alexander Korotkov	23792d7381	Fix incorrect function name in comments Update comments to reference WaitForLSN() instead of the outdated WaitForLSNReplay() function name. Discussion: https://postgr.es/m/CABPTF7UieOYbOgH3EnQCasaqcT1T4N6V2wammwrWCohQTnD_Lw%40mail.gmail.com Author: Xuneng Zhou <xunengzhou@gmail.com> Reviewed-by: Alexander Korotkov <aekorotkov@gmail.com>	2025-11-15 12:27:42 +02:00
Michael Paquier	84fb27511d	Replace off_t by pgoff_t in I/O routines PostgreSQL's Windows port has never been able to handle files larger than 2GB due to the use of off_t for file offsets, only 32-bit on Windows. This causes signed integer overflow at exactly 2^31 bytes when trying to handle files larger than 2GB, for the routines touched by this commit. Note that large files are forbidden by ./configure (3c6248a828af) and meson (recent change, see 79cd66f28c65). This restriction also exists in v16 and older versions for the now-dead MSVC scripts. The code base already defines pgoff_t as __int64 (64-bit) on Windows for this purpose, and some function declarations in headers use it, but many internals still rely on off_t. This commit switches more routines to use pgoff_t, offering more portability, for areas mainly related to file extensions and storage. These are not critical for WAL segments yet, which have currently a maximum size allowed of 1GB (well, this opens the door at allowing a larger size for them). This matters more for segment files if we want to lift the large file restriction in ./configure and meson in the future, which would make sense to remove once/if all traces of off_t are gone from the tree. This can additionally matter for out-of-core code that may want files larger than 2GB in places where off_t is four bytes in size. Note that off_t is still used in other parts of the tree like buffile.c, WAL sender/receiver, base backup, pg_combinebackup, etc. These other code paths can be addressed separately, and their update will be required if we want to remove the large file restriction in the future. This commit is a good first cut in itself towards more portability, hopefully. On Unix-like systems, pgoff_t is defined as off_t, so this change only affects Windows behavior. Author: Bryan Green <dbryan.green@gmail.com> Reviewed-by: Thomas Munro <thomas.munro@gmail.com> Reviewed-by: Michael Paquier <michael@paquier.xyz> Discussion: https://postgr.es/m/0f238ff4-c442-42f5-adb8-01b762c94ca1@gmail.com	2025-11-13 12:41:40 +09:00
Heikki Linnakangas	8eeb4a0f7c	Fix bug where we truncated CLOG that was still needed by LISTEN/NOTIFY The async notification queue contains the XID of the sender, and when processing notifications we call TransactionIdDidCommit() on the XID. But we had no safeguards to prevent the CLOG segments containing those XIDs from being truncated away. As a result, if a backend didn't for some reason process its notifications for a long time, or when a new backend issued LISTEN, you could get an error like: test=# listen c21; ERROR: 58P01: could not access status of transaction 14279685 DETAIL: Could not open file "pg_xact/000D": No such file or directory. LOCATION: SlruReportIOError, slru.c:1087 To fix, make VACUUM "freeze" the XIDs in the async notification queue before truncating the CLOG. Old XIDs are replaced with FrozenTransactionId or InvalidTransactionId. Note: This commit is not a full fix. A race condition remains, where a backend is executing asyncQueueReadAllNotifications() and has just made a local copy of an async SLRU page which contains old XIDs, while vacuum concurrently truncates the CLOG covering those XIDs. When the backend then calls TransactionIdDidCommit() on those XIDs from the local copy, you still get the error. The next commit will fix that remaining race condition. This was first reported by Sergey Zhuravlev in 2021, with many other people hitting the same issue later. Thanks to: - Alexandra Wang, Daniil Davydov, Andrei Varashen and Jacques Combrink for investigating and providing reproducable test cases, - Matheus Alcantara and Arseniy Mukhin for review and earlier proposed patches to fix this, - Álvaro Herrera and Masahiko Sawada for reviews, - Yura Sokolov aka funny-falcon for the idea of marking transactions as committed in the notification queue, and - Joel Jacobson for the final patch version. I hope I didn't forget anyone. Backpatch to all supported versions. I believe the bug goes back all the way to commit d1e027221d, which introduced the SLRU-based async notification queue. Discussion: https://www.postgresql.org/message-id/16961-25f29f95b3604a8a@postgresql.org Discussion: https://www.postgresql.org/message-id/18804-bccbbde5e77a68c2@postgresql.org Discussion: https://www.postgresql.org/message-id/CAK98qZ3wZLE-RZJN_Y%2BTFjiTRPPFPBwNBpBi5K5CU8hUHkzDpw@mail.gmail.com Backpatch-through: 14	2025-11-12 20:59:36 +02:00
Álvaro Herrera	877a024902	Split out innards of pg_tablespace_location() This creates a src/backend/catalog/pg_tablespace.c supporting file containing a new function get_tablespace_location(), which lets the code underlying pg_tablespace_location() be reused for other purposes. Author: Manni Wood <manni.wood@enterprisedb.com> Author: Nishant Sharma <nishant.sharma@enterprisedb.com> Reviewed-by: Vaibhav Dalvi <vaibhav.dalvi@enterprisedb.com> Reviewed-by: Ian Lawrence Barwick <barwick@gmail.com> Reviewed-by: Jim Jones <jim.jones@uni-muenster.de> Reviewed-by: Álvaro Herrera <alvherre@kurilemu.de> Discussion: https://postgr.es/m/CAKWEB6rmnmGKUA87Zmq-s=b3Scsnj02C0kObQjnbL2ajfPWGEw@mail.gmail.com	2025-11-12 16:39:55 +01:00
Peter Eisentraut	d2f24df19b	Clean up qsort comparison function for GUC entries guc_var_compare() is invoked from qsort() on an array of struct config_generic, but the function accesses these directly as strings (char *). This relies on the name being the first field, so this works. But we can write this more clearly by using the struct and then accessing the field through the struct. Before the reorganization of the GUC structs (commit a13833c35f9), the old code was probably more convenient, but now we can write this more clearly and correctly. After this change, it is no longer required that the name is the first field in struct config_generic, so remove that comment. Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us> Discussion: https://www.postgresql.org/message-id/2c961fa1-14f6-44a2-985c-e30b95654e8d%40eisentraut.org	2025-11-11 07:55:10 +01:00
Heikki Linnakangas	e510378358	Bump PG_CONTROL_VERSION for commit 3e0ae46d90 Commit 3e0ae46d90 added a field to ControlFileData and bumped CATALOG_VERSION_NO, but CATALOG_VERSION_NO is not the right version number for ControlFileData changes. Bumping either one will force an initdb, but PG_CONTROL_VERSION is more accurate. Bump PG_CONTROL_VERSION now. Reported-by: Tom Lane <tgl@sss.pgh.pa.us> Discussion: https://www.postgresql.org/message-id/1874404.1762787779@sss.pgh.pa.us	2025-11-10 19:12:43 +02:00
Nathan Bossart	5e4fcbe531	Check for CREATE privilege on the schema in CREATE STATISTICS. This omission allowed table owners to create statistics in any schema, potentially leading to unexpected naming conflicts. For ALTER TABLE commands that require re-creating statistics objects, skip this check in case the user has since lost CREATE on the schema. The addition of a second parameter to CreateStatistics() breaks ABI compatibility, but we are unaware of any impacted third-party code. Reported-by: Jelte Fennema-Nio <postgres@jeltef.nl> Author: Jelte Fennema-Nio <postgres@jeltef.nl> Co-authored-by: Nathan Bossart <nathandbossart@gmail.com> Reviewed-by: Noah Misch <noah@leadboat.com> Reviewed-by: Álvaro Herrera <alvherre@kurilemu.de> Security: CVE-2025-12817 Backpatch-through: 13	2025-11-10 09:00:00 -06:00
Heikki Linnakangas	3e0ae46d90	Move SLRU_PAGES_PER_SEGMENT to pg_config_manual.h It seems plausible that someone might want to experiment with different values. The pressing reason though is that I'm reviewing a patch that requires pg_upgrade to manipulate SLRU files. That patch needs to access SLRU_PAGES_PER_SEGMENT from pg_upgrade code, and slru.h, where SLRU_PAGES_PER_SEGMENT is currently defined, cannot be included from frontend code. Moving it to pg_config_manual.h makes it accessible. Now that it's a little more likely that someone might change SLRU_PAGES_PER_SEGMENT, add a cluster compatibility check for it. Bump catalog version because of the new field in the control file. Reviewed-by: Daniel Gustafsson <daniel@yesql.se> Reviewed-by: Álvaro Herrera <alvherre@kurilemu.de> Discussion: https://www.postgresql.org/message-id/c7a4ea90-9f7b-4953-81be-b3fcb47db057@iki.fi	2025-11-10 16:11:41 +02:00

1 2 3 4 5 ...

12537 Commits