postgresql

mirror of https://git.postgresql.org/git/postgresql.git synced 2026-02-08 07:27:37 +08:00

Author	SHA1	Message	Date
Robert Haas	e62e73f3a2	gist: fix typo "split(t)ed" -> "split" Dagfinn Ilmari Mannsåker, reviewed by Shubham Khanna. Discussion: http://postgr.es/m/87le9fmi01.fsf@wibble.ilmari.org	2024-01-02 12:24:28 -05:00
Robert Haas	0d9937d118	Fix typos in comments and in one isolation test. Dagfinn Ilmari Mannsåker, reviewed by Shubham Khanna. Some subtractions by me. Discussion: http://postgr.es/m/87le9fmi01.fsf@wibble.ilmari.org	2024-01-02 12:05:41 -05:00
Peter Eisentraut	141752bbb0	Fix typos in simplehash.h Author: Richard Guo <guofenglinux@gmail.com> Discussion: https://www.postgresql.org/message-id/flat/18252-d46d27900a277d87@postgresql.org	2024-01-02 10:18:47 +01:00
Amit Kapila	9a17be1e24	Allow upgrades to preserve the full subscription's state. This feature will allow us to replicate the changes on subscriber nodes after the upgrade. Previously, only the subscription metadata information was preserved. Without the list of relations and their state, it's not possible to re-enable the subscriptions without missing some records as the list of relations can only be refreshed after enabling the subscription (and therefore starting the apply worker). Even if we added a way to refresh the subscription while enabling a publication, we still wouldn't know which relations are new on the publication side, and therefore should be fully synced, and which shouldn't. To preserve the subscription relations, this patch teaches pg_dump to restore the content of pg_subscription_rel from the old cluster by using binary_upgrade_add_sub_rel_state SQL function. This is supported only in binary upgrade mode. The subscription's replication origin is needed to ensure that we don't replicate anything twice. To preserve the replication origins, this patch teaches pg_dump to update the replication origin along with creating a subscription by using binary_upgrade_replorigin_advance SQL function to restore the underlying replication origin remote LSN. This is supported only in binary upgrade mode. pg_upgrade will check that all the subscription relations are in 'i' (init) or in 'r' (ready) state and will error out if that's not the case, logging the reason for the failure. This helps to avoid the risk of any dangling slot or origin after the upgrade. Author: Vignesh C, Julien Rouhaud, Shlok Kyal Reviewed-by: Peter Smith, Masahiko Sawada, Michael Paquier, Amit Kapila, Hayato Kuroda Discussion: https://postgr.es/m/20230217075433.u5mjly4d5cr4hcfe@jrouhaud	2024-01-02 08:08:46 +05:30
Tomas Vondra	6c63bcbf3c	Minor cleanup of the BRIN parallel build code Commit b437571714 added support for parallel builds for BRIN indexes, using code similar to BTREE parallel builds, and also a new tuplesort variant. This commit simplifies the new code in two ways: * The "spool" grouping tuplesort and the heap/index is not necessary. The heap/index are available as separate arguments, causing confusion. So remove the spool, and use the tuplesort directly. * The new tuplesort variant does not need the heap/index, as it sorts simply by the range block number, without accessing the tuple data. So simplify that too. Initial report and patch by Ranier Vilela, further cleanup by me. Author: Ranier Vilela Discussion: https://postgr.es/m/CAEudQAqD7f2i4iyEaAz-5o-bf6zXVX-AkNUBm-YjUXEemaEh6A%40mail.gmail.com	2023-12-30 23:15:04 +01:00
Peter Eisentraut	a740b213d4	Add GUC backtrace_on_internal_error When enabled (default off), this logs a backtrace anytime elog() or an equivalent ereport() for internal errors is called. This is not well covered by the existing backtrace_functions, because there are many equally-worded low-level errors in many functions. And if you find out where the error is, then you need to manually rewrite the elog() to ereport() to attach the errbacktrace(), which is annoying. Having a backtrace automatically on every elog() call could be very helpful during development for various kinds of common errors from palloc, syscache, node support, etc. Discussion: https://www.postgresql.org/message-id/flat/ba76c6bc-f03f-4285-bf16-47759cfcab9e@eisentraut.org	2023-12-30 11:43:57 +01:00
Peter Eisentraut	c538592959	Make all Perl warnings fatal There are a lot of Perl scripts in the tree, mostly code generation and TAP tests. Occasionally, these scripts produce warnings. These are probably always mistakes on the developer side (true positives). Typical examples are warnings from genbki.pl or related when you make a mess in the catalog files during development, or warnings from tests when they massage a config file that looks different on different hosts, or mistakes during merges (e.g., duplicate subroutine definitions), or just mistakes that weren't noticed because there is a lot of output in a verbose build. This changes all warnings into fatal errors, by replacing use warnings; by use warnings FATAL => 'all'; in all Perl files. Discussion: https://www.postgresql.org/message-id/flat/06f899fd-1826-05ab-42d6-adeb1fd5e200%40eisentraut.org	2023-12-29 18:20:00 +01:00
Tom Lane	58054de2d0	Improve the implementation of information_schema._pg_expandarray(). This function was originally coded with a handmade expansion of the array subscripts. We can do it a little faster and far more legibly today, by using unnest() WITH ORDINALITY. While at it, let's apply the rowcount estimation support that exists for the underlying unnest() function: reduce the default ROWS estimate to 100 and attach array_unnest_support. I'm not sure that array_unnest_support can do anything useful today with the call sites that exist in information_schema, but it can't hurt, and the existing default rowcount of 1000 is surely much too high for any of these cases. The psql.sql regression script is using _pg_expandarray() as a test case for \sf+. While we could keep doing so, the new one-line function body makes a poor test case for \sf+ row-numbering, so switch it to print another information_schema function. Discussion: https://postgr.es/m/1424303.1703355485@sss.pgh.pa.us	2023-12-27 15:55:46 -05:00
Alexander Korotkov	7e6fb5da41	Improvements and fixes for e0b1ee17dc e0b1ee17dc introduced optimization for matching B-tree scan keys required for the directional scan. However, it incorrectly assumed that all keys required for opposite direction scan are satisfied by _bt_first(). It has been illustrated that with multiple scan keys over the same column, a lesser one (according to the scan direction) could win leaving the other one unsatisfied. Instead of relying on _bt_first() this commit introduces code that memorizes whether there was at least one match on the page. If that's true we know that keys required for opposite-direction scan are satisfied as soon as corresponding values are not NULLs. Also, this commit simplifies the description for the optimization of keys required for the current direction scan. Now the flag used for this is named continuescanPrechecked and means exactly that *continuescan flag is known to be true for the last item on the page. Reported-by: Peter Geoghegan Discussion: https://postgr.es/m/CAH2-Wzn0LeLcb1PdBnK0xisz8NpHkxRrMr3NWJ%2BKOK-WZ%2BQtTQ%40mail.gmail.com Reviewed-by: Pavel Borisov	2023-12-27 14:35:08 +02:00
Alexander Korotkov	06b10f80ba	Remove BTScanOpaqueData.firstPage It's not necessary to keep the firstPage flag as a field of BTScanOpaqueData. This commit makes it an argument of the _bt_readpage() function. We can easily distinguish first-time and repeated calls (within the scan) of this function. Reported-by: Peter Geoghegan Discussion: https://postgr.es/m/CAH2-Wzk4SOsw%2BtHuTFiz8U9Jqj-R77rYPkhWKODCBb1mdHACXA%40mail.gmail.com Reviewed-by: Pavel Borisov	2023-12-27 14:21:49 +02:00
Alexander Korotkov	7d58f2342b	REALLOCATE_BITMAPSETS manual compile-time option This option forces each bitmapset modification to reallocate bitmapset. This is useful for debugging hangling pointers to bitmapset's. Discussion: https://postgr.es/m/CAMbWs4_wJthNtYBL%2BSsebpgF-5L2r5zFFk6xYbS0A78GKOTFHw%40mail.gmail.com Reviewed-by: Richard Guo, Andres Freund, Ashutosh Bapat, Andrei Lepikhov	2023-12-27 03:57:57 +02:00
Alexander Korotkov	12915a58ee	Enhance checkpointer restartpoint statistics Bhis commit introduces enhancements to the pg_stat_checkpointer view by adding three new columns: restartpoints_timed, restartpoints_req, and restartpoints_done. These additions aim to improve the visibility and monitoring of restartpoint processes on replicas. Previously, it was challenging to differentiate between successful and failed restartpoint requests. This limitation arises because restartpoints on replicas are dependent on checkpoint records from the primary, and cannot occur more frequently than these checkpoints. The new columns allow for clear distinction and tracking of restartpoint requests, their triggers, and successful completions. This enhancement aids database administrators and developers in better understanding and diagnosing issues related to restartpoint behavior, particularly in scenarios where restartpoint requests may fail. System catalog is changed. Catversion is bumped. Discussion: https://postgr.es/m/99b2ccd1-a77a-962a-0837-191cdf56c2b9%40inbox.ru Author: Anton A. Melnikov Reviewed-by: Kyotaro Horiguchi, Alexander Korotkov	2023-12-25 01:12:36 +02:00
Robert Haas	49f2194ed5	Fix numerous typos in incremental backup commits. Apparently, spell check would have been a really good idea. Alexander Lakhin, with a few additions as per an off-list report from Andres Freund. Discussion: http://postgr.es/m/f08f7c60-1ad3-0b57-d580-54b11f07cddf@gmail.com	2023-12-21 15:36:17 -05:00
Robert Haas	dc21234005	Add support for incremental backup. To take an incremental backup, you use the new replication command UPLOAD_MANIFEST to upload the manifest for the prior backup. This prior backup could either be a full backup or another incremental backup. You then use BASE_BACKUP with the INCREMENTAL option to take the backup. pg_basebackup now has an --incremental=PATH_TO_MANIFEST option to trigger this behavior. An incremental backup is like a regular full backup except that some relation files are replaced with files with names like INCREMENTAL.${ORIGINAL_NAME}, and the backup_label file contains additional lines identifying it as an incremental backup. The new pg_combinebackup tool can be used to reconstruct a data directory from a full backup and a series of incremental backups. Patch by me. Reviewed by Matthias van de Meent, Dilip Kumar, Jakub Wartak, Peter Eisentraut, and Álvaro Herrera. Thanks especially to Jakub for incredibly helpful and extensive testing. Discussion: http://postgr.es/m/CA+TgmoYOYZfMCyOXFyC-P+-mdrZqm5pP2N7S-r0z3_402h9rsA@mail.gmail.com	2023-12-20 09:49:12 -05:00
Robert Haas	174c480508	Add a new WAL summarizer process. When active, this process writes WAL summary files to $PGDATA/pg_wal/summaries. Each summary file contains information for a certain range of LSNs on a certain TLI. For each relation, it stores a "limit block" which is 0 if a relation is created or destroyed within a certain range of WAL records, or otherwise the shortest length to which the relation was truncated during that range of WAL records, or otherwise InvalidBlockNumber. In addition, it stores a list of blocks which have been modified during that range of WAL records, but excluding blocks which were removed by truncation after they were modified and never subsequently modified again. In other words, it tells us which blocks need to copied in case of an incremental backup covering that range of WAL records. But this doesn't yet add the capability to actually perform an incremental backup; the next patch will do that. A new parameter summarize_wal enables or disables this new background process. The background process also automatically deletes summary files that are older than wal_summarize_keep_time, if that parameter has a non-zero value and the summarizer is configured to run. Patch by me, with some design help from Dilip Kumar and Andres Freund. Reviewed by Matthias van de Meent, Dilip Kumar, Jakub Wartak, Peter Eisentraut, and Álvaro Herrera. Discussion: http://postgr.es/m/CA+TgmoYOYZfMCyOXFyC-P+-mdrZqm5pP2N7S-r0z3_402h9rsA@mail.gmail.com	2023-12-20 08:42:28 -05:00
Robert Haas	aafc07c7a1	Move src/bin/pg_verifybackup/parse_manifest.c into src/common. This makes it possible for the code to be easily reused by other client-side tools, and/or by the server. Patch by me. Review of this patch in particular by at least Peter Eisentraut; reviewers for the patch series in general include Dilip Kumar, Andres Fruend, David Steele, Álvaro Herrera, and Jakub Wartak. Discussion: http://postgr.es/m/CA+TgmoZ6UGZVnSy5iak6s6+AXu_DewXovDjhLs3-su6nmU_x_g@mail.gmail.com	2023-12-19 15:24:31 -05:00
Tom Lane	7e1ce2b3de	Prevent integer overflow when forming tuple width estimates. It's at least theoretically possible to overflow int32 when adding up column width estimates to make a row width estimate. (The bug example isn't terribly convincing as a real use-case, but perhaps wide joins would provide a more plausible route to trouble.) This'd lead to assertion failures or silly planner behavior. To forestall it, make the relevant functions compute their running sums in int64 arithmetic and then clamp to int32 range at the end. We can reasonably assume that MaxAllocSize is a hard limit on actual tuple width, so clamping to that is simply a correction for dubious input values, and there's no need to go as far as widening width variables to int64 everywhere. Per bug #18247 from RekGRpth. There've been no reports of this issue arising in practical cases, so I feel no need to back-patch. Richard Guo and Tom Lane Discussion: https://postgr.es/m/18247-11ac477f02954422@postgresql.org	2023-12-19 11:12:16 -05:00
Peter Eisentraut	2a607fb822	Update comment for Cardinality typedef Author: Richard Guo <guofenglinux@gmail.com> Discussion: https://www.postgresql.org/message-id/flat/CAMbWs4-Zd7Yy80RL1NdskLLo-wz6QoqsbC5TKs%3D3yZxG3BT_aA%40mail.gmail.com	2023-12-19 14:58:47 +01:00
Heikki Linnakangas	3c080fb4fa	Simplify newNode() by removing special cases - Remove MemoryContextAllocZeroAligned(). It was supposed to be a faster version of MemoryContextAllocZero(), but modern compilers turn the MemSetLoop() into a call to memset() anyway, making it more or less identical to MemoryContextAllocZero(). That was the only user of MemSetTest, MemSetLoop, so remove those too, as well as palloc0fast(). - Convert newNode() to a static inline function. When this was originally originally written, it was written as a macro because testing showed that gcc didn't inline the size check as we intended. Modern compiler versions do, and now that it just calls palloc0() there is no size-check to inline anyway. One nice effect is that the palloc0() takes one less argument than MemoryContextAllocZeroAligned(), which saves a few instructions in the callers of newNode(). Reviewed-by: Peter Eisentraut, Tom Lane, John Naylor, Thomas Munro Discussion: https://www.postgresql.org/message-id/b51f1fa7-7e6a-4ecc-936d-90a8a1659e7c@iki.fi	2023-12-19 12:11:47 +02:00
Tom Lane	8b965c549d	compute_bitmap_pages' loop_count parameter should be double not int. The value was double in the original implementation of this logic. Commit da08a6598 pulled it out into a subroutine, but carelessly declared the parameter as int when it should have been double. On most platforms, the only ill effect would be to clamp the value to be not more than INT_MAX, which would seldom be exceeded and probably wouldn't change the estimates too much anyway. Nonetheless, it's wrong and can cause complaints from ubsan. While here, improve the comments and parameter names. This is an ABI change in a globally exposed subroutine, so back-patching would create some risk of breaking extensions. The value of the fix doesn't seem high enough to warrant taking that risk, so fix in HEAD only. Per report from Alexander Lakhin. Discussion: https://postgr.es/m/f5e15fe1-202d-1936-f47c-f0c69a936b72@gmail.com	2023-12-18 12:46:10 -05:00
Nathan Bossart	64b1fb5f03	Optimize pg_atomic_exchange_u32 and pg_atomic_exchange_u64. Presently, all platforms implement atomic exchanges by performing an atomic compare-and-swap in a loop until it succeeds. This can be especially expensive when there is contention on the atomic variable. This commit optimizes atomic exchanges on many platforms by using compiler intrinsics, which should compile into something much less expensive than a compare-and-swap loop. Since these intrinsics have been available for some time, the inline assembly implementations are omitted. Suggested-by: Andres Freund Reviewed-by: Andres Freund Discussion: https://postgr.es/m/20231129212905.GA1258737%40nathanxps13	2023-12-18 10:53:32 -06:00
Thomas Munro	4908c58720	Provide vectored variants of smgrread() and smgrwrite(). smgrreadv() and smgrwritev() and their md.c implementations call FileReadV() and FileWriteV(). A range of disk blocks beginning at 'blocknum' and extending for 'nblocks' can be scattered to or gathered from multiple buffers with a single system call. The traditional smgrread() and smgrwrite() functions are implemented in terms of the new functions. Later commits will introduce calls with nblocks > 1, but the following behavioral changes can be seen already: * After a short transfer we'll now retry until we eventually read 0 bytes (= EOF) or get ENOSPC, EDQUOT, EFBIG etc, where previously we would infer the reason. Retrying is consistent with xlog.c's treatment of large WAL writes, and arguably also xlog.c and fd.c's treatment of EINTR. Arbitrary short returns for larger transfers have been observed on several OSes, and might in theory also happen for transient reasons with our own pg_pv() fallback code. After unexpected EOF or -1, the error thrown now talks about a range even for the single block case, eg "blocks 42..42". Reviewed-by: Heikki Linnakangas <hlinnaka@iki.fi> Discussion: https://postgr.es/m/CA+hUKGJkOiOCa+mag4BF+zHo7qo=o9CFheB8=g6uT5TUm2gkvA@mail.gmail.com	2023-12-18 15:01:50 +13:00
Michael Paquier	3c9d9acae0	Refactor pgstat_prepare_io_time() with an input argument instead of a GUC Originally, this routine relied on track_io_timing to check if a time interval for an I/O operation stored in pg_stat_io should be initialized or not. However, the addition of WAL statistics to pg_stat_io requires that the initialization happens when track_wal_io_timing is enabled, which is dependent on the code path where the I/O operation happens. Author: Nazir Bilal Yavuz Discussion: https://postgr.es/m/CAN55FZ3AiQ+ZMxUuXnBpd0Rrh1YhwJ5FudkHg=JU0P+-W8T4Vg@mail.gmail.com	2023-12-16 20:16:20 +01:00
Alvaro Herrera	a6be0600ac	Remove useless LIMIT_OPTION_DEFAULT value from LimitOption During the development that led to commit 357889eb17bb, for a time we had the value LIMIT_OPTION_DEFAULT, which was mostly but not completely removed later on, before commit. Complete the removal now. Author: Zhang Mingli <avamingli@gmail.com> Discussion: https://postgr.es/m/59d61a1a-3858-475a-964f-24468c97cc67@Spark	2023-12-16 18:20:03 +01:00
Thomas Munro	b485ad7f07	Provide multi-block smgrprefetch(). Previously smgrprefetch() could issue POSIX_FADV_WILLNEED advice for a single block at a time. Add an nblocks argument so that we can do the same for a range of blocks. This usually produces a single system call, but might need to loop if it crosses a segment boundary. Initially it is only called with nblocks == 1, but proposed patches will make wider calls. Reviewed-by: Heikki Linnakangas <hlinnaka@iki.fi> (earlier version) Discussion: https://postgr.es/m/CA+hUKGJkOiOCa+mag4BF+zHo7qo=o9CFheB8=g6uT5TUm2gkvA@mail.gmail.com	2023-12-16 17:51:21 +13:00
Tom Lane	59bd34c2fa	Fix bugs in manipulation of large objects. In v16 and up (since commit afbfc0298), large object ownership checking has been broken because object_ownercheck() didn't take care of the discrepancy between our object-address representation of large objects (classId == LargeObjectRelationId) and the catalog where their ownership info is actually stored (LargeObjectMetadataRelationId). This resulted in failures such as "unrecognized class ID: 2613" when trying to update blob properties as a non-superuser. Poking around for related bugs, I found that AlterObjectOwner_internal would pass the wrong classId to the PostAlterHook in the no-op code path where the large object already has the desired owner. Also, recordExtObjInitPriv checked for the wrong classId; that bug is only latent because the stanza is dead code anyway, but as long as we're carrying it around it should be less wrong. These bugs are quite old. In HEAD, we can reduce the scope for future bugs of this ilk by changing AlterObjectOwner_internal's API to let the translation happen inside that function, rather than requiring callers to know about it. A more bulletproof fix, perhaps, would be to start using LargeObjectMetadataRelationId as the dependency and object-address classId for blobs. However that has substantial risk of breaking third-party code; even within our own code, it'd create hassles for pg_dump which would have to cope with a version-dependent representation. For now, keep the status quo. Discussion: https://postgr.es/m/2650449.1702497209@sss.pgh.pa.us	2023-12-15 13:55:05 -05:00
Thomas Munro	871fe4917e	Provide vectored variants of FileRead() and FileWrite(). FileReadV() and FileWriteV() adapt pg_preadv() and pg_pwritev() for fd.c's virtual file descriptors. The simple FileRead() and FileWrite() functions are now implemented in terms of the vectored functions, to avoid code duplication, and they are converted back to the corresponding simple system calls further down (commit 15c9ac36). Later work will make more interesting multi-iovec calls. The traditional behavior of reporting a "fake" ENOSPC error is simplified. It's now always set for non-failing writes, for the benefit of callers that expect to log a meaningful "%m" if they determine that the write was short. (Perhaps we should consider getting rid of that expectation one day.) Reviewed-by: Heikki Linnakangas <hlinnaka@iki.fi> Discussion: https://postgr.es/m/CA+hUKGJkOiOCa+mag4BF+zHo7qo=o9CFheB8=g6uT5TUm2gkvA@mail.gmail.com	2023-12-12 13:12:43 +13:00
Thomas Munro	0c6be59f5e	Provide helper for retrying partial vectored I/O. compute_remaining_iovec() is a re-usable routine for retrying after pg_readv() or pg_writev() reports a short transfer. This will gain new users in a later commit, but can already replace the open-coded equivalent code in the existing pg_pwritev_with_retry() function. Reviewed-by: Heikki Linnakangas <hlinnaka@iki.fi> Discussion: https://postgr.es/m/CA+hUKGJkOiOCa+mag4BF+zHo7qo=o9CFheB8=g6uT5TUm2gkvA@mail.gmail.com	2023-12-12 10:57:18 +13:00
Thomas Munro	baf7c93ed5	Define unconstify() and unvolatize() for C++. These two macros wouldn't work if used in an inline function definition in a header seen by g++, because __builtin_types_compatible_p is only available in C. Redirect to standard C++ const_cast (which also adds/removes volatile despite its name). Per cpluspluscheck failure in a development branch. Suggested-by: Peter Eisentraut <peter@eisentraut.org> Discussion: https://postgr.es/m/CA%2BhUKGK3OXFjkOyZiw-DgL2bUqk9by1uGuCnViJX786W%2BfyDSw%40mail.gmail.com	2023-12-12 09:46:46 +13:00
Alvaro Herrera	d3fe6e90ba	Simplify productions for FORMAT JSON [ ENCODING name ] This removes the production json_encoding_clause_opt, instead merging it into json_format_clause. Also remove the auxiliary makeJsonEncoding() function. Reviewed-by: Amit Langote <amitlangote09@gmail.com> Discussion: https://postgr.es/m/202312071841.u2gueb5dsrbk%40alvherre.pgsql	2023-12-11 11:55:34 +01:00
Michael Paquier	c7a3e6b46d	Remove trace_recovery_messages This GUC was intended as a debugging help in the 9.0 area when hot standby and streaming replication were being developped, able to offer more information at LOG level rather than DEBUGn. There are more tools available these days that are able to offer rather equivalent information, like pg_waldump introduced in 9.3. It is not obvious how this facility is useful these days, so let's remove it. Author: Bharath Rupireddy Discussion: https://postgr.es/m/ZXEXEAUVFrvpquSd@paquier.xyz	2023-12-11 11:49:02 +01:00
Peter Eisentraut	90834ceccd	Remove some unnecessary includes of "access/xlog_internal.h" There were a few places where access/xlog_internal.h was apparently included unnecessarily. In some of those places, a more specific header file (that somehow came in via access/xlog_internal.h) can be used instead. Reviewed-by: Heikki Linnakangas <hlinnaka@iki.fi> Discussion: https://www.postgresql.org/message-id/flat/a56a6eec-eb14-471b-9570-3cac23603964%40eisentraut.org	2023-12-10 07:46:06 +01:00
Jeff Davis	867dd2dc87	Cache opaque handle for GUC option to avoid repeasted lookups. When setting GUCs from proconfig, performance is important, and hash lookups in the GUC table are significant. Per suggestion from Robert Haas. Discussion: https://postgr.es/m/CA+TgmoYpKxhR3HOD9syK2XwcAUVPa0+ba0XPnwWBcYxtKLkyxA@mail.gmail.com Reviewed-by: John Naylor	2023-12-08 11:16:01 -08:00
Peter Geoghegan	c9c0589fda	Optimize nbtree backward scan boundary cases. Teach _bt_binsrch (and related helper routines like _bt_search and _bt_compare) about the initial positioning requirements of backward scans. Routines like _bt_binsrch already know all about "nextkey" searches, so it seems natural to teach them about "goback"/backward searches, too. These concepts are closely related, and are much easier to understand when discussed together. Now that certain implementation details are hidden from _bt_first, it's straightforward to add a new optimization: backward scans using the < strategy now avoid extra leaf page accesses in certain "boundary cases". Consider the following example, which uses the tenk1 table (and its tenk1_hundred index) from the standard regression tests: SELECT * FROM tenk1 WHERE hundred < 12 ORDER BY hundred DESC LIMIT 1; Before this commit, nbtree would scan two leaf pages, even though it was only really necessary to scan one leaf page. We'll now descend straight to the leaf page containing a (12, -inf) high key instead. The scan will locate matching non-pivot tuples with "hundred" values starting from the value 11. The scan won't waste a page access on the right sibling leaf page, which cannot possibly contain any matching tuples. You can think of the optimization added by this commit as disabling an optimization (the _bt_compare "!pivotsearch" behavior that was added to Postgres 12 in commit dd299df8) for a small subset of cases where it was always counterproductive. Equivalently, you can think of the new optimization as extending the "pivotsearch" behavior that page deletion by VACUUM has long required (since the aforementioned Postgres 12 commit went in) to other, similar cases. Obviously, this isn't strictly necessary for these new cases (unlike VACUUM, _bt_first is prepared to move the scan to the left once on the leaf level), but the underlying principle is the same. Author: Peter Geoghegan <pg@bowt.ie> Reviewed-By: Matthias van de Meent <boekewurm+postgres@gmail.com> Discussion: https://postgr.es/m/CAH2-Wz=XPzM8HzaLPq278Vms420mVSHfgs9wi5tjFKHcapZCEw@mail.gmail.com	2023-12-08 11:05:17 -08:00
Tomas Vondra	b437571714	Allow parallel CREATE INDEX for BRIN indexes Allow using multiple worker processes to build BRIN index, which until now was supported only for BTREE indexes. For large tables this often results in significant speedup when the build is CPU-bound. The work is split in a simple way - each worker builds BRIN summaries on a subset of the table, determined by the regular parallel scan used to read the data, and feeds them into a shared tuplesort which sorts them by blkno (start of the range). The leader then reads this sorted stream of ranges, merges duplicates (which may happen if the parallel scan does not align with BRIN pages_per_range), and adds the resulting ranges into the index. The number of duplicate results produced by workers (requiring merging in the leader process) should be fairly small, thanks to how parallel scans assign chunks to workers. The likelihood of duplicate results may increase for higher pages_per_range values, but then there are fewer page ranges in total. In any case, we expect the merging to be much cheaper than summarization, so this should be a win. Most of the parallelism infrastructure is a simplified copy of the code used by BTREE indexes, omitting the parts irrelevant for BRIN indexes (e.g. uniqueness checks). This also introduces a new index AM flag amcanbuildparallel, determining whether to attempt to start parallel workers for the index build. Original patch by me, with reviews and substantial reworks by Matthias van de Meent, certainly enough to make him a co-author. Author: Tomas Vondra, Matthias van de Meent Reviewed-by: Matthias van de Meent Discussion: https://postgr.es/m/c2ee7d69-ce17-43f2-d1a0-9811edbda6e6%40enterprisedb.com	2023-12-08 18:15:26 +01:00
Heikki Linnakangas	b31ba5310b	Rename ShmemVariableCache to TransamVariables The old name was misleading: It's not a cache, the values kept in the struct are the authoritative source. Reviewed-by: Tristan Partin, Richard Guo Discussion: https://www.postgresql.org/message-id/6537d63d-4bb5-46f8-9b5d-73a8ba4720ab@iki.fi	2023-12-08 09:47:15 +02:00
Heikki Linnakangas	15916ffb04	Initialize ShmemVariableCache like other shmem areas For sake of consistency. Reviewed-by: Tristan Partin, Richard Guo Discussion: https://www.postgresql.org/message-id/6537d63d-4bb5-46f8-9b5d-73a8ba4720ab@iki.fi	2023-12-08 09:46:59 +02:00
Jeff Davis	719b342d36	Shrink Unicode category table. Missing entries can implicitly be considered "unassigned". Discussion: https://postgr.es/m/ff4c2f2f9c8fc7ca27c1c24ae37ecaeaeaff6b53.camel@j-davis.com	2023-12-07 15:44:03 -08:00
David Rowley	d16a0c1e2e	Verify that attribute counts match in ExecCopySlot tts_virtual_copyslot() contained an Assert that checked that the srcslot contained <= attributes than the dstslot. This seems to be backwards as if the srcslot contained fewer attributes then the dstslot could be left with stale Datum values from the previously stored tuple. It might make more sense to allow the source to contain additional attributes and only copy the leading ones that also exist in the destination, however, that's not what we're doing here. Here we just remove the Assert from tts_virtual_copyslot() and add an Assert to ExecCopySlot() to verify the attribute counts match. There does not seem to be any places where the destination contains fewer attributes, so instead of going to the trouble of making the code properly handle this, let's just ensure the attribute counts match. If this Assert fails then that will indicate that we do have cases that require us to handle the srcslot with more attributes than the dstslot. It seems better to only write this code if there's a genuine requirement for it rather than write it now only to leave it untested. Thanks to Andres Freund for helping with the analysis of this. Discussion: https://postgr.es/m/CAApHDvpMAvBL0T+TRORquyx1iqFQKMVTXtqNocOw0Pa2uh1heg@mail.gmail.com	2023-12-07 21:28:24 +13:00
Amit Kapila	0bf62460bb	Fix issues in binary_upgrade_logical_slot_has_caught_up(). The commit 29d0a77fa6 labelled binary_upgrade_logical_slot_has_caught_up() as a non-strict function to allow providing a better error message to callers in case the passed slot_name is NULL. On further discussion, it seems that it is not helpful to have a different error message for NULL input in this function, so this patch marks the function as strict. This patch also removes the explicit permission check to use replication slots as this function is invoked only by superusers and instead adds an Assert. Reported-by: Masahiko Sawada Author: Hayato Kuroda Reviewed-by: Vignesh C Discussion: https://postgr.es/m/CAD21AoDSyiBKkMXBxN_gUayZZUCOgyHnG8Ge8rcPXNP3Tf6B4g@mail.gmail.com	2023-12-07 08:42:48 +05:30
Heikki Linnakangas	e7c6efe305	Remove now-unnecessary Autovacuum[Launcher\|Worker]IAm functions After commit fd5e8b440d, InitProcess() is called later in the EXEC_BACKEND startup sequence, so it's enough to set the am_autovacuum_[launcher\|worker] variables at the same place as in the !EXEC_BACKEND case.	2023-12-04 15:34:37 +02:00
Peter Eisentraut	457428d9e9	Remove unnecessary include of <math.h> This was probably never necessary. (The header used to use random(), but that shouldn't require <math.h> either. In any case, that's gone, too.) Reviewed-by: Shubham Khanna <Shubham.Khanna@fujitsu.com> Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us> Discussion: https://www.postgresql.org/message-id/flat/cff5475d-e0a9-4561-b094-794aa36bd031%40eisentraut.org	2023-12-04 06:35:22 +01:00
Peter Eisentraut	da67cb0a44	Remove unnecessary include of <sys/socket.h> This was put here as part of a mechanical replacement of the old "getaddrinfo.h" with <netdb.h> plus <sys/socket.h> (commit 5579388d2d). But here, we only need netdb.h (for NI_MAXHOST), not sys/socket.h. Reviewed-by: Shubham Khanna <Shubham.Khanna@fujitsu.com> Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us> Discussion: https://www.postgresql.org/message-id/flat/cff5475d-e0a9-4561-b094-794aa36bd031%40eisentraut.org	2023-12-04 06:35:22 +01:00
Peter Eisentraut	dffb2b478f	Remove unnecessary includes of <signal.h> These were once needed for sig_atomic_t, but that no longer appears in these headers, so the include is not needed. Reviewed-by: Shubham Khanna <Shubham.Khanna@fujitsu.com> Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us> Discussion: https://www.postgresql.org/message-id/flat/cff5475d-e0a9-4561-b094-794aa36bd031%40eisentraut.org	2023-12-04 06:35:22 +01:00
Michael Paquier	f21848de20	Add support for REINDEX in event triggers This commit adds support for REINDEX in event triggers, making this command react for the events ddl_command_start and ddl_command_end. The indexes rebuilt are collected with the ReindexStmt emitted by the caller, for the concurrent and non-concurrent paths. Thanks to that, it is possible to know a full list of the indexes that a single REINDEX command has worked on. Author: Garrett Thornburg, Jian He Reviewed-by: Jim Jones, Michael Paquier Discussion: https://postgr.es/m/CAEEqfk5bm32G7sbhzHbES9WejD8O8DCEOaLkxoBP7HNWxjPpvg@mail.gmail.com	2023-12-04 09:53:49 +09:00
Heikki Linnakangas	388491f1e5	Pass BackgroundWorker entry in the parameter file in EXEC_BACKEND mode The BackgroundWorker struct is now passed the same way as the Port struct. Seems more consistent. This makes it possible to move InitProcess later in SubPostmasterMain (in next commit), as we no longer need to access shared memory to read background worker entry. Reviewed-by: Tristan Partin, Andres Freund, Alexander Lakhin Discussion: https://www.postgresql.org/message-id/7a59b073-5b5b-151e-7ed3-8b01ff7ce9ef@iki.fi	2023-12-03 16:38:54 +02:00
Heikki Linnakangas	69d903367c	Refactor CreateSharedMemoryAndSemaphores For clarity, have separate functions for creating the shared memory and semaphores at postmaster or single-user backend startup, and for attaching to existing shared memory structures in EXEC_BACKEND case. CreateSharedMemoryAndSemaphores() is now called only at postmaster startup, and a new AttachSharedMemoryStructs() function is called at backend startup in EXEC_BACKEND mode. Reviewed-by: Tristan Partin, Andres Freund Discussion: https://www.postgresql.org/message-id/7a59b073-5b5b-151e-7ed3-8b01ff7ce9ef@iki.fi	2023-12-03 16:09:42 +02:00
Thomas Munro	15c9ac3629	Optimize pg_readv/pg_pwritev single vector case. For the trivial case of iovcnt == 1, kernels are measurably slower at dealing with the more complex arguments of preadv/pwritev than the equivalent plain old pread/pwrite. The overheads are worth it for iovcnt > 1, but for 1 let's just redirect to the cheaper calls. While we could leave it to callers to worry about that, we already have to have our own pg_ wrappers for portability reasons so it seems reasonable to centralize this knowledge there (thanks to Heikki for this suggestion). Try to avoid function call overheads by making them inlinable, which might also allow the compiler to avoid the branch in some cases. For systems that don't have preadv and pwritev (currently: Windows and [closed] Solaris), we might as well pull the replacement functions up into the static inline functions too. Reviewed-by: Heikki Linnakangas <hlinnaka@iki.fi> Discussion: https://postgr.es/m/CA+hUKGJkOiOCa+mag4BF+zHo7qo=o9CFheB8=g6uT5TUm2gkvA@mail.gmail.com	2023-11-29 17:19:25 +13:00
Alexander Korotkov	2cdf131c46	Use larger segment file names for pg_notify This avoids the wraparound in async.c and removes the corresponding code complexity. The maximum amount of allocated SLRU pages for NOTIFY / LISTEN queue is now determined by the max_notify_queue_pages GUC. The default value is 1048576. It allows to consume up to 8 GB of disk space which is exactly the limit we had previously. Author: Maxim Orlov, Aleksander Alekseev, Alexander Korotkov, Teodor Sigaev Author: Nikita Glukhov, Pavel Borisov, Yura Sokolov Reviewed-by: Jacob Champion, Heikki Linnakangas, Alexander Korotkov Reviewed-by: Japin Li, Pavel Borisov, Tom Lane, Peter Eisentraut, Andres Freund Reviewed-by: Andrey Borodin, Dilip Kumar, Aleksander Alekseev Discussion: https://postgr.es/m/CACG%3DezZe1NQSCnfHOr78AtAZxJZeCvxrts0ygrxYwe%3DpyyjVWA%40mail.gmail.com Discussion: https://postgr.es/m/CAJ7c6TPDOYBYrnCAeyndkBktO0WG2xSdYduTF0nxq%2BvfkmTF5Q%40mail.gmail.com	2023-11-29 01:41:48 +02:00
Alexander Korotkov	4ed8f0913b	Index SLRUs by 64-bit integers rather than by 32-bit integers We've had repeated bugs in the area of handling SLRU wraparound in the past, some of which have caused data loss. Switching to an indexing system for SLRUs that does not wrap around should allow us to get rid of a whole bunch of problems and improve the overall reliability of the system. This particular patch however only changes the indexing and doesn't address the wraparound per se. This is going to be done in the following patches. Author: Maxim Orlov, Aleksander Alekseev, Alexander Korotkov, Teodor Sigaev Author: Nikita Glukhov, Pavel Borisov, Yura Sokolov Reviewed-by: Jacob Champion, Heikki Linnakangas, Alexander Korotkov Reviewed-by: Japin Li, Pavel Borisov, Tom Lane, Peter Eisentraut, Andres Freund Reviewed-by: Andrey Borodin, Dilip Kumar, Aleksander Alekseev Discussion: https://postgr.es/m/CACG%3DezZe1NQSCnfHOr78AtAZxJZeCvxrts0ygrxYwe%3DpyyjVWA%40mail.gmail.com Discussion: https://postgr.es/m/CAJ7c6TPDOYBYrnCAeyndkBktO0WG2xSdYduTF0nxq%2BvfkmTF5Q%40mail.gmail.com	2023-11-29 01:40:56 +02:00

1 2 3 4 5 ...

11213 Commits