postgresql

mirror of https://git.postgresql.org/git/postgresql.git synced 2026-02-17 11:57:00 +08:00

Author	SHA1	Message	Date
Heikki Linnakangas	3c0fd64fec	Split ProcSleep function into JoinWaitQueue and ProcSleep Split ProcSleep into two functions: JoinWaitQueue and ProcSleep. JoinWaitQueue is called while holding the partition lock, and inserts the current process to the wait queue, while ProcSleep() does the actual sleeping. ProcSleep() is now called without holding the partition lock, and it no longer re-acquires the partition lock before returning. That makes the wakeup a little cheaper. Once upon a time, re-acquiring the partition lock was needed to prevent a signal handler from longjmping out at a bad time, but these days our signal handlers just set flags, and longjmping can only happen at points where we explicitly run CHECK_FOR_INTERRUPTS(). If JoinWaitQueue detects an "early deadlock" before even joining the wait queue, it returns without changing the shared lock entry, leaving the cleanup of the shared lock entry to the caller. This makes the handling of an early deadlock the same as the dontWait=true case. One small user-visible side-effect of this refactoring is that we now only set the 'ps' title to say "waiting" when we actually enter the sleep, not when the lock is skipped because dontWait=true, or when a deadlock is detected early before entering the sleep. This eliminates the 'lockAwaited' global variable in proc.c, which was largely redundant with 'awaitedLock' in lock.c Note: Updating the local lock table is now the caller's responsibility. JoinWaitQueue and ProcSleep are now only responsible for modifying the shared state. Seems a little nicer that way. Based on Thomas Munro's earlier patch and observation that ProcSleep doesn't really need to re-acquire the partition lock. Reviewed-by: Maxim Orlov Discussion: https://www.postgresql.org/message-id/7c2090cd-a72a-4e34-afaa-6dd2ef31440e@iki.fi	2024-11-04 17:59:24 +02:00
Heikki Linnakangas	a9c546a5a3	Use ProcNumbers instead of direct Latch pointers to address other procs This is in preparation for replacing Latches with a new abstraction. That's still work in progress, but this seems a little tidier anyway, so let's get this refactoring out of the way already. Discussion: https://www.postgresql.org/message-id/391abe21-413e-4d91-a650-b663af49500c%40iki.fi	2024-11-01 13:47:20 +02:00
Noah Misch	243e9b40f1	For inplace update, send nontransactional invalidations. The inplace update survives ROLLBACK. The inval didn't, so another backend's DDL could then update the row without incorporating the inplace update. In the test this fixes, a mix of CREATE INDEX and ALTER TABLE resulted in a table with an index, yet relhasindex=f. That is a source of index corruption. Back-patch to v12 (all supported versions). The back branch versions don't change WAL, because those branches just added end-of-recovery SIResetAll(). All branches change the ABI of extern function PrepareToInvalidateCacheTuple(). No PGXN extension calls that, and there's no apparent use case in extensions. Reviewed by Nitin Motiani and (in earlier versions) Andres Freund. Discussion: https://postgr.es/m/20240523000548.58.nmisch@google.com	2024-10-25 06:51:02 -07:00
Andres Freund	755a4c10d1	bufmgr/smgr: Don't cross segment boundaries in StartReadBuffers() With real AIO it doesn't make sense to cross segment boundaries with one IO. Add smgrmaxcombine() to allow upper layers to query which buffers can be merged. We could continue to cross segment boundaries when not using AIO, but it doesn't really make sense, because md.c will never be able to perform the read across the segment boundary in one system call. Which means we'll mark more buffers as undergoing IO than really makes sense - if another backend desires to read the same blocks, it'll be blocked longer than necessary. So it seems better to just never cross the boundary. Reviewed-by: Heikki Linnakangas <hlinnaka@iki.fi> Reviewed-by: Noah Misch <noah@leadboat.com> Discussion: https://postgr.es/m/1f6b50a7-38ef-4d87-8246-786d39f46ab9@iki.fi	2024-10-08 11:37:45 -04:00
Heikki Linnakangas	2bbc261ddb	Use an shmem_exit callback to remove backend from PMChildFlags on exit This seems nicer than having to duplicate the logic between InitProcess() and ProcKill() for which child processes have a PMChildFlags slot. Move the MarkPostmasterChildActive() call earlier in InitProcess(), out of the section protected by the spinlock. Reviewed-by: Andres Freund <andres@anarazel.de> Discussion: https://www.postgresql.org/message-id/a102f15f-eac4-4ff2-af02-f9ff209ec66f@iki.fi	2024-10-08 15:06:34 +03:00
Heikki Linnakangas	094ae07160	Remove unneeded #include Unneeded since commit d72731a704. Discussion: https://www.postgresql.org/message-id/391abe21-413e-4d91-a650-b663af49500c@iki.fi	2024-10-05 15:09:32 +03:00
Peter Eisentraut	10b721821d	Use macro to define the number of enum values Refactoring in the interest of code consistency, a follow-up to 2e068db56e31. The argument against inserting a special enum value at the end of the enum definition is that a switch statement might generate a compiler warning unless it has a default clause. Aleksander Alekseev, reviewed by Michael Paquier, Dean Rasheed, Peter Eisentraut Discussion: https://postgr.es/m/CAJ7c6TMsiaV5urU_Pq6zJ2tXPDwk69-NKVh4AMN5XrRiM7N%2BGA%40mail.gmail.com	2024-10-01 09:30:24 -04:00
Noah Misch	aac2c9b4fd	For inplace update durability, make heap_update() callers wait. The previous commit fixed some ways of losing an inplace update. It remained possible to lose one when a backend working toward a heap_update() copied a tuple into memory just before inplace update of that tuple. In catalogs eligible for inplace update, use LOCKTAG_TUPLE to govern admission to the steps of copying an old tuple, modifying it, and issuing heap_update(). This includes MERGE commands. To avoid changing most of the pg_class DDL, don't require LOCKTAG_TUPLE when holding a relation lock sufficient to exclude inplace updaters. Back-patch to v12 (all supported versions). In v13 and v12, "UPDATE pg_class" or "UPDATE pg_database" can still lose an inplace update. The v14+ UPDATE fix needs commit 86dc90056dfdbd9d1b891718d2e5614e3e432f35, and it wasn't worth reimplementing that fix without such infrastructure. Reviewed by Nitin Motiani and (in earlier versions) Heikki Linnakangas. Discussion: https://postgr.es/m/20231027214946.79.nmisch@google.com	2024-09-24 15:25:18 -07:00
Tomas Vondra	c4d5cb71d2	Increase the number of fast-path lock slots Replace the fixed-size array of fast-path locks with arrays, sized on startup based on max_locks_per_transaction. This allows using fast-path locking for workloads that need more locks. The fast-path locking introduced in 9.2 allowed each backend to acquire a small number (16) of weak relation locks cheaply. If a backend needs to hold more locks, it has to insert them into the shared lock table. This is considerably more expensive, and may be subject to contention (especially on many-core systems). The limit of 16 fast-path locks was always rather low, because we have to lock all relations - not just tables, but also indexes, views, etc. For planning we need to lock all relations that might be used in the plan, not just those that actually get used in the final plan. So even with rather simple queries and schemas, we often need significantly more than 16 locks. As partitioning gets used more widely, and the number of partitions increases, this limit is trivial to hit. Complex queries may easily use hundreds or even thousands of locks. For workloads doing a lot of I/O this is not noticeable, but for workloads accessing only data in RAM, the access to the shared lock table may be a serious issue. This commit removes the hard-coded limit of the number of fast-path locks. Instead, the size of the fast-path arrays is calculated at startup, and can be set much higher than the original 16-lock limit. The overall fast-path locking protocol remains unchanged. The variable-sized fast-path arrays can no longer be part of PGPROC, but are allocated as a separate chunk of shared memory and then references from the PGPROC entries. The fast-path slots are organized as a 16-way set associative cache. You can imagine it as a hash table of 16-slot "groups". Each relation is mapped to exactly one group using hash(relid), and the group is then processed using linear search, just like the original fast-path cache. With only 16 entries this is cheap, with good locality. Treating this as a simple hash table with open addressing would not be efficient, especially once the hash table gets almost full. The usual remedy is to grow the table, but we can't do that here easily. The access would also be more random, with worse locality. The fast-path arrays are sized using the max_locks_per_transaction GUC. We try to have enough capacity for the number of locks specified in the GUC, using the traditional 2^n formula, with an upper limit of 1024 lock groups (i.e. 16k locks). The default value of max_locks_per_transaction is 64, which means those instances will have 64 fast-path slots. The main purpose of the max_locks_per_transaction GUC is to size the shared lock table. It is often set to the "average" number of locks needed by backends, with some backends using significantly more locks. This should not be a major issue, however. Some backens may have to insert locks into the shared lock table, but there can't be too many of them, limiting the contention. The only solution is to increase the GUC, even if the shared lock table already has sufficient capacity. That is not free, especially in terms of memory usage (the shared lock table entries are fairly large). It should only happen on machines with plenty of memory, though. In the future we may consider a separate GUC for the number of fast-path slots, but let's try without one first. Reviewed-by: Robert Haas, Jakub Wartak Discussion: https://postgr.es/m/510b887e-c0ce-4a0c-a17a-2c6abb8d9a5c@enterprisedb.com	2024-09-21 20:09:35 +02:00
Thomas Munro	70d38e3d8a	Allow ReadStream to be consumed as raw block numbers. Commits 041b9680 and 6377e12a changed the interface of scan_analyze_next_block() to take a ReadStream instead of a BlockNumber and a BufferAccessStrategy, and to return a value to indicate when the stream has run out of blocks. This caused integration problems for at least one known extension that uses specially encoded BlockNumber values that map to different underlying storage, because acquire_sample_rows() sets up the stream so that read_stream_next_buffer() reads blocks from the main fork of the relation's SMgrRelation. Provide read_stream_next_block(), as a way for such an extension to access the stream of raw BlockNumbers directly and forward them to its own ReadBuffer() calls after decoding, as it could in earlier releases. The new function returns the BlockNumber and BufferAccessStrategy that were previously passed directly to scan_analyze_next_block(). Alternatively, an extension could wrap the stream of BlockNumbers in another ReadStream with a callback that performs any decoding required to arrive at real storage manager BlockNumber values, so that it could benefit from the I/O combining and concurrency provided by read_stream.c. Another class of table access method that does nothing in scan_analyze_next_block() because it is not block-oriented could use this function to control the number of block sampling loops. It could match the previous behavior with "return read_stream_next_block(stream, &bas) != InvalidBlockNumber". Ongoing work is expected to provide better ANALYZE support for table access methods that don't behave like heapam with respect to storage blocks, but that will be for future releases. Back-patch to 17. Reported-by: Mats Kindahl <mats@timescale.com> Reviewed-by: Mats Kindahl <mats@timescale.com> Discussion: https://postgr.es/m/CA%2B14425%2BCcm07ocG97Fp%2BFrD9xUXqmBKFvecp0p%2BgV2YYR258Q%40mail.gmail.com	2024-09-18 11:34:28 +12:00
Noah Misch	c582b75851	Add block_range_read_stream_cb(), to deduplicate code. This replaces two functions for iterating over all blocks in a range. A pending patch will use this instead of adding a third. Nazir Bilal Yavuz Discussion: https://postgr.es/m/20240820184742.f2.nmisch@google.com	2024-09-03 10:46:20 -07:00
Peter Eisentraut	2b5f57977f	Add const qualifiers to XLogRegister*() functions Add const qualifiers to XLogRegisterData() and XLogRegisterBufData(). Several unconstify() calls can be removed. Reviewed-by: Aleksander Alekseev <aleksander@timescale.com> Discussion: https://www.postgresql.org/message-id/dd889784-9ce7-436a-b4f1-52e4a5e577bd@eisentraut.org	2024-09-03 08:06:03 +02:00
Michael Paquier	4236825197	Fix typos and grammar in code comments and docs Author: Alexander Lakhin Discussion: https://postgr.es/m/f7e514cf-2446-21f1-a5d2-8c089a6e2168@gmail.com	2024-09-03 14:49:04 +09:00
Heikki Linnakangas	478846e768	Rename some shared memory initialization routines To make them follow the usual naming convention where FoobarShmemSize() calculates the amount of shared memory needed by Foobar subsystem, and FoobarShmemInit() performs the initialization. I didn't rename CreateLWLocks() and InitShmmeIndex(), because they are a little special. They need to be called before any of the other ShmemInit() functions, because they set up the shared memory bookkeeping itself. I also didn't rename InitProcGlobal(), because unlike other Shmeminit functions, it's not called by individual backends. Reviewed-by: Andreas Karlsson Discussion: https://www.postgresql.org/message-id/c09694ff-2453-47e5-b26c-32a16cd75ce6@iki.fi	2024-08-29 09:46:21 +03:00
Heikki Linnakangas	fbce7dfc77	Refactor lock manager initialization to make it a bit less special Split the shared and local initialization to separate functions, and follow the common naming conventions. With this, we no longer create the LockMethodLocalHash hash table in the postmaster process, which was always pointless. Reviewed-by: Andreas Karlsson Discussion: https://www.postgresql.org/message-id/c09694ff-2453-47e5-b26c-32a16cd75ce6@iki.fi	2024-08-29 09:46:06 +03:00
Peter Eisentraut	5304fec4d8	Apply PGDLLIMPORT markings to some GUC variables According to the commit message in 8ec569479, we must have all variables in header files marked with PGDLLIMPORT. In commit d3cc5ffe81f6 some variables were moved from launch_backend.c file to several header files. This adds PGDLLIMPORT to moved variables. Author: Sofia Kopikova <s.kopikova@postgrespro.ru> Reviewed-by: Robert Haas <robertmhaas@gmail.com> Discussion: https://www.postgresql.org/message-id/flat/e0b17014-5319-4dd6-91cd-93d9c8fc9539%40postgrespro.ru	2024-08-14 11:36:12 +02:00
Amit Kapila	b5df24e520	Fix typo in bufpage.h. Author: Senglee Choi Reviewed-by: Tender Wang Discussion: https://postgr.es/m/CACUsy79U0=S5zWEf6D57F=vB7rOEa86xFY6oovDZ58jRcROCxQ@mail.gmail.com	2024-08-05 14:38:00 +05:30
Alexander Korotkov	3c5db1d6b0	Implement pg_wal_replay_wait() stored procedure pg_wal_replay_wait() is to be used on standby and specifies waiting for the specific WAL location to be replayed. This option is useful when the user makes some data changes on primary and needs a guarantee to see these changes are on standby. The queue of waiters is stored in the shared memory as an LSN-ordered pairing heap, where the waiter with the nearest LSN stays on the top. During the replay of WAL, waiters whose LSNs have already been replayed are deleted from the shared memory pairing heap and woken up by setting their latches. pg_wal_replay_wait() needs to wait without any snapshot held. Otherwise, the snapshot could prevent the replay of WAL records, implying a kind of self-deadlock. This is why it is only possible to implement pg_wal_replay_wait() as a procedure working without an active snapshot, not a function. Catversion is bumped. Discussion: https://postgr.es/m/eb12f9b03851bb2583adab5df9579b4b%40postgrespro.ru Author: Kartyshov Ivan, Alexander Korotkov Reviewed-by: Michael Paquier, Peter Eisentraut, Dilip Kumar, Amit Kapila Reviewed-by: Alexander Lakhin, Bharath Rupireddy, Euler Taveira Reviewed-by: Heikki Linnakangas, Kyotaro Horiguchi	2024-08-02 21:16:56 +03:00
Thomas Munro	e25626677f	Remove --disable-spinlocks. A later change will require atomic support, so it wouldn't make sense for a hypothetical new system not to be able to implement spinlocks. Reviewed-by: Heikki Linnakangas <hlinnaka@iki.fi> Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us> (concept, not the patch) Reviewed-by: Andres Freund <andres@anarazel.de> (concept, not the patch) Discussion: https://postgr.es/m/3351991.1697728588%40sss.pgh.pa.us	2024-07-30 22:58:37 +12:00
Heikki Linnakangas	9d9b9d46f3	Move cancel key generation to after forking the backend Move responsibility of generating the cancel key to the backend process. The cancel key is now generated after forking, and the backend advertises it in the ProcSignal array. When a cancel request arrives, the backend handling it scans the ProcSignal array to find the target pid and cancel key. This is similar to how this previously worked in the EXEC_BACKEND case with the ShmemBackendArray, just reusing the ProcSignal array. One notable change is that we no longer generate cancellation keys for non-backend processes. We generated them before just to prevent a malicious user from canceling them; the keys for non-backend processes were never actually given to anyone. There is now an explicit flag indicating whether a process has a valid key or not. I wrote this originally in preparation for supporting longer cancel keys, but it's a nice cleanup on its own. Reviewed-by: Jelte Fennema-Nio Discussion: https://www.postgresql.org/message-id/508d0505-8b7a-4864-a681-e7e5edfe32aa@iki.fi	2024-07-29 15:37:48 +03:00
Peter Eisentraut	774d47b6c0	Move all extern declarations for GUC variables to header files Add extern declarations in appropriate header files for global variables related to GUC. In many cases, this was handled quite inconsistently before, with some GUC variables declared in a header file and some only pulled in via ad-hoc extern declarations in various .c files. Also add PGDLLIMPORT qualifications to those variables. These were previously missing because src/tools/mark_pgdllimport.pl has only been used with header files. This also fixes -Wmissing-variable-declarations warnings for GUC variables (not yet part of the standard warning options). Reviewed-by: Andres Freund <andres@anarazel.de> Discussion: https://www.postgresql.org/message-id/flat/e0a62134-83da-4ba4-8cdb-ceb0111c95ce@eisentraut.org	2024-07-24 06:31:07 +02:00
Peter Eisentraut	d3cc5ffe81	Move extern declarations for EXEC_BACKEND to header files This fixes warnings from -Wmissing-variable-declarations (not yet part of the standard warning options) under EXEC_BACKEND. The NON_EXEC_STATIC variables need a suitable declaration in a header file under EXEC_BACKEND. Also fix the inconsistent application of the volatile qualifier for PMSignalState, which was revealed by this change. Reviewed-by: Andres Freund <andres@anarazel.de> Discussion: https://www.postgresql.org/message-id/flat/e0a62134-83da-4ba4-8cdb-ceb0111c95ce@eisentraut.org	2024-07-23 15:07:10 +02:00
Noah Misch	a858be17c3	Add a way to create read stream object by using SMgrRelation. Currently read stream object can be created only by using Relation. Nazir Bilal Yavuz Discussion: https://postgr.es/m/CAN55FZ0JKL6vk1xQp6rfOXiNFV1u1H0tJDPPGHWoiO3ea2Wc=A@mail.gmail.com	2024-07-20 04:22:12 -07:00
Noah Misch	af07a827b9	Refactor PinBufferForBlock() to remove checks about persistence. There are checks in PinBufferForBlock() function to set persistence of the relation. This function is called for each block in the relation. Instead, set persistence of the relation before PinBufferForBlock(). Nazir Bilal Yavuz Discussion: https://postgr.es/m/CAN55FZ0JKL6vk1xQp6rfOXiNFV1u1H0tJDPPGHWoiO3ea2Wc=A@mail.gmail.com	2024-07-20 04:22:12 -07:00
Noah Misch	e00c45f685	Remove "smgr_persistence == 0" dead code. Reaching that code would have required multiple processes performing relation extension during recovery, which does not happen. That caller has the persistence available, so pass it. This was dead code as soon as commit 210622c60e1a9db2e2730140b8106ab57d259d15 added it. Discussion: https://postgr.es/m/CAN55FZ0JKL6vk1xQp6rfOXiNFV1u1H0tJDPPGHWoiO3ea2Wc=A@mail.gmail.com	2024-07-20 04:22:12 -07:00
Heikki Linnakangas	98347b5a3a	Lift limitation that PGPROC->links must be the first field Since commit 5764f611e1, we've been using the ilist.h functions for handling the linked list. There's no need for 'links' to be the first element of the struct anymore, except for one call in InitProcess where we used a straight cast from the 'dlist_node ' to PGPROC , without the dlist_container() macro. That was just an oversight in commit 5764f611e1, fix it. There no imminent need to move 'links' from being the first field, but let's be tidy. Reviewed-by: Aleksander Alekseev, Andres Freund Discussion: https://www.postgresql.org/message-id/22aa749e-cc1a-424a-b455-21325473a794@iki.fi	2024-07-05 11:21:46 +03:00
Peter Eisentraut	8f8bcb8883	Improve some global variable declarations We have in launch_backend.c: /* * The following need to be available to the save/restore_backend_variables * functions. They are marked NON_EXEC_STATIC in their home modules. / extern slock_t ShmemLock; extern slock_t ProcStructLock; extern PGPROC AuxiliaryProcs; extern PMSignalData PMSignalState; extern pg_time_t first_syslogger_file_time; extern struct bkend ShmemBackendArray; extern bool redirection_done; That comment is not completely true: ShmemLock, ShmemBackendArray, and redirection_done are not in fact NON_EXEC_STATIC. ShmemLock once was, but was then needed elsewhere. ShmemBackendArray was static inside postmaster.c before launch_backend.c was created. redirection_done was never static. This patch moves the declaration of ShmemLock and redirection_done to a header file. ShmemBackendArray gets a NON_EXEC_STATIC. This doesn't make a difference, since it only exists if EXEC_BACKEND anyway, but it makes it consistent. After that, the comment is now correct. Reviewed-by: Andres Freund <andres@anarazel.de> Discussion: https://www.postgresql.org/message-id/flat/e0a62134-83da-4ba4-8cdb-ceb0111c95ce@eisentraut.org	2024-07-02 07:26:22 +02:00
Tom Lane	edadeb0710	Remove support for HPPA (a/k/a PA-RISC) architecture. This old CPU architecture hasn't been produced in decades, and whatever instances might still survive are surely too underpowered for anyone to consider running Postgres on in production. We'd nonetheless continued to carry code support for it (largely at my insistence), because its unique implementation of spinlocks seemed like a good edge case for our spinlock infrastructure. However, our last buildfarm animal of this type was retired last year, and it seems quite unlikely that another will emerge. Without the ability to run tests, the argument that this is useful test code fails to hold water. Furthermore, carrying code support for an untestable architecture has costs not to be ignored. So, remove HPPA-specific code, in the same vein as commits 718aa43a4 and 92d70b77e. Discussion: https://postgr.es/m/3351991.1697728588@sss.pgh.pa.us	2024-07-01 13:55:52 -04:00
Noah Misch	0cecc908e9	Lock before setting relhassubclass on RELKIND_PARTITIONED_INDEX. Commit 5b562644fec696977df4a82790064e8287927891 added a comment that SetRelationHasSubclass() callers must hold this lock. When commit 17f206fbc824d2b4b14480199ca9ff7dea417eda extended use of this column to partitioned indexes, it didn't take the lock. As the latter commit message mentioned, we currently never reset a partitioned index to relhassubclass=f. That largely avoids harm from the lock omission. The cause for fixing this now is to unblock introducing a rule about locks required to heap_update() a pg_class row. This might cause more deadlocks. It gives minor user-visible benefits: - If an ALTER INDEX SET TABLESPACE runs concurrently with ALTER TABLE ATTACH PARTITION or CREATE PARTITION OF, one transaction blocks instead of failing with "tuple concurrently updated". (Many cases of DDL concurrency still fail that way.) - Match ALTER INDEX ATTACH PARTITION in choosing to lock the index. While not user-visible today, we'll need this if we ever make something set the flag to false for a partitioned index, like ANALYZE does today for tables. Back-patch to v12 (all supported versions), the plan for the commit relying on the new rule. In back branches, add LockOrStrongerHeldByMe() instead of adding a LockHeldByMe() parameter. Reviewed (in an earlier version) by Robert Haas. Discussion: https://postgr.es/m/20240611024525.9f.nmisch@google.com	2024-06-27 19:21:05 -07:00
Noah Misch	bb93640a68	Add wait event type "InjectionPoint", a custom type like "Extension". Both injection points and customization of type "Extension" are new in v17, so this just changes a detail of an unreleased feature. Reported by Robert Haas. Reviewed by Michael Paquier. Discussion: https://postgr.es/m/CA+TgmobfMU5pdXP36D5iAwxV5WKE_vuDLtp_1QyH+H5jMMt21g@mail.gmail.com	2024-06-27 19:21:05 -07:00
Heikki Linnakangas	cbfbda7841	Fix MVCC bug with prepared xact with subxacts on standby We did not recover the subtransaction IDs of prepared transactions when starting a hot standby from a shutdown checkpoint. As a result, such subtransactions were considered as aborted, rather than in-progress. That would lead to hint bits being set incorrectly, and the subtransactions suddenly becoming visible to old snapshots when the prepared transaction was committed. To fix, update pg_subtrans with prepared transactions's subxids when starting hot standby from a shutdown checkpoint. The snapshots taken from that state need to be marked as "suboverflowed", so that we also check the pg_subtrans. Backport to all supported versions. Discussion: https://www.postgresql.org/message-id/6b852e98-2d49-4ca1-9e95-db419a2696e0@iki.fi	2024-06-27 21:09:58 +03:00
Peter Geoghegan	6207f08f70	Harmonize function parameter names for Postgres 17. Make sure that function declarations use names that exactly match the corresponding names from function definitions in a few places. These inconsistencies were all introduced during Postgres 17 development. pg_bsd_indent still has a couple of similar inconsistencies, which I (pgeoghegan) have left untouched for now. This commit was written with help from clang-tidy, by mechanically applying the same rules as similar clean-up commits (the earliest such commit was commit 035ce1fe).	2024-06-12 17:01:51 -04:00
Peter Eisentraut	be5942aee7	Remove unused typedefs There were a few typedefs that were never used to define a variable or field. This had the effect that the buildfarm's typedefs.list would not include them, and so they would need to be re-added manually to keep the overall pgindent result perfectly clean. We can easily get rid of these typedefs to avoid the issue. In a few cases, we just let the enum or struct stand on its own without a typedef around it. In one case, an enum was abused to define flag bits; that's better done with macro definitions. This fixes all the remaining issues with missing entries in the buildfarm's typedefs.list. Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us> Discussion: https://www.postgresql.org/message-id/1919000.1715815925@sss.pgh.pa.us	2024-05-17 07:36:12 +02:00
Daniel Gustafsson	950d4a2cb1	Fix typos and duplicate words This fixes various typos, duplicated words, and tiny bits of whitespace mainly in code comments but also in docs. Author: Daniel Gustafsson <daniel@yesql.se> Author: Heikki Linnakangas <hlinnaka@iki.fi> Author: Alexander Lakhin <exclusion@gmail.com> Author: David Rowley <dgrowleyml@gmail.com> Author: Nazir Bilal Yavuz <byavuz81@gmail.com> Discussion: https://postgr.es/m/3F577953-A29E-4722-98AD-2DA9EFF2CBB8@yesql.se	2024-04-18 21:28:07 +02:00
Alexander Korotkov	772faafca1	Revert: Implement pg_wal_replay_wait() stored procedure This commit reverts 06c418e163, e37662f221, bf1e650806, 25f42429e2, ee79928441, and 74eaf66f98 per review by Heikki Linnakangas. Discussion: https://postgr.es/m/b155606b-e744-4218-bda5-29379779da1a%40iki.fi	2024-04-11 17:28:15 +03:00
Thomas Munro	13453eedd3	Add pg_buffercache_evict() function for testing. When testing buffer pool logic, it is useful to be able to evict arbitrary blocks. This function can be used in SQL queries over the pg_buffercache view to set up a wide range of buffer pool states. Of course, buffer mappings might change concurrently so you might evict a block other than the one you had in mind, and another session might bring it back in at any time. That's OK for the intended purpose of setting up developer testing scenarios, and more complicated interlocking schemes to give stronger guararantees about that would likely be less flexible for actual testing work anyway. Superuser-only. Author: Palak Chaturvedi <chaturvedipalak1911@gmail.com> Author: Thomas Munro <thomas.munro@gmail.com> (docs, small tweaks) Reviewed-by: Nitin Jadhav <nitinjadhavpostgres@gmail.com> Reviewed-by: Andres Freund <andres@anarazel.de> Reviewed-by: Cary Huang <cary.huang@highgo.ca> Reviewed-by: Cédric Villemain <cedric.villemain+pgsql@abcsql.com> Reviewed-by: Jim Nasby <jim.nasby@gmail.com> Reviewed-by: Maxim Orlov <orlovmg@gmail.com> Reviewed-by: Thomas Munro <thomas.munro@gmail.com> Reviewed-by: Melanie Plageman <melanieplageman@gmail.com> Discussion: https://postgr.es/m/CALfch19pW48ZwWzUoRSpsaV9hqt0UPyaBPC4bOZ4W+c7FF566A@mail.gmail.com	2024-04-08 16:23:40 +12:00
Alexander Korotkov	25f42429e2	Use an LWLock instead of a spinlock in waitlsn.c This should prevent busy-waiting when number of waiting processes is high. Discussion: https://postgr.es/m/202404030658.hhj3vfxeyhft%40alvherre.pgsql Author: Alvaro Herrera	2024-04-07 00:49:53 +03:00
Thomas Munro	3bd8439ed6	Allow BufferAccessStrategy to limit pin count. While pinning extra buffers to look ahead, users of strategies are in danger of using too many buffers. For some strategies, that means "escaping" from the ring, and in others it means forcing dirty data to disk very frequently with associated WAL flushing. Since external code has no insight into any of that, allow individual strategy types to expose a clamp that should be applied when deciding how many buffers to pin at once. Reviewed-by: Andres Freund <andres@anarazel.de> Reviewed-by: Melanie Plageman <melanieplageman@gmail.com> Discussion: https://postgr.es/m/CAAKRu_aJXnqsyZt6HwFLnxYEBgE17oypkxbKbT1t1geE_wvH2Q%40mail.gmail.com	2024-04-06 23:11:45 +13:00
Tom Lane	6faca9ae28	Avoid deadlock during orphan temp table removal. If temp tables have dependencies (such as sequences) then it's possible for autovacuum's cleanup of orphan temp tables to deadlock against an incoming backend that's trying to clean out the temp namespace for its own use. That can happen because RemoveTempRelations' performDeletion call can visit objects within the namespace in an order different from the order in which a per-table deletion will visit them. To fix, observe that performDeletion will begin by taking an exclusive lock on the temp namespace (even though it won't actually delete it). So, if we can get a shared lock on the namespace, we can be sure we're not running concurrently with RemoveTempRelations, while also not conflicting with ordinary use of the namespace. This requires introducing a conditional version of LockDatabaseObject, but that's no big deal. (It's surprising we've got along without that this long.) Report and patch by Mikhail Zhilin. Back-patch to all supported branches. Discussion: https://postgr.es/m/c43ce028-2bc2-4865-9b89-3f706246eed5@postgrespro.ru	2024-04-02 14:59:32 -04:00
Thomas Munro	b5a9b18cd0	Provide API for streaming relation data. Introduce an abstraction allowing relation data to be accessed as a stream of buffers, with an implementation that is more efficient than the equivalent sequence of ReadBuffer() calls. Client code supplies a callback that can say which block number it wants next, and then consumes individual buffers one at a time from the stream. This division puts read_stream.c in control of how far ahead it can see and allows it to read clusters of neighboring blocks with StartReadBuffers(). It also issues POSIX_FADV_WILLNEED advice ahead of time when random access is detected. Other variants of I/O stream will be proposed in future work (for example to support recovery, whose LsnReadQueue device in xlogprefetcher.c is a distant cousin of this code and should eventually be replaced by this), but this basic API is sufficient for many common executor usage patterns involving predictable access to a single fork of a single relation. Several patches using this API are proposed separately. This stream concept is loosely based on ideas from Andres Freund on how we should pave the way for later work on asynchronous I/O. Author: Thomas Munro <thomas.munro@gmail.com> Author: Heikki Linnakangas <hlinnaka@iki.fi> (contributions) Author: Melanie Plageman <melanieplageman@gmail.com> (contributions) Suggested-by: Andres Freund <andres@anarazel.de> Reviewed-by: Heikki Linnakangas <hlinnaka@iki.fi> Reviewed-by: Melanie Plageman <melanieplageman@gmail.com> Reviewed-by: Nazir Bilal Yavuz <byavuz81@gmail.com> Reviewed-by: Andres Freund <andres@anarazel.de> Tested-by: Tomas Vondra <tomas.vondra@enterprisedb.com> Discussion: https://postgr.es/m/CA+hUKGJkOiOCa+mag4BF+zHo7qo=o9CFheB8=g6uT5TUm2gkvA@mail.gmail.com	2024-04-03 00:49:46 +13:00
Thomas Munro	210622c60e	Provide vectored variant of ReadBuffer(). Break ReadBuffer() up into two steps. StartReadBuffers() and WaitReadBuffers() give us two main advantages: 1. Multiple consecutive blocks can be read with one system call. 2. Advice (hints of future reads) can optionally be issued to the kernel ahead of time. The traditional ReadBuffer() function is now implemented in terms of those functions, to avoid duplication. A new GUC io_combine_limit is defined, and the functions for limiting per-backend pin counts are made into public APIs. Those are provided for use by callers of StartReadBuffers(), when deciding how many buffers to read at once. The following commit will add a higher level mechanism for doing that automatically with a practical interface. With some more infrastructure in later work, StartReadBuffers() could be extended to start real asynchronous I/O instead of just issuing advice and leaving WaitReadBuffers() to do the work synchronously. Author: Thomas Munro <thomas.munro@gmail.com> Author: Andres Freund <andres@anarazel.de> (some optimization tweaks) Reviewed-by: Melanie Plageman <melanieplageman@gmail.com> Reviewed-by: Heikki Linnakangas <hlinnaka@iki.fi> Reviewed-by: Nazir Bilal Yavuz <byavuz81@gmail.com> Reviewed-by: Dilip Kumar <dilipbalaut@gmail.com> Reviewed-by: Andres Freund <andres@anarazel.de> Tested-by: Tomas Vondra <tomas.vondra@enterprisedb.com> Discussion: https://postgr.es/m/CA+hUKGJkOiOCa+mag4BF+zHo7qo=o9CFheB8=g6uT5TUm2gkvA@mail.gmail.com	2024-04-03 00:23:20 +13:00
Masahiko Sawada	667e65aac3	Use TidStore for dead tuple TIDs storage during lazy vacuum. Previously, we used a simple array for storing dead tuple IDs during lazy vacuum, which had a number of problems: * The array used a single allocation and so was limited to 1GB. * The allocation was pessimistically sized according to table size. * Lookup with binary search was slow because of poor CPU cache and branch prediction behavior. This commit replaces that array with the TID store from commit 30e144287a. Since the backing radix tree makes small allocations as needed, the 1GB limit is now gone. Further, the total memory used is now often smaller by an order of magnitude or more, depending on the distribution of blocks and offsets. These two features should make multiple rounds of heap scanning and index cleanup an extremely rare event. TID lookup during index cleanup is also several times faster, even more so when index order is correlated with heap tuple order. Since there is no longer a predictable relationship between the number of dead tuples vacuumed and the space taken up by their TIDs, the number of tuples no longer provides any meaningful insights for users, nor is the maximum number predictable. For that reason this commit also changes to byte-based progress reporting, with the relevant columns of pg_stat_progress_vacuum renamed accordingly to max_dead_tuple_bytes and dead_tuple_bytes. For parallel vacuum, both the TID store and supplemental information specific to vacuum are shared among the parallel vacuum workers. As with the previous array, we don't take any locks on TidStore during parallel vacuum since writes are still only done by the leader process. Bump catalog version. Reviewed-by: John Naylor, (in an earlier version) Dilip Kumar Discussion: https://postgr.es/m/CAD21AoAfOZvmfR0j8VmZorZjL7RhTiQdVttNuC4W-Shdc2a-AA%40mail.gmail.com	2024-04-02 10:15:37 +09:00
Alvaro Herrera	da952b415f	Rework lwlocknames.txt to become lwlocklist.h This way, we can fold the list of lock names to occur in BuiltinTrancheNames instead of having its own separate array. This saves two lines of code in GetLWTrancheName and some space in BuiltinTrancheNames, as foreseen in commit 74a730631065, as well as removing the need for a separate lwlocknames.c file. We still have to build lwlocknames.h using Perl code, which initially I wanted to avoid, but it gives us the chance to cross-check wait_event_names.txt. Discussion: https://postgr.es/m/202401231025.gbv4nnte5fmm@alvherre.pgsql	2024-03-20 11:55:20 +01:00
Robert Haas	2346df6fc3	Allow a no-wait lock acquisition to succeed in more cases. We don't determine the position at which a process waiting for a lock should insert itself into the wait queue until we reach ProcSleep(), and we may at that point discover that we must insert ourselves ahead of everyone who wants a conflicting lock, in which case we obtain the lock immediately. Up until now, a no-wait lock acquisition would fail in such cases, erroneously claiming that the lock couldn't be obtained immediately. Fix that by trying ProcSleep even in the no-wait case. No back-patch for now, because I'm treating this as an improvement to the existing no-wait feature. It could instead be argued that it's a bug fix, on the theory that there should never be any case whatsoever where no-wait fails to obtain a lock that would have been obtained immediately without no-wait, but I'm reluctant to interpret the semantics of no-wait that strictly. Robert Haas and Jingxian Li Discussion: http://postgr.es/m/CA+TgmobCH-kMXGVpb0BB-iNMdtcNkTvcZ4JBxDJows3kYM+GDg@mail.gmail.com	2024-03-14 08:56:06 -04:00
Alexander Korotkov	e85662df44	Fix false reports in pg_visibility Currently, pg_visibility computes its xid horizon using the GetOldestNonRemovableTransactionId(). The problem is that this horizon can sometimes go backward. That can lead to reporting false errors. In order to fix that, this commit implements a new function GetStrictOldestNonRemovableTransactionId(). This function computes the xid horizon, which would be guaranteed to be newer or equal to any xid horizon computed before. We have to do the following to achieve this. 1. Ignore processes xmin's, because they consider connection to other databases that were ignored before. 2. Ignore KnownAssignedXids, because they are not database-aware. At the same time, the primary could compute its horizons database-aware. 3. Ignore walsender xmin, because it could go backward if some replication connections don't use replication slots. As a result, we're using only currently running xids to compute the horizon. Surely these would significantly sacrifice accuracy. But we have to do so to avoid reporting false errors. Inspired by earlier patch by Daniel Shelepanov and the following discussion with Robert Haas and Tom Lane. Discussion: https://postgr.es/m/1649062270.289865713%40f403.i.mail.ru Reviewed-by: Alexander Lakhin, Dmitry Koval	2024-03-14 13:12:05 +02:00
Heikki Linnakangas	024c521117	Replace BackendIds with 0-based ProcNumbers Now that BackendId was just another index into the proc array, it was redundant with the 0-based proc numbers used in other places. Replace all usage of backend IDs with proc numbers. The only place where the term "backend id" remains is in a few pgstat functions that expose backend IDs at the SQL level. Those IDs are now in fact 0-based ProcNumbers too, but the documentation still calls them "backend ids". That term still seems appropriate to describe what the numbers are, so I let it be. One user-visible effect is that pg_temp_0 is now a valid temp schema name, for backend with ProcNumber 0. Reviewed-by: Andres Freund Discussion: https://www.postgresql.org/message-id/8171f1aa-496f-46a6-afc3-c46fe7a9b407@iki.fi	2024-03-03 19:38:22 +02:00
Heikki Linnakangas	ab355e3a88	Redefine backend ID to be an index into the proc array Previously, backend ID was an index into the ProcState array, in the shared cache invalidation manager (sinvaladt.c). The entry in the ProcState array was reserved at backend startup by scanning the array for a free entry, and that was also when the backend got its backend ID. Things become slightly simpler if we redefine backend ID to be the index into the PGPROC array, and directly use it also as an index to the ProcState array. This uses a little more memory, as we reserve a few extra slots in the ProcState array for aux processes that don't need them, but the simplicity is worth it. Aux processes now also have a backend ID. This simplifies the reservation of BackendStatusArray and ProcSignal slots. You can now convert a backend ID into an index into the PGPROC array simply by subtracting 1. We still use 0-based "pgprocnos" in various places, for indexes into the PGPROC array, but the only difference now is that backend IDs start at 1 while pgprocnos start at 0. (The next commmit will get rid of the term "backend ID" altogether and make everything 0-based.) There is still a 'backendId' field in PGPROC, now part of 'vxid' which encapsulates the backend ID and local transaction ID together. It's needed for prepared xacts. For regular backends, the backendId is always equal to pgprocno + 1, but for prepared xact PGPROC entries, it's the ID of the original backend that processed the transaction. Reviewed-by: Andres Freund, Reid Thompson Discussion: https://www.postgresql.org/message-id/8171f1aa-496f-46a6-afc3-c46fe7a9b407@iki.fi	2024-03-03 19:37:28 +02:00
Thomas Munro	653b55b570	Return ssize_t in fd.c I/O functions. In the past, FileRead() and FileWrite() used types based on the Unix read() and write() functions from before C and POSIX standardization, though not exactly (we had int for amount instead of unsigned). In commit 2d4f1ba6 we changed to the appropriate standard C types, just like the modern POSIX functions they wrap, but again not exactly: the return type stayed as int. In theory, a ssize_t value could be returned by the underlying call that is too large for an int. That wasn't really a live bug, because we don't expect PostgreSQL code to perform reads or writes of gigabytes, and OSes probably apply internal caps smaller than that anyway. This change is done on the principle that the return might as well follow the standard interfaces consistently. Reported-by: Tom Lane <tgl@sss.pgh.pa.us> Reviewed-by: Peter Eisentraut <peter@eisentraut.org> Discussion: https://postgr.es/m/1672202.1703441340%40sss.pgh.pa.us	2024-03-02 12:09:28 +13:00
Alvaro Herrera	53c2a97a92	Improve performance of subsystems on top of SLRU More precisely, what we do here is make the SLRU cache sizes configurable with new GUCs, so that sites with high concurrency and big ranges of transactions in flight (resp. multixacts/subtransactions) can benefit from bigger caches. In order for this to work with good performance, two additional changes are made: 1. the cache is divided in "banks" (to borrow terminology from CPU caches), and algorithms such as eviction buffer search only affect one specific bank. This forestalls the problem that linear searching for a specific buffer across the whole cache takes too long: we only have to search the specific bank, whose size is small. This work is authored by Andrey Borodin. 2. Change the locking regime for the SLRU banks, so that each bank uses a separate LWLock. This allows for increased scalability. This work is authored by Dilip Kumar. (A part of this was previously committed as d172b717c6f4.) Special care is taken so that the algorithms that can potentially traverse more than one bank release one bank's lock before acquiring the next. This should happen rarely, but particularly clog.c's group commit feature needed code adjustment to cope with this. I (Álvaro) also added lots of comments to make sure the design is sound. The new GUCs match the names introduced by bcdfa5f2e2f2 in the pg_stat_slru view. The default values for these parameters are similar to the previous sizes of each SLRU. commit_ts, clog and subtrans accept value 0, which means to adjust by dividing shared_buffers by 512 (so 2MB for every 1GB of shared_buffers), with a cap of 8MB. (A new slru.c function SimpleLruAutotuneBuffers() was added to support this.) The cap was previously 1MB for clog, so for sites with more than 512MB of shared memory the total memory used increases, which is likely a good tradeoff. However, other SLRUs (notably multixact ones) retain smaller sizes and don't support a configured value of 0. These values based on shared_buffers may need to be revisited, but that's an easy change. There was some resistance to adding these new GUCs: it would be better to adjust to memory pressure automatically somehow, for example by stealing memory from shared_buffers (where the caches can grow and shrink naturally). However, doing that seems to be a much larger project and one which has made virtually no progress in several years, and because this is such a pain point for so many users, here we take the pragmatic approach. Author: Andrey Borodin <x4mmm@yandex-team.ru> Author: Dilip Kumar <dilipbalaut@gmail.com> Reviewed-by: Amul Sul, Gilles Darold, Anastasia Lubennikova, Ivan Lazarev, Robert Haas, Thomas Munro, Tomas Vondra, Yura Sokolov, Васильев Дмитрий (Dmitry Vasiliev). Discussion: https://postgr.es/m/2BEC2B3F-9B61-4C1D-9FB5-5FAB0F05EF86@yandex-team.ru Discussion: https://postgr.es/m/CAFiTN-vzDvNz=ExGXz6gdyjtzGixKSqs0mKHMmaQ8sOSEFZ33A@mail.gmail.com	2024-02-28 17:05:31 +01:00
Heikki Linnakangas	0b16bb8776	Remove AIX support There isn't a lot of user demand for AIX support, we have a bunch of hacks to work around AIX-specific compiler bugs and idiosyncrasies, and no one has stepped up to the plate to properly maintain it. Remove support for AIX to get rid of that maintenance overhead. It's still supported for stable versions. The acute issue that triggered this decision was that after commit 8af2565248, the AIX buildfarm members have been hitting this assertion: TRAP: failed Assert("(uintptr_t) buffer == TYPEALIGN(PG_IO_ALIGN_SIZE, buffer)"), File: "md.c", Line: 472, PID: 2949728 Apperently the "pg_attribute_aligned(a)" attribute doesn't work on AIX for values larger than PG_IO_ALIGN_SIZE, for a static const variable. That could be worked around, but we decided to just drop the AIX support instead. Discussion: https://www.postgresql.org/message-id/20240224172345.32@rfd.leadboat.com Reviewed-by: Andres Freund, Noah Misch, Thomas Munro	2024-02-28 15:17:23 +04:00

1 2 3 4 5 ...

1444 Commits