postgresql

mirror of https://git.postgresql.org/git/postgresql.git synced 2026-02-21 13:56:59 +08:00

Author	SHA1	Message	Date
Thomas Munro	7cdd0c2d7c	Fix lock assertions in dshash.c. dshash.c previously maintained flags to be able to assert that you didn't hold any partition lock. These flags could get out of sync with reality in error scenarios. Get rid of all that, and make assertions about the locks themselves instead. Since LWLockHeldByMe() loops internally, we don't want to put that inside another loop over all partition locks. Introduce a new debugging-only interface LWLockAnyHeldByMe() to avoid that. This problem was noted by Tom and Andres while reviewing changes to support the new shared memory stats system, and later showed up in reality while working on commit 389869af. Back-patch to 11, where dshash.c arrived. Reported-by: Tom Lane <tgl@sss.pgh.pa.us> Reported-by: Andres Freund <andres@anarazel.de> Reviewed-by: Kyotaro HORIGUCHI <horiguchi.kyotaro@lab.ntt.co.jp> Reviewed-by: Zhihong Yu <zyu@yugabyte.com> Reviewed-by: Andres Freund <andres@anarazel.de> Discussion: https://postgr.es/m/20220311012712.botrpsikaufzteyt@alap3.anarazel.de Discussion: https://postgr.es/m/CA%2BhUKGJ31Wce6HJ7xnVTKWjFUWQZPBngxfJVx4q0E98pDr3kAw%40mail.gmail.com	2022-07-11 15:48:54 +12:00
Tom Lane	cfc86f9873	Fix SPI's handling of errors during transaction commit. SPI_commit previously left it up to the caller to recover from any error occurring during commit. Since that's complicated and requires use of low-level xact.c facilities, it's not too surprising that no caller got it right. Let's move the responsibility for cleanup into spi.c. Doing that requires redefining SPI_commit as starting a new transaction, so that it becomes equivalent to SPI_commit_and_chain except that you get default transaction characteristics instead of preserving the prior transaction's characteristics. We can make this pretty transparent API-wise by redefining SPI_start_transaction() as a no-op. Callers that expect to do something in between might be surprised, but available evidence is that no callers do so. Having made that API redefinition, we can fix this mess by having SPI_commit[_and_chain] trap errors and start a new, clean transaction before re-throwing the error. Likewise for SPI_rollback[_and_chain]. Some cleanup is also needed in AtEOXact_SPI, which was nowhere near smart enough to deal with SPI contexts nested inside a committing context. While plperl and pltcl need no changes beyond removing their now-useless SPI_start_transaction() calls, plpython needs some more work because it hadn't gotten the memo about catching commit/rollback errors in the first place. Such an error resulted in longjmp'ing out of the Python interpreter, which leaks Python stack entries at present and is reported to crash Python 3.11 altogether. Add the missing logic to catch such errors and convert them into Python exceptions. This is a back-patch of commit 2e517818f. That's now aged long enough to reduce the concerns about whether it will break something, and we do need to ensure that supported branches will work with Python 3.11. Peter Eisentraut and Tom Lane Discussion: https://postgr.es/m/3375ffd8-d71c-2565-e348-a597d6e739e3@enterprisedb.com Discussion: https://postgr.es/m/17416-ed8fe5d7213d6c25@postgresql.org	2022-06-22 12:12:00 -04:00
Amit Kapila	1f9a7738eb	Fix data inconsistency between publisher and subscriber. We were not updating the partition map cache in the subscriber even when the corresponding remote rel is changed. Due to this data was getting incorrectly replicated for partition tables after the publisher has changed the table schema. Fix it by resetting the required entries in the partition map cache after receiving a new relation mapping from the publisher. Reported-by: Shi Yu Author: Shi Yu, Hou Zhijie Reviewed-by: Amit Langote, Amit Kapila Backpatch-through: 13, where it was introduced Discussion: https://postgr.es/m/OSZPR01MB6310F46CD425A967E4AEF736FDA49@OSZPR01MB6310.jpnprd01.prod.outlook.com	2022-06-16 08:24:22 +05:30
Alvaro Herrera	5fd0cccc11	Repurpose PROC_COPYABLE_FLAGS as PROC_XMIN_FLAGS This is a slight, convenient semantics change from what commit 0f0cfb494004 ("Fix parallel operations that prevent oldest xmin from advancing") introduced that lets us simplify the coding in the one place where it is used. Backpatch to 13. This is related to commit 6fea65508a1a ("Tighten ComputeXidHorizons' handling of walsenders") rewriting the code site where this is used, which has not yet been backpatched, but it may well be in the future. Reviewed-by: Masahiko Sawada <sawada.mshk@gmail.com> Discussion: https://postgr.es/m/202204191637.eldwa2exvguw@alvherre.pgsql	2022-05-19 16:20:32 +02:00
Amit Kapila	55558df237	Fix the logical replication timeout during large transactions. The problem is that we don't send keep-alive messages for a long time while processing large transactions during logical replication where we don't send any data of such transactions. This can happen when the table modified in the transaction is not published or because all the changes got filtered. We do try to send the keep_alive if necessary at the end of the transaction (via WalSndWriteData()) but by that time the subscriber-side can timeout and exit. To fix this we try to send the keepalive message if required after processing certain threshold of changes. Reported-by: Fabrice Chapuis Author: Wang wei and Amit Kapila Reviewed By: Masahiko Sawada, Euler Taveira, Hou Zhijie, Hayato Kuroda Backpatch-through: 10 Discussion: https://postgr.es/m/CAA5-nLARN7-3SLU_QUxfy510pmrYK6JJb=bk3hcgemAM_pAv+w@mail.gmail.com	2022-05-11 10:41:24 +05:30
Tom Lane	e6440d5007	Remove inadequate assertion check in CTE inlining. inline_cte() expected to find exactly as many references to the target CTE as its cterefcount indicates. While that should be accurate for the tree as emitted by the parser, there are some optimizations that occur upstream of here that could falsify it, notably removal of unused subquery output expressions. Trying to make the accounting 100% accurate seems expensive and doomed to future breakage. It's not really worth it, because all this code is protecting is downstream assumptions that every referenced CTE has a plan. Let's convert those assertions to regular test-and-elog just in case there's some actual problem, and then drop the failing assertion. Per report from Tomas Vondra (thanks also to Richard Guo for analysis). Back-patch to v12 where the faulty code came in. Discussion: https://postgr.es/m/29196a1e-ed47-c7ca-9be2-b1c636816183@enterprisedb.com	2022-04-21 17:58:52 -04:00
Robert Haas	d18c913b78	Rethink the delay-checkpoint-end mechanism in the back-branches. The back-patch of commit bbace5697df12398e87ffd9879171c39d27f5b33 had the unfortunate effect of changing the layout of PGPROC in the back-branches, which could break extensions. This happened because it changed the delayChkpt from type bool to type int. So, change it back, and add a new bool delayChkptEnd field instead. The new field should fall within what used to be padding space within the struct, and so hopefully won't cause any extensions to break. Per report from Markus Wanner and discussion with Tom Lane and others. Patch originally by me, somewhat revised by Markus Wanner per a suggestion from Michael Paquier. A very similar patch was developed by Kyotaro Horiguchi, but I failed to see the email in which that was posted before writing one of my own. Discussion: http://postgr.es/m/CA+Tgmoao-kUD9c5nG5sub3F7tbo39+cdr8jKaOVEs_1aBWcJ3Q@mail.gmail.com Discussion: http://postgr.es/m/20220406.164521.17171257901083417.horikyota.ntt@gmail.com	2022-04-14 11:10:11 -04:00
Tom Lane	44096c31ea	Prevent access to no-longer-pinned buffer in heapam_tuple_lock(). heap_fetch() used to have a "keep_buf" parameter that told it to return ownership of the buffer pin to the caller after finding that the requested tuple TID exists but is invisible to the specified snapshot. This was thoughtlessly removed in commit 5db6df0c0, which broke heapam_tuple_lock() (formerly EvalPlanQualFetch) because that function needs to do more accesses to the tuple even if it's invisible. The net effect is that we would continue to touch the page for a microsecond or two after releasing pin on the buffer. Usually no harm would result; but if a different session decided to defragment the page concurrently, we could see garbage data and mistakenly conclude that there's no newer tuple version to chain up to. (It's hard to say whether this has happened in the field. The bug was actually found thanks to a later change that allowed valgrind to detect accesses to non-pinned buffers.) The most reasonable way to fix this is to reintroduce keep_buf, although I made it behave slightly differently: buffer ownership is passed back only if there is a valid tuple at the requested TID. In HEAD, we can just add the parameter back to heap_fetch(). To avoid an API break in the back branches, introduce an additional function heap_fetch_extended() in those branches. In HEAD there is an additional, less obvious API change: tuple->t_data will be set to NULL in all cases where buffer ownership is not returned, in particular when the tuple exists but fails the time qual (and !keep_buf). This is to defend against any other callers attempting to access non-pinned buffers. We concluded that making that change in back branches would be more likely to introduce problems than cure any. In passing, remove a comment about heap_fetch that was obsoleted by 9a8ee1dc6. Per bug #17462 from Daniil Anisimov. Back-patch to v12 where the bug was introduced. Discussion: https://postgr.es/m/17462-9c98a0f00df9bd36@postgresql.org	2022-04-13 13:35:02 -04:00
Peter Eisentraut	8f4b5b0909	Remove obsolete comment accidentally left behind by 4cb658af70027c3544fb843d77b2e84028762747	2022-04-02 07:28:23 +02:00
Alvaro Herrera	a54ed2de80	Revert "Fix replay of create database records on standby" This reverts commit 49d9cfc68bf4. The approach taken by this patch has problems, so we'll come up with a radically different fix. Discussion: https://postgr.es/m/CA+TgmoYcUPL+WOJL2ZzhH=zmrhj0iOQ=iCFM0SuYqBbqZEamEg@mail.gmail.com	2022-03-29 15:36:21 +02:00
Tom Lane	4fbbea2c19	Suppress compiler warning in relptr_store(). clang 13 with -Wextra warns that "performing pointer subtraction with a null pointer has undefined behavior" in the places where freepage.c tries to set a relptr variable to constant NULL. This appears to be a compiler bug, but it's unlikely to get fixed instantly. Fortunately, we can work around it by introducing an inline support function, which seems like a good change anyway because it removes the macro's existing double-evaluation hazard. Backpatch to v10 where this code was introduced. Patch by me, based on an idea of Andres Freund's. Discussion: https://postgr.es/m/48826.1648310694@sss.pgh.pa.us	2022-03-26 14:29:48 -04:00
Alvaro Herrera	50e3ed8e91	Fix replay of create database records on standby Crash recovery on standby may encounter missing directories when replaying create database WAL records. Prior to this patch, the standby would fail to recover in such a case. However, the directories could be legitimately missing. Consider a sequence of WAL records as follows: CREATE DATABASE DROP DATABASE DROP TABLESPACE If, after replaying the last WAL record and removing the tablespace directory, the standby crashes and has to replay the create database record again, the crash recovery must be able to move on. This patch adds a mechanism similar to invalid-page tracking, to keep a tally of missing directories during crash recovery. If all the missing directory references are matched with corresponding drop records at the end of crash recovery, the standby can safely continue following the primary. Backpatch to 13, at least for now. The bug is older, but fixing it in older branches requires more careful study of the interactions with commit e6d8069522c8, which appeared in 13. A new TAP test file is added to verify the condition. However, because it depends on commit d6d317dbf615, it can only be added to branch master. I (Álvaro) manually verified that the code behaves as expected in branch 14. It's a bit nervous-making to leave the code uncovered by tests in older branches, but leaving the bug unfixed is even worse. Also, the main reason this fix took so long is precisely that we couldn't agree on a good strategy to approach testing for the bug, so perhaps this is the best we can do. Diagnosed-by: Paul Guo <paulguo@gmail.com> Author: Paul Guo <paulguo@gmail.com> Author: Kyotaro Horiguchi <horikyota.ntt@gmail.com> Author: Asim R Praveen <apraveen@pivotal.io> Discussion: https://postgr.es/m/CAEET0ZGx9AvioViLf7nbR_8tH9-=27DN5xWJ2P9-ROH16e4JUA@mail.gmail.com	2022-03-25 13:16:21 +01:00
Robert Haas	1ce14b6b2f	Fix possible recovery trouble if TRUNCATE overlaps a checkpoint. If TRUNCATE causes some buffers to be invalidated and thus the checkpoint does not flush them, TRUNCATE must also ensure that the corresponding files are truncated on disk. Otherwise, a replay from the checkpoint might find that the buffers exist but have the wrong contents, which may cause replay to fail. Report by Teja Mupparti. Patch by Kyotaro Horiguchi, per a design suggestion from Heikki Linnakangas, with some changes to the comments by me. Review of this and a prior patch that approached the issue differently by Heikki Linnakangas, Andres Freund, Álvaro Herrera, Masahiko Sawada, and Tom Lane. Discussion: http://postgr.es/m/BYAPR06MB6373BF50B469CA393C614257ABF00@BYAPR06MB6373.namprd06.prod.outlook.com	2022-03-24 14:36:06 -04:00
Thomas Munro	cfdb303be7	Fix waiting in RegisterSyncRequest(). If we run out of space in the checkpointer sync request queue (which is hopefully rare on real systems, but common with very small buffer pool), we wait for it to drain. While waiting, we should report that as a wait event so that users know what is going on, and also handle postmaster death, since otherwise the loop might never terminate if the checkpointer has exited. Back-patch to 12. Although the problem exists in earlier releases too, the code is structured differently before 12 so I haven't gone any further for now, in the absence of field complaints. Reported-by: Andres Freund <andres@anarazel.de> Reviewed-by: Andres Freund <andres@anarazel.de> Discussion: https://postgr.es/m/20220226213942.nb7uvb2pamyu26dj%40alap3.anarazel.de	2022-03-16 15:37:15 +13:00
Tom Lane	97031f4404	Fix bogus casting in BlockIdGetBlockNumber(). This macro cast the result to BlockNumber after shifting, not before, which is the wrong thing. Per the C spec, the uint16 fields would promote to int not unsigned int, so that (for 32-bit int) the shift potentially shifts a nonzero bit into the sign position. I doubt there are any production systems where this would actually end with the wrong answer, but it is undefined behavior per the C spec, and clang's -fsanitize=undefined option reputedly warns about it on some platforms. (I can't reproduce that right now, but the code is undeniably wrong per spec.) It's easy to fix by casting to BlockNumber (uint32) in the proper places. It's been wrong for ages, so back-patch to all supported branches. Report and patch by Zhihong Yu (cosmetic tweaking by me) Discussion: https://postgr.es/m/CALNJ-vT9r0DSsAOw9OXVJFxLENoVS_68kJ5x0p44atoYH+H4dg@mail.gmail.com	2022-03-03 19:03:42 -05:00
Tom Lane	46b738ffc2	Suppress warning about stack_base_ptr with late-model GCC. GCC 12 complains that set_stack_base is storing the address of a local variable in a long-lived pointer. This is an entirely reasonable warning (indeed, it just helped us find a bug); but that behavior is intentional here. We can work around it by using __builtin_frame_address(0) instead of a specific local variable; that produces an address a dozen or so bytes different, in my testing, but we don't care about such a small difference. Maybe someday a compiler lacking that function will start to issue a similar warning, but we'll worry about that when it happens. Patch by me, per a suggestion from Andres Freund. Back-patch to v12, which is as far back as the patch will go without some pain. (Recently-established project policy would permit a back-patch as far as 9.2, but I'm disinclined to expend the work until GCC 12 is much more widespread.) Discussion: https://postgr.es/m/3773792.1645141467@sss.pgh.pa.us	2022-02-17 22:45:34 -05:00
Tom Lane	5ad70564f4	Fix failure to validate the result of select_common_type(). Although select_common_type() has a failure-return convention, an apparent successful return just provides a type OID that might work as a common supertype; we've not validated that the required casts actually exist. In the mainstream use-cases that doesn't matter, because we'll proceed to invoke coerce_to_common_type() on each input, which will fail appropriately if the proposed common type doesn't actually work. However, a few callers didn't read the (nonexistent) fine print, and thought that if they got back a nonzero OID then the coercions were sure to work. This affects in particular the recently-added "anycompatible" polymorphic types; we might think that a function/operator using such types matches cases it really doesn't. A likely end result of that is unexpected "ambiguous operator" errors, as for example in bug #17387 from James Inform. Another, much older, case is that the parser might try to transform an "x IN (list)" construct to a ScalarArrayOpExpr even when the list elements don't actually have a common supertype. It doesn't seem desirable to add more checking to select_common_type itself, as that'd just slow down the mainstream use-cases. Instead, write a separate function verify_common_type that performs the missing checks, and add a call to that where necessary. Likewise add verify_common_type_from_oids to go with select_common_type_from_oids. Back-patch to v13 where the "anycompatible" types came in. (The symptom complained of in bug #17387 doesn't appear till v14, but that's just because we didn't get around to converting \|\| to use anycompatible till then.) In principle the "x IN (list)" fix could go back all the way, but I'm not currently convinced that it makes much difference in real-world cases, so I won't bother for now. Discussion: https://postgr.es/m/17387-5dfe54b988444963@postgresql.org	2022-01-29 11:41:12 -05:00
Tomas Vondra	e90f258aca	Fix ordering of XIDs in ProcArrayApplyRecoveryInfo Commit 8431e296ea reworked ProcArrayApplyRecoveryInfo to sort XIDs before adding them to KnownAssignedXids. But the XIDs are sorted using xidComparator, which compares the XIDs simply as uint32 values, not logically. KnownAssignedXidsAdd() however expects XIDs in logical order, and calls TransactionIdFollowsOrEquals() to enforce that. If there are XIDs for which the two orderings disagree, an error is raised and the recovery fails/restarts. Hitting this issue is fairly easy - you just need two transactions, one started before the 4B limit (e.g. XID 4294967290), the other sometime after it (e.g. XID 1000). Logically (4294967290 <= 1000) but when compared using xidComparator we try to add them in the opposite order. Which makes KnownAssignedXidsAdd() fail with an error like this: ERROR: out-of-order XID insertion in KnownAssignedXids This only happens during replica startup, while processing RUNNING_XACTS records to build the snapshot. Once we reach STANDBY_SNAPSHOT_READY, we skip these records. So this does not affect already running replicas, but if you restart (or create) a replica while there are transactions with XIDs for which the two orderings disagree, you may hit this. Long-running transactions and frequent replica restarts increase the likelihood of hitting this issue. Once the replica gets into this state, it can't be started (even if the old transactions are terminated). Fixed by sorting the XIDs logically - this is fine because we're dealing with normal XIDs (because it's XIDs assigned to backends) and from the same wraparound epoch (otherwise the backends could not be running at the same time on the primary node). So there are no problems with the triangle inequality, which is why xidComparator compares raw values. Investigation and root cause analysis by Abhijit Menon-Sen. Patch by me. This issue is present in all releases since 9.4, however releases up to 9.6 are EOL already so backpatch to 10 only. Reviewed-by: Abhijit Menon-Sen Reviewed-by: Alvaro Herrera Backpatch-through: 10 Discussion: https://postgr.es/m/36b8a501-5d73-277c-4972-f58a4dce088a%40enterprisedb.com	2022-01-27 20:16:39 +01:00
Tom Lane	d67354d870	Fix limitations on what SQL commands can be issued to a walsender. In logical replication mode, a WalSender is supposed to be able to execute any regular SQL command, as well as the special replication commands. Poor design of the replication-command parser caused it to fail in various cases, notably: * semicolons embedded in a command, or multiple SQL commands sent in a single message; * dollar-quoted literals containing odd numbers of single or double quote marks; * commands starting with a comment. The basic problem here is that we're trying to run repl_scanner.l across the entire input string even when it's not a replication command. Since repl_scanner.l does not understand all of the token types known to the core lexer, this is doomed to have failure modes. We certainly don't want to make repl_scanner.l as big as scan.l, so instead rejigger stuff so that we only lex the first token of a non-replication command. That will usually look like an IDENT to repl_scanner.l, though a comment would end up getting reported as a '-' or '/' single-character token. If the token is a replication command keyword, we push it back and proceed normally with repl_gram.y parsing. Otherwise, we can drop out of exec_replication_command() without examining the rest of the string. (It's still theoretically possible for repl_scanner.l to fail on the first token; but that could only happen if it's an unterminated single- or double-quoted string, in which case you'd have gotten largely the same error from the core lexer too.) In this way, repl_gram.y isn't involved at all in handling general SQL commands, so we can get rid of the SQLCmd node type. (In the back branches, we can't remove it because renumbering enum NodeTag would be an ABI break; so just leave it sit there unused.) I failed to resist the temptation to clean up some other sloppy coding in repl_scanner.l while at it. The only externally-visible behavior change from that is it now accepts \r and \f as whitespace, same as the core lexer. Per bug #17379 from Greg Rychlewski. Back-patch to all supported branches. Discussion: https://postgr.es/m/17379-6a5c6cfb3f1f5e77@postgresql.org	2022-01-24 15:33:34 -05:00
Tom Lane	20d08b2c61	Fix index-only scan plans, take 2. Commit 4ace45677 failed to fix the problem fully, because the same issue of attempting to fetch a non-returnable index column can occur when rechecking the indexqual after using a lossy index operator. Moreover, it broke EXPLAIN for such indexquals (which indicates a gap in our test cases :-(). Revert the code changes of 4ace45677 in favor of adding a new field to struct IndexOnlyScan, containing a version of the indexqual that can be executed against the index-returned tuple without using any non-returnable columns. (The restrictions imposed by check_index_only guarantee this is possible, although we may have to recompute indexed expressions.) Support construction of that during setrefs.c processing by marking IndexOnlyScan.indextlist entries as resjunk if they can't be returned, rather than removing them entirely. (We could alternatively require setrefs.c to look up the IndexOptInfo again, but abusing resjunk this way seems like a reasonably safe way to avoid needing to do that.) This solution isn't great from an API-stability standpoint: if there are any extensions out there that build IndexOnlyScan structs directly, they'll be broken in the next minor releases. However, only a very invasive extension would be likely to do such a thing. There's no change in the Path representation, so typical planner extensions shouldn't have a problem. As before, back-patch to all supported branches. Discussion: https://postgr.es/m/3179992.1641150853@sss.pgh.pa.us Discussion: https://postgr.es/m/17350-b5bdcf476e5badbb@postgresql.org	2022-01-03 15:42:27 -05:00
Tom Lane	45ae427141	Fix index-only scan plans when not all index columns can be returned. If an index has both returnable and non-returnable columns, and one of the non-returnable columns is an expression using a Var that is in a returnable column, then a query returning that expression could result in an index-only scan plan that attempts to read the non-returnable column, instead of recomputing the expression from the returnable column as intended. To fix, redefine the "indextlist" list of an IndexOnlyScan plan node as containing null Consts in place of any non-returnable columns. This solves the problem by preventing setrefs.c from falsely matching to such entries. The executor is happy since it only cares about the exposed types of the entries, and ruleutils.c doesn't care because a correct plan won't reference those entries. I considered some other ways to prevent setrefs.c from doing the wrong thing, but this way seems good since (a) it allows a very localized fix, (b) it makes the indextlist structure more compact in many cases, and (c) the indextlist is now a more faithful representation of what the index AM will actually produce, viz. nulls for any non-returnable columns. This is easier to hit since we introduced included columns, but it's possible to construct failing examples without that, as per the added regression test. Hence, back-patch to all supported branches. Per bug #17350 from Louis Jachiet. Discussion: https://postgr.es/m/17350-b5bdcf476e5badbb@postgresql.org	2022-01-01 16:12:03 -05:00
Michael Paquier	28e1e5c2a9	Correct comment and some documentation about REPLICA_IDENTITY_INDEX catalog/pg_class.h was stating that REPLICA_IDENTITY_INDEX with a dropped index is equivalent to REPLICA_IDENTITY_DEFAULT. The code tells a different story, as it is equivalent to REPLICA_IDENTITY_NOTHING. The behavior exists since the introduction of replica identities, and fe7fd4e even added tests for this case but I somewhat forgot to fix this comment. While on it, this commit reorganizes the documentation about replica identities on the ALTER TABLE page, and a note is added about the case of dropped indexes with REPLICA_IDENTITY_INDEX. Author: Michael Paquier, Wei Wang Reviewed-by: Euler Taveira Discussion: https://postgr.es/m/OS3PR01MB6275464AD0A681A0793F56879E759@OS3PR01MB6275.jpnprd01.prod.outlook.com Backpatch-through: 10	2021-12-22 16:38:42 +09:00
Alvaro Herrera	f76fd05bae	Harden be-gssapi-common.h for headerscheck Surround the contents with a test that the feature is enabled by configure, to silence header checking tools on systems without GSSAPI installed. Backpatch to 12, where the file appeared. Discussion: https://postgr.es/m/202111161709.u3pbx5lxdimt@alvherre.pgsql	2021-11-26 17:00:29 -03:00
Amit Kapila	33b6dd83e2	Fix parallel operations that prevent oldest xmin from advancing. While determining xid horizons, we skip over backends that are running Vacuum. We also ignore Create Index Concurrently, or Reindex Concurrently for the purposes of computing Xmin for Vacuum. But we were not setting the flags corresponding to these operations when they are performed in parallel which was preventing Xid horizon from advancing. The optimization related to skipping Create Index Concurrently, or Reindex Concurrently operations was implemented in PG-14 but the fix is the same for the Parallel Vacuum as well so back-patched till PG-13. Author: Masahiko Sawada Reviewed-by: Amit Kapila Backpatch-through: 13 Discussion: https://postgr.es/m/CAD21AoCLQqgM1sXh9BrDFq0uzd3RBFKi=Vfo6cjjKODm0Onr5w@mail.gmail.com	2021-11-19 09:24:00 +05:30
Tom Lane	e92ed93e8e	Reject extraneous data after SSL or GSS encryption handshake. The server collects up to a bufferload of data whenever it reads data from the client socket. When SSL or GSS encryption is requested during startup, any additional data received with the initial request message remained in the buffer, and would be treated as already-decrypted data once the encryption handshake completed. Thus, a man-in-the-middle with the ability to inject data into the TCP connection could stuff some cleartext data into the start of a supposedly encryption-protected database session. This could be abused to send faked SQL commands to the server, although that would only work if the server did not demand any authentication data. (However, a server relying on SSL certificate authentication might well not do so.) To fix, throw a protocol-violation error if the internal buffer is not empty after the encryption handshake. Our thanks to Jacob Champion for reporting this problem. Security: CVE-2021-23214	2021-11-08 11:01:43 -05:00
Tom Lane	0151af40cd	Avoid O(N^2) behavior in SyncPostCheckpoint(). As in commits 6301c3ada and e9d9ba2a4, avoid doing repetitive list_delete_first() operations, since that would be expensive when there are many files waiting to be unlinked. This is a slightly larger change than in those cases. We have to keep the list state valid for calls to AbsorbSyncRequests(), so it's necessary to invent a "canceled" field instead of immediately deleting PendingUnlinkEntry entries. Also, because we might not be able to process all the entries, we need a new list primitive list_delete_first_n(). list_delete_first_n() is almost list_copy_tail(), but it modifies the input List instead of making a new copy. I found a couple of existing uses of the latter that could profitably use the new function. (There might be more, but the other callers look like they probably shouldn't overwrite the input List.) As before, back-patch to v13. Discussion: https://postgr.es/m/CD2F0E7F-9822-45EC-A411-AE56F14DEA9F@amazon.com	2021-11-02 11:31:54 -04:00
Noah Misch	a9d0a54094	Fix CREATE INDEX CONCURRENTLY for the newest prepared transactions. The purpose of commit 8a54e12a38d1545d249f1402f66c8cde2837d97c was to fix this, and it sufficed when the PREPARE TRANSACTION completed before the CIC looked for lock conflicts. Otherwise, things still broke. As before, in a cluster having used CIC while having enabled prepared transactions, queries that use the resulting index can silently fail to find rows. It may be necessary to reindex to recover from past occurrences; REINDEX CONCURRENTLY suffices. Fix this for future index builds by making CIC wait for arbitrarily-recent prepared transactions and for ordinary transactions that may yet PREPARE TRANSACTION. As part of that, have PREPARE TRANSACTION transfer locks to its dummy PGPROC before it calls ProcArrayClearTransaction(). Back-patch to 9.6 (all supported versions). Andrey Borodin, reviewed (in earlier versions) by Andres Freund. Discussion: https://postgr.es/m/01824242-AA92-4FE9-9BA7-AEBAFFEA3D0C@yandex-team.ru	2021-10-23 18:36:42 -07:00
Noah Misch	2e33b43599	Avoid race in RelationBuildDesc() affecting CREATE INDEX CONCURRENTLY. CIC and REINDEX CONCURRENTLY assume backends see their catalog changes no later than each backend's next transaction start. That failed to hold when a backend absorbed a relevant invalidation in the middle of running RelationBuildDesc() on the CIC index. Queries that use the resulting index can silently fail to find rows. Fix this for future index builds by making RelationBuildDesc() loop until it finishes without accepting a relevant invalidation. It may be necessary to reindex to recover from past occurrences; REINDEX CONCURRENTLY suffices. Back-patch to 9.6 (all supported versions). Noah Misch and Andrey Borodin, reviewed (in earlier versions) by Andres Freund. Discussion: https://postgr.es/m/20210730022548.GA1940096@gust.leadboat.com	2021-10-23 18:36:42 -07:00
Tom Lane	2e01d050d9	Fix frontend version of sh_error() in simplehash.h. The code does not expect sh_error() to return, but the patch that made this header usable in frontend didn't get that memo. While here, plaster unlikely() on the tests that decide whether to invoke sh_error(), and add our standard copyright notice. Noted by Andres Freund. Back-patch to v13 where this frontend support came in. Discussion: https://postgr.es/m/0D54435C-1199-4361-9D74-2FBDCF8EA164@anarazel.de	2021-10-22 16:43:38 -04:00
Michael Paquier	8f4fe8d7f8	Reset properly snapshot export state during transaction abort During a replication slot creation, an ERROR generated in the same transaction as the one creating a to-be-exported snapshot would have left the backend in an inconsistent state, as the associated static export snapshot state was not being reset on transaction abort, but only on the follow-up command received by the WAL sender that created this snapshot on replication slot creation. This would trigger inconsistency failures if this session tried to export again a snapshot, like during the creation of a replication slot. Note that a snapshot export cannot happen in a transaction block, so there is no need to worry resetting this state for subtransaction aborts. Also, this inconsistent state would very unlikely show up to users. For example, one case where this could happen is an out-of-memory error when building the initial snapshot to-be-exported. Dilip found this problem while poking at a different patch, that caused an error in this code path for reasons unrelated to HEAD. Author: Dilip Kumar Reviewed-by: Michael Paquier, Zhihong Yu Discussion: https://postgr.es/m/CAFiTN-s0zA1Kj0ozGHwkYkHwa5U0zUE94RSc_g81WrpcETB5=w@mail.gmail.com Backpatch-through: 9.6	2021-10-18 11:56:52 +09:00
Tom Lane	04ef2021e3	Fix Portal snapshot tracking to handle subtransactions properly. Commit 84f5c2908 forgot to consider the possibility that EnsurePortalSnapshotExists could run inside a subtransaction with lifespan shorter than the Portal's. In that case, the new active snapshot would be popped at the end of the subtransaction, leaving a dangling pointer in the Portal, with mayhem ensuing. To fix, make sure the ActiveSnapshot stack entry is marked with the same subtransaction nesting level as the associated Portal. It's certainly safe to do so since we won't be here at all unless the stack is empty; hence we can't create an out-of-order stack. Let's also apply this logic in the case where PortalRunUtility sets portalSnapshot, just to be sure that path can't cause similar problems. It's slightly less clear that that path can't create an out-of-order stack, so add an assertion guarding it. Report and patch by Bertrand Drouvot (with kibitzing by me). Back-patch to v11, like the previous commit. Discussion: https://postgr.es/m/ff82b8c5-77f4-3fe7-6028-fcf3303e82dd@amazon.com	2021-10-01 11:10:12 -04:00
Alvaro Herrera	1d97d3d086	Fix WAL replay in presence of an incomplete record Physical replication always ships WAL segment files to replicas once they are complete. This is a problem if one WAL record is split across a segment boundary and the primary server crashes before writing down the segment with the next portion of the WAL record: WAL writing after crash recovery would happily resume at the point where the broken record started, overwriting that record ... but any standby or backup may have already received a copy of that segment, and they are not rewinding. This causes standbys to stop following the primary after the latter crashes: LOG: invalid contrecord length 7262 at A8/D9FFFBC8 because the standby is still trying to read the continuation record (contrecord) for the original long WAL record, but it is not there and it will never be. A workaround is to stop the replica, delete the WAL file, and restart it -- at which point a fresh copy is brought over from the primary. But that's pretty labor intensive, and I bet many users would just give up and re-clone the standby instead. A fix for this problem was already attempted in commit 515e3d84a0b5, but it only addressed the case for the scenario of WAL archiving, so streaming replication would still be a problem (as well as other things such as taking a filesystem-level backup while the server is down after having crashed), and it had performance scalability problems too; so it had to be reverted. This commit fixes the problem using an approach suggested by Andres Freund, whereby the initial portion(s) of the split-up WAL record are kept, and a special type of WAL record is written where the contrecord was lost, so that WAL replay in the replica knows to skip the broken parts. With this approach, we can continue to stream/archive segment files as soon as they are complete, and replay of the broken records will proceed across the crash point without a hitch. Because a new type of WAL record is added, users should be careful to upgrade standbys first, primaries later. Otherwise they risk the standby being unable to start if the primary happens to write such a record. A new TAP test that exercises this is added, but the portability of it is yet to be seen. This has been wrong since the introduction of physical replication, so backpatch all the way back. In stable branches, keep the new XLogReaderState members at the end of the struct, to avoid an ABI break. Author: Álvaro Herrera <alvherre@alvh.no-ip.org> Reviewed-by: Kyotaro Horiguchi <horikyota.ntt@gmail.com> Reviewed-by: Nathan Bossart <bossartn@amazon.com> Discussion: https://postgr.es/m/202108232252.dh7uxf6oxwcy@alvherre.pgsql	2021-09-29 11:21:51 -03:00
Alexander Korotkov	fd22aec631	Split macros from visibilitymap.h into a separate header That allows to include just visibilitymapdefs.h from file.c, and in turn, remove include of postgres.h from relcache.h. Reported-by: Andres Freund Discussion: https://postgr.es/m/20210913232614.czafiubr435l6egi%40alap3.anarazel.de Author: Alexander Korotkov Reviewed-by: Andres Freund, Tom Lane, Alvaro Herrera Backpatch-through: 13	2021-09-23 20:12:25 +03:00
Amit Kapila	f09a81f1c0	Invalidate all partitions for a partitioned table in publication. Updates/Deletes on a partition were allowed even without replica identity after the parent table was added to a publication. This would later lead to an error on subscribers. The reason was that we were not invalidating the partition's relcache and the publication information for partitions was not getting rebuilt. Similarly, we were not invalidating the partitions' relcache after dropping a partitioned table from a publication which will prohibit Updates/Deletes on its partition without replica identity even without any publication. Reported-by: Haiying Tang Author: Hou Zhijie and Vignesh C Reviewed-by: Vignesh C and Amit Kapila Backpatch-through: 13 Discussion: https://postgr.es/m/OS0PR01MB6113D77F583C922F1CEAA1C3FBD29@OS0PR01MB6113.jpnprd01.prod.outlook.com	2021-09-22 08:24:20 +05:30
Tom Lane	63f28776cb	Send NOTIFY signals during CommitTransaction. Formerly, we sent signals for outgoing NOTIFY messages within ProcessCompletedNotifies, which was also responsible for sending relevant ones of those messages to our connected client. It therefore had to run during the main-loop processing that occurs just before going idle. This arrangement had two big disadvantages: * Now that procedures allow intra-command COMMITs, it would be useful to send NOTIFYs to other sessions immediately at COMMIT (though, for reasons of wire-protocol stability, we still shouldn't forward them to our client until end of command). * Background processes such as replication workers would not send NOTIFYs at all, since they never execute the client communication loop. We've had requests to allow triggers running in replication workers to send NOTIFYs, so that's a problem. To fix these things, move transmission of outgoing NOTIFY signals into AtCommit_Notify, where it will happen during CommitTransaction. Also move the possible call of asyncQueueAdvanceTail there, to ensure we don't bloat the async SLRU if a background worker sends many NOTIFYs with no one listening. We can also drop the call of asyncQueueReadAllNotifications, allowing ProcessCompletedNotifies to go away entirely. That's because commit 790026972 added a call of ProcessNotifyInterrupt adjacent to PostgresMain's call of ProcessCompletedNotifies, and that does its own call of asyncQueueReadAllNotifications, meaning that we were uselessly doing two such calls (inside two separate transactions) whenever inbound notify signals coincided with an outbound notify. We need only set notifyInterruptPending to ensure that ProcessNotifyInterrupt runs, and we're done. The existing documentation suggests that custom background workers should call ProcessCompletedNotifies if they want to send NOTIFY messages. To avoid an ABI break in the back branches, reduce it to an empty routine rather than removing it entirely. Removal will occur in v15. Although the problems mentioned above have existed for awhile, I don't feel comfortable back-patching this any further than v13. There was quite a bit of churn in adjacent code between 12 and 13. At minimum we'd have to also backpatch 51004c717, and a good deal of other adjustment would also be needed, so the benefit-to-risk ratio doesn't look attractive. Per bug #15293 from Michael Powers (and similar gripes from others). Artur Zakirov and Tom Lane Discussion: https://postgr.es/m/153243441449.1404.2274116228506175596@wrigleys.postgresql.org	2021-09-14 17:18:25 -04:00
Andres Freund	c49e6f9d97	jit: Do not try to shut down LLVM state in case of LLVM triggered errors. If an allocation failed within LLVM it is not safe to call back into LLVM as LLVM is not generally safe against exceptions / stack-unwinding. Thus errors while in LLVM code are promoted to FATAL. However llvm_shutdown() did call back into LLVM even in such cases, while llvm_release_context() was careful not to do so. We cannot generally skip shutting down LLVM, as that can break profiling. But it's OK to do so if there was an error from within LLVM. Reported-By: Jelte Fennema <Jelte.Fennema@microsoft.com> Author: Andres Freund <andres@anarazel.de> Author: Justin Pryzby <pryzby@telsasoft.com> Discussion: https://postgr.es/m/AM5PR83MB0178C52CCA0A8DEA0207DC14F7FF9@AM5PR83MB0178.EURPRD83.prod.outlook.com Backpatch: 11-, where jit was introduced	2021-09-13 18:26:18 -07:00
Alvaro Herrera	518621c40b	Revert "Avoid creating archive status ".ready" files too early" This reverts commit 515e3d84a0b5 and equivalent commits in back branches. This solution to the problem has a number of problems, so we'll try again with a different approach. Per note from Andres Freund Discussion: https://postgr.es/m/20210831042949.52eqp5xwbxgrfank@alap3.anarazel.de	2021-09-04 12:14:30 -04:00
Amit Kapila	794025eff0	Fix toast rewrites in logical decoding. Commit 325f2ec555 introduced pg_class.relwrite to skip operations on tables created as part of a heap rewrite during DDL. It links such transient heaps to the original relation OID via this new field in pg_class but forgot to do anything about toast tables. So, logical decoding was not able to skip operations on internally created toast tables. This leads to an error when we tried to decode the WAL for the next operation for which it appeared that there is a toast data where actually it didn't have any toast data. To fix this, we set pg_class.relwrite for internally created toast tables as well which allowed skipping operations on them during logical decoding. Author: Bertrand Drouvot Reviewed-by: David Zhang, Amit Kapila Backpatch-through: 11, where it was introduced Discussion: https://postgr.es/m/b5146fb1-ad9e-7d6e-f980-98ed68744a7c@amazon.com	2021-08-25 09:23:27 +05:30
Alvaro Herrera	ad1231171f	Avoid creating archive status ".ready" files too early WAL records may span multiple segments, but XLogWrite() does not wait for the entire record to be written out to disk before creating archive status files. Instead, as soon as the last WAL page of the segment is written, the archive status file is created, and the archiver may process it. If PostgreSQL crashes before it is able to write and flush the rest of the record (in the next WAL segment), the wrong version of the first segment file lingers in the archive, which causes operations such as point-in-time restores to fail. To fix this, keep track of records that span across segments and ensure that segments are only marked ready-for-archival once such records have been completely written to disk. This has always been wrong, so backpatch all the way back. Author: Nathan Bossart <bossartn@amazon.com> Reviewed-by: Kyotaro Horiguchi <horikyota.ntt@gmail.com> Reviewed-by: Ryo Matsumura <matsumura.ryo@fujitsu.com> Reviewed-by: Andrey Borodin <x4mmm@yandex-team.ru> Discussion: https://postgr.es/m/CBDDFA01-6E40-46BB-9F98-9340F4379505@amazon.com	2021-08-23 15:50:35 -04:00
Tom Lane	7fa367d96b	Avoid trying to lock OLD/NEW in a rule with FOR UPDATE. transformLockingClause neglected to exclude the pseudo-RTEs for OLD/NEW when processing a rule's query. This led to odd errors or even crashes later on. This bug is very ancient, but it's not terribly surprising that nobody noticed, since the use-case for SELECT FOR UPDATE in a non-view rule is somewhere between thin and non-existent. Still, crashing is not OK. Per bug #17151 from Zhiyong Wu. Thanks to Masahiko Sawada for analysis of the problem. Discussion: https://postgr.es/m/17151-c03a3e6e4ec9aadb@postgresql.org	2021-08-19 12:12:35 -04:00
Tom Lane	7b01246e1d	Prevent ALTER TYPE/DOMAIN/OPERATOR from changing extension membership. If recordDependencyOnCurrentExtension is invoked on a pre-existing, free-standing object during an extension update script, that object will become owned by the extension. In our current code this is possible in three cases: * Replacing a "shell" type or operator. * CREATE OR REPLACE overwriting an existing object. * ALTER TYPE SET, ALTER DOMAIN SET, and ALTER OPERATOR SET. The first of these cases is intentional behavior, as noted by the existing comments for GenerateTypeDependencies. It seems like appropriate behavior for CREATE OR REPLACE too; at least, the obvious alternatives are not better. However, the fact that it happens during ALTER is an artifact of trying to share code (GenerateTypeDependencies and makeOperatorDependencies) between the CREATE and ALTER cases. Since an extension script would be unlikely to ALTER an object that didn't already belong to the extension, this behavior is not very troubling for the direct target object ... but ALTER TYPE SET will recurse to dependent domains, and it is very uncool for those to become owned by the extension if they were not already. Let's fix this by redefining the ALTER cases to never change extension membership, full stop. We could minimize the behavioral change by only changing the behavior when ALTER TYPE SET is recursing to a domain, but that would complicate the code and it does not seem like a better definition. Per bug #17144 from Alex Kozhemyakin. Back-patch to v13 where ALTER TYPE SET was added. (The other cases are older, but since they only affect the directly-named object, there's not enough of a problem to justify changing the behavior further back.) Discussion: https://postgr.es/m/17144-e67d7a8f049de9af@postgresql.org	2021-08-17 14:29:22 -04:00
Tom Lane	48695decc2	Add RISC-V spinlock support in s_lock.h. Like the ARM case, just use gcc's __sync_lock_test_and_set(); that will compile into AMOSWAP.W.AQ which does what we need. At some point it might be worth doing some work on atomic ops for RISC-V, but this should be enough for a creditable port. Back-patch to all supported branches, just in case somebody wants to try them on RISC-V. Marek Szuba Discussion: https://postgr.es/m/dea97b6d-f55f-1f6d-9109-504aa7dfa421@gentoo.org	2021-08-13 13:59:06 -04:00
David Rowley	4873da79da	Fix incorrect hash table resizing code in simplehash.h This fixes a bug in simplehash.h which caused an incorrect size mask to be used when the hash table grew to SH_MAX_SIZE (2^32). The code was incorrectly setting the size mask to 0 when the hash tables reached the maximum possible number of buckets. This would result always trying to use the 0th bucket causing an infinite loop of trying to grow the hash table due to there being too many collisions. Seemingly it's not that common for simplehash tables to ever grow this big as this bug dates back to v10 and nobody seems to have noticed it before. However, probably the most likely place that people would notice it would be doing a large in-memory Hash Aggregate with something close to at least 2^31 groups. After this fix, the code now works correctly with up to within 98% of 2^32 groups and will fail with the following error when trying to insert any more items into the hash table: ERROR: hash table size exceeded However, the work_mem (or hash_mem_multiplier in newer versions) settings will generally cause Hash Aggregates to spill to disk long before reaching that many groups. The minimal test case I did took a work_mem setting of over 192GB to hit the bug. simplehash hash tables are used in a few other places such as Bitmap Index Scans, however, again the size that the hash table can become there is also limited to work_mem and it would take a relation of around 16TB (2^31) pages and a very large work_mem setting to hit this. With smaller work_mem values the table would become lossy and never grow large enough to hit the problem. Author: Yura Sokolov Reviewed-by: David Rowley, Ranier Vilela Discussion: https://postgr.es/m/b1f7f32737c3438136f64b26f4852b96@postgrespro.ru Backpatch-through: 10, where simplehash.h was added	2021-08-13 16:42:35 +12:00
Tom Lane	2b8f3f5a7c	Get rid of artificial restriction on hash table sizes on Windows. The point of introducing the hash_mem_multiplier GUC was to let users reproduce the old behavior of hash aggregation, i.e. that it could use more than work_mem at need. However, the implementation failed to get the job done on Win64, where work_mem is clamped to 2GB to protect various places that calculate memory sizes using "long int". As written, the same clamp was applied to hash_mem. This resulted in severe performance regressions for queries requiring a bit more than 2GB for hash aggregation, as they now spill to disk and there's no way to stop that. Getting rid of the work_mem restriction seems like a good idea, but it's a big job and could not conceivably be back-patched. However, there's only a fairly small number of places that are concerned with the hash_mem value, and it turns out to be possible to remove the restriction there without too much code churn or any ABI breaks. So, let's do that for now to fix the regression, and leave the larger task for another day. This patch does introduce a bit more infrastructure that should help with the larger task, namely pg_bitutils.h support for working with size_t values. Per gripe from Laurent Hasson. Back-patch to v13 where the behavior change came in. Discussion: https://postgr.es/m/997817.1627074924@sss.pgh.pa.us Discussion: https://postgr.es/m/MN2PR15MB25601E80A9B6D1BA6F592B1985E39@MN2PR15MB2560.namprd15.prod.outlook.com	2021-07-25 14:02:27 -04:00
Alvaro Herrera	c31516ae5b	Preserve firing-on state when cloning row triggers to partitions When triggers are cloned from partitioned tables to their partitions, the 'tgenabled' flag (origin/replica/always/disable) was not propagated. Make it so that the flag on the trigger on partition is initially set to the same value as on the partitioned table. Add a test case to verify the behavior. Backpatch to 11, where this appeared in commit 86f575948c77. Author: Álvaro Herrera <alvherre@alvh.no-ip.org> Reported-by: Justin Pryzby <pryzby@telsasoft.com> Discussion: https://postgr.es/m/20200930223450.GA14848@telsasoft.com	2021-07-16 13:01:43 -04:00
Alvaro Herrera	866237a6fa	Advance old-segment horizon properly after slot invalidation When some slots are invalidated due to the max_slot_wal_keep_size limit, the old segment horizon should move forward to stay within the limit. However, in commit c6550776394e we forgot to call KeepLogSeg again to recompute the horizon after invalidating replication slots. In cases where other slots remained, the limits would be recomputed eventually for other reasons, but if all slots were invalidated, the limits would not move at all afterwards. Repair. Backpatch to 13 where the feature was introduced. Author: Kyotaro Horiguchi <horikyota.ntt@gmail.com> Reported-by: Marcin Krupowicz <mk@071.ovh> Discussion: https://postgr.es/m/17103-004130e8f27782c9@postgresql.org	2021-07-16 12:07:30 -04:00
Tom Lane	55cccdfdf1	Update configure's probe for libldap to work with OpenLDAP 2.5. The separate libldap_r is gone and libldap itself is now always thread-safe. Unfortunately there seems no easy way to tell by inspection whether libldap is thread-safe, so we have to take it on faith that libldap is thread-safe if there's no libldap_r. That should be okay, as it appears that libldap_r was a standard part of the installation going back at least 20 years. Report and patch by Adrian Ho. Back-patch to all supported branches, since people might try to build any of them with a newer OpenLDAP. Discussion: https://postgr.es/m/17083-a19190d9591946a7@postgresql.org	2021-07-09 12:38:55 -04:00
Tom Lane	7fc97752d5	Don't try to print data type names in slot_store_error_callback(). The existing code tried to do syscache lookups in an already-failed transaction, which is problematic to say the least. After some consideration of alternatives, the best fix seems to be to just drop type names from the error message altogether. The table and column names seem like sufficient localization. If the user is unsure what types are involved, she can check the local and remote table definitions. Having done that, we can also discard the LogicalRepTypMap hash table, which had no other use. Arguably, LOGICAL_REP_MSG_TYPE replication messages are now obsolete as well; but we should probably keep them in case some other use emerges. (The complexity of removing something from the replication protocol would likely outweigh any savings anyhow.) Masahiko Sawada and Bharath Rupireddy, per complaint from Andres Freund. Back-patch to v10 where this code originated. Discussion: https://postgr.es/m/20210106020229.ne5xnuu6wlondjpe@alap3.anarazel.de	2021-07-02 16:04:54 -04:00
Amit Kapila	56e366f675	Fix ABI break introduced by commit 4daa140a2f. Move the newly defined enum value REORDER_BUFFER_CHANGE_INTERNAL_SPEC_ABORT at the end to avoid ABI break in the back branches. We need to back-patch this till v11 because before that it is already at the end. Reported-by: Tomas Vondra Backpatch-through: 11 Discussion: https://postgr.es/m/CAExHW5sPKF-Oovx_qZe4p5oM6Dvof7_P+XgsNAViug15Fm99jA@mail.gmail.com	2021-06-24 15:21:50 +05:30
Amit Kapila	602a32a685	Fix decoding of speculative aborts. During decoding for speculative inserts, we were relying for cleaning toast hash on confirmation records or next change records. But that could lead to multiple problems (a) memory leak if there is neither a confirmation record nor any other record after toast insertion for a speculative insert in the transaction, (b) error and assertion failures if the next operation is not an insert/update on the same table. The fix is to start queuing spec abort change and clean up toast hash and change record during its processing. Currently, we are queuing the spec aborts for both toast and main table even though we perform cleanup while processing the main table's spec abort record. Later, if we have a way to distinguish between the spec abort record of toast and the main table, we can avoid queuing the change for spec aborts of toast tables. Reported-by: Ashutosh Bapat Author: Dilip Kumar Reviewed-by: Amit Kapila Backpatch-through: 9.6, where it was introduced Discussion: https://postgr.es/m/CAExHW5sPKF-Oovx_qZe4p5oM6Dvof7_P+XgsNAViug15Fm99jA@mail.gmail.com	2021-06-15 08:41:16 +05:30

1 2 3 4 5 ...

9472 Commits