postgresql

mirror of https://git.postgresql.org/git/postgresql.git synced 2026-03-03 02:37:00 +08:00

Author	SHA1	Message	Date
Tom Lane	36c07fc299	Fix make rules that generate multiple output files. For years, our makefiles have correctly observed that "there is no correct way to write a rule that generates two files". However, what we did is to provide empty rules that "generate" the secondary output files from the primary one, and that's not right either. Depending on the details of the creating process, the primary file might end up timestamped later than one or more secondary files, causing subsequent make runs to consider the secondary file(s) out of date. That's harmless in a plain build, since make will just re-execute the empty rule and nothing happens. But it's fatal in a VPATH build, since make will expect the secondary file to be rebuilt in the build directory. This would manifest as "file not found" failures during VPATH builds from tarballs, if we were ever unlucky enough to ship a tarball with apparently out-of-date secondary files. (It's not clear whether that has ever actually happened, but it definitely could.) To ensure that secondary output files have timestamps >= their primary's, change our makefile convention to be that we provide a "touch $@" action not an empty rule. Also, make sure that this rule actually gets invoked during a distprep run, else the hazard remains. It's been like this a long time, so back-patch to all supported branches. In HEAD, I skipped the changes in src/backend/catalog/Makefile, because those rules are due to get replaced soon in the bootstrap data format patch, and there seems no need to create a merge issue for that patch. If for some reason we fail to land that patch in v11, we'll need to back-fill the changes in that one makefile from v10. Discussion: https://postgr.es/m/18556.1521668179@sss.pgh.pa.us	2018-03-23 13:45:38 -04:00
Tom Lane	a030b997ae	Remove restriction on SQL block length in isolationtester scanner. specscanner.l had a fixed limit of 1024 bytes on the length of individual SQL stanzas in an isolation test script. People are starting to run into that, so fix it by making the buffer resizable. Once we allow this in HEAD, it seems inevitable that somebody will try to back-patch a test that exceeds the old limit, so back-patch this change as a preventive measure. Daniel Gustafsson Discussion: https://postgr.es/m/8D628BE4-6606-4FF6-A3FF-8B2B0E9B43D0@yesql.se	2018-02-28 16:57:37 -05:00
Tom Lane	795f2112ea	Fix misbehavior of CTE-used-in-a-subplan during EPQ rechecks. An updating query that reads a CTE within an InitPlan or SubPlan could get incorrect results if it updates rows that are concurrently being modified. This is caused by CteScanNext supposing that nothing inside its recursive ExecProcNode call could change which read pointer is selected in the CTE's shared tuplestore. While that's normally true because of scoping considerations, it can break down if an EPQ plan tree gets built during the call, because EvalPlanQualStart builds execution trees for all subplans whether they're going to be used during the recheck or not. And it seems like a pretty shaky assumption anyway, so let's just reselect our own read pointer here. Per bug #14870 from Andrei Gorita. This has been broken since CTEs were implemented, so back-patch to all supported branches. Discussion: https://postgr.es/m/20171024155358.1471.82377@wrigleys.postgresql.org	2018-02-19 16:00:18 -05:00
Tom Lane	c2a7044a51	Avoid unnecessary failure in SELECT concurrent with ALTER NO INHERIT. If a query against an inheritance tree runs concurrently with an ALTER TABLE that's disinheriting one of the tree members, it's possible to get a "could not find inherited attribute" error because after obtaining lock on the removed member, make_inh_translation_list sees that its columns have attinhcount=0 and decides they aren't the columns it's looking for. An ideal fix, perhaps, would avoid including such a just-removed member table in the query at all; but there seems no way to accomplish that without adding expensive catalog rechecks or creating a likelihood of deadlocks. Instead, let's just drop the check on attinhcount. In this way, a query that's included a just-disinherited child will still succeed, which is not a completely unreasonable behavior. This problem has existed for a long time, so back-patch to all supported branches. Also add an isolation test verifying related behaviors. Patch by me; the new isolation test is based on Kyotaro Horiguchi's work. Discussion: https://postgr.es/m/20170626.174612.23936762.horiguchi.kyotaro@lab.ntt.co.jp	2018-01-12 15:46:37 -05:00
Alvaro Herrera	b64d6c9341	Revert "Fix isolation test to be less timing-dependent" This reverts commit 2268e6afd596. It turned out that inconsistency in the report is still possible, so go back to the simpler formulation of the test and instead add an alternate expected output. Discussion: https://postgr.es/m/20180103193728.ysqpcp2xjnqpiep7@alvherre.pgsql	2018-01-03 18:16:43 -03:00
Alvaro Herrera	0f38a1d85a	Fix isolation test to be less timing-dependent I did this by adding another locking process, which makes the other two wait. This way the output should be stable enough. Per buildfarm and Andres Freund Discussion: https://postgr.es/m/20180103034445.t3utrtrnrevfsghm@alap3.anarazel.de	2018-01-03 12:06:56 -03:00
Alvaro Herrera	fb7b43903e	Fix deadlock hazard in CREATE INDEX CONCURRENTLY Multiple sessions doing CREATE INDEX CONCURRENTLY simultaneously are supposed to be able to work in parallel, as evidenced by fixes in commit c3d09b3bd23f specifically to support this case. In reality, one of the sessions would be aborted by a misterious "deadlock detected" error. Jeff Janes diagnosed that this is because of leftover snapshots used for system catalog scans -- this was broken by 8aa3e47510b9 keeping track of (registering) the catalog snapshot. To fix the deadlocks, it's enough to de-register that snapshot prior to waiting. Backpatch to 9.4, which introduced MVCC catalog scans. Include an isolationtester spec that 8 out of 10 times reproduces the deadlock with the unpatched code for me (Álvaro). Author: Jeff Janes Diagnosed-by: Jeff Janes Reported-by: Jeremy Finzel Discussion: https://postgr.es/m/CAMa1XUhHjCv8Qkx0WOr1Mpm_R4qxN26EibwCrj0Oor2YBUFUTg%40mail.gmail.com	2018-01-02 19:16:16 -03:00
Andres Freund	937494c0e1	Fix pruning of locked and updated tuples. Previously it was possible that a tuple was not pruned during vacuum, even though its update xmax (i.e. the updating xid in a multixact with both key share lockers and an updater) was below the cutoff horizon. As the freezing code assumed, rightly so, that that's not supposed to happen, xmax would be preserved (as a member of a new multixact or xmax directly). That causes two problems: For one the tuple is below the xmin horizon, which can cause problems if the clog is truncated or once there's an xid wraparound. The bigger problem is that that will break HOT chains, which in turn can lead two to breakages: First, failing index lookups, which in turn can e.g lead to constraints being violated. Second, future hot prunes / vacuums can end up making invisible tuples visible again. There's other harmful scenarios. Fix the problem by recognizing that tuples can be DEAD instead of RECENTLY_DEAD, even if the multixactid has alive members, if the update_xid is below the xmin horizon. That's safe because newer versions of the tuple will contain the locking xids. A followup commit will harden the code somewhat against future similar bugs and already corrupted data. Author: Andres Freund, with changes by Alvaro Herrera Reported-By: Daniel Wood Analyzed-By: Andres Freund, Alvaro Herrera, Robert Haas, Peter Geoghegan, Daniel Wood, Yi Wen Wong, Michael Paquier Reviewed-By: Alvaro Herrera, Robert Haas, Michael Paquier Discussion: https://postgr.es/m/E5711E62-8FDF-4DCA-A888-C200BF6B5742@amazon.com https://postgr.es/m/20171102112019.33wb7g5wp4zpjelu@alap3.anarazel.de Backpatch: 9.3-	2017-12-14 18:20:48 -08:00
Alvaro Herrera	08ba67d596	Revert bogus fixes of HOT-freezing bug It turns out we misdiagnosed what the real problem was. Revert the previous changes, because they may have worse consequences going forward. A better fix is forthcoming. The simplistic test case is kept, though disabled. Discussion: https://postgr.es/m/20171102112019.33wb7g5wp4zpjelu@alap3.anarazel.de	2017-11-02 15:51:05 +01:00
Alvaro Herrera	3a135bdb1b	Fix freezing of a dead HOT-updated tuple Vacuum calls page-level HOT prune to remove dead HOT tuples before doing liveness checks (HeapTupleSatisfiesVacuum) on the remaining tuples. But concurrent transaction commit/abort may turn DEAD some of the HOT tuples that survived the prune, before HeapTupleSatisfiesVacuum tests them. This happens to activate the code that decides to freeze the tuple ... which resuscitates it, duplicating data. (This is especially bad if there's any unique constraints, because those are now internally violated due to the duplicate entries, though you won't know until you try to REINDEX or dump/restore the table.) One possible fix would be to simply skip doing anything to the tuple, and hope that the next HOT prune would remove it. But there is a problem: if the tuple is older than freeze horizon, this would leave an unfrozen XID behind, and if no HOT prune happens to clean it up before the containing pg_clog segment is truncated away, it'd later cause an error when the XID is looked up. Fix the problem by having the tuple freezing routines cope with the situation: don't freeze the tuple (and keep it dead). In the cases that the XID is older than the freeze age, set the HEAP_XMAX_COMMITTED flag so that there is no need to look up the XID in pg_clog later on. An isolation test is included, authored by Michael Paquier, loosely based on Daniel Wood's original reproducer. It only tests one particular scenario, though, not all the possible ways for this problem to surface; it be good to have a more reliable way to test this more fully, but it'd require more work. In message https://postgr.es/m/20170911140103.5akxptyrwgpc25bw@alvherre.pgsql I outlined another test case (more closely matching Dan Wood's) that exposed a few more ways for the problem to occur. Backpatch all the way back to 9.3, where this problem was introduced by multixact juggling. In branches 9.3 and 9.4, this includes a backpatch of commit e5ff9fefcd50 (of 9.5 era), since the original is not correctable without matching the coding pattern in 9.5 up. Reported-by: Daniel Wood Diagnosed-by: Daniel Wood Reviewed-by: Yi Wen Wong, Michaël Paquier Discussion: https://postgr.es/m/E5711E62-8FDF-4DCA-A888-C200BF6B5742@amazon.com	2017-09-28 16:44:01 +02:00
Andrew Gierth	733488dc6b	Repair test for vacuum reltuples fix. Concurrent auto-analyze could be holding a snapshot, affecting the removal of deleted row versions. Remove the deletion to avoid this happening. Per buildfarm. In passing, make the test independent of assumptions of physical row order, just out of sheer paranoia.	2017-03-17 14:46:15 +00:00
Andrew Gierth	9b626f6c33	Avoid having vacuum set reltuples to 0 on non-empty relations in the presence of page pins, which leads to serious estimation errors in the planner. This particularly affects small heavily-accessed tables, especially where locking (e.g. from FK constraints) forces frequent vacuums for mxid cleanup. Fix by keeping separate track of pages whose live tuples were actually counted vs. pages that were only scanned for freezing purposes. Thus, reltuples can only be set to 0 if all pages of the relation were actually counted. Backpatch to all supported versions. Per bug #14057 from Nicolas Baccelli, analyzed by me. Discussion: https://postgr.es/m/20160331103739.8956.94469@wrigleys.postgresql.org	2017-03-16 22:31:49 +00:00
Heikki Linnakangas	90e8599219	Fix typos in comments. Backpatch to all supported versions, where applicable, to make backpatching of future fixes go more smoothly. Josh Soref Discussion: https://www.postgresql.org/message-id/CACZqfqCf+5qRztLPgmmosr-B0Ye4srWzzw_mo4c_8_B_mtjmJQ@mail.gmail.com	2017-02-06 11:34:15 +02:00
Tom Lane	c4016fcb1f	Don't throw serialization errors for self-conflicts in INSERT ON CONFLICT. A transaction that conflicts against itself, for example INSERT INTO t(pk) VALUES (1),(1) ON CONFLICT DO NOTHING; should behave the same regardless of isolation level. It certainly shouldn't throw a serialization error, as retrying will not help. We got this wrong due to the ON CONFLICT logic not considering the case, as reported by Jason Dusek. Core of this patch is by Peter Geoghegan (based on an earlier patch by Thomas Munro), though I didn't take his proposed code refactoring for fear that it might have unexpected side-effects. Test cases by Thomas Munro and myself. Report: <CAO3NbwOycQjt2Oqy2VW-eLTq2M5uGMyHnGm=RNga4mjqcYD7gQ@mail.gmail.com> Related-Discussion: <57EE93C8.8080504@postgrespro.ru>	2016-10-23 18:36:13 -04:00
Tom Lane	a88fe25f50	Be sure to rewind the tuplestore read pointer in non-leader CTEScan nodes. ExecInitCteScan supposed that it didn't have to do anything to the extra tuplestore read pointer it gets from tuplestore_alloc_read_pointer. However, it needs this read pointer to be positioned at the start of the tuplestore, while tuplestore_alloc_read_pointer is actually defined as cloning the current position of read pointer 0. In normal situations that accidentally works because we initialize the whole plan tree at once, before anything gets read. But it fails in an EvalPlanQual recheck, as illustrated in bug #14328 from Dima Pavlov. To fix, just forcibly rewind the pointer after tuplestore_alloc_read_pointer. The cost of doing so is negligible unless the tuplestore is already in TSS_READFILE state, which wouldn't happen in normal cases. We could consider altering tuplestore's API to make that case cheaper, but that would make for a more invasive back-patch and it doesn't seem worth it. This has been broken probably for as long as we've had CTEs, so back-patch to all supported branches. Discussion: <32468.1474548308@sss.pgh.pa.us>	2016-09-22 11:34:44 -04:00
Andres Freund	0d5afd3f21	Add alternative output for ON CONFLICT toast isolation test. On some buildfarm animals the isolationtest added in 07ef0351 failed, as the order in which processes are run after unlocking is not guaranteed. Add an alternative output for that. Discussion: <7969.1471484738@sss.pgh.pa.us> Backpatch: 9.6, like the test in the aforementioned commit	2016-08-18 17:34:16 -07:00
Andres Freund	e79aaebccd	Fix deletion of speculatively inserted TOAST on conflict INSERT .. ON CONFLICT runs a pre-check of the possible conflicting constraints before performing the actual speculative insertion. In case the inserted tuple included TOASTed columns the ON CONFLICT condition would be handled correctly in case the conflict was caught by the pre-check, but if two transactions entered the speculative insertion phase at the same time, one would have to re-try, and the code for aborting a speculative insertion did not handle deleting the speculatively inserted TOAST datums correctly. TOAST deletion would fail with "ERROR: attempted to delete invisible tuple" as we attempted to remove the TOAST tuples using simple_heap_delete which reasoned that the given tuples should not be visible to the command that wrote them. This commit updates the heap_abort_speculative() function which aborts the conflicting tuple to use itself, via toast_delete, for deleting associated TOAST datums. Like before, the inserted toast rows are not marked as being speculative. This commit also adds a isolationtester spec test, exercising the relevant code path. Unfortunately 9.5 cannot handle two waiting sessions, and thus cannot execute this test. Reported-By: Viren Negi, Oskari Saarenmaa Author: Oskari Saarenmaa, edited a bit by me Bug: #14150 Discussion: <20160519123338.12513.20271@wrigleys.postgresql.org> Backpatch: 9.5, where ON CONFLICT was introduced	2016-08-17 17:03:36 -07:00
Tom Lane	18555b1323	Establish conventions about global object names used in regression tests. To ensure that "make installcheck" can be used safely against an existing installation, we need to be careful about what global object names (database, role, and tablespace names) we use; otherwise we might accidentally clobber important objects. There's been a weak consensus that test databases should have names including "regression", and that test role names should start with "regress_", but we didn't have any particular rule about tablespace names; and neither of the other rules was followed with any consistency either. This commit moves us a long way towards having a hard-and-fast rule that regression test databases must have names including "regression", and that test role and tablespace names must start with "regress_". It's not completely there because I did not touch some test cases in rolenames.sql that test creation of special role names like "session_user". That will require some rethinking of exactly what we want to test, whereas the intent of this patch is just to hit all the cases in which the needed renamings are cosmetic. There is no enforcement mechanism in this patch either, but if we don't add one we can expect that the tests will soon be violating the convention again. Again, that's not such a cosmetic change and it will require discussion. (But I did use a quick-hack enforcement patch to find these cases.) Discussion: <16638.1468620817@sss.pgh.pa.us>	2016-07-17 18:42:43 -04:00
Alvaro Herrera	533e9c6b06	Avoid serializability errors when locking a tuple with a committed update When key-share locking a tuple that has been not-key-updated, and the update is a committed transaction, in some cases we raised serializability errors: ERROR: could not serialize access due to concurrent update Because the key-share doesn't conflict with the update, the error is unnecessary and inconsistent with the case that the update hasn't committed yet. This causes problems for some usage patterns, even if it can be claimed that it's sufficient to retry the aborted transaction: given a steady stream of updating transactions and a long locking transaction, the long transaction can be starved indefinitely despite multiple retries. To fix, we recognize that HeapTupleSatisfiesUpdate can return HeapTupleUpdated when an updating transaction has committed, and that we need to deal with that case exactly as if it were a non-committed update: verify whether the two operations conflict, and if not, carry on normally. If they do conflict, however, there is a difference: in the HeapTupleBeingUpdated case we can just sleep until the concurrent transaction is gone, while in the HeapTupleUpdated case this is not possible and we must raise an error instead. Per trouble report from Olivier Dony. In addition to a couple of test cases that verify the changed behavior, I added a test case to verify the behavior that remains unchanged, namely that errors are raised when a update that modifies the key is used. That must still generate serializability errors. One pre-existing test case changes behavior; per discussion, the new behavior is actually the desired one. Discussion: https://www.postgresql.org/message-id/560AA479.4080807@odoo.com https://www.postgresql.org/message-id/20151014164844.3019.25750@wrigleys.postgresql.org Backpatch to 9.3, where the problem appeared.	2016-07-15 14:17:20 -04:00
Tom Lane	ad520ec4ac	Use memmove() not memcpy() to slide some pointers down. The previous coding here was formally undefined, though it seems to accidentally work on most platforms in the buildfarm. Caught by some OpenBSD platforms in which libc contains an assertion check for overlapping areas passed to memcpy(). Thomas Munro	2016-04-27 18:19:28 -04:00
Peter Eisentraut	339025c68f	Replace printf format %i by %d see also ce8d7bb6440710058503d213b2aafcdf56a5b481	2016-04-08 12:42:58 -04:00
Kevin Grittner	fcff8a5751	Detect SSI conflicts before reporting constraint violations While prior to this patch the user-visible effect on the database of any set of successfully committed serializable transactions was always consistent with some one-at-a-time order of execution of those transactions, the presence of declarative constraints could allow errors to occur which were not possible in any such ordering, and developers had no good workarounds to prevent user-facing errors where they were not necessary or desired. This patch adds a check for serialization failure ahead of duplicate key checking so that if a developer explicitly (redundantly) checks for the pre-existing value they will get the desired serialization failure where the problem is caused by a concurrent serializable transaction; otherwise they will get a duplicate key error. While it would be better if the reads performed by the constraints could count as part of the work of the transaction for serialization failure checking, and we will hopefully get there some day, this patch allows a clean and reliable way for developers to work around the issue. In many cases existing code will already be doing the right thing for this to "just work". Author: Thomas Munro, with minor editing of docs by me Reviewed-by: Marko Tiikkaja, Kevin Grittner	2016-04-07 11:12:35 -05:00
Tom Lane	71404af2a2	Fix EvalPlanQual bug when query contains both locked and not-locked rels. In commit afb9249d06f47d7a, we (probably I) made ExecLockRows assign null test tuples to all relations of the query while setting up to do an EvalPlanQual recheck for a newly-updated locked row. This was sheerest brain fade: we should only set test tuples for relations that are lockable by the LockRows node, and in particular empty test tuples are only sensible for inheritance child relations that weren't the source of the current tuple from their inheritance tree. Setting a null test tuple for an unrelated table causes it to return NULLs when it should not, as exhibited in bug #14034 from Bronislav Houdek. To add insult to injury, doing it the wrong way required two loops where one would suffice; so the corrected code is even a bit shorter and faster. Add a regression test case based on his example, and back-patch to 9.5 where the bug was introduced.	2016-03-22 17:56:20 -04:00
Peter Eisentraut	a40814d7aa	Handle invalid libpq sockets in more places Also, make error messages consistent. From: Michael Paquier <michael.paquier@gmail.com>	2016-03-08 21:10:33 -05:00
Tom Lane	3d523564c5	Suppress scary-looking log messages from async-notify isolation test. I noticed that the async-notify test results in log messages like these: LOG: could not send data to client: Broken pipe FATAL: connection to client lost This is because it unceremoniously disconnects a client session that is about to have some NOTIFY messages delivered to it. Such log messages during a regression test might well cause people to go looking for a problem that doesn't really exist (it did cause me to waste some time that way). We can shut it up by adding an UNLISTEN command to session teardown. Patch HEAD only; this doesn't seem significant enough to back-patch.	2016-02-29 19:29:19 -05:00
Alvaro Herrera	54638f5708	Make new isolationtester test more stable The original coding of the test was relying too much on the ordering in which backends are awakened once an advisory lock which they wait for is released. Change the code so that each backend uses its own advisory lock instead, so that the output becomes stable. Also add a few seconds of sleep between lock releases, so that the test isn't broken in overloaded buildfarm animals, as suggested by Tom Lane. Per buildfarm members spoonbill and guaibasaurus. Discussion: https://www.postgresql.org/message-id/19294.1456551587%40sss.pgh.pa.us	2016-02-29 16:34:56 -03:00
Andrew Dunstan	87cc6b57a9	Respect TEMP_CONFIG when pg_regress_check and friends are called This reverts commit 9117985b6ba9beda4f280f596035649fc23b6233 in favor of a more general solution.	2016-02-27 12:28:21 -05:00
Alvaro Herrera	c9578135f7	Add isolationtester spec for old heapam.c bug In 0e5680f4737a, I fixed a bug in heapam that caused spurious deadlocks when multiple updates concurrently attempted to modify the old version of an updated tuple whose new version was key-share locked. I proposed an isolationtester spec file that reproduced the bug, but back then isolationtester wasn't mature enough to be able to run it. Now that 38f8bdcac498 is in the tree, we can have this spec file too. Discussion: https://www.postgresql.org/message-id/20141212205254.GC1768%40alvh.no-ip.org	2016-02-26 17:11:15 -03:00
Tom Lane	52f5d578d6	Create a function to reliably identify which sessions block which others. This patch introduces "pg_blocking_pids(int) returns int[]", which returns the PIDs of any sessions that are blocking the session with the given PID. Historically people have obtained such information using a self-join on the pg_locks view, but it's unreasonably tedious to do it that way with any modicum of correctness, and the addition of parallel queries has pretty much broken that approach altogether. (Given some more columns in the view than there are today, you could imagine handling parallel-query cases with a 4-way join; but ugh.) The new function has the following behaviors that are painful or impossible to get right via pg_locks: 1. Correctly understands which lock modes block which other ones. 2. In soft-block situations (two processes both waiting for conflicting lock modes), only the one that's in front in the wait queue is reported to block the other. 3. In parallel-query cases, reports all sessions blocking any member of the given PID's lock group, and reports a session by naming its leader process's PID, which will be the pg_backend_pid() value visible to clients. The motivation for doing this right now is mostly to fix the isolation tests. Commit 38f8bdcac4982215beb9f65a19debecaf22fd470 lobotomized isolationtester's is-it-waiting query by removing its ability to recognize nonconflicting lock modes, as a crude workaround for the inability to handle soft-block situations properly. But even without the lock mode tests, the old query was excessively slow, particularly in CLOBBER_CACHE_ALWAYS builds; some of our buildfarm animals fail the new deadlock-hard test because the deadlock timeout elapses before they can probe the waiting status of all eight sessions. Replacing the pg_locks self-join with use of pg_blocking_pids() is not only much more correct, but a lot faster: I measure it at about 9X faster in a typical dev build with Asserts, and 3X faster in CLOBBER_CACHE_ALWAYS builds. That should provide enough headroom for the slower CLOBBER_CACHE_ALWAYS animals to pass the test, without having to lengthen deadlock_timeout yet more and thus slow down the test for everyone else.	2016-02-22 14:31:43 -05:00
Tom Lane	e84e06d2b3	Increase deadlock_timeout some more in the deadlock-hard isolation test. The previous value of 5s is inadequate for the buildfarm's CLOBBER_CACHE_ALWAYS animals: they take long enough to do the is-it-waiting queries that the timeout expires, allowing the database state to change, before isolationtester is done looking. Perhaps 10s will be enough. (If it isn't, I'm inclined to reduce the number of sessions involved.)	2016-02-12 17:22:42 -05:00
Tom Lane	dca369320f	Revert "isolationtester: don't repeat the is-it-waiting query when retrying a step." This mostly reverts commit 9c9782f066e0ce5424b8706df2cce147cb78170f. I left in the parts that rearranged removal of completed waiting steps; but the idea of not rechecking a step's blocked-ness isn't working.	2016-02-12 17:12:23 -05:00
Tom Lane	3992188c2a	Revert "Still further tweaking of deadlock isolation tests." This reverts commit d03130d378b5fb071d231a7822784ad87268583a. That was dependent on an isolationtester.c change that now proves to be broken; we will need to find another solution.	2016-02-12 17:02:59 -05:00
Tom Lane	d03130d378	Still further tweaking of deadlock isolation tests. It turns out that there is a second race condition in the new deadlock-hard test: once the deadlock detector fires, it's uncertain whether step s7a8 or step s8a1 will report first, because killing s8's transaction unblocks s7. So far, s7 has only been seen to report first in CLOBBER_CACHE_ALWAYS builds, but it's pretty reproducible there, and in theory it should sometimes occur in normal builds too. If s7 were a bit slower than usual, that could also break the test, since the existing expected-file assumes that we'll see s7a8 report the first time we check it after s8a1 completes. To fix, add a post-lock delay to s7a8.	2016-02-12 14:19:57 -05:00
Tom Lane	9c9782f066	isolationtester: don't repeat the is-it-waiting query when retrying a step. If we're retrying a step, then we already decided it was blocked on a lock, and there's no need to recheck that. The original coding of commit 38f8bdcac4982215beb9f65a19debecaf22fd470 resulted in a large number of is-it-waiting queries when dealing with multiple concurrently-blocked sessions, which is fairly pointless and also results in test failures in CLOBBER_CACHE_ALWAYS builds, where the is-it-waiting query is quite slow. This definition also permits appending pg_sleep() calls to steps where it's needed to control the order of finish of concurrent steps. Before, that did not work nicely because we'd decide that a step performing a sleep was not blocked and hang up waiting for it to finish, rather than noticing the completion of the concurrent step we're supposed to notice first. In passing, revise handling of removal of completed waiting steps to make it a bit less messy.	2016-02-12 14:10:36 -05:00
Tom Lane	a361490806	Re-pgindent isolationtester.c. Need to do some more hacking on this, and got annoyed that it's not indent clean.	2016-02-12 13:36:13 -05:00
Peter Eisentraut	29b4b7bda6	Fix whitespace	2016-02-12 12:08:40 -05:00
Tom Lane	caefc11ef6	Further tweaking of deadlock isolation tests. The new deadlock-soft-2 test has a timing dependency too: it supposes that isolationtester will detect step s1b as waiting before the deadlock detector runs and grants it the lock. Adjust deadlock_timeout to ensure that that's true even in CLOBBER_CACHE_ALWAYS builds, where the wait detection query is quite slow. Per buildfarm member jaguarundi.	2016-02-11 23:21:33 -05:00
Tom Lane	b11d07b6a3	Make new deadlock isolation test more reproducible. The original formulation of 4c9864b9b4d87d02f07f40bb27976da737afdcab was extremely timing-sensitive, because it arranged for the deadlock detector to be running (and possibly unblocking the current query) at almost exactly the same time as isolationtester would be probing to see if the query is blocked. The committed expected-file assumed that the deadlock detection would finish first, but we see the opposite on both fast and slow buildfarm animals. Adjust the deadlock timeout settings to make it predictable that isolationtester will see the query as waiting before deadlock detection unblocks it. I used a 5s timeout for the same reasons mentioned in a7921f71a3c747141344d8604f6a6d7b4cddb2a9.	2016-02-11 11:59:11 -05:00
Tom Lane	d9dc2b4149	Code review for isolationtester changes. Fix a few oversights in 38f8bdcac4982215beb9f65a19debecaf22fd470: don't leak memory in run_permutation(), remember when we've issued a cancel rather than issuing another one every 10ms, fix some typos in comments.	2016-02-11 11:30:52 -05:00
Robert Haas	4c9864b9b4	Add some isolation tests for deadlock detection and resolution. Previously, we had no test coverage for the deadlock detector.	2016-02-11 08:38:09 -05:00
Robert Haas	38f8bdcac4	Modify the isolation tester so that multiple sessions can wait. This allows testing of deadlock scenarios. Scenarios that would previously have been considered invalid are now simply taken as a scenario in which more than one backend will wait.	2016-02-11 08:36:30 -05:00
Robert Haas	c9882c60f4	Specify permutations for isolation tests with "invalid" permutations. This is a necessary prerequisite for forthcoming changes to allow deadlock scenarios to be tested by the isolation tester. It is also a good idea on general principle, since these scenarios add no useful test coverage not provided by other scenarios, but do to take time to execute.	2016-02-11 08:33:24 -05:00
Bruce Momjian	ee94300446	Update copyright for 2016 Backpatch certain files through 9.1	2016-01-02 13:33:40 -05:00
Tom Lane	5884b92a84	Fix errors in commit a04bb65f70dafdf462e0478ad19e6de56df89bfc. Not a lot of commentary needed here really.	2015-09-30 23:37:26 -04:00
Robert Haas	8bd42fe5c7	Remove unused expected-output file.	2015-08-14 23:13:13 -04:00
Robert Haas	43b4a16817	Reject isolation test specifications with duplicate step names. alter-table-1.spec has such a case, so change one instance of step rx1 to rx3 instead.	2015-08-14 22:10:46 -04:00
Tom Lane	6a1e14c62b	Temporarily(?) remove BRIN isolation test. Commit 2834855cb added a not-very-carefully-thought-out isolation test to check a BRIN index bug fix. The test depended on the availability of the pageinspect contrib module, which meant it did not work in several common testing scenarios such as "make check-world". It's not clear whether we want a core test depending on a contrib module like that, but in any case, failing to deal with the possibility that the module isn't present in the installation-under-test is not acceptable. Remove that test pending some better solution.	2015-08-10 10:22:37 -04:00
Alvaro Herrera	2834855cb9	Fix BRIN to use SnapshotAny during summarization For correctness of summarization results, it is critical that the snapshot used during the summarization scan is able to see all tuples that are live to all transactions -- including tuples inserted or deleted by in-progress transactions. Otherwise, it would be possible for a transaction to insert a tuple, then idle for a long time while a concurrent transaction executes summarization of the range: this would result in the inserted value not being considered in the summary. Previously we were trying to use a MVCC snapshot in conjunction with adding a "placeholder" tuple in the index: the snapshot would see all committed tuples, and the placeholder tuple would catch insertions by any new inserters. The hole is that prior insertions by transactions that are still in progress by the time the MVCC snapshot was taken were ignored. Kevin Grittner reported this as a bogus error message during vacuum with default transaction isolation mode set to repeatable read (because the error report mentioned a function name not being invoked during), but the problem is larger than that. To fix, tweak IndexBuildHeapRangeScan to have a new mode that behaves the way we need using SnapshotAny visibility rules. This change simplifies the BRIN code a bit, mainly by removing large comments that were mistaken. Instead, rely on the SnapshotAny semantics to provide what it needs. (The business about a placeholder tuple needs to remain: that covers the case that a transaction inserts a a tuple in a page that summarization already scanned.) Discussion: https://www.postgresql.org/message-id/20150731175700.GX2441@postgresql.org In passing, remove a couple of unused declarations from brin.h and reword a comment to be proper English. This part submitted by Kevin Grittner. Backpatch to 9.5, where BRIN was introduced.	2015-08-05 16:20:50 -03:00
Tom Lane	342a1ffa21	Add some test coverage of EvalPlanQual with non-locked tables. A Salesforce colleague of mine griped that the regression tests don't exercise EvalPlanQualFetchRowMarks() and allied routines. Which is a fair complaint. Add test cases that go through the REFERENCE and COPY code paths. Unfortunately we don't have sufficient infrastructure right now to exercise the FDW code path in the isolation tests, but this is surely better than before.	2015-07-29 13:27:56 -04:00
Robert Haas	a04bb65f70	Add new function pg_notification_queue_usage. This tells you what fraction of NOTIFY's queue is currently filled. Brendan Jurd, reviewed by Merlin Moncure and Gurjeet Singh. A few further tweaks by me.	2015-07-17 09:12:03 -04:00

1 2 3

150 Commits