postgresql

mirror of https://git.postgresql.org/git/postgresql.git synced 2026-02-10 18:27:42 +08:00

Author	SHA1	Message	Date
Alvaro Herrera	bfb12cd2b5	Fix bogus completion tag usage in walsender Since commit fd5942c18f97 (2012, 9.3-era), walsender has been sending completion tags for certain replication commands twice -- and they're not even consistent. Apparently neither libpq nor JDBC have a problem with it, but it's not kosher. Fix by remove the EndCommand() call in the common code path for them all, and inserting specific calls to EndReplicationCommand() specifically in those places where it's needed. EndReplicationCommand() is a new simple function to send the completion tag for replication commands. Do this instead of sending a generic SELECT completion tag for them all, which was also pretty bogus (if innocuous). While at it, change StartReplication() to use EndReplicationCommand() instead of pg_puttextmessage(). In commit 2f9661311b83, I failed to realize that replication commands are not close-enough kin of regular SQL commands, so the DROP_REPLICATION_SLOT tag I added is undeserved and a type pun. Take it out. Backpatch to 13, where the latter commit appeared. The duplicate tag has been sent since 9.3, but since nothing is broken, it doesn't seem worth fixing. Per complaints from Tom Lane. Discussion: https://postgr.es/m/1347966.1600195735@sss.pgh.pa.us	2020-09-16 21:16:25 -03:00
Tom Lane	3e3f8f2020	Fix bogus cache-invalidation logic in logical replication worker. The code recorded cache invalidation events by zeroing the "localreloid" field of affected cache entries. However, it's possible for an inval event to occur even while we have the entry open and locked. So an ill-timed inval could result in "cache lookup failed for relation 0" errors, if the worker's code tried to use the cleared field. We can fix that by creating a separate bool field to record whether the entry needs to be revalidated. (In the back branches, cram the bool into what had been padding space, to avoid an ABI break in the somewhat unlikely event that any extension is looking at this struct.) Also, rearrange the logic in logicalrep_rel_open so that it does the right thing in cases where table_open would fail. We should retry the lookup by name in that case, but we didn't. The real-world impact of this is probably small. In the first place, the error conditions are very low probability, and in the second place, the worker would just exit and get restarted. We only noticed because in a CLOBBER_CACHE_ALWAYS build, the failure can occur repeatedly, preventing the worker from making progress. Nonetheless, it's clearly a bug, and it impedes a useful type of testing; so back-patch to v10 where this code was introduced. Discussion: https://postgr.es/m/1032727.1600096803@sss.pgh.pa.us	2020-09-16 12:07:31 -04:00
Jeff Davis	93106d71a1	logtape.c: do not preallocate for tapes when sorting The preallocation logic is only useful for HashAgg, so disable it when sorting. Also, adjust an out-of-date comment. Reviewed-by: Peter Geoghegan Discussion: https://postgr.es/m/CAH2-Wzn_o7tE2+hRVvwSFghRb75AJ5g-nqGzDUqLYMexjOAe=g@mail.gmail.com Backpatch-through: 13	2020-09-11 17:21:00 -07:00
Bruce Momjian	7574af1e1b	C comment: correct use of 64-"byte" cache line size Reported-by: Kelly Min Discussion: https://postgr.es/m/CAPSbxatOiQO90LYpSC3+svAU9-sHgDfEP4oFhcEUt_X=DqFA9g@mail.gmail.com Backpatch-through: 9.5	2020-09-04 13:27:52 -04:00
Bruce Momjian	787ccf5a5f	C comment: remove mention of use of t_hoff WAL structure member Reported-by: Antonin Houska Discussion: https://postgr.es/m/21643.1595353537@antos Backpatch-through: 9.5	2020-08-31 17:51:31 -04:00
Tom Lane	845cfe012f	Mark factorial operator, and postfix operators in general, as deprecated. Per discussion, we're planning to remove parser support for postfix operators in order to simplify the grammar. So it behooves us to put out a deprecation notice at least one release before that. There is only one built-in postfix operator, ! for factorial. Label it deprecated in the docs and in pg_description, and adjust some examples that formerly relied on it. (The sister prefix operator !! is also deprecated. We don't really have to remove that one, but since we're suggesting that people use factorial() instead, it seems better to remove both operators.) Also state in the CREATE OPERATOR ref page that postfix operators in general are going away. Although this changes the initial contents of pg_description, I did not force a catversion bump; it doesn't seem essential. In v13, also back-patch 4c5cf5431, so that there's someplace for the <link>s to point to. Mark Dilger and John Naylor, with some adjustments by me Discussion: https://postgr.es/m/BE2DF53D-251A-4E26-972F-930E523580E9@enterprisedb.com	2020-08-30 14:37:24 -04:00
Tom Lane	894f5dea76	Fix handling of CREATE TABLE LIKE with inheritance. If a CREATE TABLE command uses both LIKE and traditional inheritance, Vars in CHECK constraints and expression indexes that are absorbed from a LIKE parent table tended to get mis-numbered, resulting in wrong answers and/or bizarre error messages (though probably not any actual crashes, thanks to validation occurring in the executor). In v12 and up, the same could happen to Vars in GENERATED expressions, even in cases with no LIKE clause but multiple traditional-inheritance parents. The cause of the problem for LIKE is that parse_utilcmd.c supposed it could renumber such Vars correctly during transformCreateStmt(), which it cannot since we have not yet accounted for columns added via inheritance. Fix that by postponing processing of LIKE INCLUDING CONSTRAINTS, DEFAULTS, GENERATED, INDEXES till after we've performed DefineRelation(). The error with GENERATED and multiple inheritance is a simple oversight in MergeAttributes(); it knows it has to renumber Vars in inherited CHECK constraints, but forgot to apply the same processing to inherited GENERATED expressions (a/k/a defaults). Per bug #16272 from Tom Gottfried. The non-GENERATED variants of the issue are ancient, presumably dating right back to the addition of CREATE TABLE LIKE; hence back-patch to all supported branches. Discussion: https://postgr.es/m/16272-6e32da020e9a9381@postgresql.org	2020-08-21 15:00:48 -04:00
David Rowley	0d7437de73	Fix a few typos in JIT comments and README Reviewed-by: Abhijit Menon-Sen Reviewed-by: Andres Freund Discussion: https://postgr.es/m/CAApHDvobgmCs6CohqhKTUf7D8vffoZXQTCBTERo9gbOeZmvLTw%40mail.gmail.com Backpatch-through: 11, where JIT was added	2020-08-21 09:34:47 +12:00
Tom Lane	69ffc2f838	Suppress unnecessary RelabelType nodes in yet more cases. Commit a477bfc1d fixed eval_const_expressions() to ensure that it didn't generate unnecessary RelabelType nodes, but I failed to notice that some other places in the planner had the same issue. Really noplace in the planner should be using plain makeRelabelType(), for fear of generating expressions that should be equal() to semantically equivalent trees, but aren't. An example is that because canonicalize_ec_expression() failed to be careful about this, we could end up with an equivalence class containing both a plain Const, and a Const-with-RelabelType representing exactly the same value. So far as I can tell this led to no visible misbehavior, but we did waste a bunch of cycles generating and evaluating "Const = Const-with-RelabelType" to prove such entries are redundant. Hence, move the support function added by a477bfc1d to where it can be more generally useful, and use it in the places where planner code previously used makeRelabelType. Back-patch to v12, like the previous patch. While I have no concrete evidence of any real misbehavior here, it's certainly possible that I overlooked a case where equivalent expressions that aren't equal() could cause a user-visible problem. In any case carrying extra RelabelType nodes through planning to execution isn't very desirable. Discussion: https://postgr.es/m/1311836.1597781384@sss.pgh.pa.us	2020-08-19 14:07:49 -04:00
Noah Misch	592a589a04	Prevent concurrent SimpleLruTruncate() for any given SLRU. The SimpleLruTruncate() header comment states the new coding rule. To achieve this, add locktype "frozenid" and two LWLocks. This closes a rare opportunity for data loss, which manifested as "apparent wraparound" or "could not access status of transaction" errors. Data loss is more likely in pg_multixact, due to released branches' thin margin between multiStopLimit and multiWrapLimit. If a user's physical replication primary logged ": apparent wraparound" messages, the user should rebuild standbys of that primary regardless of symptoms. At less risk is a cluster having emitted "not accepting commands" errors or "must be vacuumed" warnings at some point. One can test a cluster for this data loss by running VACUUM FREEZE in every database. Back-patch to 9.5 (all supported versions). Discussion: https://postgr.es/m/20190218073103.GA1434723@rfd.leadboat.com	2020-08-15 10:15:56 -07:00
Tom Lane	b538e83f17	Be more careful about the shape of hashable subplan clauses. nodeSubplan.c expects that the testexpr for a hashable ANY SubPlan has the form of one or more OpExprs whose LHS is an expression of the outer query's, while the RHS is an expression over Params representing output columns of the subquery. However, the planner only went as far as verifying that the clauses were all binary OpExprs. This works 99.99% of the time, because the clauses have the right shape when emitted by the parser --- but it's possible for function inlining to break that, as reported by PegoraroF10. To fix, teach the planner to check that the LHS and RHS contain the right things, or more accurately don't contain the wrong things. Given that this has been broken for years without anyone noticing, it seems sufficient to just give up hashing when it happens, rather than go to the trouble of commuting the clauses back again (which wouldn't necessarily work anyway). While poking at that, I also noticed that nodeSubplan.c had a baked-in assumption that the number of hash clauses is identical to the number of subquery output columns. Again, that's fine as far as parser output goes, but it's not hard to break it via function inlining. There seems little reason for that assumption though --- AFAICS, the only thing it's buying us is not having to store the number of hash clauses explicitly. Adding code to the planner to reject such cases would take more code than getting nodeSubplan.c to cope, so I fixed it that way. This has been broken for as long as we've had hashable SubPlans, so back-patch to all supported branches. Discussion: https://postgr.es/m/1549209182255-0.post@n3.nabble.com	2020-08-14 22:14:03 -04:00
Tom Lane	1c6066fbdb	Fix postmaster's behavior during smart shutdown. Up to now, upon receipt of a SIGTERM ("smart shutdown" command), the postmaster has immediately killed all "optional" background processes, and subsequently refused to launch new ones while it's waiting for foreground client processes to exit. No doubt this seemed like an OK policy at some point; but it's a pretty bad one now, because it makes for a seriously degraded environment for the remaining clients: * Parallel queries are killed, and new ones fail to launch. (And our parallel-query infrastructure utterly fails to deal with the case in a reasonable way --- it just hangs waiting for workers that are not going to arrive. There is more work needed in that area IMO.) * Autovacuum ceases to function. We can tolerate that for awhile, but if bulk-update queries continue to run in the surviving client sessions, there's eventually going to be a mess. In the worst case the system could reach a forced shutdown to prevent XID wraparound. * The bgwriter and walwriter are also stopped immediately, likely resulting in performance degradation. Hence, let's rearrange things so that the only immediate change in behavior is refusing to let in new normal connections. Once the last normal connection is gone, shut everything down as though we'd received a "fast" shutdown. To implement this, remove the PM_WAIT_BACKUP and PM_WAIT_READONLY states, instead staying in PM_RUN or PM_HOT_STANDBY while normal connections remain. A subsidiary state variable tracks whether or not we're letting in new connections in those states. This also allows having just one copy of the logic for killing child processes in smart and fast shutdown modes. I moved that logic into PostmasterStateMachine() by inventing a new state PM_STOP_BACKENDS. Back-patch to 9.6 where parallel query was added. In principle this'd be a good idea in 9.5 as well, but the risk/reward ratio is not as good there, since lack of autovacuum is not a problem during typical uses of smart shutdown. Per report from Bharath Rupireddy. Patch by me, reviewed by Thomas Munro Discussion: https://postgr.es/m/CALj2ACXAZ5vKxT9P7P89D87i3MDO9bfS+_bjMHgnWJs8uwUOOw@mail.gmail.com	2020-08-14 13:26:57 -04:00
Noah Misch	41dae35532	Move connect.h from fe_utils to src/include/common. Any libpq client can use the header. Clients include backend components postgres_fdw, dblink, and logical replication apply worker. Back-patch to v10, because another fix needs this. In released branches, just copy the header and keep the original.	2020-08-10 09:22:58 -07:00
David Rowley	22c105595f	Use int64 instead of long in incremental sort code Windows 64bit has 4-byte long values which is not suitable for tracking disk space usage in the incremental sort code. Let's just make all these fields int64s. Author: James Coleman Discussion: https://postgr.es/m/CAApHDvpky%2BUhof8mryPf5i%3D6e6fib2dxHqBrhp0Qhu0NeBhLJw%40mail.gmail.com Backpatch-through: 13, where the incremental sort code was added	2020-08-02 14:23:57 +12:00
Peter Geoghegan	78530c8e7a	Add hash_mem_multiplier GUC. Add a GUC that acts as a multiplier on work_mem. It gets applied when sizing executor node hash tables that were previously size constrained using work_mem alone. The new GUC can be used to preferentially give hash-based nodes more memory than the generic work_mem limit. It is intended to enable admin tuning of the executor's memory usage. Overall system throughput and system responsiveness can be improved by giving hash-based executor nodes more memory (especially over sort-based alternatives, which are often much less sensitive to being memory constrained). The default value for hash_mem_multiplier is 1.0, which is also the minimum valid value. This means that hash-based nodes continue to apply work_mem in the traditional way by default. hash_mem_multiplier is generally useful. However, it is being added now due to concerns about hash aggregate performance stability for users that upgrade to Postgres 13 (which added disk-based hash aggregation in commit 1f39bce0). While the old hash aggregate behavior risked out-of-memory errors, it is nevertheless likely that many users actually benefited. Hash agg's previous indifference to work_mem during query execution was not just faster; it also accidentally made aggregation resilient to grouping estimate problems (at least in cases where this didn't create destabilizing memory pressure). hash_mem_multiplier can provide a certain kind of continuity with the behavior of Postgres 12 hash aggregates in cases where the planner incorrectly estimates that all groups (plus related allocations) will fit in work_mem/hash_mem. This seems necessary because hash-based aggregation is usually much slower when only a small fraction of all groups can fit. Even when it isn't possible to totally avoid hash aggregates that spill, giving hash aggregation more memory will reliably improve performance (the same cannot be said for external sort operations, which appear to be almost unaffected by memory availability provided it's at least possible to get a single merge pass). The PostgreSQL 13 release notes should advise users that increasing hash_mem_multiplier can help with performance regressions associated with hash aggregation. That can be taken care of by a later commit. Author: Peter Geoghegan Reviewed-By: Álvaro Herrera, Jeff Davis Discussion: https://postgr.es/m/20200625203629.7m6yvut7eqblgmfo@alap3.anarazel.de Discussion: https://postgr.es/m/CAH2-WzmD%2Bi1pG6rc1%2BCjc4V6EaFJ_qSuKCCHVnH%3DoruqD-zqow%40mail.gmail.com Backpatch: 13-, where disk-based hash aggregation was introduced.	2020-07-29 14:14:57 -07:00
Jeff Davis	3a232a3183	HashAgg: use better cardinality estimate for recursive spilling. Use HyperLogLog to estimate the group cardinality in a spilled partition. This estimate is used to choose the number of partitions if we recurse. The previous behavior was to use the number of tuples in a spilled partition as the estimate for the number of groups, which lead to overpartitioning. That could cause the number of batches to be much higher than expected (with each batch being very small), which made it harder to interpret EXPLAIN ANALYZE results. Reviewed-by: Peter Geoghegan Discussion: https://postgr.es/m/a856635f9284bc36f7a77d02f47bbb6aaf7b59b3.camel@j-davis.com Backpatch-through: 13	2020-07-28 23:17:23 -07:00
Peter Geoghegan	5a6cc6ffa9	Remove hashagg_avoid_disk_plan GUC. Note: This GUC was originally named enable_hashagg_disk when it appeared in commit 1f39bce0, which added disk-based hash aggregation. It was subsequently renamed in commit 92c58fd9. Author: Peter Geoghegan Reviewed-By: Jeff Davis, Álvaro Herrera Discussion: https://postgr.es/m/9d9d1e1252a52ea1bad84ea40dbebfd54e672a0f.camel%40j-davis.com Backpatch: 13-, where disk-based hash aggregation was introduced.	2020-07-27 17:53:17 -07:00
Jeff Davis	7f5f2249b2	Fix LookupTupleHashEntryHash() pipeline-stall issue. Refactor hash lookups in nodeAgg.c to improve performance. Author: Andres Freund and Jeff Davis Discussion: https://postgr.es/m/20200612213715.op4ye4q7gktqvpuo%40alap3.anarazel.de Backpatch-through: 13	2020-07-26 16:08:41 -07:00
Tom Lane	70eca6a9a6	Replace TS_execute's TS_EXEC_CALC_NOT flag with TS_EXEC_SKIP_NOT. It's fairly silly that ignoring NOT subexpressions is TS_execute's default behavior. It's wrong on its face and it encourages errors of omission. Moreover, the only two remaining callers that aren't specifying CALC_NOT are in ts_headline calculations, and it's very arguable that those are bugs: if you've specified "!foo" in your query, why would you want to get a headline that includes "foo"? Hence, rip that out and change the default behavior to be to calculate NOT accurately. As a concession to the slim chance that there is still somebody somewhere who needs the incorrect behavior, provide a new SKIP_NOT flag to explicitly request that. Back-patch into v13, mainly because it seems better to change this at the same time as the previous commit's rejiggering of TS_execute related APIs. Any outside callers affected by this change are probably also affected by that one. Discussion: https://postgr.es/m/CALT9ZEE-aLotzBg-pOp2GFTesGWVYzXA3=mZKzRDa_OKnLF7Mg@mail.gmail.com	2020-07-24 15:43:56 -04:00
Tom Lane	92fe6895d6	Fix assorted bugs by changing TS_execute's callback API to ternary logic. Text search sometimes failed to find valid matches, for instance '!crew:A'::tsquery might fail to locate 'crew:1B'::tsvector during an index search. The root of the issue is that TS_execute's callback functions were not changed to use ternary (yes/no/maybe) reporting when we made the search logic itself do so. It's somewhat annoying to break that API, but on the other hand we now see that any code using plain boolean logic is almost certainly broken since the addition of phrase search. There seem to be very few outside callers of this code anyway, so we'll just break them intentionally to get them to adapt. This allows removal of tsginidx.c's private re-implementation of TS_execute, since that's now entirely duplicative. It's also no longer necessary to avoid use of CALC_NOT in tsgistidx.c, since the underlying callbacks can now do something reasonable. Back-patch into v13. We can't change this in stable branches, but it seems not quite too late to fix it in v13. Tom Lane and Pavel Borisov Discussion: https://postgr.es/m/CALT9ZEE-aLotzBg-pOp2GFTesGWVYzXA3=mZKzRDa_OKnLF7Mg@mail.gmail.com	2020-07-24 15:26:51 -04:00
Tom Lane	e5372b48b9	Correctly mark pg_subscription_rel.srsublsn as nullable. The code has always set this column to NULL when it's not valid, but the catalog header's description failed to reflect that, as did the SGML docs, as did some of the code. To prevent future coding errors of the same ilk, let's hide the field from C code as though it were variable-length (which, in a sense, it is). As with commit 72eab84a5, we can only fix this cleanly in HEAD and v13; the problem extends further back but we'll need some klugery in the released branches. Discussion: https://postgr.es/m/367660.1595202498@sss.pgh.pa.us	2020-07-20 14:55:56 -04:00
Fujii Masao	f5dff45962	Rename wal_keep_segments to wal_keep_size. max_slot_wal_keep_size that was added in v13 and wal_keep_segments are the GUC parameters to specify how much WAL files to retain for the standby servers. While max_slot_wal_keep_size accepts the number of bytes of WAL files, wal_keep_segments accepts the number of WAL files. This difference of setting units between those similar parameters could be confusing to users. To alleviate this situation, this commit renames wal_keep_segments to wal_keep_size, and make users specify the WAL size in it instead of the number of WAL files. There was also the idea to rename max_slot_wal_keep_size to max_slot_wal_keep_segments, in the discussion. But we have been moving away from measuring in segments, for example, checkpoint_segments was replaced by max_wal_size. So we concluded to rename wal_keep_segments to wal_keep_size. Back-patch to v13 where max_slot_wal_keep_size was added. Author: Fujii Masao Reviewed-by: Álvaro Herrera, Kyotaro Horiguchi, David Steele Discussion: https://postgr.es/m/574b4ea3-e0f9-b175-ead2-ebea7faea855@oss.nttdata.com	2020-07-20 13:33:45 +09:00
Tom Lane	914d2383ae	Correctly mark pg_subscription.subslotname as nullable. Due to the layout of this catalog, subslotname has to be explicitly marked BKI_FORCE_NULL, else initdb will default to the assumption that it's non-nullable. Since, in fact, CREATE/ALTER SUBSCRIPTION will store null values there, the existing marking is just wrong, and has been since this catalog was invented. We haven't noticed because not much in the system actually depends on attnotnull being truthful. However, JIT'ed tuple deconstruction does depend on that in some cases, allowing crashes or wrong answers in queries that inspect pg_subscription. Commit 9de77b545 quite accidentally exposed this on the buildfarm members that force JIT activation. Back-patch to v13. The problem goes further back, but we cannot force initdb in released branches, so some klugier solution will be needed there. Before working on that, push this simple fix to try to get the buildfarm back to green. Discussion: https://postgr.es/m/4118109.1595096139@sss.pgh.pa.us	2020-07-19 12:37:23 -04:00
Michael Paquier	9678c08184	Fix comments related to table AMs Incorrect function names were referenced. As this fixes some portions of tableam.h, that is mentioned in the docs as something to look at when implementing a table AM, backpatch down to 12 where this has been introduced. Author: Hironobu Suzuki Discussion: https://postgr.es/m/8fe6d672-28dd-3f1d-7aed-ac2f6d599d3f@interdb.jp Backpatch-through: 12	2020-07-14 13:17:31 +09:00
Jeff Davis	d8a7ce2450	HashAgg: before spilling tuples, set unneeded columns to NULL. This is a replacement for 4cad2534. Instead of projecting all tuples going into a HashAgg, only remove unnecessary attributes when actually spilling. This avoids the regression for the in-memory case. Discussion: https://postgr.es/m/a2fb7dfeb4f50aa0a123e42151ee3013933cb802.camel%40j-davis.com Backpatch-through: 13	2020-07-12 23:08:16 -07:00
Amit Kapila	b074813d48	Revert "Track statistics for spilling of changes from ReorderBuffer". The stats with this commit was available only for WALSenders, however, users might want to see for backends doing logical decoding via SQL API. Then, users might want to reset and access these stats across server restart which was not possible with the current patch. List of commits reverted: caa3c4242c Don't call elog() while holding spinlock. e641b2a995 Doc: Update the documentation for spilled transaction statistics. 5883f5fe27 Fix unportable printf format introduced in commit 9290ad198. 9290ad198b Track statistics for spilling of changes from ReorderBuffer. Additionaly, remove the release notes entry for this feature. Backpatch-through: 13, where it was introduced Discussion: https://postgr.es/m/CA+fd4k5_pPAYRTDrO2PbtTOe0eHQpBvuqmCr8ic39uTNmR49Eg@mail.gmail.com	2020-07-13 08:46:56 +05:30
Alvaro Herrera	c54b5891f4	Morph pg_replication_slots.min_safe_lsn to safe_wal_size The previous definition of the column was almost universally disliked, so provide this updated definition which is more useful for monitoring purposes: a large positive value is good, while zero or a negative value means danger. This should be operationally more convenient. Backpatch to 13, where the new column to pg_replication_slots (and the feature it represents) were added. Author: Kyotaro Horiguchi <horikyota.ntt@gmail.com> Author: Álvaro Herrera <alvherre@alvh.no-ip.org> Reported-by: Fujii Masao <masao.fujii@oss.nttdata.com> Discussion: https://postgr.es/m/9ddfbf8c-2f67-904d-44ed-cf8bc5916228@oss.nttdata.com	2020-07-07 13:08:00 -04:00
Peter Eisentraut	94e454cddf	Rename enable_incrementalsort for clarity Author: James Coleman <jtc331@gmail.com> Discussion: https://www.postgresql.org/message-id/flat/df652910-e985-9547-152c-9d4357dc3979%402ndquadrant.com	2020-07-05 11:42:29 +02:00
Tom Lane	e2bcd99be1	Add hints about protocol-version-related SSL connection failures. OpenSSL's native reports about problems related to protocol version restrictions are pretty opaque and inconsistent. When we get an SSL error that is plausibly due to this, emit a hint message that includes the range of SSL protocol versions we (think we) are allowing. This should at least get the user thinking in the right direction to resolve the problem, even if the hint isn't totally accurate, which it might not be for assorted reasons. Back-patch to v13 where we increased the default minimum protocol version, thereby increasing the risk of this class of failure. Patch by me, reviewed by Daniel Gustafsson Discussion: https://postgr.es/m/a9408304-4381-a5af-d259-e55d349ae4ce@2ndquadrant.com	2020-06-27 12:47:58 -04:00
Peter Geoghegan	8c2010f123	Fix misuse of table_index_fetch_tuple_check(). Commit 0d861bbb, which added deduplication to nbtree, had _bt_check_unique() pass a TID to table_index_fetch_tuple_check() that isn't safe to mutate. table_index_fetch_tuple_check()'s tid argument is modified when the TID in question is not the latest visible tuple in a hot chain, though this wasn't documented. To fix, go back to using a local copy of the TID in _bt_check_unique(), and update comments above table_index_fetch_tuple_check(). Backpatch: 13-, where B-Tree deduplication was introduced.	2020-06-25 10:55:26 -07:00
Alvaro Herrera	6f7a862bed	Adjust max_slot_wal_keep_size behavior per review In pg_replication_slot, change output from normal/reserved/lost to reserved/extended/unreserved/ lost, which better expresses the possible states particularly near the time where segments are no longer safe but checkpoint has not run yet. Under the new definition, reserved means the slot is consuming WAL that's still under the normal WAL size constraints; extended means it's consuming WAL that's being protected by wal_keep_segments or the slot itself, whose size is below max_slot_wal_keep_size; unreserved means the WAL is no longer safe, but checkpoint has not yet removed those files. Such as slot is in imminent danger, but can still continue for a little while and may catch up to the reserved WAL space. Also, there were some bugs in the calculations used to report the status; fixed those. Backpatch to 13. Reported-by: Fujii Masao <masao.fujii@oss.nttdata.com> Author: Kyotaro Horiguchi <horikyota.ntt@gmail.com> Reviewed-by: Fujii Masao <masao.fujii@oss.nttdata.com> Reviewed-by: Álvaro Herrera <alvherre@alvh.no-ip.org> Discussion: https://postgr.es/m/20200616.120236.1809496990963386593.horikyota.ntt@gmail.com	2020-06-24 14:23:39 -04:00
Alvaro Herrera	12e52ba5a7	Save slot's restart_lsn when invalidated due to size We put it aside as invalidated_at, which let us show "lost" in pg_replication slot. Prior to this change, the state value was reported as NULL. Backpatch to 13. Author: Kyotaro Horiguchi <horikyota.ntt@gmail.com> Reviewed-by: Álvaro Herrera <alvherre@alvh.no-ip.org> Discussion: https://postgr.es/m/20200617.101707.1735599255100002667.horikyota.ntt@gmail.com Discussion: https://postgr.es/m/20200407.120905.1507671100168805403.horikyota.ntt@gmail.com	2020-06-24 14:15:17 -04:00
Peter Geoghegan	dedb92d4a3	Fix deduplication "single value" strategy bug. It was possible for deduplication's single value strategy to mistakenly believe that a very small duplicate tuple counts as one of the six large tuples that it aims to leave behind after the page finally splits. This could cause slightly suboptimal space utilization with very low cardinality indexes, though only under fairly narrow conditions. To fix, be particular about what kind of tuple counts as a maxpostingsize-capped tuple. This avoids confusion in the event of a small tuple that gets "wedged" between two large tuples, where all tuples on the page are duplicates of the same value. Discussion: https://postgr.es/m/CAH2-Wz=Y+sgSFc-O3LpiZX-POx2bC+okec2KafERHuzdVa7-rQ@mail.gmail.com Backpatch: 13-, where deduplication was introduced (by commit 0d861bbb)	2020-06-19 08:57:23 -07:00
David Rowley	bdee4af8e0	Fix EXPLAIN ANALYZE for parallel HashAgg plans Since 1f39bce02, HashAgg nodes have had the ability to spill to disk when memory consumption exceeds work_mem. That commit added new properties to EXPLAIN ANALYZE to show the maximum memory usage and disk usage, however, it didn't quite go as far as showing that information for parallel workers. Since workers may have experienced something very different from the main process, we should show this information per worker, as is done in Sort. Reviewed-by: Justin Pryzby Reviewed-by: Jeff Davis Discussion: https://postgr.es/m/CAApHDvpEKbfZa18mM1TD7qV6PG+w97pwCWq5tVD0dX7e11gRJw@mail.gmail.com Backpatch-through: 13, where the hashagg spilling code was added.	2020-06-19 17:25:07 +12:00
Peter Geoghegan	6b29610292	Fix nbtree.h dedup state comment. Oversight in commit 0d861bbb.	2020-06-17 15:23:54 -07:00
Andres Freund	09bff91b31	Avoid potential spinlock in a signal handler as part of global barriers. On platforms without support for 64bit atomic operations where we also cannot rely on 64bit reads to have single copy atomicity, such atomics are implemented using a spinlock based fallback. That means it's not safe to even read such atomics from within a signal handler (since the signal handler might run when the spinlock already is held). To avoid this issue defer global barrier processing out of the signal handler. Instead of checking local / shared barrier generation to determine whether to set ProcSignalBarrierPending, introduce PROCSIGNAL_BARRIER and always set ProcSignalBarrierPending when receiving such a signal. Additionally avoid redundant work in ProcessProcSignalBarrier if ProcSignalBarrierPending is unnecessarily. Also do a small amount of other polishing. Author: Andres Freund Reviewed-By: Robert Haas Discussion: https://postgr.es/m/20200609193723.eu5ilsjxwdpyxhgz@alap3.anarazel.de Backpatch: 13-, where the code was introduced.	2020-06-17 12:47:13 -07:00
David Rowley	095f2d95c9	Add missing extern keyword for a couple of numutils functions In passing, also remove a few surplus empty lines from pg_ltoa and pg_ulltoa_n in numutils.c Reported-by: Andrew Gierth Discussion: https://postgr.es/m/87y2ou3xuh.fsf@news-spur.riddles.org.uk Backpatch-through: 13, where these changes were introduced	2020-06-13 11:28:12 +12:00
Jeff Davis	13e0fa7ae5	Rework HashAgg GUCs. Eliminate enable_groupingsets_hash_disk, which was primarily useful for testing grouping sets that use HashAgg and spill. Instead, hack the table stats to convince the planner to choose hashed aggregation for grouping sets that will spill to disk. Suggested by Melanie Plageman. Rename enable_hashagg_disk to hashagg_avoid_disk_plan, and invert the meaning of on/off. The new name indicates more strongly that it only affects the planner. Also, the word "avoid" is less definite, which should avoid surprises when HashAgg still needs to use the disk. Change suggested by Justin Pryzby, though I chose a different GUC name. Discussion: https://postgr.es/m/CAAKRu_aisiENMsPM2gC4oUY1hHG3yrCwY-fXUg22C6_MJUwQdA%40mail.gmail.com Discussion: https://postgr.es/m/20200610021544.GA14879@telsasoft.com Backpatch-through: 13	2020-06-11 13:05:37 -07:00
Michael Paquier	8d8b89266c	Move frontend-side archive APIs from src/common/ to src/fe_utils/ fe_archive.c was compiled only for the frontend in src/common/, but as it will never share anything with the backend, it makes most sense to move this file to src/fe_utils/. Reported-by: Peter Eisentraut Discussion: https://postgr.es/m/e9766d71-8655-ac86-bdf6-77e0e7169977@2ndquadrant.com Backpatch-through: 13	2020-06-11 15:48:56 +09:00
Andres Freund	de4a259896	Avoid need for valgrind suppressions for pg_atomic_init_u64 on some platforms. Previously we used pg_atomic_write_64_impl inside pg_atomic_init_u64. That works correctly, but on platforms without 64bit single copy atomicity it could trigger spurious valgrind errors about uninitialized memory, because we use compare_and_swap for atomic writes on such platforms. I previously suppressed one instance of this problem (6c878edc1df), but as Tom reports that wasn't enough. As the atomic variable cannot yet be concurrently accessible during initialization, it seems better to have pg_atomic_init_64_impl set the value directly. Change pg_atomic_init_u32_impl for symmetry. Reported-By: Tom Lane Author: Andres Freund Discussion: https://postgr.es/m/1714601.1591503815@sss.pgh.pa.us Backpatch: 9.5-	2020-06-08 20:02:49 -07:00
Michael Paquier	10ffe0fa72	Fix crash in WAL sender when starting physical replication Since database connections can be used with WAL senders in 9.4, it is possible to use physical replication. This commit fixes a crash when starting physical replication with a WAL sender using a database connection, caused by the refactoring done in 850196b. There have been discussions about forbidding the use of physical replication in a database connection, but this is left for later, taking care only of the crash new to 13. While on it, add a test to check for a failure when attempting logical replication if the WAL sender does not have a database connection. This part is extracted from a larger patch by Kyotaro Horiguchi. Reported-by: Vladimir Sitnikov Author: Michael Paquier, Kyotaro Horiguchi Reviewed-by: Kyotaro Horiguchi, Álvaro Herrera Discussion: https://postgr.es/m/CAB=Je-GOWMj1PTPkeUhjqQp-4W3=nW-pXe2Hjax6rJFffB5_Aw@mail.gmail.com Backpatch-through: 13	2020-06-08 10:12:31 +09:00
Tom Lane	b5d69b7c22	pgindent run prior to branching v13. pgperltidy and reformat-dat-files too, though those didn't find anything to change.	2020-06-07 16:57:08 -04:00
Peter Eisentraut	0fd2a79a63	Spelling adjustments	2020-06-07 15:06:51 +02:00
Tom Lane	0c882e52a8	Improve ineq_histogram_selectivity's behavior for non-default orderings. ineq_histogram_selectivity() can be invoked in situations where the ordering we care about is not that of the column's histogram. We could be considering some other collation, or even more drastically, the query operator might not agree at all with what was used to construct the histogram. (We'll get here for anything using scalarineqsel-based estimators, so that's quite likely to happen for extension operators.) Up to now we just ignored this issue and assumed we were dealing with an operator/collation whose sort order exactly matches the histogram, possibly resulting in junk estimates if the binary search gets confused. It's past time to improve that, since the use of nondefault collations is increasing. What we can do is verify that the given operator and collation match what's recorded in pg_statistic, and use the existing code only if so. When they don't match, instead execute the operator against each histogram entry, and take the fraction of successes as our selectivity estimate. This gives an estimate that is probably good to about 1/histogram_size, with no assumptions about ordering. (The quality of the estimate is likely to degrade near the ends of the value range, since the two orderings probably don't agree on what is an extremal value; but this is surely going to be more reliable than what we did before.) At some point we might further improve matters by storing more than one histogram calculated according to different orderings. But this code would still be good fallback logic when no matches exist, so that is not an argument for not doing this. While here, also improve get_variable_range() to deal more honestly with non-default collations. This isn't back-patchable, because it requires adding another argument to ineq_histogram_selectivity, and because it might have significant impact on the estimation results for extension operators relying on scalarineqsel --- mostly for the better, one hopes, but in any case destabilizing plan choices in back branches is best avoided. Per investigation of a report from James Lucas. Discussion: https://postgr.es/m/CAAFmbbOvfi=wMM=3qRsPunBSLb8BFREno2oOzSBS=mzfLPKABw@mail.gmail.com	2020-06-05 16:55:27 -04:00
Joe Conway	87fb04af1e	Add unlikely() to CHECK_FOR_INTERRUPTS() Add the unlikely() branch hint macro to CHECK_FOR_INTERRUPTS(). Backpatch to REL_10_STABLE where we first started using unlikely(). Discussion: https://www.postgresql.org/message-id/flat/8692553c-7fe8-17d9-cbc1-7cddb758f4c6%40joeconway.com	2020-06-05 16:49:25 -04:00
Tom Lane	044c99bc56	Use query collation, not column's collation, while examining statistics. Commit 5e0928005 changed the planner so that, instead of blindly using DEFAULT_COLLATION_OID when invoking operators for selectivity estimation, it would use the collation of the column whose statistics we're considering. This was recognized as still being not quite the right thing, but it seemed like a good incremental improvement. However, shortly thereafter we introduced nondeterministic collations, and that creates cases where operators can fail if they're passed the wrong collation. We don't want planning to fail in cases where the query itself would work, so this means that we must use the query's collation when invoking operators for estimation purposes. The only real problem this creates is in ineq_histogram_selectivity, where the binary search might produce a garbage answer if we perform comparisons using a different collation than the column's histogram is ordered with. However, when the query's collation is significantly different from the column's default collation, the estimate we previously generated would be pretty irrelevant anyway; so it's not clear that this will result in noticeably worse estimates in practice. (A follow-on patch will improve this situation in HEAD, but it seems too invasive for back-patch.) The patch requires changing the signatures of mcv_selectivity and allied functions, which are exported and very possibly are used by extensions. In HEAD, I just did that, but an API/ABI break of this sort isn't acceptable in stable branches. Therefore, in v12 the patch introduces "mcv_selectivity_ext" and so on, with signatures matching HEAD, and makes the old functions into wrappers that assume DEFAULT_COLLATION_OID should be used. That does not match the prior behavior, but it should avoid risk of failure in most cases. (In practice, I think most extension datatypes aren't collation-aware, so the change probably doesn't matter to them.) Per report from James Lucas. Back-patch to v12 where the problem was introduced. Discussion: https://postgr.es/m/CAAFmbbOvfi=wMM=3qRsPunBSLb8BFREno2oOzSBS=mzfLPKABw@mail.gmail.com	2020-06-05 16:18:50 -04:00
Tom Lane	a9632830bb	Reject "23:59:60.nnn" in datetime input. It's intentional that we don't allow values greater than 24 hours, while we do allow "24:00:00" as well as "23:59:60" as inputs. However, the range check was miscoded in such a way that it would accept "23:59:60.nnn" with a nonzero fraction. For time or timetz, the stored result would then be greater than "24:00:00" which would fail dump/reload, not to mention possibly confusing other operations. Fix by explicitly calculating the result and making sure it does not exceed 24 hours. (This calculation is redundant with what will happen later in tm2time or tm2timetz. Maybe someday somebody will find that annoying enough to justify refactoring to avoid the duplication; but that seems too invasive for a back-patched bug fix, and the cost is probably unmeasurable anyway.) Note that this change also rejects such input as the time portion of a timestamp(tz) value. Back-patch to v10. The bug is far older, but to change this pre-v10 we'd need to ensure that the logic behaves sanely with float timestamps, which is possibly nontrivial due to roundoff considerations. Doesn't really seem worth troubling with. Per report from Christoph Berg. Discussion: https://postgr.es/m/20200520125807.GB296739@msg.df7cb.de	2020-06-04 16:42:23 -04:00
Tom Lane	f88bd3139f	Don't call palloc() while holding a spinlock, either. Fix some more violations of the "only straight-line code inside a spinlock" rule. These are hazardous not only because they risk holding the lock for an excessively long time, but because it's possible for palloc to throw elog(ERROR), leaving a stuck spinlock behind. copy_replication_slot() had two separate places that did pallocs while holding a spinlock. We can make the code simpler and safer by copying the whole ReplicationSlot struct into a local variable while holding the spinlock, and then referencing that copy. (While that's arguably more cycles than we really need to spend holding the lock, the struct isn't all that big, and this way seems far more maintainable than copying fields piecemeal. Anyway this is surely much cheaper than a palloc.) That bug goes back to v12. InvalidateObsoleteReplicationSlots() not only did a palloc while holding a spinlock, but for extra sloppiness then leaked the memory --- probably for the lifetime of the checkpointer process, though I didn't try to verify that. Fortunately that silliness is new in HEAD. pg_get_replication_slots() had a cosmetic violation of the rule, in that it only assumed it's safe to call namecpy() while holding a spinlock. Still, that's a hazard waiting to bite somebody, and there were some other cosmetic coding-rule violations in the same function, so clean it up. I back-patched this as far as v10; the code exists before that but it looks different, and this didn't seem important enough to adapt the patch further back. Discussion: https://postgr.es/m/20200602.161518.1399689010416646074.horikyota.ntt@gmail.com	2020-06-03 12:36:23 -04:00
Michael Paquier	f93bb0ce64	Fix some comments in xlogreader.h segment_open and segment_close were mentioned with incorrect names. Discussion: https://postgr.es/m/20200525234944.GA1573@paquier.xyz	2020-05-28 16:40:07 +09:00
Tom Lane	3048898e73	Mop-up for wait event naming issues. Synchronize the event names for parallel hash join waits with other event names, by getting rid of the slashes and dropping "-ing" suffixes. Rename ClogGroupUpdate to XactGroupUpdate, to match the new SLRU name. Move the ProcSignalBarrier event to the IPC category; it doesn't belong under IO. Also a bit more wordsmithing in the wait event documentation tables. Discussion: https://postgr.es/m/4505.1589640417@sss.pgh.pa.us	2020-05-16 21:00:11 -04:00

1 2 3 4 5 ...

9373 Commits