It's intentional that we don't allow values greater than 24 hours,
while we do allow "24:00:00" as well as "23:59:60" as inputs.
However, the range check was miscoded in such a way that it would
accept "23:59:60.nnn" with a nonzero fraction. For time or timetz,
the stored result would then be greater than "24:00:00" which would
fail dump/reload, not to mention possibly confusing other operations.
Fix by explicitly calculating the result and making sure it does not
exceed 24 hours. (This calculation is redundant with what will happen
later in tm2time or tm2timetz. Maybe someday somebody will find that
annoying enough to justify refactoring to avoid the duplication; but
that seems too invasive for a back-patched bug fix, and the cost is
probably unmeasurable anyway.)
Note that this change also rejects such input as the time portion
of a timestamp(tz) value.
Back-patch to v10. The bug is far older, but to change this pre-v10
we'd need to ensure that the logic behaves sanely with float timestamps,
which is possibly nontrivial due to roundoff considerations.
Doesn't really seem worth troubling with.
Per report from Christoph Berg.
Discussion: https://postgr.es/m/20200520125807.GB296739@msg.df7cb.de
A table rewritten by ALTER TABLE would lose tracking of an index usable
for CLUSTER. This setting is tracked by pg_index.indisclustered and is
controlled by ALTER TABLE, so some extra work was needed to restore it
properly. Note that ALTER TABLE only marks the index that can be used
for clustering, and does not do the actual operation.
Author: Amit Langote, Justin Pryzby
Reviewed-by: Ibrar Ahmed, Michael Paquier
Discussion: https://postgr.es/m/20200202161718.GI13621@telsasoft.com
Backpatch-through: 9.5
This reverts commit cb2fd7eac285b1b0a24eeb2b8ed4456b66c5a09f. Per
numerous buildfarm members, it was incompatible with parallel query, and
a test case assumed LP64. Back-patch to 9.5 (all supported versions).
Discussion: https://postgr.es/m/20200321224920.GB1763544@rfd.leadboat.com
Until now, only selected bulk operations (e.g. COPY) did this. If a
given relfilenode received both a WAL-skipping COPY and a WAL-logged
operation (e.g. INSERT), recovery could lose tuples from the COPY. See
src/backend/access/transam/README section "Skipping WAL for New
RelFileNode" for the new coding rules. Maintainers of table access
methods should examine that section.
To maintain data durability, just before commit, we choose between an
fsync of the relfilenode and copying its contents to WAL. A new GUC,
wal_skip_threshold, guides that choice. If this change slows a workload
that creates small, permanent relfilenodes under wal_level=minimal, try
adjusting wal_skip_threshold. Users setting a timeout on COMMIT may
need to adjust that timeout, and log_min_duration_statement analysis
will reflect time consumption moving to COMMIT from commands like COPY.
Internally, this requires a reliable determination of whether
RollbackAndReleaseCurrentSubTransaction() would unlink a relation's
current relfilenode. Introduce rd_firstRelfilenodeSubid. Amend the
specification of rd_createSubid such that the field is zero when a new
rel has an old rd_node. Make relcache.c retain entries for certain
dropped relations until end of transaction.
Back-patch to 9.5 (all supported versions). This introduces a new WAL
record type, XLOG_GIST_ASSIGN_LSN, without bumping XLOG_PAGE_MAGIC. As
always, update standby systems before master systems. This changes
sizeof(RelationData) and sizeof(IndexStmt), breaking binary
compatibility for affected extensions. (The most recent commit to
affect the same class of extensions was
089e4d405d0f3b94c74a2c6a54357a84a681754b.)
Kyotaro Horiguchi, reviewed (in earlier, similar versions) by Robert
Haas. Heikki Linnakangas and Michael Paquier implemented earlier
designs that materially clarified the problem. Reviewed, in earlier
designs, by Andrew Dunstan, Andres Freund, Alvaro Herrera, Tom Lane,
Fujii Masao, and Simon Riggs. Reported by Martijn van Oosterhout.
Discussion: https://postgr.es/m/20150702220524.GA9392@svana.org
Remove an obsolete comment from AtEOXact_cleanup(). Restore formatting
of a comment in struct RelationData, mangled by the pgindent run in
commit 9af4159fce6654aa0e081b00d02bca40b978745c. Back-patch to 9.5 (all
supported versions), because another fix stacks on this.
Commit fc7695891 changed CheckAttributeType to recurse into ranges,
but made it pass down the wrong collation (always InvalidOid, since
ranges as such have no collation). This would result in guaranteed
failure when considering a range type whose subtype is collatable.
Embarrassingly, we lack any regression tests that would expose such
a problem (but fortunately, somebody noticed before we shipped this
bug in any release).
Fix it to pass down the range's subtype collation property instead,
and add some regression test cases to exercise collatable-subtype
ranges a bit more. Back-patch to all supported branches, as the
previous patch was.
Report and patch by Julien Rouhaud, test cases tweaked by me
Discussion: https://postgr.es/m/CAOBaU_aBWqNweiGUFX0guzBKkcfJ8mnnyyGC_KBQmO12Mj5f_A@mail.gmail.com
We implement ON COMMIT DELETE ROWS by truncating tables marked that
way, which requires also truncating/rebuilding their indexes. But
RelationTruncateIndexes asks the relcache for up-to-date copies of any
index expressions, which may cause execution of eval_const_expressions
on them, which can result in actual execution of subexpressions.
This is a bad thing to have happening during ON COMMIT. Manuel Rigger
reported that use of a SQL function resulted in crashes due to
expectations that ActiveSnapshot would be set, which it isn't.
The most obvious fix perhaps would be to push a snapshot during
PreCommit_on_commit_actions, but I think that would just open the door
to more problems: CommitTransaction explicitly expects that no
user-defined code can be running at this point.
Fortunately, since we know that no tuples exist to be indexed, there
seems no need to use the real index expressions or predicates during
RelationTruncateIndexes. We can set up dummy index expressions
instead (we do need something that will expose the right data type,
as there are places that build index tupdescs based on this), and
just ignore predicates and exclusion constraints.
In a green field it'd likely be better to reimplement ON COMMIT DELETE
ROWS using the same "init fork" infrastructure used for unlogged
relations. That seems impractical without catalog changes though,
and even without that it'd be too big a change to back-patch.
So for now do it like this.
Per private report from Manuel Rigger. This has been broken forever,
so back-patch to all supported branches.
If the bitstring length is not a multiple of 8, we'd shift the
rightmost bits into the pad space, which must be zeroes --- bit_cmp,
for one, depends on that. This'd lead to the result failing to
compare equal to what it should compare equal to, as reported in
bug #16013 from Daryl Waycott.
This is, if memory serves, not the first such bug in the bitstring
functions. In hopes of making it the last one, do a bit more work
than minimally necessary to fix the bug:
* Add assertion checks to bit_out() and varbit_out() to complain if
they are given incorrectly-padded input. This will improve the
odds that manual testing of any new patch finds problems.
* Encapsulate the padding-related logic in macros to make it
easier to use.
Also, remove unnecessary padding logic from bit_or() and bitxor().
Somebody had already noted that we need not re-pad the result of
bit_and() since the inputs are required to be the same length,
but failed to extrapolate that to the other two.
Also, move a comment block that once was near the head of varbit.c
(but people kept putting other stuff in front of it), to put it in
the header block.
Note for the release notes: if anyone has inconsistent data as a
result of saving the output of bitshiftright() in a table, it's
possible to fix it with something like
UPDATE mytab SET bitcol = ~(~bitcol) WHERE bitcol != ~(~bitcol);
This has been broken since day one, so back-patch to all supported
branches.
Discussion: https://postgr.es/m/16013-c2765b6996aacae9@postgresql.org
RelationAllowsEarlyPruning() performed a catalog scan, but is used
in two contexts where that was a bad idea:
1. In heap_page_prune_opt(), which runs very frequently in some large
scans. This caused major performance problems in a field report
that was easy to reproduce.
2. In TestForOldSnapshot(), which runs while we hold a buffer content
lock. It's not clear if this was guaranteed to be free of buffer
deadlock risk.
The check was introduced in commit 2cc41acd8 and defended against a
real problem: 9.6's hash indexes have no page LSN and so we can't
allow early pruning (ie the snapshot-too-old feature). We can remove
the check from all later releases though: hash indexes are now logged,
and there is no way to create UNLOGGED indexes on regular logged
tables.
If a future release allows such a combination, it might need to put
a similar check in place, but it'll need some more thought.
Back-patch to 10.
Author: Thomas Munro
Reviewed-by: Tom Lane, who spotted the second problem
Discussion: https://postgr.es/m/CA%2BhUKGKT8oTkp5jw_U4p0S-7UG9zsvtw_M47Y285bER6a2gD%2Bg%40mail.gmail.com
Discussion: https://postgr.es/m/CAA4eK1%2BWy%2BN4eE5zPm765h68LrkWc3Biu_8rzzi%2BOYX4j%2BiHRw%40mail.gmail.com
UBSan complains about this. Instead, cast to a suitable type requiring
only 4-byte alignment. DatumGetAnyArrayP() already assumes one can cast
between AnyArrayType and ArrayType, so this doesn't introduce a new
assumption. Back-patch to 9.5, where AnyArrayType was introduced.
Reviewed by Tom Lane.
Discussion: https://postgr.es/m/20190629210334.GA1244217@rfd.leadboat.com
The original coding of generate_partition_qual() just copied the list
of predicate expressions into the global CacheMemoryContext, making it
effectively impossible to clean up when the owning relcache entry is
destroyed --- the relevant code in RelationDestroyRelation() only managed
to free the topmost List header :-(. This resulted in a session-lifespan
memory leak whenever a table partition's relcache entry is rebuilt.
Fortunately, that's not normally a large data structure, and rebuilds
shouldn't occur all that often in production situations; but this is
still a bug worth fixing back to v10 where the code was introduced.
To fix, put the cached expression tree into its own small memory context,
as we do with other complicated substructures of relcache entries.
Also, deal more honestly with the case that a partition has an empty
partcheck list; while that probably isn't a case that's very interesting
for production use, it's legal.
In passing, clarify comments about how partitioning-related relcache
data structures are managed, and add some Asserts that we're not leaking
old copies when we overwrite these data fields.
Amit Langote and Tom Lane
Discussion: https://postgr.es/m/7961.1552498252@sss.pgh.pa.us
This patch fixes several related cases in which pg_shdepend entries were
never made, or were lost, for references to roles appearing in the ACLs of
schemas and/or types. While that did no immediate harm, if a referenced
role were later dropped, the drop would be allowed and would leave a
dangling reference in the object's ACL. That still wasn't a big problem
for normal database usage, but it would cause obscure failures in
subsequent dump/reload or pg_upgrade attempts, taking the form of
attempts to grant privileges to all-numeric role names. (I think I've
seen field reports matching that symptom, but can't find any right now.)
Several cases are fixed here:
1. ALTER DOMAIN SET/DROP DEFAULT would lose the dependencies for any
existing ACL entries for the domain. This case is ancient, dating
back as far as we've had pg_shdepend tracking at all.
2. If a default type privilege applies, CREATE TYPE recorded the
ACL properly but forgot to install dependency entries for it.
This dates to the addition of default privileges for types in 9.2.
3. If a default schema privilege applies, CREATE SCHEMA recorded the
ACL properly but forgot to install dependency entries for it.
This dates to the addition of default privileges for schemas in v10
(commit ab89e465c).
Another somewhat-related problem is that when creating a relation
rowtype or implicit array type, TypeCreate would apply any available
default type privileges to that type, which we don't really want
since such an object isn't supposed to have privileges of its own.
(You can't, for example, drop such privileges once they've been added
to an array type.)
ab89e465c is also to blame for a race condition in the regression tests:
privileges.sql transiently installed globally-applicable default
privileges on schemas, which sometimes got absorbed into the ACLs of
schemas created by concurrent test scripts. This should have resulted
in failures when privileges.sql tried to drop the role holding such
privileges; but thanks to the bug fixed here, it instead led to dangling
ACLs in the final state of the regression database. We'd managed not to
notice that, but it became obvious in the wake of commit da906766c, which
allowed the race condition to occur in pg_upgrade tests.
To fix, add a function recordDependencyOnNewAcl to encapsulate what
callers of get_user_default_acl need to do; while the original call
sites got that right via ad-hoc code, none of the later-added ones
have. Also change GenerateTypeDependencies to generate these
dependencies, which requires adding the typacl to its parameter list.
(That might be annoying if there are any extensions calling that
function directly; but if there are, they're most likely buggy in the
same way as the core callers were, so they need work anyway.) While
I was at it, I changed GenerateTypeDependencies to accept most of its
parameters in the form of a Form_pg_type pointer, making its parameter
list a bit less unwieldy and mistake-prone.
The test race condition is fixed just by wrapping the addition and
removal of default privileges into a single transaction, so that that
state is never visible externally. We might eventually prefer to
separate out tests of default privileges into a script that runs by
itself, but that would be a bigger change and would make the tests
run slower overall.
Back-patch relevant parts to all supported branches.
Discussion: https://postgr.es/m/15719.1541725287@sss.pgh.pa.us
Historically we forbade datatype-specific comparison functions from
returning INT_MIN, so that it would be safe to invert the sort order
just by negating the comparison result. However, this was never
really safe for comparison functions that directly return the result
of memcmp(), strcmp(), etc, as POSIX doesn't place any such restriction
on those library functions. Buildfarm results show that at least on
recent Linux on s390x, memcmp() actually does return INT_MIN sometimes,
causing sort failures.
The agreed-on answer is to remove this restriction and fix relevant
call sites to not make such an assumption; code such as "res = -res"
should be replaced by "INVERT_COMPARE_RESULT(res)". The same is needed
in a few places that just directly negated the result of memcmp or
strcmp.
To help find places having this problem, I've also added a compile option
to nbtcompare.c that causes some of the commonly used comparators to
return INT_MIN/INT_MAX instead of their usual -1/+1. It'd likely be
a good idea to have at least one buildfarm member running with
"-DSTRESS_SORT_INT_MIN". That's far from a complete test of course,
but it should help to prevent fresh introductions of such bugs.
This is a longstanding portability hazard, so back-patch to all supported
branches.
Discussion: https://postgr.es/m/20180928185215.ffoq2xrq5d3pafna@alap3.anarazel.de
It failed if passed a nonexistent relation OID, or one that was a non-heap
relation, because of blindly applying heap_open to a user-supplied OID.
This is not OK behavior for a SQL-exposed function; we have a project
policy that we should return NULL in such cases. Moreover, since
pg_get_partition_constraintdef ought now to work on indexes, restricting
it to heaps is flat wrong anyway.
The underlying function generate_partition_qual() wasn't on board with
indexes having partition quals either, nor for that matter with rels
having relispartition set but yet null relpartbound. (One wonders
whether the person who wrote the function comment blocks claiming that
these functions allow a missing relpartbound had ever tested it.)
Fix by testing relispartition before opening the rel, and by using
relation_open not heap_open. (If any other relkinds ever grow the
ability to have relispartition set, the code will work with them
automatically.) Also, don't reject null relpartbound in
generate_partition_qual.
Back-patch to v11, and all but the null-relpartbound change to v10.
(It's not really necessary to change generate_partition_qual at all
in v10, but I thought s/heap_open/relation_open/ would be a good
idea anyway just to keep the code in sync with later branches.)
Per report from Justin Pryzby.
Discussion: https://postgr.es/m/20180927200020.GJ776@telsasoft.com
Commits 742869946 et al turn out to be a couple bricks shy of a load.
We were dumping the stored values of GUC_LIST_QUOTE variables as they
appear in proconfig or setconfig catalog columns. However, although that
quoting rule looks a lot like SQL-identifier double quotes, there are two
critical differences: empty strings ("") are legal, and depending on which
variable you're considering, values longer than NAMEDATALEN might be valid
too. So the current technique fails altogether on empty-string list
entries (as reported by Steven Winfield in bug #15248) and it also risks
truncating file pathnames during dump/reload of GUC values that are lists
of pathnames.
To fix, split the stored value without any downcasing or truncation,
and then emit each element as a SQL string literal.
This is a tad annoying, because we now have three copies of the
comma-separated-string splitting logic in varlena.c as well as a fourth
one in dumputils.c. (Not to mention the randomly-different-from-those
splitting logic in libpq...) I looked at unifying these, but it would
be rather a mess unless we're willing to tweak the API definitions of
SplitIdentifierString, SplitDirectoriesString, or both. That might be
worth doing in future; but it seems pretty unsafe for a back-patched
bug fix, so for now accept the duplication.
Back-patch to all supported branches, as the previous fix was.
Discussion: https://postgr.es/m/7585.1529435872@sss.pgh.pa.us
Also, fix the pg_settings view to display source filename and line
number when invoked by a pg_read_all_settings member. This addition by
me (Álvaro).
Also, fix wording of the comment in GetConfigOption regarding the
restriction it implements, renaming the parameter for extra clarity.
Noted by Michaël.
These were all oversight in commit 25fff40798fc; backpatch to pg10,
where that commit first appeared.
Author: Laurenz Albe
Reviewed-by: Michaël Paquier, Álvaro Herrera
Discussion: https://postgr.es/m/1519917758.6586.8.camel@cybertec.at
Code that prints out the contents of setconfig or proconfig arrays in
SQL format needs to handle GUC_LIST_QUOTE variables differently from
other ones, because for those variables, flatten_set_variable_args()
already applied a layer of quoting. The value can therefore safely
be printed as-is, and indeed must be, or flatten_set_variable_args()
will muck it up completely on reload. For all other GUC variables,
it's necessary and sufficient to quote the value as a SQL literal.
We'd recognized the need for this long ago, but mis-analyzed the
need slightly, thinking that all GUC_LIST_INPUT variables needed
the special treatment. That's actually wrong, since a valid value
of a LIST variable might include characters that need quoting,
although no existing variables accept such values.
More to the point, we hadn't made any particular effort to keep the
various places that deal with this up-to-date with the set of variables
that actually need special treatment, meaning that we'd do the wrong
thing with, for example, temp_tablespaces values. This affects dumping
of SET clauses attached to functions, as well as ALTER DATABASE/ROLE SET
commands.
In ruleutils.c we can fix it reasonably honestly by exporting a guc.c
function that allows discovering the flags for a given GUC variable.
But pg_dump doesn't have easy access to that, so continue the old method
of having a hard-wired list of affected variable names. At least we can
fix it to have just one list not two, and update the list to match
current reality.
A remaining problem with this is that it only works for built-in
GUC variables. pg_dump's list obvious knows nothing of third-party
extensions, and even the "ask guc.c" method isn't bulletproof since
the relevant extension might not be loaded. There's no obvious
solution to that, so for now, we'll just have to discourage extension
authors from inventing custom GUCs that need GUC_LIST_QUOTE.
This has been busted for a long time, so back-patch to all supported
branches.
Michael Paquier and Tom Lane, reviewed by Kyotaro Horiguchi and
Pavel Stehule
Discussion: https://postgr.es/m/20180111064900.GA51030@paquier.xyz
refresh_by_match_merge() has some issues in the way it builds a SQL
query to construct the "diff" table:
1. It doesn't require the selected unique index(es) to be indimmediate.
2. It doesn't pay attention to the particular equality semantics enforced
by a given index, but just assumes that they must be those of the column
datatype's default btree opclass.
3. It doesn't check that the indexes are btrees.
4. It's insufficiently careful to ensure that the parser will pick the
intended operator when parsing the query. (This would have been a
security bug before CVE-2018-1058.)
5. It's not careful about indexes on system columns.
The way to fix#4 is to make use of the existing code in ri_triggers.c
for generating an arbitrary binary operator clause. I chose to move
that to ruleutils.c, since that seems a more reasonable place to be
exporting such functionality from than ri_triggers.c.
While #1, #3, and #5 are just latent given existing feature restrictions,
and #2 doesn't arise in the core system for lack of alternate opclasses
with different equality behaviors, #4 seems like an issue worth
back-patching. That's the bulk of the change anyway, so just back-patch
the whole thing to 9.4 where this code was introduced.
Discussion: https://postgr.es/m/13836.1521413227@sss.pgh.pa.us
If convert_to_scalar is passed a pair of datatypes it can't cope with,
its former behavior was just to elog(ERROR). While this is OK so far as
the core code is concerned, there's extension code that would like to use
scalarltsel/scalargtsel/etc as selectivity estimators for operators that
work on non-core datatypes, and this behavior is a show-stopper for that
use-case. If we simply allow convert_to_scalar to return FALSE instead of
outright failing, then the main logic of scalarltsel/scalargtsel will work
fine for any operator that behaves like a scalar inequality comparison.
The lack of conversion capability will mean that we can't estimate to
better than histogram-bin-width precision, since the code will effectively
assume that the comparison constant falls at the middle of its bin. But
that's still a lot better than nothing. (Someday we should provide a way
for extension code to supply a custom version of convert_to_scalar, but
today is not that day.)
While poking at this issue, we noted that the existing code for handling
type bytea in convert_to_scalar is several bricks shy of a load.
It assumes without checking that if the comparison value is type bytea,
the bounds values are too; in the worst case this could lead to a crash.
It also fails to detoast the input values, so that the comparison result is
complete garbage if any input is toasted out-of-line, compressed, or even
just short-header. I'm not sure how often such cases actually occur ---
the bounds values, at least, are probably safe since they are elements of
an array and hence can't be toasted. But that doesn't make this code OK.
Back-patch to all supported branches, partly because author requested that,
but mostly because of the bytea bugs. The change in API for the exposed
routine convert_network_to_scalar() is theoretically a back-patch hazard,
but it seems pretty unlikely that any third-party code is calling that
function directly.
Tomas Vondra, with some adjustments by me
Discussion: https://postgr.es/m/b68441b6-d18f-13ab-b43b-9a72188a4e02@2ndquadrant.com
This backpatches 935dee9ad5a8d12f4d3b772a6e6c99d245e5ad44 to the
the branches requested by extension authors.
Original-Author: Metin Doslu
Original-Committer: Robert Haas
Author: Brian Cloutier
Without this fix, dropping a role can sometimes result in parallel
query failures in sessions that have used "SET ROLE" to assume the
dropped role, even if that setting isn't active any more.
Report by Pavan Deolasee. Patch by Amit Kapila, reviewed by me.
Discussion: http://postgr.es/m/CABOikdOomRcZsLsLK+Z+qENM1zxyaWnAvFh3MJZzZnnKiF+REg@mail.gmail.com
Up to now async.c has used TransactionIdIsInProgress() to detect whether
a notify message's source transaction is still running. However, that
function has a quick-exit path that reports that XIDs before RecentXmin
are no longer running. If a listening backend is doing nothing but
listening, and not running any queries, there is nothing that will advance
its value of RecentXmin. Once 2 billion transactions elapse, the
RecentXmin check causes active transactions to be reported as not running.
If they aren't committed yet according to CLOG, async.c decides they
aborted and discards their messages. The timing for that is a bit tight
but it can happen when multiple backends are sending notifies concurrently.
The net symptom therefore is that a sufficiently-long-surviving
listen-only backend starts to miss some fraction of NOTIFY traffic,
but only under heavy load.
The only function that updates RecentXmin is GetSnapshotData().
A brute-force fix would therefore be to take a snapshot before
processing incoming notify messages. But that would add cycles,
as well as contention for the ProcArrayLock. We can be smarter:
having taken the snapshot, let's use that to check for running
XIDs, and not call TransactionIdIsInProgress() at all. In this
way we reduce the number of ProcArrayLock acquisitions from one
per message to one per notify interrupt; that's the same under
light load but should be a benefit under heavy load. Light testing
says that this change is a wash performance-wise for normal loads.
I looked around for other callers of TransactionIdIsInProgress()
that might be at similar risk, and didn't find any; all of them
are inside transactions that presumably have already taken a
snapshot.
Problem report and diagnosis by Marko Tiikkaja, patch by me.
Back-patch to all supported branches, since it's been like this
since 9.0.
Discussion: https://postgr.es/m/20170926182935.14128.65278@wrigleys.postgresql.org
The order in which GRANTs are output is important as GRANTs which have
been GRANT'd by individuals via WITH GRANT OPTION GRANTs have to come
after the GRANT which included the WITH GRANT OPTION. This happens
naturally in the backend during normal operation as we only change
existing ACLs in-place, only add new ACLs to the end, and when removing
an ACL we remove any which depend on it also.
Also, adjust the comments in acl.h to make this clear.
Unfortunately, the updates to pg_dump to handle initial privileges
involved pulling apart ACLs and then combining them back together and
could end up putting them back together in an invalid order, leading to
dumps which wouldn't restore.
Fix this by adjusting the queries used by pg_dump to ensure that the
ACLs are rebuilt in the same order in which they were originally.
Back-patch to 9.6 where the changes for initial privileges were done.
Pre-C99 platforms may lack <stdint.h> and thereby SIZE_MAX. We have
a couple of places using the hack "(size_t) -1" as a fallback, but
it wasn't universally available; which means the code added in commit
2e70d6b5e fails to compile everywhere. Move that hack to c.h so that
we can rely on having SIZE_MAX everywhere.
Per discussion, it'd be a good idea to make the macro's value safe
for use in #if-tests, but that will take a bit more work. This is
just a quick expedient to get the buildfarm green again.
Back-patch to all supported branches, like the previous commit.
Discussion: https://postgr.es/m/15883.1504278595@sss.pgh.pa.us
The sole useful effect of this function, to check that no catcache
entries have positive refcounts at transaction end, has really been
obsolete since we introduced ResourceOwners in PG 8.1. We reduced the
checks to assertions years ago, so that the function was a complete
no-op in production builds. There have been previous discussions about
removing it entirely, but consensus up to now was that it had some small
value as a cross-check for bugs in the ResourceOwner logic.
However, it now emerges that it's possible to trigger these assertions
if you hit an assert-enabled backend with SIGTERM during a call to
SearchCatCacheList, because that function temporarily increases the
refcounts of entries it's intending to add to a catcache list construct.
In a normal ERROR scenario, the extra refcounts are cleaned up by
SearchCatCacheList's PG_CATCH block; but in a FATAL exit we do a
transaction abort and exit without ever executing PG_CATCH handlers.
There's a case to be made that this is a generic hazard and we should
consider restructuring elog(FATAL) handling so that pending PG_CATCH
handlers do get run. That's pretty scary though: it could easily create
more problems than it solves. Preliminary stress testing by Andreas
Seltenreich suggests that there are not many live problems of this ilk,
so we rejected that idea.
There are more-localized ways to fix the problem; the most principled
one would be to use PG_ENSURE_ERROR_CLEANUP instead of plain PG_TRY.
But adding cycles to SearchCatCacheList isn't very appealing. We could
also weaken the assertions in AtEOXact_CatCache in some more or less
ad-hoc way, but that just makes its raison d'etre even less compelling.
In the end, the most reasonable solution seems to be to just remove
AtEOXact_CatCache altogether, on the grounds that it's not worth trying
to fix it. It hasn't found any bugs for us in many years.
Per report from Jeevan Chalke. Back-patch to all supported branches.
Discussion: https://postgr.es/m/CAM2+6=VEE30YtRQCZX7_sCFsEpoUkFBV1gZazL70fqLn8rcvBA@mail.gmail.com
The previous message didn't mention the name of the table or the
bounds. Put the table name in the primary error message and the
bounds in the detail message.
Amit Langote, changed slightly by me. Suggestions on the exac
phrasing from Tom Lane, David G. Johnston, and Dean Rasheed.
Discussion: http://postgr.es/m/CA+Tgmoae6bpwVa-1BMaVcwvCCeOoJ5B9Q9-RHWo-1gJxfPBZ5Q@mail.gmail.com
Traditionally, "pg_ctl start -w" has waited for the server to become
ready to accept connections by attempting a connection once per second.
That has the major problem that connection issues (for instance, a
kernel packet filter blocking traffic) can't be reliably told apart
from server startup issues, and the minor problem that if server startup
isn't quick, we accumulate "the database system is starting up" spam
in the server log. We've hacked around many of the possible connection
issues, but it resulted in ugly and complicated code in pg_ctl.c.
In commit c61559ec3, I changed the probe rate to every tenth of a second.
That prompted Jeff Janes to complain that the log-spam problem had become
much worse. In the ensuing discussion, Andres Freund pointed out that
we could dispense with connection attempts altogether if the postmaster
were changed to report its status in postmaster.pid, which "pg_ctl start"
already relies on being able to read. This patch implements that, teaching
postmaster.c to report a status string into the pidfile at the same
state-change points already identified as being of interest for systemd
status reporting (cf commit 7d17e683f). pg_ctl no longer needs to link
with libpq at all; all its functions now depend on reading server files.
In support of this, teach AddToDataDirLockFile() to allow addition of
postmaster.pid lines in not-necessarily-sequential order. This is needed
on Windows where the SHMEM_KEY line will never be written at all. We still
have the restriction that we don't want to truncate the pidfile; document
the reasons for that a bit better.
Also, fix the pg_ctl TAP tests so they'll notice if "start -w" mode
is broken --- before, they'd just wait out the sixty seconds until
the loop gives up, and then report success anyway. (Yes, I found that
out the hard way.)
While at it, arrange for pg_ctl to not need to #include miscadmin.h;
as a rather low-level backend header, requiring that to be compilable
client-side is pretty dubious. This requires moving the #define's
associated with the pidfile into a new header file, and moving
PG_BACKEND_VERSIONSTR someplace else. For lack of a clearly better
"someplace else", I put it into port.h, beside the declaration of
find_other_exec(), since most users of that macro are passing the value to
find_other_exec(). (initdb still depends on miscadmin.h, but at least
pg_ctl and pg_upgrade no longer do.)
In passing, fix main.c so that PG_BACKEND_VERSIONSTR actually defines the
output of "postgres -V", which remarkably it had never done before.
Discussion: https://postgr.es/m/CAMkU=1xJW8e+CTotojOMBd-yzUvD0e_JZu2xHo=MnuZ4__m7Pg@mail.gmail.com
Callers of icu_to_uchar() neglected to pfree the result string when done
with it. This results in catastrophic memory leaks in varstr_cmp(),
because of our prevailing assumption that btree comparison functions don't
leak memory. For safety, make all the call sites clean up leaks, though
I suspect that we could get away without it in formatting.c. I audited
callers of icu_from_uchar() as well, but found no places that seemed to
have a comparable issue.
Add function API specifications for icu_to_uchar() and icu_from_uchar();
the lack of any thought-through specification is perhaps not unrelated
to the existence of this bug in the first place. Fix icu_to_uchar()
to guarantee a nul-terminated result; although no existing caller appears
to care, the fact that it would have been nul-terminated except in
extreme corner cases seems ideally designed to bite someone on the rear
someday. Fix ucnv_fromUChars() destCapacity argument --- in the worst
case, that could perhaps have led to a non-nul-terminated result, too.
Fix icu_from_uchar() to have a more reasonable definition of the function
result --- no callers are actually paying attention, so this isn't a live
bug, but it's certainly sloppily designed. Const-ify icu_from_uchar()'s
input string for consistency.
That is not the end of what needs to be done to these functions, but
it's as much as I have the patience for right now.
Discussion: https://postgr.es/m/1955.1498181798@sss.pgh.pa.us
Don't move parenthesized lines to the left, even if that means they
flow past the right margin.
By default, BSD indent lines up statement continuation lines that are
within parentheses so that they start just to the right of the preceding
left parenthesis. However, traditionally, if that resulted in the
continuation line extending to the right of the desired right margin,
then indent would push it left just far enough to not overrun the margin,
if it could do so without making the continuation line start to the left of
the current statement indent. That makes for a weird mix of indentations
unless one has been completely rigid about never violating the 80-column
limit.
This behavior has been pretty universally panned by Postgres developers.
Hence, disable it with indent's new -lpl switch, so that parenthesized
lines are always lined up with the preceding left paren.
This patch is much less interesting than the first round of indent
changes, but also bulkier, so I thought it best to separate the effects.
Discussion: https://postgr.es/m/E1dAmxK-0006EE-1r@gemulon.postgresql.org
Discussion: https://postgr.es/m/30527.1495162840@sss.pgh.pa.us
Change pg_bsd_indent to follow upstream rules for placement of comments
to the right of code, and remove pgindent hack that caused comments
following #endif to not obey the general rule.
Commit e3860ffa4dd0dad0dd9eea4be9cc1412373a8c89 wasn't actually using
the published version of pg_bsd_indent, but a hacked-up version that
tried to minimize the amount of movement of comments to the right of
code. The situation of interest is where such a comment has to be
moved to the right of its default placement at column 33 because there's
code there. BSD indent has always moved right in units of tab stops
in such cases --- but in the previous incarnation, indent was working
in 8-space tab stops, while now it knows we use 4-space tabs. So the
net result is that in about half the cases, such comments are placed
one tab stop left of before. This is better all around: it leaves
more room on the line for comment text, and it means that in such
cases the comment uniformly starts at the next 4-space tab stop after
the code, rather than sometimes one and sometimes two tabs after.
Also, ensure that comments following #endif are indented the same
as comments following other preprocessor commands such as #else.
That inconsistency turns out to have been self-inflicted damage
from a poorly-thought-through post-indent "fixup" in pgindent.
This patch is much less interesting than the first round of indent
changes, but also bulkier, so I thought it best to separate the effects.
Discussion: https://postgr.es/m/E1dAmxK-0006EE-1r@gemulon.postgresql.org
Discussion: https://postgr.es/m/30527.1495162840@sss.pgh.pa.us
The new indent version includes numerous fixes thanks to Piotr Stefaniak.
The main changes visible in this commit are:
* Nicer formatting of function-pointer declarations.
* No longer unexpectedly removes spaces in expressions using casts,
sizeof, or offsetof.
* No longer wants to add a space in "struct structname *varname", as
well as some similar cases for const- or volatile-qualified pointers.
* Declarations using PG_USED_FOR_ASSERTS_ONLY are formatted more nicely.
* Fixes bug where comments following declarations were sometimes placed
with no space separating them from the code.
* Fixes some odd decisions for comments following case labels.
* Fixes some cases where comments following code were indented to less
than the expected column 33.
On the less good side, it now tends to put more whitespace around typedef
names that are not listed in typedefs.list. This might encourage us to
put more effort into typedef name collection; it's not really a bug in
indent itself.
There are more changes coming after this round, having to do with comment
indentation and alignment of lines appearing within parentheses. I wanted
to limit the size of the diffs to something that could be reviewed without
one's eyes completely glazing over, so it seemed better to split up the
changes as much as practical.
Discussion: https://postgr.es/m/E1dAmxK-0006EE-1r@gemulon.postgresql.org
Discussion: https://postgr.es/m/30527.1495162840@sss.pgh.pa.us
Consistently refer to such an entry as a "statistics object", not just
"statistics" or "extended statistics". Previously we had a mismash of
terms, accompanied by utter confusion as to whether the term was
singular or plural. That's not only grating (at least to the ear of
a native English speaker) but could be outright misleading, eg in error
messages that seemed to be referring to multiple objects where only one
could be meant.
This commit fixes the code and a lot of comments (though I may have
missed a few). I also renamed two new SQL functions,
pg_get_statisticsextdef -> pg_get_statisticsobjdef
pg_statistic_ext_is_visible -> pg_statistics_obj_is_visible
to conform better with this terminology.
I have not touched the SGML docs other than fixing those function
names; the docs certainly need work but it seems like a separable task.
Discussion: https://postgr.es/m/22676.1494557205@sss.pgh.pa.us
The mess cleaned up in commit da0759600 is clear evidence that it's a
bug hazard to expect the caller of get_attstatsslot()/free_attstatsslot()
to provide the correct type OID for the array elements in the slot.
Moreover, we weren't even getting any performance benefit from that,
since get_attstatsslot() was extracting the real type OID from the array
anyway. So we ought to get rid of that requirement; indeed, it would
make more sense for get_attstatsslot() to pass back the type OID it found,
in case the caller isn't sure what to expect, which is likely in binary-
compatible-operator cases.
Another problem with the current implementation is that if the stats array
element type is pass-by-reference, we incur a palloc/memcpy/pfree cycle
for each element. That seemed acceptable when the code was written because
we were targeting O(10) array sizes --- but these days, stats arrays are
almost always bigger than that, sometimes much bigger. We can save a
significant number of cycles by doing one palloc/memcpy/pfree of the whole
array. Indeed, in the now-probably-common case where the array is toasted,
that happens anyway so this method is basically free. (Note: although the
catcache code will inline any out-of-line toasted values, it doesn't
decompress them. At the other end of the size range, it doesn't expand
short-header datums either. In either case, DatumGetArrayTypeP would have
to make a copy. We do end up using an extra array copy step if the element
type is pass-by-value and the array length is neither small enough for a
short header nor large enough to have suffered compression. But that
seems like a very acceptable price for winning in pass-by-ref cases.)
Hence, redesign to take these insights into account. While at it,
convert to an API in which we fill a struct rather than passing a bunch
of pointers to individual output arguments. That will make it less
painful if we ever want further expansion of what get_attstatsslot can
pass back.
It's certainly arguable that this is new development and not something to
push post-feature-freeze. However, I view it as primarily bug-proofing
and therefore something that's better to have sooner not later. Since
we aren't quite at beta phase yet, let's put it in.
Discussion: https://postgr.es/m/16364.1494520862@sss.pgh.pa.us
We have now grown enough registerable syscache-invalidation callback
functions that the original assumption that there would be few of them
is causing performance problems. In particular, let's fix things so that
CallSyscacheCallbacks doesn't have to search the whole array to find
which callback(s) to invoke for a given cache ID. Preserve the original
behavior that callbacks are called in order of registration, just in
case there's someplace that depends on that (which I doubt).
In support of this, export the number of syscaches from syscache.h.
People could have found that out anyway from the enum, but adding a
#define makes that much safer.
This provides a useful additional speedup in Mathieu Fenniak's
logical-decoding test case, although we're reaching the point of
diminishing returns there. I think any further improvement will have
to come from reducing the number of cache invalidations that are
triggered in the first place. Still, we can hope that this change
gives some incremental benefit for all invalidation scenarios.
Back-patch to 9.4 where logical decoding was introduced.
Discussion: https://postgr.es/m/CAHoiPjzea6N0zuCi=+f9v_j94nfsy6y8SU7-=bp4=7qw6_i=Rg@mail.gmail.com
A test case provided by Mathieu Fenniak shows that the initial search for
the target catcache in CatalogCacheIdInvalidate consumes a very significant
amount of overhead in cases where cache invalidation is triggered but has
little useful work to do. There is no good reason for that search to exist
at all, as the index array maintained by syscache.c allows direct lookup of
the catcache from its ID. We just need a frontend function in syscache.c,
matching the division of labor for most other cache-accessing operations.
While there's more that can be done in this area, this patch alone reduces
the runtime of Mathieu's example by 2X. We can hope that it offers some
useful benefit in other cases too, although usually cache invalidation
overhead is not such a striking fraction of the total runtime.
Back-patch to 9.4 where logical decoding was introduced. It might be
worth going further back, but presently the only case we know of where
cache invalidation is really a significant burden is in logical decoding.
Also, older branches have fewer catcaches, reducing the possible benefit.
(Note: although this nominally changes catcache's API, we have always
documented CatalogCacheIdInvalidate as a private function, so I would
have little sympathy for an external module calling it directly. So
backpatching should be fine.)
Discussion: https://postgr.es/m/CAHoiPjzea6N0zuCi=+f9v_j94nfsy6y8SU7-=bp4=7qw6_i=Rg@mail.gmail.com
Some selectivity estimation functions run user-supplied operators over
data obtained from pg_statistic without security checks, which allows
those operators to leak pg_statistic data without having privileges on
the underlying tables. Fix by checking that one of the following is
satisfied: (1) the user has table or column privileges on the table
underlying the pg_statistic data, or (2) the function implementing the
user-supplied operator is leak-proof. If neither is satisfied, planning
will proceed as if there are no statistics available.
At least one of these is satisfied in most cases in practice. The only
situations that are negatively impacted are user-defined or
not-leak-proof operators on a security-barrier view.
Reported-by: Robert Haas <robertmhaas@gmail.com>
Author: Peter Eisentraut <peter_e@gmx.net>
Author: Tom Lane <tgl@sss.pgh.pa.us>
Security: CVE-2017-7484
create_singleton_array() was not really as useful as we perhaps thought
when we added it. It had never accreted more than one call site, and is
only saving a dozen lines of code at that one, which is considerably less
bulk than the function itself. Moreover, because of its insistence on
using the caller's fn_extra cache space, it's arguably a coding hazard.
text_to_array_internal() does not currently use fn_extra in any other way,
but if it did it would be subtly broken, since the conflicting fn_extra
uses could be needed within a single query, in the seldom-tested case that
the field separator varies during the query. The same objection seems
likely to apply to any other potential caller.
The replacement code is a bit uglier, because it hardwires knowledge of
the storage parameters of type TEXT, but it's not like we haven't got
dozens or hundreds of other places that do the same. Uglier seems like
a good tradeoff for smaller, faster, and safer.
Per discussion with Neha Khatri.
Discussion: https://postgr.es/m/CAFO0U+_fS5SRhzq6uPG+4fbERhoA9N2+nPrtvaC9mmeWivxbsA@mail.gmail.com
This commit also does
- add REPLICATION_SUBSCRIBERS into config_group
- mark max_logical_replication_workers and max_sync_workers_per_subscription
as REPLICATION_SUBSCRIBERS parameters
- move those parameters into "Subscribers" section in postgresql.conf.sample
Author: Masahiko Sawada, Petr Jelinek and me
Reported-by: Masahiko Sawada
Discussion: http://postgr.es/m/CAD21AoAonSCoa=v=87ZO3vhfUZA1k_E2XRNHTt=xioWGUa+0ug@mail.gmail.com
Add a "copy" argument to make it optional to receive a copy of caller
tuple that is safe to use following a subsequent manipulating of
tuplesort's state. This is a performance optimization. Most existing
tuplesort_gettupleslot() callers are made to opt out of copying.
Existing callers that happen to rely on the validity of tuple memory
beyond subsequent manipulations of the tuplesort request their own
copy.
This brings tuplesort_gettupleslot() in line with
tuplestore_gettupleslot(). In the future, a "copy"
tuplesort_getdatum() argument may be added, that similarly allows
callers to opt out of receiving their own copy of tuple.
In passing, clarify assumptions that callers of other tuplesort fetch
routines may make about tuple memory validity, per gripe from Tom
Lane.
Author: Peter Geoghegan
Discussion: CAM3SWZQWZZ_N=DmmL7tKy_OUjGH_5mN=N=A6h7kHyyDvEhg2DA@mail.gmail.com
This is the SQL standard-conforming variant of PostgreSQL's serial
columns. It fixes a few usability issues that serial columns have:
- CREATE TABLE / LIKE copies default but refers to same sequence
- cannot add/drop serialness with ALTER TABLE
- dropping default does not drop sequence
- need to grant separate privileges to sequence
- other slight weirdnesses because serial is some kind of special macro
Reviewed-by: Vitaly Burovoy <vitaly.burovoy@gmail.com>
For normal commits and aborts we already reset PgXact->xmin,
so we can simply avoid running SnapshotResetXmin() twice.
During performance tests by Alexander Korotkov, diagnosis
by Andres Freund showed PgXact array as a bottleneck. After
manual analysis by me of the code paths that touch those
memory locations, I was able to identify extraneous code
in the main transaction commit path.
Avoiding touching highly contented shmem improves concurrent
performance slightly on all workloads, confirmed by tests
run by Ashutosh Sharma and Alexander Korotkov.
Simon Riggs
Discussion: CANP8+jJdXE9b+b9F8CQT-LuxxO0PBCB-SZFfMVAdp+akqo4zfg@mail.gmail.com