This provides a mechanism for specifying conversions between SQL data
types and procedural languages. As examples, there are transforms
for hstore and ltree for PL/Perl and PL/Python.
reviews by Pavel Stěhule and Andres Freund
Now that we use CRC-32C in WAL and the control file, the "traditional" and
"legacy" CRC-32 variants are not used in any frontend programs anymore.
Move the code for those back from src/common to src/backend/utils/hash.
Also move the slicing-by-8 implementation (back) to src/port. This is in
preparation for next patch that will add another implementation that uses
Intel SSE 4.2 instructions to calculate CRC-32C, where available.
It now also reports temporary objects dropped that are local to the
backend. Previously we weren't reporting any temp objects because it
was deemed unnecessary; but as it turns out, it is necessary if we want
to keep close track of DDL command execution inside one session. Temp
objects are reported as living in schema pg_temp, which works because
such a schema-qualification always refers to the temp objects of the
current session.
This is useful to control autovacuum log volume, for situations where
monitoring only a set of tables is necessary.
Author: Michael Paquier
Reviewed by: A team led by Naoya Anzai (also including Akira Kurosawa,
Taiki Kondo, Huong Dangminh), Fujii Masao.
This commit also adds pg_malloc_extended for frontend. These interfaces
can be used to control at a lower level memory allocation using an interface
similar to MemoryContextAllocExtended. For example, the callers can specify
MCXT_ALLOC_NO_OOM if they want to suppress the "out of memory" error while
allocating the memory and handle a NULL return value.
Michael Paquier, reviewed by me.
In 83ff1618 we defined integer limits iff they're not provided by the
system. That turns out not to be the greatest idea because there's
different ways some datatypes can be represented. E.g. on OSX PG's 64bit
datatype will be a 'long int', but OSX unconditionally uses 'long
long'. That disparity then can lead to warnings, e.g. around printf
formats.
One way to fix that would be to back int64 using stdint.h's
int64_t. While a good idea it's not that easy to implement. We would
e.g. need to include stdint.h in our external headers, which we don't
today. Also computing the correct int64 printf formats in that case is
nontrivial.
Instead simply prefix the integer limits with PG_ and define them
unconditionally. I've adjusted all the references to them in code, but
not the ones in comments; the latter seems unnecessary to me.
Discussion: 20150331141423.GK4878@alap3.anarazel.de
... and rename it and its sibling array_offsets to array_position and
array_positions, to account for the changed behavior.
Having the functions return subscripts better matches existing practice,
and is better suited to using the result value as a subscript into the
array directly. For one-based arrays, the new definition is identical
to what was originally committed.
(We use the term "subscript" in the documentation, which is what we use
whenever we talk about arrays; but the functions themselves are named
using the word "position" to match the standard-defined POSITION()
functions.)
Author: Pavel Stěhule
Behavioral problem noted by Dean Rasheed.
This improves on commit bbfd7edae5aa5ad5553d3c7e102f2e450d4380d4 by
making two simple changes:
* pg_attribute_noreturn now takes parentheses, ie pg_attribute_noreturn().
Likewise pg_attribute_unused(), pg_attribute_packed(). This reduces
pgindent's tendency to misformat declarations involving them.
* attributes are now always attached to function declarations, not
definitions. Previously some places were taking creative shortcuts,
which were not merely candidates for bad misformatting by pgindent
but often were outright wrong anyway. (It does little good to put a
noreturn annotation where callers can't see it.) In any case, if
we would like to believe that these macros can be used with non-gcc
compilers, we should avoid gratuitous variance in usage patterns.
I also went through and manually improved the formatting of a lot of
declarations, and got rid of excessively repetitive (and now obsolete
anyway) comments informing the reader what pg_attribute_printf is for.
This adds a new GiST opclass method, 'fetch', which is used to reconstruct
the original Datum from the value stored in the index. Also, the 'canreturn'
index AM interface function gains a new 'attno' argument. That makes it
possible to use index-only scans on a multi-column index where some of the
opclasses support index-only scans but some do not.
This patch adds support in the box and point opclasses. Other opclasses
can added later as follow-on patches (btree_gist would be particularly
interesting).
Anastasia Lubennikova, with additional fixes and modifications by me.
Several submitted and even committed patches have run into the problem
that C89, our baseline, does not provide minimum/maximum values for
various integer datatypes. C99's stdint.h does, but we can't rely on
it.
Several parts of the code defined limits locally, so instead centralize
the definitions to c.h.
This patch also changes the more obvious usages of literal limit values;
there's more places that could be changed, but it's less clear whether
it's beneficial to change those.
Author: Andrew Gierth
Discussion: 87619tc5wc.fsf@news-spur.riddles.org.uk
On platforms where we support 128bit integers, use them to implement
faster transition functions for sum(int8), avg(int8),
var_*(int2/int4),stdev_*(int2/int4). Where not supported continue to use
numeric as a transition type.
In some synthetic benchmarks this has been shown to provide significant
speedups.
Bumps catversion.
Discussion: 544BB5F1.50709@proxel.se
Author: Andreas Karlsson
Reviewed-By: Peter Geoghegan, Petr Jelinek, Andres Freund,
Oskari Saarenmaa, David Rowley
Commit df630b0dd5ea2de52972d456f5978a012436115e moved enum LockWaitPolicy
into its very own header file utils/lockwaitpolicy.h, which does not seem
like a great idea from here. First, it's still a node-related declaration,
and second, a file named like that can never sensibly be used for anything
else. I do not think we want to encourage a one-typedef-per-header-file
approach. The upcoming foreign table inheritance patch was doubling down
on this bad idea by moving enum LockClauseStrength into its *own*
can-never-be-used-for-anything-else file. Instead, let's put them both in
a file named nodes/lockoptions.h. (They do seem to need a separate header
file because we need them in both parsenodes.h and plannodes.h, and we
don't want either of those including the other. Past practice might
suggest adding them to nodes/nodes.h, but they don't seem sufficiently
globally useful to justify that.)
Committed separately since there's no functional change here, just some
header-file refactoring.
Until now __attribute__() was defined to be empty for all compilers but
gcc. That's problematic because it prevents using it in other compilers;
which is necessary e.g. for atomics portability. It's also just
generally dubious to do so in a header as widely included as c.h.
Instead add pg_attribute_format_arg, pg_attribute_printf,
pg_attribute_noreturn macros which are implemented in the compilers that
understand them. Also add pg_attribute_noreturn and pg_attribute_packed,
but don't provide fallbacks, since they can affect functionality.
This means that external code that, possibly unwittingly, relied on
__attribute__ defined to be empty on !gcc compilers may now run into
warnings or errors on those compilers. But there shouldn't be many
occurances of that and it's hard to work around...
Discussion: 54B58BA3.8040302@ohmu.fi
Author: Oskari Saarenmaa, with some minor changes by me.
Commands such as ALTER USER, ALTER GROUP, ALTER ROLE, GRANT, and the
various ALTER OBJECT / OWNER TO, as well as ad-hoc clauses related to
roles such as the AUTHORIZATION clause of CREATE SCHEMA, the FOR clause
of CREATE USER MAPPING, and the FOR ROLE clause of ALTER DEFAULT
PRIVILEGES can now take the keywords CURRENT_USER and SESSION_USER as
user specifiers in place of an explicit user name.
This commit also fixes some quite ugly handling of special standards-
mandated syntax in CREATE USER MAPPING, which in particular would fail
to work in presence of a role named "current_user".
The special role specifiers PUBLIC and NONE also have more consistent
handling now.
Also take the opportunity to add location tracking to user specifiers.
Authors: Kyotaro Horiguchi. Heavily reworked by Álvaro Herrera.
Reviewed by: Rushabh Lathia, Adam Brightwell, Marti Raudsepp.
Previously, we cached domain constraints for the life of a query, or
really for the life of the FmgrInfo struct that was used to invoke
domain_in() or domain_check(). But plpgsql (and probably other places)
are set up to cache such FmgrInfos for the whole lifespan of a session,
which meant they could be enforcing really stale sets of constraints.
On the other hand, searching pg_constraint once per query gets kind of
expensive too: testing says that as much as half the runtime of a
trivial query such as "SELECT 0::domaintype" went into that.
To fix this, delegate the responsibility for tracking a domain's
constraints to the typcache, which has the infrastructure needed to
detect syscache invalidation events that signal possible changes.
This not only removes unnecessary repeat reads of pg_constraint,
but ensures that we never apply stale constraint data: whatever we
use is the current data according to syscache rules.
Unfortunately, the current configuration of the system catalogs means
we have to flush cached domain-constraint data whenever either pg_type
or pg_constraint changes, which happens rather a lot (eg, creation or
deletion of a temp table will do it). It might be worth rearranging
things to split pg_constraint into two catalogs, of which the domain
constraint one would probably be very low-traffic. That's a job for
another patch though, and in any case this patch should improve matters
materially even with that handicap.
This patch makes use of the recently-added memory context reset callback
feature to manage the lifespan of domain constraint caches, so that we
don't risk deleting a cache that might be in the midst of evaluation.
Although this is a bug fix as well as a performance improvement, no
back-patch. There haven't been many if any field complaints about
stale domain constraint checks, so it doesn't seem worth taking the
risk of modifying data structures as basic as MemoryContexts in back
branches.
This makes "ALTER TABLE tabname ALTER tscol TYPE ... USING tscol AT TIME
ZONE 'UTC'" skip rewriting the table when altering from "timestamp" to
"timestamptz" or vice versa. While it would be nicer still to optimize
this in the absence of the USING clause given timezone==UTC, transform
functions must consult IMMUTABLE facts only.
Initial experience with this feature suggests that instances of
MemoryContextCallback are likely to propagate into some widely-used headers
over time. As things stood, that would result in pulling memutils.h or
at least memnodes.h into common headers, which does not seem desirable.
Instead, let's decide that this feature is part of the "ordinary palloc
user" API rather than the "specialized context management" API, and as
such should be declared in palloc.h not memutils.h.
That is, MemoryContextReset() now means what was formerly meant by
MemoryContextResetAndDeleteChildren(), and the latter is now just a macro
alias for the former. If you really want the functionality that was
formerly provided by MemoryContextReset(), what you have to do is
MemoryContextResetChildren() plus MemoryContextResetOnly() (which is a
new API to reset *only* the named context and not touch its children).
The reason for this change is that near fifteen years of experience has
proven that there is noplace where old-style MemoryContextReset() is
actually what you want. Making that the default behavior has led to lots
of context-leakage bugs, while we've not found anyplace where it's actually
necessary to keep the child contexts; at least the standard regression
tests do not reveal anyplace where this change breaks anything. And there
are upcoming patches that will introduce additional reasons why child
contexts need to be removed.
We could change existing calls of MemoryContextResetAndDeleteChildren to be
just MemoryContextReset, but for the moment I'll leave them alone; they're
not costing anything.
This allows cleanup actions to be registered to be called just before a
particular memory context's contents are flushed (either by deletion or
MemoryContextReset). The patch in itself has no use-cases for this, but
several likely reasons for wanting this exist.
In passing, per discussion, rearrange some boolean fields in struct
MemoryContextData so as to avoid wasted padding space. For safety,
this requires making allowInCritSection's existence unconditional;
but I think that's a better approach than what was there anyway.
Instead of having a single knob (checkpoint_segments) that both triggers
checkpoints, and determines how many checkpoints to recycle, they are now
separate concerns. There is still an internal variable called
CheckpointSegments, which triggers checkpoints. But it no longer determines
how many segments to recycle at a checkpoint. That is now auto-tuned by
keeping a moving average of the distance between checkpoints (in bytes),
and trying to keep that many segments in reserve. The advantage of this is
that you can set max_wal_size very high, but the system won't actually
consume that much space if there isn't any need for it. The min_wal_size
sets a floor for that; you can effectively disable the auto-tuning behavior
by setting min_wal_size equal to max_wal_size.
The max_wal_size setting is now the actual target size of WAL at which a
new checkpoint is triggered, instead of the distance between checkpoints.
Previously, you could calculate the actual WAL usage with the formula
"(2 + checkpoint_completion_target) * checkpoint_segments + 1". With this
patch, you set the desired WAL usage with max_wal_size, and the system
calculates the appropriate CheckpointSegments with the reverse of that
formula. That's a lot more intuitive for administrators to set.
Reviewed by Amit Kapila and Venkata Balaji N.
Replace the if-switch-case constructs with two conversion tables,
containing all the supported conversions between human-readable unit
strings and the base units used in GUC variables. This makes the code
easier to read, and makes adding new units simpler.
Previously, each new array created a new memory context that started
out at 8kB. This is incredibly wasteful when there are lots of small
groups of just a few elements each.
Change initArrayResult() and friends to accept a "subcontext" argument
to indicate whether the caller wants the ArrayBuildState allocated in
a new subcontext or not. If not, it can no longer be released
separately from the rest of the memory context.
Fixes bug report by Frank van Vugt on 2013-10-19.
Tomas Vondra. Reviewed by Ali Akbar, Tom Lane, and me.
Replace some bogus "x[1]" declarations with "x[FLEXIBLE_ARRAY_MEMBER]".
Aside from being more self-documenting, this should help prevent bogus
warnings from static code analyzers and perhaps compiler misoptimizations.
This patch is just a down payment on eliminating the whole problem, but
it gets rid of a lot of easy-to-fix cases.
Note that the main problem with doing this is that one must no longer rely
on computing sizeof(the containing struct), since the result would be
compiler-dependent. Instead use offsetof(struct, lastfield). Autoconf
also warns against spelling that offsetof(struct, lastfield[0]).
Michael Paquier, review and additional fixes by me.
There wasn't any good reason for a single C function to implement both
these SQL functions: it saved very little code overall, and it required
significant pushups to re-determine at runtime which case applied. Redoing
it as two functions ends up with just slightly more lines of code, but it's
simpler to understand, and faster too because we need not repeat syscache
lookups on every call.
An important side benefit is that this eliminates the only case in which
different aliases of the same C function had both anyarray and anyelement
arguments at the same position, which would almost always be a mistake.
The opr_sanity regression test will now notice such mistakes since there's
no longer a valid case where it happens.
The four functions array_ref, array_set, array_get_slice, array_set_slice
have traditionally declared their array inputs and results as being of type
"ArrayType *". This is a lie, and has been since Berkeley days, because
they actually also support "fixed-length array" types such as "name" and
"point"; not to mention that the inputs could be toasted. These values
should be declared Datum instead to avoid confusion. The current coding
already risks possible misoptimization by compilers, and it'll get worse
when "expanded" array representations become a valid alternative.
However, there's a fair amount of code using array_ref and array_set with
arrays that *are* known to be ArrayType structures, and there might be more
such places in third-party code. Rather than cluttering those call sites
with PointerGetDatum/DatumGetArrayTypeP cruft, what I did was to rename the
existing functions to array_get_element/array_set_element, fix their
signatures, then reincarnate array_ref/array_set as backwards compatibility
wrappers.
array_get_slice/array_set_slice have no such constituency in the core code,
and probably not in third-party code either, so I just changed their APIs.
To get CRC functionality in a client program, you now need to link with
libpgcommon instead of libpgport. The CRC code has nothing to do with
portability, so libpgcommon is a better home. (libpgcommon didn't exist
when pg_crc.c was originally moved to src/port.)
Remove the possibility to get CRC functionality by just #including
pg_crc_tables.h. I'm not aware of any extensions that actually did that and
couldn't simply link with libpgcommon.
This also moves the pg_crc.h header file from src/include/utils to
src/include/common, which will require changes to any external programs
that currently does #include "utils/pg_crc.h". That seems acceptable, as
include/common is clearly the right home for it now, and the change needed
to any such programs is trivial.
The meta data of PGLZ symbolized by PGLZ_Header is removed, to make
the compression and decompression code independent on the backend-only
varlena facility. PGLZ_Header is being used to store some meta data
related to the data being compressed like the raw length of the uncompressed
record or some varlena-related data, making it unpluggable once PGLZ is
stored in src/common as it contains some backend-only code paths with
the management of varlena structures. The APIs of PGLZ are reworked
at the same time to do only compression and decompression of buffers
without the meta-data layer, simplifying its use for a more general usage.
On-disk format is preserved as well, so there is no incompatibility with
previous major versions of PostgreSQL for TOAST entries.
Exposing compression and decompression APIs of pglz makes possible its
use by extensions and contrib modules. Especially this commit is required
for upcoming WAL compression feature so that the WAL reader facility can
decompress the WAL data by using pglz_decompress.
Michael Paquier, reviewed by me.
This field has been unreferenced since 1998, and does not appear in lseg
values stored on disk (since sizeof(lseg) is only 32 bytes according to
pg_type). There was apparently some idea of maintaining it just in values
appearing in memory, but the bookkeeping required to make that work would
surely far outweigh the cost of recalculating the line's slope when needed.
Remove it to (a) simplify matters and (b) suppress some uninitialized-field
whining from Coverity.
Using the new interface MemoryContextAllocExtended, callers can
specify MCXT_ALLOC_NO_OOM if they are prepared to handle a NULL
return value.
Michael Paquier, reviewed and somewhat revised by me.
While building error messages to return to the user,
BuildIndexValueDescription, ExecBuildSlotValueDescription and
ri_ReportViolation would happily include the entire key or entire row in
the result returned to the user, even if the user didn't have access to
view all of the columns being included.
Instead, include only those columns which the user is providing or which
the user has select rights on. If the user does not have any rights
to view the table or any of the columns involved then no detail is
provided and a NULL value is returned from BuildIndexValueDescription
and ExecBuildSlotValueDescription. Note that, for key cases, the user
must have access to all of the columns for the key to be shown; a
partial key will not be returned.
Further, in master only, do not return any data for cases where row
security is enabled on the relation and row security should be applied
for the user. This required a bit of refactoring and moving of things
around related to RLS- note the addition of utils/misc/rls.c.
Back-patch all the way, as column-level privileges are now in all
supported versions.
This has been assigned CVE-2014-8161, but since the issue and the patch
have already been publicized on pgsql-hackers, there's no point in trying
to hide this commit.
This commit extends the SortSupport infrastructure to allow operator
classes the option to provide abbreviated representations of Datums;
in the case of text, we abbreviate by taking the first few characters
of the strxfrm() blob. If the abbreviated comparison is insufficent
to resolve the comparison, we fall back on the normal comparator.
This can be much faster than the old way of doing sorting if the
first few bytes of the string are usually sufficient to resolve the
comparison.
There is the potential for a performance regression if all of the
strings to be sorted are identical for the first 8+ characters and
differ only in later positions; therefore, the SortSupport machinery
now provides an infrastructure to abort the use of abbreviation if
it appears that abbreviation is producing comparatively few distinct
keys. HyperLogLog, a streaming cardinality estimator, is included in
this commit and used to make that determination for text.
Peter Geoghegan, reviewed by me.
Currently, a backend will reset it's PGXACT->xmin value when it doesn't
have any registered snapshots left. That covered the common case that a
transaction in read committed mode runs several queries, one after each
other, as there would be no snapshots active between those queries.
However, if you hold cursors across each of the query, we didn't get a
chance to reset xmin.
To make that better, keep all the registered snapshots in a pairing heap,
ordered by xmin so that it's always quick to find the snapshot with the
smallest xmin. That allows us to advance PGXACT->xmin whenever the oldest
snapshot is deregistered, even if there are others still active.
Per discussion originally started by Jeff Davis back in 2009 and more
recently by Robert Haas.
As of 9.3, ruleutils.c goes to some lengths to ensure that table and column
aliases used in its output are unique. Of course this takes more time than
was required before, which in itself isn't fatal. However, EXPLAIN was set
up so that recalculation of the unique aliases was repeated for each
subexpression printed in a plan. That results in O(N^2) time and memory
consumption for large plan trees, which did not happen in older branches.
Fortunately, the expensive work is the same across a whole plan tree,
so there is no need to repeat it; we can do most of the initialization
just once per query and re-use it for each subexpression. This buys
back most (not all) of the performance loss since 9.2.
We need an extra ExplainState field to hold the precalculated deparse
context. That's no problem in HEAD, but in the back branches, expanding
sizeof(ExplainState) seems risky because third-party extensions might
have local variables of that struct type. So, in 9.4 and 9.3, introduce
an auxiliary struct to keep sizeof(ExplainState) the same. We should
refactor the APIs to avoid such local variables in future, but that's
material for a separate HEAD-only commit.
Per gripe from Alexey Bashtanov. Back-patch to 9.3 where the issue
was introduced.
This function returns object type and objname/objargs arrays, which can
be passed to pg_get_object_address. This is especially useful because
the textual representation can be copied to a remote server in order to
obtain the corresponding OID-based address. In essence, this function
is the inverse of recently added pg_get_object_address().
Catalog version bumped due to the addition of the new function.
Also add docs to pg_get_object_address.
This reverts commit 60838df922345b26a616e49ac9fab808a35d1f85.
That change needs a bit more thought to be workable. In view of
the potentially machine-dependent stuff that went in today,
we need all of the buildfarm to be testing those other changes.
Hiding context messages usually is not a good idea - except for rather
verbose debugging/development utensils like LOG_DEBUG. There the
amount of repeated context messages just bloat the log without adding
information.
Exposing compression and decompression APIs of pglz makes possible its
use by extensions and contrib modules. pglz_decompress contained a call
to elog to emit an error message in case of corrupted data. This function
is changed to return a status code to let its callers return an error instead.
This commit is required for upcoming WAL compression feature so that
the WAL reader facility can decompress the WAL data by using pglz_decompress.
Michael Paquier
This reverts commit 1826987a46d079458007b7b6bbcbbd852353adbb.
The overall design was deemed unacceptable, in discussion following the
previous commit message; we might find some parts of it still
salvageable, but I don't want to be on the hook for fixing it, so let's
wait until we have a new patch.
This allows access to get_object_address from SQL, which is useful to
obtain OID addressing information from data equivalent to that emitted
by the parser. This is necessary infrastructure of a project to let
replication systems propagate object dropping events to remote servers,
where the schema might be different than the server originating the
DROP.
This patch also adds support for OBJECT_DEFAULT to get_object_address;
that is, it is now possible to refer to a column's default value.
Catalog version bumped due to the new function.
Reviewed by Stephen Frost, Heikki Linnakangas, Robert Haas, Andres
Freund, Abhijit Menon-Sen, Adam Brightwell.
The previous representation using a boolean column for each attribute
would not scale as well as we want to add further attributes.
Extra auxilliary functions are added to go along with this change, to
make up for the lost convenience of access of the old representation.
Catalog version bumped due to change in catalogs and the new functions.
Author: Adam Brightwell, minor tweaks by Álvaro
Reviewed by: Stephen Frost, Andres Freund, Álvaro Herrera
Previously, if you wanted anything besides C-string hash keys, you had to
specify a custom hashing function to hash_create(). Nearly all such
callers were specifying tag_hash or oid_hash; which is tedious, and rather
error-prone, since a caller could easily miss the opportunity to optimize
by using hash_uint32 when appropriate. Replace this with a design whereby
callers using simple binary-data keys just specify HASH_BLOBS and don't
need to mess with specific support functions. hash_create() itself will
take care of optimizing when the key size is four bytes.
This nets out saving a few hundred bytes of code space, and offers
a measurable performance improvement in tidbitmap.c (which was not
exploiting the opportunity to use hash_uint32 for its 4-byte keys).
There might be some wins elsewhere too, I didn't analyze closely.
In future we could look into offering a similar optimized hashing function
for 8-byte keys. Under this design that could be done in a centralized
and machine-independent fashion, whereas getting it right for keys of
platform-dependent sizes would've been notationally painful before.
For the moment, the old way still works fine, so as not to break source
code compatibility for loadable modules. Eventually we might want to
remove tag_hash and friends from the exported API altogether, since there's
no real need for them to be explicitly referenced from outside dynahash.c.
Teodor Sigaev and Tom Lane