If a standby is broadcasting reply messages and we have named
one or more standbys in synchronous_standby_names then allow
users who set synchronous_replication to wait for commit, which
then provides strict data integrity guarantees. Design avoids
sending and receiving transaction state information so minimises
bookkeeping overheads. We synchronize with the highest priority
standby that is connected and ready to synchronize. Other standbys
can be defined to takeover in case of standby failure.
This version has very strict behaviour; more relaxed options
may be added at a later date.
Simon Riggs and Fujii Masao, with reviews by Yeb Havinga, Jaime
Casanova, Heikki Linnakangas and Robert Haas, plus the assistance
of many other design reviewers.
The original scheme for this was to symlink plpython.$DLSUFFIX to
plpython2.$DLSUFFIX, but that doesn't work on Windows, and only
accidentally failed to fail because of the way that CREATE LANGUAGE created
or didn't create new C functions. My changes of yesterday exposed the
weakness of that approach. To fix, get rid of the symlink and make
pg_pltemplate show what's really going on.
This mostly just involves creating control, install, and
update-from-unpackaged scripts for them. However, I had to adjust plperl
and plpython to not share the same support functions between variants,
because we can't put the same function into multiple extensions.
catversion bump forced due to new contents of pg_pltemplate, and because
initdb now installs plpgsql as an extension not a bare language.
Add support for regression testing these as extensions not bare
languages.
Fix a couple of other issues that popped up while testing this: my initial
hack at pg_dump binary-upgrade support didn't work right, and we don't want
an extra schema permissions test after all.
Documentation changes still to come, but I'm committing now to see
whether the MSVC build scripts need work (likely they do).
It is possible that an expression ends up with a collatable type but
without a collation. CREATE TABLE AS could then create a table based
on that. But such a column cannot be dumped with valid SQL syntax, so
we disallow creating such a column.
per test report from Noah Misch
Remove the unconditional superuser permissions check in CREATE EXTENSION,
and instead define a "superuser" extension property, which when false
(not the default) skips the superuser permissions check. In this case
the calling user only needs enough permissions to execute the commands
in the extension's installation script. The superuser property is also
enforced in the same way for ALTER EXTENSION UPDATE cases.
In other ALTER EXTENSION cases and DROP EXTENSION, test ownership of
the extension rather than superuserness. ALTER EXTENSION ADD/DROP needs
to insist on ownership of the target object as well; to do that without
duplicating code, refactor comment.c's big switch for permissions checks
into a separate function in objectaddress.c.
I also removed the superuserness checks in pg_available_extensions and
related functions; there's no strong reason why everybody shouldn't
be able to see that info.
Also invent an IF NOT EXISTS variant of CREATE EXTENSION, and use that
in pg_dump, so that dumps won't fail for installed-by-default extensions.
We don't have any of those yet, but we will soon.
This is all per discussion of wrapping the standard procedural languages
into extensions. I'll make those changes in a separate commit; this is
just putting the core infrastructure in place.
Instead of manually maintaining the "implementation of XXX operator"
comments in pg_proc.h, delete all those entries and let initdb create
them via a join. To let initdb figure out which name to use when there
is a conflict, change the comments for deprecated operators to say they
are deprecated --- which seems like a good thing to do anyway.
Historically, we've not had separate comments for built-in pg_operator
entries, but relied on the comments for the underlying functions. The
trouble with this approach is that there isn't much of anything to suggest
to users that they'd be better off using the operators instead. So, move
all the relevant comments into pg_operator, and give each underlying
function a comment that just says "implementation of XXX operator".
There are only about half a dozen cases where it seems reasonable to use
the underlying function interchangeably with the operator; in these cases
I left the same comment in place on the function as on the operator.
While at it, establish a policy that every built-in function and operator
entry should have a comment: there are now queries in the opr_sanity
regression test that will complain if one doesn't. This only required
adding a dozen or two more entries than would have been there anyway.
I also spent some time trying to eliminate gratuitous inconsistencies in
the style of the comments, though it's hopeless to suppose that more won't
creep in soon enough.
Per my proposal of 2010-10-15.
This patch implements data-modifying WITH queries according to the
semantics that the updates all happen with the same command counter value,
and in an unspecified order. Therefore one WITH clause can't see the
effects of another, nor can the outer query see the effects other than
through the RETURNING values. And attempts to do conflicting updates will
have unpredictable results. We'll need to document all that.
This commit just fixes the code; documentation updates are waiting on
author.
Marko Tiikkaja and Hitoshi Harada
The recent additions for FDW support required checking foreign-table-ness
in several places in the parse/plan chain. While it's not clear whether
that would really result in a noticeable slowdown, it seems best to avoid
any performance risk by keeping a copy of the relation's relkind in
RangeTblEntry. That might have some other uses later, anyway.
Per discussion.
void_send is useful for the same reason that void_out doesn't throw error,
namely that someone might do "select void_returning_func(...)" from a
client that prefers to operate in binary mode. The void_recv function may
or may not have any practical use, but we provide it for symmetry.
Radosław Smogura
Add a fdwhandler column to pg_foreign_data_wrapper, plus HANDLER options
in the CREATE FOREIGN DATA WRAPPER and ALTER FOREIGN DATA WRAPPER commands,
plus pg_dump support for same. Also invent a new pseudotype fdw_handler
with properties similar to language_handler.
This is split out of the "FDW API" patch for ease of review; it's all stuff
we will certainly need, regardless of any other details of the FDW API.
FDW handler functions will not actually get called yet.
In passing, fix some omissions and infelicities in foreigncmds.c.
Shigeru Hanada, Jan Urbanski, Heikki Linnakangas
They share the same locking namespace with the existing session-level
advisory locks, but they are automatically released at the end of the
current transaction and cannot be released explicitly via unlock
functions.
Marko Tiikkaja, reviewed by me.
ts_typanalyze.c computes MCE statistics as fractions of the non-null rows,
which seems fairly reasonable, and anyway changing it in released versions
wouldn't be a good idea. But then ts_selfuncs.c has to account for that.
Failure to do so results in overestimates in columns with a significant
fraction of null documents. Back-patch to 8.4 where this stuff was
introduced.
Jesper Krogh
These are needed to support reloading dumps of 9.0 installations containing
contrib/intarray or contrib/tsearch2. Since not only regular dump/reload
but binary upgrade would fail, it seems worth the trouble to carry these
stubs for awhile. Note that the contrib opclasses referencing these
functions will still work fine, since GIN doesn't actually pay any
attention to the declared signature of a support function.
The original design of pg_available_extensions did not consider the
possibility of version-specific control files. Split it into two views:
pg_available_extensions shows information that is generic about an
extension, while pg_available_extension_versions shows all available
versions together with information that could be version-dependent.
Also, add an SRF pg_extension_update_paths() to assist in checking that
a collection of update scripts provide sane update path sequences.
- collowner field
- CREATE COLLATION
- ALTER COLLATION
- DROP COLLATION
- COMMENT ON COLLATION
- integration with extensions
- pg_dump support for the above
- dependency management
- psql tab completion
- psql \dO command
This follows recent discussions, so it's quite a bit different from
Dimitri's original. There will probably be more changes once we get a bit
of experience with it, but let's get it in and start playing with it.
This is still just core code. I'll start converting contrib modules
shortly.
Dimitri Fontaine and Tom Lane
the standby has written, flushed, and applied the WAL. At the moment, this
is for informational purposes only, the values are only shown in
pg_stat_replication system view, but in the future they will also be needed
for synchronous replication.
Extracted from Simon riggs' synchronous replication patch by Robert Haas, with
some tweaking by me.
Tracks one counter for each database, which is reset whenever
the statistics for any individual object inside the database is
reset, and one counter for the background writer.
Tomas Vondra, reviewed by Greg Smith
This patch adds the server infrastructure to support extensions.
There is still one significant loose end, namely how to make it play nice
with pg_upgrade, so I am not yet committing the changes that would make
all the contrib modules depend on this feature.
In passing, fix a disturbingly large amount of breakage in
AlterObjectNamespace() and callers.
Dimitri Fontaine, reviewed by Anssi Kääriäinen,
Itagaki Takahiro, Tom Lane, and numerous others
This adds collation support for columns and domains, a COLLATE clause
to override it per expression, and B-tree index support.
Peter Eisentraut
reviewed by Pavel Stehule, Itagaki Takahiro, Robert Haas, Noah Misch
new recovery.conf parameter recovery_target_name allows PITR to
specify named points as recovery targets.
Jaime Casanova, reviewed by Euler Taveira de Oliveira, plus minor edits
FK constraints that are marked NOT VALID may later be VALIDATED, which uses an
ShareUpdateExclusiveLock on constraint table and RowShareLock on referenced
table. Significantly reduces lock strength and duration when adding FKs.
New state visible from psql.
Simon Riggs, with reviews from Marko Tiikkaja and Robert Haas
Until now, our Serializable mode has in fact been what's called Snapshot
Isolation, which allows some anomalies that could not occur in any
serialized ordering of the transactions. This patch fixes that using a
method called Serializable Snapshot Isolation, based on research papers by
Michael J. Cahill (see README-SSI for full references). In Serializable
Snapshot Isolation, transactions run like they do in Snapshot Isolation,
but a predicate lock manager observes the reads and writes performed and
aborts transactions if it detects that an anomaly might occur. This method
produces some false positives, ie. it sometimes aborts transactions even
though there is no anomaly.
To track reads we implement predicate locking, see storage/lmgr/predicate.c.
Whenever a tuple is read, a predicate lock is acquired on the tuple. Shared
memory is finite, so when a transaction takes many tuple-level locks on a
page, the locks are promoted to a single page-level lock, and further to a
single relation level lock if necessary. To lock key values with no matching
tuple, a sequential scan always takes a relation-level lock, and an index
scan acquires a page-level lock that covers the search key, whether or not
there are any matching keys at the moment.
A predicate lock doesn't conflict with any regular locks or with another
predicate locks in the normal sense. They're only used by the predicate lock
manager to detect the danger of anomalies. Only serializable transactions
participate in predicate locking, so there should be no extra overhead for
for other transactions.
Predicate locks can't be released at commit, but must be remembered until
all the transactions that overlapped with it have completed. That means that
we need to remember an unbounded amount of predicate locks, so we apply a
lossy but conservative method of tracking locks for committed transactions.
If we run short of shared memory, we overflow to a new "pg_serial" SLRU
pool.
We don't currently allow Serializable transactions in Hot Standby mode.
That would be hard, because even read-only transactions can cause anomalies
that wouldn't otherwise occur.
Serializable isolation mode now means the new fully serializable level.
Repeatable Read gives you the old Snapshot Isolation level that we have
always had.
Kevin Grittner and Dan Ports, reviewed by Jeff Davis, Heikki Linnakangas and
Anssi Kääriäinen
There isn't any need to track this state on a table-wide basis, and trying
to do so introduces undesirable semantic fuzziness. Move the flag to
pg_index, where it clearly describes just a single index and can be
immutable after index creation.
This feature allows a unique or pkey constraint to be created using an
already-existing unique index. While the constraint isn't very
functionally different from the bare index, it's nice to be able to do that
for documentation purposes. The main advantage over just issuing a plain
ALTER TABLE ADD UNIQUE/PRIMARY KEY is that the index can be created with
CREATE INDEX CONCURRENTLY, so that there is not a long interval where the
table is locked against updates.
On the way, refactor some of the code in DefineIndex() and index_create()
so that we don't have to pass through those functions in order to create
the index constraint's catalog entries. Also, in parse_utilcmd.c, pass
around the ParseState pointer in struct CreateStmtContext to save on
notation, and add error location pointers to some error reports that didn't
have one before.
Gurjeet Singh, reviewed by Steve Singer and Tom Lane
Failure to do so can lead to constraint violations. This was broken by
commit 1ddc2703a936d03953657f43345460b9242bbed1 on 2010-02-07, so
back-patch to 9.0.
Noah Misch. Regression test by me.
The only use we have had for amindexnulls is in determining whether an
index is safe to cluster on; but since the addition of the amclusterable
flag, that usage is pretty redundant.
In passing, clean up assorted sloppiness from the last patch that touched
pg_am.h: Natts_pg_am was wrong, and ambuildempty was not documented.
Add more "internal" arguments so that these pg_proc entries reflect the
current preferred API. This is purely a cosmetic change, since GIN doesn't
actually consult the pg_proc entry when calling a support function.
Accordingly, no catversion bump.
This can be overriden by using NOREPLICATION on the CREATE ROLE
statement, but by default they will have it, making it backwards
compatible and "less surprising" (given that superusers normally
override all checks).
Add new function pg_sequence_parameters that returns a sequence's start,
minimum, maximum, increment, and cycle values, and use that in the view.
(bug #5662; design suggestion by Tom Lane)
Also slightly adjust the view's column order and permissions after review of
SQL standard.
Foreign tables are a core component of SQL/MED. This commit does
not provide a working SQL/MED infrastructure, because foreign tables
cannot yet be queried. Support for foreign table scans will need to
be added in a future patch. However, this patch creates the necessary
system catalog structure, syntax support, and support for ancillary
operations such as COMMENT and SECURITY LABEL.
Shigeru Hanada, heavily revised by Robert Haas
The contents of an unlogged table are WAL-logged; thus, they are not
available on standby servers and are truncated whenever the database
system enters recovery. Indexes on unlogged tables are also unlogged.
Unlogged GiST indexes are not currently supported.
This privilege is required to do Streaming Replication, instead of
superuser, making it possible to set up a SR slave that doesn't
have write permissions on the master.
Superuser privileges do NOT override this check, so in order to
use the default superuser account for replication it must be
explicitly granted the REPLICATION permissions. This is backwards
incompatible change, in the interest of higher default security.
This is to avoid use of the C++ keywords "bitand" and "bitor" in
the header file utils/varbit.h. Note the functions' SQL-level
names are not changed, only their C-level names.
In passing, make some comments in varbit.c conform to project-standard
layout.