Commit Graph

5048 Commits

Author SHA1 Message Date
9a75803b1a Remove CatalogCacheFlushRelation, and the reloidattr infrastructure that was
needed by nothing else.

The restructuring I just finished doing on cache management exposed to me how
silly this routine was.  Its function was to go into the catcache and blow
away all entries related to a given relation when there was a relcache flush
on that relation.  However, there is no point in removing a catcache entry
if the catalog row it represents is still valid --- and if it isn't valid,
there must have been a catcache entry flush on it, because that's triggered
directly by heap_update or heap_delete on the catalog row.  So this routine
accomplished nothing except to blow away valid cache entries that we'd very
likely be wanting in the near future to help reconstruct the relcache entry.
Dumb.

On top of which, it required a subtle and easy-to-get-wrong attribute in
syscache definitions, ie, the column containing the OID of the related
relation if any.  Removing that is a very useful maintenance simplification.
2010-02-08 05:53:55 +00:00
0a469c8769 Remove old-style VACUUM FULL (which was known for a little while as
VACUUM FULL INPLACE), along with a boatload of subsidiary code and complexity.
Per discussion, the use case for this method of vacuuming is no longer large
enough to justify maintaining it; not to mention that we don't wish to invest
the work that would be needed to make it play nicely with Hot Standby.

Aside from the code directly related to old-style VACUUM FULL, this commit
removes support for certain WAL record types that could only be generated
within VACUUM FULL, redirect-pointer removal in heap_page_prune, and
nontransactional generation of cache invalidation sinval messages (the last
being the sticking point for Hot Standby).

We still have to retain all code that copes with finding HEAP_MOVED_OFF and
HEAP_MOVED_IN flag bits on existing tuples.  This can't be removed as long
as we want to support in-place update from pre-9.0 databases.
2010-02-08 04:33:55 +00:00
1ddc2703a9 Work around deadlock problems with VACUUM FULL/CLUSTER on system catalogs,
as per my recent proposal.

First, teach IndexBuildHeapScan to not wait for INSERT_IN_PROGRESS or
DELETE_IN_PROGRESS tuples to commit unless the index build is checking
uniqueness/exclusion constraints.  If it isn't, there's no harm in just
indexing the in-doubt tuple.

Second, modify VACUUM FULL/CLUSTER to suppress reverifying
uniqueness/exclusion constraint properties while rebuilding indexes of
the target relation.  This is reasonable because these commands aren't
meant to deal with corrupted-data situations.  Constraint properties
will still be rechecked when an index is rebuilt by a REINDEX command.

This gets us out of the problem that new-style VACUUM FULL would often
wait for other transactions while holding exclusive lock on a system
catalog, leading to probable deadlock because those other transactions
need to look at the catalogs too.  Although the real ultimate cause of
the problem is a debatable choice to release locks early after modifying
system catalogs, changing that choice would require pretty serious
analysis and is not something to be undertaken lightly or on a tight
schedule.  The present patch fixes the problem in a fairly reasonable
way and should also improve the speed of VACUUM FULL/CLUSTER a little bit.
2010-02-07 22:40:33 +00:00
b9b8831ad6 Create a "relation mapping" infrastructure to support changing the relfilenodes
of shared or nailed system catalogs.  This has two key benefits:

* The new CLUSTER-based VACUUM FULL can be applied safely to all catalogs.

* We no longer have to use an unsafe reindex-in-place approach for reindexing
  shared catalogs.

CLUSTER on nailed catalogs now works too, although I left it disabled on
shared catalogs because the resulting pg_index.indisclustered update would
only be visible in one database.

Since reindexing shared system catalogs is now fully transactional and
crash-safe, the former special cases in REINDEX behavior have been removed;
shared catalogs are treated the same as non-shared.

This commit does not do anything about the recently-discussed problem of
deadlocks between VACUUM FULL/CLUSTER on a system catalog and other
concurrent queries; will address that in a separate patch.  As a stopgap,
parallel_schedule has been tweaked to run vacuum.sql by itself, to avoid
such failures during the regression tests.
2010-02-07 20:48:13 +00:00
9727c583fe Restructure CLUSTER/newstyle VACUUM FULL/ALTER TABLE support so that swapping
of old and new toast tables can be done either at the logical level (by
swapping the heaps' reltoastrelid links) or at the physical level (by swapping
the relfilenodes of the toast tables and their indexes).  This is necessary
infrastructure for upcoming changes to support CLUSTER/VAC FULL on shared
system catalogs, where we cannot change reltoastrelid.  The physical swap
saves a few catalog updates too.

We unfortunately have to keep the logical-level swap logic because in some
cases we will be adding or deleting a toast table, so there's no possibility
of a physical swap.  However, that only happens as a consequence of schema
changes in the table, which we do not need to support for system catalogs,
so such cases aren't an obstacle for that.

In passing, refactor the cluster support functions a little bit to eliminate
unnecessarily-duplicated code; and fix the problem that while CLUSTER had
been taught to rename the final toast table at need, ALTER TABLE had not.
2010-02-04 00:09:14 +00:00
808969d0e7 Add a message type header to the CopyData messages sent from primary
to standby in streaming replication. While we only have one message type
at the moment, adding a message type header makes this easier to extend.
2010-02-03 09:47:19 +00:00
70a2b05a59 Assorted cleanups in preparation for using a map file to support altering
the relfilenode of currently-not-relocatable system catalogs.

1. Get rid of inval.c's dependency on relfilenode, by not having it emit
smgr invalidations as a result of relcache flushes.  Instead, smgr sinval
messages are sent directly from smgr.c when an actual relation delete or
truncate is done.  This makes considerably more structural sense and allows
elimination of a large number of useless smgr inval messages that were
formerly sent even in cases where nothing was changing at the
physical-relation level.  Note that this reintroduces the concept of
nontransactional inval messages, but that's okay --- because the messages
are sent by smgr.c, they will be sent in Hot Standby slaves, just from a
lower logical level than before.

2. Move setNewRelfilenode out of catalog/index.c, where it never logically
belonged, into relcache.c; which is a somewhat debatable choice as well but
better than before.  (I considered catalog/storage.c, but that seemed too
low level.)  Rename to RelationSetNewRelfilenode.

3. Cosmetic cleanups of some other relfilenode manipulations.
2010-02-03 01:14:17 +00:00
0a27347141 Make RADIUS authentication use pg_getaddrinfo_all() to get address of
the server.

Gets rid of a fairly ugly hack for Solaris, and also provides hostname
and IPV6 support.
2010-02-02 19:09:37 +00:00
d8db6a6096 Fold FindConversion() into FindConversionByName() and remove ACL check.
All callers of FindConversionByName() already do suitable permissions
checking already apart from this function, but this is not just dead
code removal: the unnecessary permissions check can actually lead to
spurious failures - there's no reason why inability to execute the
underlying function should prohibit renaming the conversion, for example.
(The error messages in these cases were also rather poor:
FindConversion would return InvalidOid, eventually leading to a complaint
that the conversion "did not exist", which was not correct.)

KaiGai Kohei
2010-02-02 18:52:33 +00:00
63f9282f6e Tighten integrity checks on ALTER TABLE ... ALTER COLUMN ... RENAME.
When a column is renamed, we recursively rename the same column in
all descendent tables.  But if one of those tables also inherits that
column from a table outside the inheritance hierarchy rooted at the
named table, we must throw an error.  The previous coding correctly
prohibited the rename when the parent had inherited the column from
elsewhere, but overlooked the case where the parent was OK but a child
table also inherited the same column from a second, unrelated parent.

For now, not backpatched due to lack of complaints from the field.

KaiGai Kohei, with further changes by me.
Reviewed by Bernd Helme and Tom Lane.
2010-02-01 19:28:56 +00:00
42a8ab0a14 Augment EXPLAIN output with more details on Hash nodes.
We show the number of buckets, the number of batches (and also the original
number if it has changed), and the peak space used by the hash table.  Minor
executor changes to track peak space used.
2010-02-01 15:43:36 +00:00
296578feb4 Revoke augmentation of WAL records for btree delete, per discussion. 2010-02-01 13:40:28 +00:00
9ea9918e37 Add string_agg aggregate functions. The one argument version concatenates
the input values into a string. The two argument version also does the same
thing, but inserts delimiters between elements.

Original patch by Pavel Stehule, reviewed by David E. Wheeler and me.
2010-02-01 03:14:45 +00:00
c85c941470 Detect early deadlock in Hot Standby when Startup is already waiting. First
stage of required deadlock detection to allow re-enabling max_standby_delay
setting of -1, which is now essential in the absence of improved relation-
specific conflict resoluton. Requested by Greg Stark et al.
2010-01-31 19:01:11 +00:00
0c0203d0a9 Parenthesize this macro, just in case. 2010-01-31 17:35:46 +00:00
6d2bc0a6cf Augment WAL records for btree delete with GetOldestXmin() to reduce
false positives during Hot Standby conflict processing. Simple
patch to enhance conflict processing, following previous discussions.
Controlled by parameter minimize_standby_conflicts = on | off, with
default off allows measurement of performance impact to see whether
it should be set on all the time.
2010-01-29 18:39:05 +00:00
76be0c81cc Filter recovery conflicts based upon dboid from relfilenode of WAL
records for heap and btree. Minor change, mostly API changes to
pass through the required values. This is a simple change though
also provides the refactoring required for further enhancements
to conflict processing using the relOid. Changes only have effect
during Hot Standby.
2010-01-29 17:10:05 +00:00
e7b3349a8a Type table feature
This adds the CREATE TABLE name OF type command, per SQL standard.
2010-01-28 23:21:13 +00:00
083e1b0f27 Add functions to reset the statistics counter for a single table/index or
a single function.
2010-01-28 14:25:41 +00:00
21d3ae09da Define INADDR_NONE on Solaris when it's missing. Per a couple of buildfarm
members complaining.
2010-01-28 11:36:14 +00:00
e0e8b96345 Change a few remaining calls of XLogArchivingActive() to use
XLogIsNeeded() instead, to determine if an otherwise non-logged operation
needs to be logged in WAL for standby servers.

Fujii Masao
2010-01-28 07:31:42 +00:00
1bb2558046 Make standby server continuously retry restoring the next WAL segment with
restore_command, if the connection to the primary server is lost. This
ensures that the standby can recover automatically, if the connection is
lost for a long time and standby falls behind so much that the required
WAL segments have been archived and deleted in the master.

This also makes standby_mode useful without streaming replication; the
server will keep retrying restore_command every few seconds until the
trigger file is found. That's the same basic functionality pg_standby
offers, but without the bells and whistles.

To implement that, refactor the ReadRecord/FetchRecord functions. The
FetchRecord() function introduced in the original streaming replication
patch is removed, and all the retry logic is now in a new function called
XLogReadPage(). XLogReadPage() is now responsible for executing
restore_command, launching walreceiver, and waiting for new WAL to arrive
from primary, as required.

This also changes the life cycle of walreceiver. When launched, it now only
tries to connect to the master once, and exits if the connection fails, or
is lost during streaming for any reason. The startup process detects the
death, and re-launches walreceiver if necessary.
2010-01-27 15:27:51 +00:00
b3daac5a9c Add support for RADIUS authentication. 2010-01-27 12:12:00 +00:00
d879697cd2 Remove the default_do_language parameter, instead making DO use a hardwired
default of "plpgsql".  This is more reasonable than it was when the DO patch
was written, because we have since decided that plpgsql should be installed
by default.  Per discussion, having a parameter for this doesn't seem useful
enough to justify the risk of application breakage if the value is changed
unexpectedly.
2010-01-26 16:33:40 +00:00
9507c8a1db Add get_bit/set_bit functions for bit strings, paralleling those for bytea,
and implement OVERLAY() for bit strings and bytea.

In passing also convert text OVERLAY() to a true built-in, instead of
relying on a SQL function.

Leonardo F, reviewed by Kevin Grittner
2010-01-25 20:55:32 +00:00
959ac58c04 In HS, Startup process sets SIGALRM when waiting for buffer pin. If
woken by alarm we send SIGUSR1 to all backends requesting that they
check to see if they are blocking Startup process. If so, they throw
ERROR/FATAL as for other conflict resolutions. Deadlock stop gap
removed. max_standby_delay = -1 option removed to prevent deadlock.
2010-01-23 16:37:12 +00:00
d779199175 Fix several oversights in previous commit - attribute options patch.
I failed to 'cvs add' the new files and also neglected to bump catversion.
2010-01-22 16:42:31 +00:00
76a47c0e74 Replace ALTER TABLE ... SET STATISTICS DISTINCT with a more general mechanism.
Attributes can now have options, just as relations and tablespaces do, and
the reloptions code is used to parse, validate, and store them.  For
simplicity and because these options are not performance critical, we store
them in a separate cache rather than the main relcache.

Thanks to Alex Hunsaker for the review.
2010-01-22 16:40:19 +00:00
adb7764030 PL/Python DO handler
Also cleaned up some redundancies between the primary error messages and the
error context in PL/Python.

Hannu Valtonen
2010-01-22 15:45:15 +00:00
09b115f706 Write a WAL record whenever we perform an operation without WAL-logging
that would've been WAL-logged if archiving was enabled. If we encounter
such records in archive recovery anyway, we know that some data is
missing from the log. A WARNING is emitted in that case.

Original patch by Fujii Masao, with changes by me.
2010-01-20 19:43:40 +00:00
47ce95a7b9 Now that much of walreceiver has been pulled back into the postgres
binary, revert PGDLLIMPORT decoration of global variables. I'm not sure
if there's any real harm from unnecessary PGDLLIMPORTs, but these are all
internal variables that external modules really shouldn't be messing
with. ThisTimeLineID still needs PGDLLIMPORT.
2010-01-20 18:54:27 +00:00
32bc08b1d4 Rethink the way walreceiver is linked into the backend. Instead than shoving
walreceiver as whole into a dynamically loaded module, split the
libpq-specific parts of it into dynamically loaded module and keep the rest
in the main backend binary.

Although Tom fixed the Windows compilation problems with the old walreceiver
module already, this is a cleaner division of labour and makes the code
more readable. There's also the prospect of adding new transport methods
as pluggable modules in the future, which this patch makes easier, though for
now the API between libpqwalreceiver and walreceiver process should be
considered private.

The libpq-specific module is now in src/backend/replication/libpqwalreceiver,
and the part linked with postgres binary is in
src/backend/replication/walreceiver.c.
2010-01-20 09:16:24 +00:00
7e40cdc075 Add pg_stat_reset_shared('bgwriter') to reset the cluster-wide shared
statistics of the bgwriter.

Greg Smith
2010-01-19 14:11:32 +00:00
4f15699d70 Add pg_table_size() and pg_indexes_size() to provide more user-friendly
wrappers around the pg_relation_size() function.

Bernd Helmle, reviewed by Greg Smith
2010-01-19 05:50:18 +00:00
9a915e596f Improve the handling of SET CONSTRAINTS commands by having them search
pg_constraint before searching pg_trigger.  This allows saner handling of
corner cases; in particular we now say "constraint is not deferrable"
rather than "constraint does not exist" when the command is applied to
a constraint that's inherently non-deferrable.  Per a gripe several months
ago from hubert depesz lubaczewski.

To make this work without breaking user-defined constraint triggers,
we have to add entries for them to pg_constraint.  However, in return
we can remove the pgconstrname column from pg_constraint, which represents
a fairly sizable space savings.  I also replaced the tgisconstraint column
with tgisinternal; the old meaning of tgisconstraint can now be had by
testing for nonzero tgconstraint, while there is no other way to get
the old meaning of nonzero tgconstraint, namely that the trigger was
internally generated rather than being user-created.

In passing, fix an old misstatement in the docs and comments, namely that
pg_trigger.tgdeferrable is exactly redundant with pg_constraint.condeferrable.
Actually, we mark RI action triggers as nondeferrable even when they belong to
a nominally deferrable FK constraint.  The SET CONSTRAINTS code now relies on
that instead of hard-coding a list of exception OIDs.
2010-01-17 22:56:23 +00:00
a8ce974cdd Teach standby conflict resolution to use SIGUSR1
Conflict reason is passed through directly to the backend, so we can
take decisions about the effect of the conflict based upon the local
state. No specific changes, as yet, though this prepares for later work.
CancelVirtualTransaction() sends signals while holding ProcArrayLock.
Introduce errdetail_abort() to give message detail explaining that the
abort was caused by conflict processing. Remove CONFLICT_MODE states
in favour of using PROCSIG_RECOVERY_CONFLICT states directly, for clarity.
2010-01-16 10:05:59 +00:00
c9dc53be77 Huh, apparently on cygwin we HAVE_SIGPROCMASK, so both variants of
the BlockSig/UnBlockSig declaration have to be PGDLLIMPORT'ified.
Per buildfarm results.
2010-01-16 05:52:29 +00:00
47a09eda89 PGDLLIMPORT-ize the remaining variables needed by walreceiver. 2010-01-16 00:04:41 +00:00
08f8d478eb Do parse analysis of an EXPLAIN's contained statement during the normal
parse analysis phase, rather than at execution time.  This makes parameter
handling work the same as it does in ordinary plannable queries, and in
particular fixes the incompatibility that Pavel pointed out with plpgsql's
new handling of variable references.  plancache.c gets a little bit
grottier, but the alternatives seem worse.
2010-01-15 22:36:35 +00:00
40f908bdcd Introduce Streaming Replication.
This includes two new kinds of postmaster processes, walsenders and
walreceiver. Walreceiver is responsible for connecting to the primary server
and streaming WAL to disk, while walsender runs in the primary server and
streams WAL from disk to the client.

Documentation still needs work, but the basics are there. We will probably
pull the replication section to a new chapter later on, as well as the
sections describing file-based replication. But let's do that as a separate
patch, so that it's easier to see what has been added/changed. This patch
also adds a new section to the chapter about FE/BE protocol, documenting the
protocol used by walsender/walreceivxer.

Bump catalog version because of two new functions,
pg_last_xlog_receive_location() and pg_last_xlog_replay_location(), for
monitoring the progress of replication.

Fujii Masao, with additional hacking by me
2010-01-15 09:19:10 +00:00
4cbe473938 Add point_ops opclass for GiST. 2010-01-14 16:31:09 +00:00
e99767bc28 First part of refactoring of code for ResolveRecoveryConflict. Purposes
of this are to centralise the conflict code to allow further change,
as well as to allow passing through the full reason for the conflict
through to the conflicting backends. Backend state alters how we
can handle different types of conflict so this is now required.
As originally suggested by Heikki, no longer optional.
2010-01-14 11:08:02 +00:00
228170410d Please tablespace directories in their own subdirectory so pg_migrator
can upgrade clusters without renaming the tablespace directories.  New
directory structure format is, e.g.:

	$PGDATA/pg_tblspc/20981/PG_8.5_201001061/719849/83292814
2010-01-12 02:42:52 +00:00
6164b4bc19 Some trivial adjustments in comments for struct RelationData. 2010-01-10 22:19:17 +00:00
3bfcccc295 During Hot Standby, fix drop database when sessions idle.
Previously we only cancelled sessions that were in-transaction.

Simple fix is to just cancel all sessions without waiting. Doing
it this way avoids complicating common code paths, which would
not be worth the trouble to cover this rare case.

Problem report and fix by Andres Freund, edited somewhat by me
2010-01-10 15:44:28 +00:00
87091cb1f1 Create typedef pgsocket for storing socket descriptors.
This silences some warnings on Win64. Not using the proper SOCKET datatype
was actually wrong on Win32 as well, but didn't cause any warnings there.

Also create define PGINVALID_SOCKET to indicate an invalid/non-existing
socket, instead of using a hardcoded -1 value.
2010-01-10 14:16:08 +00:00
84b6d5f359 Remove partial, broken support for NULL pointers when fetching attributes.
Previously, fastgetattr() and heap_getattr() tested their fourth argument
against a null pointer, but any attempt to use them with a literal-NULL
fourth argument evaluated to *(void *)0, resulting in a compiler error.
Remove these NULL tests to avoid leading future readers of this code to
believe that this has a chance of working.  Also clean up related legacy
code in nocachegetattr(), heap_getsysattr(), and nocache_index_getattr().

The new coding standard is that any code which calls a getattr-type
function or macro which takes an isnull argument MUST pass a valid
boolean pointer.  Per discussion with Bruce Momjian, Tom Lane, Alvaro
Herrera.
2010-01-10 04:26:36 +00:00
42edbd16fb During Hot Standby, set DatabasePath correctly during relcache init file
deletion, so that we attempt to unlink the correct filepath. unlink()
errors are ignorable there, so lack of a DatabasePath initialization step
did not cause visible problems until a related bug showed up on Solaris.

Code refactored from xact_redo_commit() to
ProcessCommittedInvalidationMessages() in inval.c. Recovery may replay
shared invalidation messages for many databases, so we cannot
SetDatabasePath() once as we do in normal backends. Read the databaseid
from the shared invalidation messages, then set DatabasePath
temporarily before calling RelationCacheInitFileInvalidate().

Problem report by Robert Treat, analysis and fix by me.
2010-01-09 16:49:27 +00:00
83a5a338bf pgBufferUsage needs PGDLLIMPORT for pg_stat_statements on Windows. 2010-01-08 00:48:56 +00:00
50626efe0a Fix 3-parameter form of bit substring() to throw error for negative length,
as required by SQL standard.
2010-01-07 20:17:44 +00:00