Commit Graph

621 Commits

Author SHA1 Message Date
cdd46c7654 Start background writer during archive recovery. Background writer now performs
its usual buffer cleaning duties during archive recovery, and it's responsible
for performing restartpoints.

This requires some changes in postmaster. When the startup process has done
all the initialization and is ready to start WAL redo, it signals the
postmaster to launch the background writer. The postmaster is signaled again
when the point in recovery is reached where we know that the database is in
consistent state. Postmaster isn't interested in that at the moment, but
that's the point where we could let other backends in to perform read-only
queries. The postmaster is signaled third time when the recovery has ended,
so that postmaster knows that it's safe to start accepting connections.

The startup process now traps SIGTERM, and performs a "clean" shutdown. If
you do a fast shutdown during recovery, a shutdown restartpoint is performed,
like a shutdown checkpoint, and postmaster kills the processes cleanly. You
still have to continue the recovery at next startup, though.

Currently, the background writer is only launched during archive recovery.
We could launch it during crash recovery as well, but it seems better to keep
that codepath as simple as possible, for the sake of robustness. And it
couldn't do any restartpoints during crash recovery anyway, so it wouldn't be
that useful.

log_restartpoints is gone. Use log_checkpoints instead. This is yet to be
documented.

This whole operation is a pre-requisite for Hot Standby, but has some value of
its own whether the hot standby patch makes 8.4 or not.

Simon Riggs, with lots of modifications by me.
2009-02-18 15:58:41 +00:00
3a5b773715 Allow reloption names to have qualifiers, initially supporting a TOAST
qualifier, and add support for this in pg_dump.

This allows TOAST tables to have user-defined fillfactor, and will also
enable us to move the autovacuum parameters to reloptions without taking
away the possibility of setting values for TOAST tables.
2009-02-02 19:31:40 +00:00
c0f92b57dc Allow extracting and parsing of reloptions from a bare pg_class tuple, and
refactor the relcache code that used to do that.  This allows other callers
(particularly autovacuum) to do the same without necessarily having to open
and lock a table.
2009-01-26 19:41:06 +00:00
b2a667b9ee Add a new option to RestoreBkpBlocks() to indicate if a cleanup lock should
be used instead of the normal exclusive lock, and make WAL redo functions
responsible for calling RestoreBkpBlocks(). They know better what kind of a
lock they need.

At the moment, this just moves things around with no functional change, but
makes the hot standby patch that's under review cleaner.
2009-01-20 18:59:37 +00:00
8ebe1e356c Simplify the writing of amoptions routines by introducing a convenience
fillRelOptions routine that stores the parsed values in the struct using a
table-based approach.  Per Tom suggestion.  Also remove the "continue"
in HANDLE_*_RELOPTION macros, which were useless and in spirit they were
assuming too much of how the macros were going to be used.  (Note that these
macros are now unused, but the intention is to introduce some usage in a
future autovacuum patch, which is why they weren't completely removed.)

Also, do not call the string validation routine when not validating.  It seems
less error-prone this way, per commentary on the amoptions SGML docs.
2009-01-12 21:02:15 +00:00
43a57cf365 Revise the TIDBitmap API to support multiple concurrent iterations over a
bitmap.  This is extracted from Greg Stark's posix_fadvise patch; it seems
worth committing separately, since it's potentially useful independently of
posix_fadvise.
2009-01-10 21:08:36 +00:00
b813c8daca A couple further reloptions improvements, per KaiGai Kohei: add a validation
function to the string type and add a couple of macros for string handling.

In passing, fix an off-by-one bug of mine.
2009-01-08 19:34:41 +00:00
b25433da5d Fix string reloption handling, per KaiGai Kohei. 2009-01-06 14:47:37 +00:00
ba748f7a11 Change the reloptions machinery to use a table-based parser, and provide
a more complete framework for writing custom option processing routines
by user-defined access methods.

Catalog version bumped due to the general API changes, which are going to
affect user-defined "amoptions" routines.
2009-01-05 17:14:28 +00:00
511db38ace Update copyright for 2009. 2009-01-01 17:24:05 +00:00
4942ea2870 The flag to mark dead tuples is nowadays called LP_DEAD, not LP_DELETE.
Simon Riggs.
2008-12-30 16:24:37 +00:00
0f864a63ea Reduce some rel.h inclusions, and add pg_list.h to pg_proc_fn.h. 2008-12-12 22:56:00 +00:00
608195a3a3 Introduce visibility map. The visibility map is a bitmap with one bit per
heap page, where a set bit indicates that all tuples on the page are
visible to all transactions, and the page therefore doesn't need
vacuuming. It is stored in a new relation fork.

Lazy vacuum uses the visibility map to skip pages that don't need
vacuuming. Vacuum is also responsible for setting the bits in the map.
In the future, this can hopefully be used to implement index-only-scans,
but we can't currently guarantee that the visibility map is always 100%
up-to-date.

In addition to the visibility map, there's a new PD_ALL_VISIBLE flag on
each heap page, also indicating that all tuples on the page are visible to
all transactions. It's important that this flag is kept up-to-date. It
is also used to skip visibility tests in sequential scans, which gives a
small performance gain on seqscans.
2008-12-03 13:05:22 +00:00
c1f3073333 Clean up the API for DestReceiver objects by eliminating the assumption
that a Portal is a useful and sufficient additional argument for
CreateDestReceiver --- it just isn't, in most cases.  Instead formalize
the approach of passing any needed parameters to the receiver separately.

One unexpected benefit of this change is that we can declare typedef Portal
in a less surprising location.

This patch is just code rearrangement and doesn't change any functionality.
I'll tackle the HOLD-cursor-vs-toast problem in a follow-on patch.
2008-11-30 20:51:25 +00:00
3396000684 Rethink the way FSM truncation works. Instead of WAL-logging FSM
truncations in FSM code, call FreeSpaceMapTruncateRel from smgr_redo. To
make that cleaner from modularity point of view, move the WAL-logging one
level up to RelationTruncate, and move RelationTruncate and all the
related WAL-logging to new src/backend/catalog/storage.c file. Introduce
new RelationCreateStorage and RelationDropStorage functions that are used
instead of calling smgrcreate/smgrscheduleunlink directly. Move the
pending rel deletion stuff from smgrcreate/smgrscheduleunlink to the new
functions. This leaves smgr.c as a thin wrapper around md.c; all the
transactional stuff is now in storage.c.

This will make it easier to add new forks with similar truncation logic,
like the visibility map.
2008-11-19 10:34:52 +00:00
03e5248d0f Replace the usage of heap_addheader to create pg_attribute tuples with regular
heap_form_tuple.  Since this removes the last remaining caller of
heap_addheader, remove it.

Extracted from the column privileges patch from Stephen Frost, with further
code cleanups by me.
2008-11-14 01:57:42 +00:00
85e2cedf98 Improve bulk-insert performance by keeping the current target buffer pinned
(but not locked, as that would risk deadlocks).  Also, make it work in a small
ring of buffers to avoid having bulk inserts trash the whole buffer arena.

Robert Haas, after an idea of Simon Riggs'.
2008-11-06 20:51:15 +00:00
b4eae023bb Clean up the messy semantics (not to mention inefficiency) of PageGetTempPage
by splitting it into three functions with better-defined behaviors.

Zdenek Kotala
2008-11-03 20:47:49 +00:00
902d1cb35f Remove all uses of the deprecated functions heap_formtuple, heap_modifytuple,
and heap_deformtuple in favor of the newer functions heap_form_tuple et al
(which do the same things but use bool control flags instead of arbitrary
char values).  Eliminate the former duplicate coding of these functions,
reducing the deprecated functions to mere wrappers around the newer ones.
We can't get rid of them entirely because add-on modules probably still
contain many instances of the old coding style.

Kris Jurka
2008-11-02 01:45:28 +00:00
19c8dc839b Unite ReadBufferWithFork, ReadBufferWithStrategy, and ZeroOrReadBuffer
functions into one ReadBufferExtended function, that takes the strategy
and mode as argument. There's three modes, RBM_NORMAL which is the default
used by plain ReadBuffer(), RBM_ZERO, which replaces ZeroOrReadBuffer, and
a new mode RBM_ZERO_ON_ERROR, which allows callers to read corrupt pages
without throwing an error. The FSM needs the new mode to recover from
corrupt pages, which could happend if we crash after extending an FSM file,
and the new page is "torn".

Add fork number to some error messages in bufmgr.c, that still lacked it.
2008-10-31 15:05:00 +00:00
d26bf23f34 Arrange to squeeze out the MINIMAL_TUPLE_PADDING in the tuple representation
written to temp files by tuplesort.c and tuplestore.c.  This saves 2 bytes per
row for 32-bit machines, and 6 bytes per row for 64-bit machines, which seems
worth the slight additional uglification of the tuple read/write routines.
2008-10-28 15:51:03 +00:00
b9856b67a7 Fix GiST's killing tuple: GISTScanOpaque->curpos wasn't
correctly set. As result, killtuple() marks as dead
wrong tuple on page. Bug was introduced by me while fixing
possible duplicates during GiST index scan.
2008-10-22 12:53:56 +00:00
06da3c570f Rework subtransaction commit protocol for hot standby.
This patch eliminates the marking of subtransactions as SUBCOMMITTED in pg_clog
during their commit; instead they remain in-progress until main transaction
commit.  At main transaction commit, the commit protocol is atomic-by-page
instead of one transaction at a time.  To avoid a race condition with some
subtransactions appearing committed before others in the case where they span
more than one pg_clog page, we conserve the logic that marks them subcommitted
before marking the parent committed.

Simon Riggs with minor help from me
2008-10-20 19:18:18 +00:00
77db9d9ff2 Remove mark/restore support in GIN and GiST indexes.
Per Tom's comment.
Also revome useless GISTScanOpaque->flags field.
2008-10-20 13:39:44 +00:00
af59a0650b Remove useless mark/restore support in hash index AM, per discussion.
(I'm leaving GiST/GIN cleanup to Teodor.)
2008-10-17 23:50:57 +00:00
beeb3562dd During repeated rescan of GiST index it's possible that scan key
is NULL but SK_SEARCHNULL is not set. Add checking IS NULL of keys
to set during key initialization. If key is NULL and SK_SEARCHNULL is not
set then nothnig can be satisfied.
With assert-enabled compilation that causes coredump.

Bug was introduced in 8.3 by support of IS NULL index scan.
2008-10-17 17:02:21 +00:00
3437286356 Modify the parser's error reporting to include a specific hint for the case
of referencing a WITH item that's not yet in scope according to the SQL
spec's semantics.  This seems to be an easy error to make, and the bare
"relation doesn't exist" message doesn't lead one's mind in the correct
direction to fix it.
2008-10-08 01:14:44 +00:00
15c121b3ed Rewrite the FSM. Instead of relying on a fixed-size shared memory segment, the
free space information is stored in a dedicated FSM relation fork, with each
relation (except for hash indexes; they don't use FSM).

This eliminates the max_fsm_relations and max_fsm_pages GUC options; remove any
trace of them from the backend, initdb, and documentation.

Rewrite contrib/pg_freespacemap to match the new FSM implementation. Also
introduce a new variant of the get_raw_page(regclass, int4, int4) function in
contrib/pageinspect that let's you to return pages from any relation fork, and
a new fsm_page_contents() function to inspect the new FSM pages.
2008-09-30 10:52:14 +00:00
4adc2f72a4 Change hash indexes to store only the hash code rather than the whole indexed
value.  This means that hash index lookups are always lossy and have to be
rechecked when the heap is visited; however, the gain in index compactness
outweighs this when the indexed values are wide.  Also, we only need to
perform datatype comparisons when the hash codes match exactly, rather than
for every entry in the hash bucket; so it could also win for datatypes that
have expensive comparison functions.  A small additional win is gained by
keeping hash index pages sorted by hash code and using binary search to reduce
the number of index tuples we have to look at.

Xiao Meng

This commit also incorporates Zdenek Kotala's patch to isolate hash metapages
and hash bitmaps a bit better from the page header datastructures.
2008-09-15 18:43:41 +00:00
1dcf6fdf1b Fix possible duplicate tuples while GiST scan. Now page is processed
at once and ItemPointers are collected in memory.

Remove tuple's killing by killtuple() if tuple was moved to another
page - it could produce unaceptable overhead.

Backpatch up to 8.1 because the bug was introduced by GiST's concurrency support.
2008-08-23 10:37:24 +00:00
3f0e808c4a Introduce the concept of relation forks. An smgr relation can now consist
of multiple forks, and each fork can be created and grown separately.

The bulk of this patch is about changing the smgr API to include an extra
ForkNumber argument in every smgr function. Also, smgrscheduleunlink and
smgrdounlink no longer implicitly call smgrclose, because other forks might
still exist after unlinking one. The callers of those functions have been
modified to call smgrclose instead.

This patch in itself doesn't have any user-visible effect, but provides the
infrastructure needed for upcoming patches. The additional forks envisioned
are a rewritten FSM implementation that doesn't rely on a fixed-size shared
memory block, and a visibility map to allow skipping portions of a table in
VACUUM that have no dead tuples.
2008-08-11 11:05:11 +00:00
6816577a78 Change the PageGetContents() macro to guarantee its result is maxalign'd,
thereby forestalling any problems with alignment of the data structure placed
there.  Since SizeOfPageHeaderData is maxalign'd anyway in 8.3 and HEAD, this
does not actually change anything right now, but it is foreseeable that the
header size will change again someday.  I had to fix a couple of places that
were assuming that the content offset is just SizeOfPageHeaderData rather than
MAXALIGN(SizeOfPageHeaderData).  Per discussion of Zdenek's page-macros patch.
2008-07-13 21:50:04 +00:00
9d035f4254 Clean up the use of some page-header-access macros: principally, use
SizeOfPageHeaderData instead of sizeof(PageHeaderData) in places where that
makes the code clearer, and avoid casting between Page and PageHeader where
possible.  Zdenek Kotala, with some additional cleanup by Heikki Linnakangas.

I did not apply the parts of the proposed patch that would have resulted in
slightly changing the on-disk format of hash indexes; it seems to me that's
not a win as long as there's any chance of having in-place upgrade for 8.4.
2008-07-13 20:45:47 +00:00
27cb66fdfe Multi-column GIN indexes. Teodor Sigaev 2008-07-11 21:06:29 +00:00
a3540b0f65 Improve our #include situation by moving pointer types away from the
corresponding struct definitions.  This allows other headers to avoid including
certain highly-loaded headers such as rel.h and relscan.h, instead using just
relcache.h, heapam.h or genam.h, which are more lightweight and thus cause less
unnecessary dependencies.
2008-06-19 00:46:06 +00:00
a213f1ee6c Refactor XLogOpenRelation() and XLogReadBuffer() in preparation for relation
forks. XLogOpenRelation() and the associated light-weight relation cache in
xlogutils.c is gone, and XLogReadBuffer() now takes a RelFileNode as argument,
instead of Relation.

For functions that still need a Relation struct during WAL replay, there's a
new function called CreateFakeRelcacheEntry() that returns a fake entry like
XLogOpenRelation() used to.
2008-06-12 09:12:31 +00:00
281a724d5c Rewrite DROP's dependency traversal algorithm into an honest two-pass
algorithm, replacing the original intention of a one-pass search, which
had been hacked up over time to be partially two-pass in hopes of handling
various corner cases better.  It still wasn't quite there, especially as
regards emitting unwanted NOTICE messages.  More importantly, this approach
lets us fix a number of open bugs concerning concurrent DROP scenarios,
because we can take locks during the first pass and avoid traversing to
dependent objects that were just deleted by someone else.

There is more that can be done here, but I'll go ahead and commit the
base patch before working on the options.
2008-06-08 22:41:04 +00:00
e4ca6cac43 Change xlog.h to xlogdefs.h in bufpage.h, and fix fallout. 2008-06-06 22:35:22 +00:00
1a604b4e31 Fix a subtle bug exposed by recent wal_sync_method rearrangements.
Formerly, the default value of wal_sync_method was determined inside xlog.c,
but now it is determined inside guc.c.  guc.c was reading xlogdefs.h
without having read <fcntl.h>, leading to wrong determination of
DEFAULT_SYNC_METHOD.  Obviously xlogdefs.h needs to include <fcntl.h>
for itself to ensure stable results.
2008-05-17 17:24:57 +00:00
55f6f8f2aa Remove DEFAULT_SYNC_FLAGBIT ... not used anymore. 2008-05-17 16:49:23 +00:00
e6dbcb72fa Extend GIN to support partial-match searches, and extend tsquery to support
prefix matching using this facility.

Teodor Sigaev and Oleg Bartunov
2008-05-16 16:31:02 +00:00
f99760c19f Convert wal_sync_method to guc enum. 2008-05-12 08:35:05 +00:00
f8c4d7db60 Restructure some header files a bit, in particular heapam.h, by removing some
unnecessary #include lines in it.  Also, move some tuple routine prototypes and
macros to htup.h, which allows removal of heapam.h inclusion from some .c
files.

For this to work, a new header file access/sysattr.h needed to be created,
initially containing attribute numbers of system columns, for pg_dump usage.

While at it, make contrib ltree, intarray and hstore header files more
consistent with our header style.
2008-05-12 00:00:54 +00:00
cf23b75b4d Fix using too many LWLocks bug, reported by Craig Ringer
<craig@postnewspapers.com.au>.
It was my mistake, I missed limitation of number of held locks, now GIN doesn't
use continiuous locks, but still hold buffers pinned to prevent interference
with vacuum's deletion algorithm.

Backpatch is needed.
2008-04-22 17:52:43 +00:00
d1cbd26ded Repair two places where SIGTERM exit could leave shared memory state
corrupted.  (Neither is very important if SIGTERM is used to shut down the
whole database cluster together, but there's a problem if someone tries to
SIGTERM individual backends.)  To do this, introduce new infrastructure
macros PG_ENSURE_ERROR_CLEANUP/PG_END_ENSURE_ERROR_CLEANUP that take care
of transiently pushing an on_shmem_exit cleanup hook.  Also use this method
for createdb cleanup --- that wasn't a shared-memory-corruption problem,
but SIGTERM abort of createdb could leave orphaned files lying around.

Backpatch as far as 8.2.  The shmem corruption cases don't exist in 8.1,
and the createdb usage doesn't seem important enough to risk backpatching
further.
2008-04-16 23:59:40 +00:00
24558da14a Phase 2 of project to make index operator lossiness be determined at runtime
instead of plan time.  Extend the amgettuple API so that the index AM returns
a boolean indicating whether the indexquals need to be rechecked, and make
that rechecking happen in nodeIndexscan.c (currently the only place where
it's expected to be needed; other callers of index_getnext are just erroring
out for now).  For the moment, GIN and GIST have stub logic that just always
sets the recheck flag to TRUE --- I'm hoping to get Teodor to handle pushing
that control down to the opclass consistent() functions.  The planner no
longer pays any attention to amopreqcheck, and that catalog column will go
away in due course.
2008-04-13 19:18:14 +00:00
ec498cdcbb Create new routines systable_beginscan_ordered, systable_getnext_ordered,
systable_endscan_ordered that have API similar to systable_beginscan etc
(in particular, the passed-in scankeys have heap not index attnums),
but guarantee ordered output, unlike the existing functions.  For the moment
these are just very thin wrappers around index_beginscan/index_getnext/etc.
Someday they might need to get smarter; but for now this is just a code
refactoring exercise to reduce the number of direct callers of index_getnext,
in preparation for changing that function's API.

In passing, remove index_getnext_indexitem, which has been dead code for
quite some time, and will have even less use than that in the presence
of run-time-lossy indexes.
2008-04-12 23:14:21 +00:00
4e82a95476 Replace "amgetmulti" AM functions with "amgetbitmap", in which the whole
indexscan always occurs in one call, and the results are returned in a
TIDBitmap instead of a limited-size array of TIDs.  This should improve
speed a little by reducing AM entry/exit overhead, and it is necessary
infrastructure if we are ever to support bitmap indexes.

In an only slightly related change, add support for TIDBitmaps to preserve
(somewhat lossily) the knowledge that particular TIDs reported by an index
need to have their quals rechecked when the heap is visited.  This facility
is not really used yet; we'll need to extend the forced-recheck feature to
plain indexscans before it's useful, and that hasn't been coded yet.
The intent is to use it to clean up 8.3's horrid @@@ kluge for text search
with weighted queries.  There might be other uses in future, but that one
alone is sufficient reason.

Heikki Linnakangas, with some adjustments by me.
2008-04-10 22:25:26 +00:00
ceb5db69d4 Defend against JOINs having more than 32K columns altogether. We cannot
currently support this because we must be able to build Vars referencing
join columns, and varattno is only 16 bits wide.  Perhaps this should be
improved in future, but considering that it never came up before, I'm not
sure the problem is worth much effort.  Per bug #4070 from Marcello
Ceschia.

The problem seems largely academic in 8.0 and 7.4, because they have
(different) O(N^2) performance issues with such wide joins, but
back-patch all the way anyway.
2008-04-05 01:58:20 +00:00
b03271590e Remove heap_release_fetch, which is no longer used anywhere; this simplifies
heap_fetch a little.
2008-04-03 17:12:27 +00:00