postgresql

mirror of https://git.postgresql.org/git/postgresql.git synced 2026-02-15 02:46:59 +08:00

Author	SHA1	Message	Date
Alvaro Herrera	b25433da5d	Fix string reloption handling, per KaiGai Kohei.	2009-01-06 14:47:37 +00:00
Alvaro Herrera	ba748f7a11	Change the reloptions machinery to use a table-based parser, and provide a more complete framework for writing custom option processing routines by user-defined access methods. Catalog version bumped due to the general API changes, which are going to affect user-defined "amoptions" routines.	2009-01-05 17:14:28 +00:00
Bruce Momjian	511db38ace	Update copyright for 2009.	2009-01-01 17:24:05 +00:00
Heikki Linnakangas	4942ea2870	The flag to mark dead tuples is nowadays called LP_DEAD, not LP_DELETE. Simon Riggs.	2008-12-30 16:24:37 +00:00
Alvaro Herrera	0f864a63ea	Reduce some rel.h inclusions, and add pg_list.h to pg_proc_fn.h.	2008-12-12 22:56:00 +00:00
Heikki Linnakangas	608195a3a3	Introduce visibility map. The visibility map is a bitmap with one bit per heap page, where a set bit indicates that all tuples on the page are visible to all transactions, and the page therefore doesn't need vacuuming. It is stored in a new relation fork. Lazy vacuum uses the visibility map to skip pages that don't need vacuuming. Vacuum is also responsible for setting the bits in the map. In the future, this can hopefully be used to implement index-only-scans, but we can't currently guarantee that the visibility map is always 100% up-to-date. In addition to the visibility map, there's a new PD_ALL_VISIBLE flag on each heap page, also indicating that all tuples on the page are visible to all transactions. It's important that this flag is kept up-to-date. It is also used to skip visibility tests in sequential scans, which gives a small performance gain on seqscans.	2008-12-03 13:05:22 +00:00
Tom Lane	c1f3073333	Clean up the API for DestReceiver objects by eliminating the assumption that a Portal is a useful and sufficient additional argument for CreateDestReceiver --- it just isn't, in most cases. Instead formalize the approach of passing any needed parameters to the receiver separately. One unexpected benefit of this change is that we can declare typedef Portal in a less surprising location. This patch is just code rearrangement and doesn't change any functionality. I'll tackle the HOLD-cursor-vs-toast problem in a follow-on patch.	2008-11-30 20:51:25 +00:00
Heikki Linnakangas	3396000684	Rethink the way FSM truncation works. Instead of WAL-logging FSM truncations in FSM code, call FreeSpaceMapTruncateRel from smgr_redo. To make that cleaner from modularity point of view, move the WAL-logging one level up to RelationTruncate, and move RelationTruncate and all the related WAL-logging to new src/backend/catalog/storage.c file. Introduce new RelationCreateStorage and RelationDropStorage functions that are used instead of calling smgrcreate/smgrscheduleunlink directly. Move the pending rel deletion stuff from smgrcreate/smgrscheduleunlink to the new functions. This leaves smgr.c as a thin wrapper around md.c; all the transactional stuff is now in storage.c. This will make it easier to add new forks with similar truncation logic, like the visibility map.	2008-11-19 10:34:52 +00:00
Alvaro Herrera	03e5248d0f	Replace the usage of heap_addheader to create pg_attribute tuples with regular heap_form_tuple. Since this removes the last remaining caller of heap_addheader, remove it. Extracted from the column privileges patch from Stephen Frost, with further code cleanups by me.	2008-11-14 01:57:42 +00:00
Tom Lane	85e2cedf98	Improve bulk-insert performance by keeping the current target buffer pinned (but not locked, as that would risk deadlocks). Also, make it work in a small ring of buffers to avoid having bulk inserts trash the whole buffer arena. Robert Haas, after an idea of Simon Riggs'.	2008-11-06 20:51:15 +00:00
Tom Lane	b4eae023bb	Clean up the messy semantics (not to mention inefficiency) of PageGetTempPage by splitting it into three functions with better-defined behaviors. Zdenek Kotala	2008-11-03 20:47:49 +00:00
Tom Lane	902d1cb35f	Remove all uses of the deprecated functions heap_formtuple, heap_modifytuple, and heap_deformtuple in favor of the newer functions heap_form_tuple et al (which do the same things but use bool control flags instead of arbitrary char values). Eliminate the former duplicate coding of these functions, reducing the deprecated functions to mere wrappers around the newer ones. We can't get rid of them entirely because add-on modules probably still contain many instances of the old coding style. Kris Jurka	2008-11-02 01:45:28 +00:00
Heikki Linnakangas	19c8dc839b	Unite ReadBufferWithFork, ReadBufferWithStrategy, and ZeroOrReadBuffer functions into one ReadBufferExtended function, that takes the strategy and mode as argument. There's three modes, RBM_NORMAL which is the default used by plain ReadBuffer(), RBM_ZERO, which replaces ZeroOrReadBuffer, and a new mode RBM_ZERO_ON_ERROR, which allows callers to read corrupt pages without throwing an error. The FSM needs the new mode to recover from corrupt pages, which could happend if we crash after extending an FSM file, and the new page is "torn". Add fork number to some error messages in bufmgr.c, that still lacked it.	2008-10-31 15:05:00 +00:00
Tom Lane	d26bf23f34	Arrange to squeeze out the MINIMAL_TUPLE_PADDING in the tuple representation written to temp files by tuplesort.c and tuplestore.c. This saves 2 bytes per row for 32-bit machines, and 6 bytes per row for 64-bit machines, which seems worth the slight additional uglification of the tuple read/write routines.	2008-10-28 15:51:03 +00:00
Teodor Sigaev	b9856b67a7	Fix GiST's killing tuple: GISTScanOpaque->curpos wasn't correctly set. As result, killtuple() marks as dead wrong tuple on page. Bug was introduced by me while fixing possible duplicates during GiST index scan.	2008-10-22 12:53:56 +00:00
Alvaro Herrera	06da3c570f	Rework subtransaction commit protocol for hot standby. This patch eliminates the marking of subtransactions as SUBCOMMITTED in pg_clog during their commit; instead they remain in-progress until main transaction commit. At main transaction commit, the commit protocol is atomic-by-page instead of one transaction at a time. To avoid a race condition with some subtransactions appearing committed before others in the case where they span more than one pg_clog page, we conserve the logic that marks them subcommitted before marking the parent committed. Simon Riggs with minor help from me	2008-10-20 19:18:18 +00:00
Teodor Sigaev	77db9d9ff2	Remove mark/restore support in GIN and GiST indexes. Per Tom's comment. Also revome useless GISTScanOpaque->flags field.	2008-10-20 13:39:44 +00:00
Tom Lane	af59a0650b	Remove useless mark/restore support in hash index AM, per discussion. (I'm leaving GiST/GIN cleanup to Teodor.)	2008-10-17 23:50:57 +00:00
Teodor Sigaev	beeb3562dd	During repeated rescan of GiST index it's possible that scan key is NULL but SK_SEARCHNULL is not set. Add checking IS NULL of keys to set during key initialization. If key is NULL and SK_SEARCHNULL is not set then nothnig can be satisfied. With assert-enabled compilation that causes coredump. Bug was introduced in 8.3 by support of IS NULL index scan.	2008-10-17 17:02:21 +00:00
Tom Lane	3437286356	Modify the parser's error reporting to include a specific hint for the case of referencing a WITH item that's not yet in scope according to the SQL spec's semantics. This seems to be an easy error to make, and the bare "relation doesn't exist" message doesn't lead one's mind in the correct direction to fix it.	2008-10-08 01:14:44 +00:00
Heikki Linnakangas	15c121b3ed	Rewrite the FSM. Instead of relying on a fixed-size shared memory segment, the free space information is stored in a dedicated FSM relation fork, with each relation (except for hash indexes; they don't use FSM). This eliminates the max_fsm_relations and max_fsm_pages GUC options; remove any trace of them from the backend, initdb, and documentation. Rewrite contrib/pg_freespacemap to match the new FSM implementation. Also introduce a new variant of the get_raw_page(regclass, int4, int4) function in contrib/pageinspect that let's you to return pages from any relation fork, and a new fsm_page_contents() function to inspect the new FSM pages.	2008-09-30 10:52:14 +00:00
Tom Lane	4adc2f72a4	Change hash indexes to store only the hash code rather than the whole indexed value. This means that hash index lookups are always lossy and have to be rechecked when the heap is visited; however, the gain in index compactness outweighs this when the indexed values are wide. Also, we only need to perform datatype comparisons when the hash codes match exactly, rather than for every entry in the hash bucket; so it could also win for datatypes that have expensive comparison functions. A small additional win is gained by keeping hash index pages sorted by hash code and using binary search to reduce the number of index tuples we have to look at. Xiao Meng This commit also incorporates Zdenek Kotala's patch to isolate hash metapages and hash bitmaps a bit better from the page header datastructures.	2008-09-15 18:43:41 +00:00
Teodor Sigaev	1dcf6fdf1b	Fix possible duplicate tuples while GiST scan. Now page is processed at once and ItemPointers are collected in memory. Remove tuple's killing by killtuple() if tuple was moved to another page - it could produce unaceptable overhead. Backpatch up to 8.1 because the bug was introduced by GiST's concurrency support.	2008-08-23 10:37:24 +00:00
Heikki Linnakangas	3f0e808c4a	Introduce the concept of relation forks. An smgr relation can now consist of multiple forks, and each fork can be created and grown separately. The bulk of this patch is about changing the smgr API to include an extra ForkNumber argument in every smgr function. Also, smgrscheduleunlink and smgrdounlink no longer implicitly call smgrclose, because other forks might still exist after unlinking one. The callers of those functions have been modified to call smgrclose instead. This patch in itself doesn't have any user-visible effect, but provides the infrastructure needed for upcoming patches. The additional forks envisioned are a rewritten FSM implementation that doesn't rely on a fixed-size shared memory block, and a visibility map to allow skipping portions of a table in VACUUM that have no dead tuples.	2008-08-11 11:05:11 +00:00
Tom Lane	6816577a78	Change the PageGetContents() macro to guarantee its result is maxalign'd, thereby forestalling any problems with alignment of the data structure placed there. Since SizeOfPageHeaderData is maxalign'd anyway in 8.3 and HEAD, this does not actually change anything right now, but it is foreseeable that the header size will change again someday. I had to fix a couple of places that were assuming that the content offset is just SizeOfPageHeaderData rather than MAXALIGN(SizeOfPageHeaderData). Per discussion of Zdenek's page-macros patch.	2008-07-13 21:50:04 +00:00
Tom Lane	9d035f4254	Clean up the use of some page-header-access macros: principally, use SizeOfPageHeaderData instead of sizeof(PageHeaderData) in places where that makes the code clearer, and avoid casting between Page and PageHeader where possible. Zdenek Kotala, with some additional cleanup by Heikki Linnakangas. I did not apply the parts of the proposed patch that would have resulted in slightly changing the on-disk format of hash indexes; it seems to me that's not a win as long as there's any chance of having in-place upgrade for 8.4.	2008-07-13 20:45:47 +00:00
Tom Lane	27cb66fdfe	Multi-column GIN indexes. Teodor Sigaev	2008-07-11 21:06:29 +00:00
Alvaro Herrera	a3540b0f65	Improve our #include situation by moving pointer types away from the corresponding struct definitions. This allows other headers to avoid including certain highly-loaded headers such as rel.h and relscan.h, instead using just relcache.h, heapam.h or genam.h, which are more lightweight and thus cause less unnecessary dependencies.	2008-06-19 00:46:06 +00:00
Heikki Linnakangas	a213f1ee6c	Refactor XLogOpenRelation() and XLogReadBuffer() in preparation for relation forks. XLogOpenRelation() and the associated light-weight relation cache in xlogutils.c is gone, and XLogReadBuffer() now takes a RelFileNode as argument, instead of Relation. For functions that still need a Relation struct during WAL replay, there's a new function called CreateFakeRelcacheEntry() that returns a fake entry like XLogOpenRelation() used to.	2008-06-12 09:12:31 +00:00
Tom Lane	281a724d5c	Rewrite DROP's dependency traversal algorithm into an honest two-pass algorithm, replacing the original intention of a one-pass search, which had been hacked up over time to be partially two-pass in hopes of handling various corner cases better. It still wasn't quite there, especially as regards emitting unwanted NOTICE messages. More importantly, this approach lets us fix a number of open bugs concerning concurrent DROP scenarios, because we can take locks during the first pass and avoid traversing to dependent objects that were just deleted by someone else. There is more that can be done here, but I'll go ahead and commit the base patch before working on the options.	2008-06-08 22:41:04 +00:00
Alvaro Herrera	e4ca6cac43	Change xlog.h to xlogdefs.h in bufpage.h, and fix fallout.	2008-06-06 22:35:22 +00:00
Tom Lane	1a604b4e31	Fix a subtle bug exposed by recent wal_sync_method rearrangements. Formerly, the default value of wal_sync_method was determined inside xlog.c, but now it is determined inside guc.c. guc.c was reading xlogdefs.h without having read <fcntl.h>, leading to wrong determination of DEFAULT_SYNC_METHOD. Obviously xlogdefs.h needs to include <fcntl.h> for itself to ensure stable results.	2008-05-17 17:24:57 +00:00
Tom Lane	55f6f8f2aa	Remove DEFAULT_SYNC_FLAGBIT ... not used anymore.	2008-05-17 16:49:23 +00:00
Tom Lane	e6dbcb72fa	Extend GIN to support partial-match searches, and extend tsquery to support prefix matching using this facility. Teodor Sigaev and Oleg Bartunov	2008-05-16 16:31:02 +00:00
Magnus Hagander	f99760c19f	Convert wal_sync_method to guc enum.	2008-05-12 08:35:05 +00:00
Alvaro Herrera	f8c4d7db60	Restructure some header files a bit, in particular heapam.h, by removing some unnecessary #include lines in it. Also, move some tuple routine prototypes and macros to htup.h, which allows removal of heapam.h inclusion from some .c files. For this to work, a new header file access/sysattr.h needed to be created, initially containing attribute numbers of system columns, for pg_dump usage. While at it, make contrib ltree, intarray and hstore header files more consistent with our header style.	2008-05-12 00:00:54 +00:00
Teodor Sigaev	cf23b75b4d	Fix using too many LWLocks bug, reported by Craig Ringer <craig@postnewspapers.com.au>. It was my mistake, I missed limitation of number of held locks, now GIN doesn't use continiuous locks, but still hold buffers pinned to prevent interference with vacuum's deletion algorithm. Backpatch is needed.	2008-04-22 17:52:43 +00:00
Tom Lane	d1cbd26ded	Repair two places where SIGTERM exit could leave shared memory state corrupted. (Neither is very important if SIGTERM is used to shut down the whole database cluster together, but there's a problem if someone tries to SIGTERM individual backends.) To do this, introduce new infrastructure macros PG_ENSURE_ERROR_CLEANUP/PG_END_ENSURE_ERROR_CLEANUP that take care of transiently pushing an on_shmem_exit cleanup hook. Also use this method for createdb cleanup --- that wasn't a shared-memory-corruption problem, but SIGTERM abort of createdb could leave orphaned files lying around. Backpatch as far as 8.2. The shmem corruption cases don't exist in 8.1, and the createdb usage doesn't seem important enough to risk backpatching further.	2008-04-16 23:59:40 +00:00
Tom Lane	24558da14a	Phase 2 of project to make index operator lossiness be determined at runtime instead of plan time. Extend the amgettuple API so that the index AM returns a boolean indicating whether the indexquals need to be rechecked, and make that rechecking happen in nodeIndexscan.c (currently the only place where it's expected to be needed; other callers of index_getnext are just erroring out for now). For the moment, GIN and GIST have stub logic that just always sets the recheck flag to TRUE --- I'm hoping to get Teodor to handle pushing that control down to the opclass consistent() functions. The planner no longer pays any attention to amopreqcheck, and that catalog column will go away in due course.	2008-04-13 19:18:14 +00:00
Tom Lane	ec498cdcbb	Create new routines systable_beginscan_ordered, systable_getnext_ordered, systable_endscan_ordered that have API similar to systable_beginscan etc (in particular, the passed-in scankeys have heap not index attnums), but guarantee ordered output, unlike the existing functions. For the moment these are just very thin wrappers around index_beginscan/index_getnext/etc. Someday they might need to get smarter; but for now this is just a code refactoring exercise to reduce the number of direct callers of index_getnext, in preparation for changing that function's API. In passing, remove index_getnext_indexitem, which has been dead code for quite some time, and will have even less use than that in the presence of run-time-lossy indexes.	2008-04-12 23:14:21 +00:00
Tom Lane	4e82a95476	Replace "amgetmulti" AM functions with "amgetbitmap", in which the whole indexscan always occurs in one call, and the results are returned in a TIDBitmap instead of a limited-size array of TIDs. This should improve speed a little by reducing AM entry/exit overhead, and it is necessary infrastructure if we are ever to support bitmap indexes. In an only slightly related change, add support for TIDBitmaps to preserve (somewhat lossily) the knowledge that particular TIDs reported by an index need to have their quals rechecked when the heap is visited. This facility is not really used yet; we'll need to extend the forced-recheck feature to plain indexscans before it's useful, and that hasn't been coded yet. The intent is to use it to clean up 8.3's horrid @@@ kluge for text search with weighted queries. There might be other uses in future, but that one alone is sufficient reason. Heikki Linnakangas, with some adjustments by me.	2008-04-10 22:25:26 +00:00
Tom Lane	ceb5db69d4	Defend against JOINs having more than 32K columns altogether. We cannot currently support this because we must be able to build Vars referencing join columns, and varattno is only 16 bits wide. Perhaps this should be improved in future, but considering that it never came up before, I'm not sure the problem is worth much effort. Per bug #4070 from Marcello Ceschia. The problem seems largely academic in 8.0 and 7.4, because they have (different) O(N^2) performance issues with such wide joins, but back-patch all the way anyway.	2008-04-05 01:58:20 +00:00
Tom Lane	b03271590e	Remove heap_release_fetch, which is no longer used anywhere; this simplifies heap_fetch a little.	2008-04-03 17:12:27 +00:00
Alvaro Herrera	73b0300b2a	Move the HTSU_Result enum definition into snapshot.h, to avoid including tqual.h into heapam.h. This makes all inclusion of tqual.h explicit. I also sorted alphabetically the includes on some source files.	2008-03-26 21:10:39 +00:00
Alvaro Herrera	d43b085d57	Separate snapshot management code from tuple visibility code, create a snapmgmt.c file for the former. The header files have also been reorganized in three parts: the most basic snapshot definitions are now in a new file snapshot.h, and the also new snapmgmt.h keeps the definitions for snapmgmt.c. tqual.h has been reduced to the bare minimum. This patch is just a first step towards managing live snapshots within a transaction; there is no functionality change. Per my proposal to pgsql-patches on 20080318191940.GB27458@alvh.no-ip.org and subsequent discussion.	2008-03-26 16:20:48 +00:00
Tom Lane	787eba734b	When creating a large hash index, pre-sort the index entries by estimated bucket number, so as to ensure locality of access to the index during the insertion step. Without this, building an index significantly larger than available RAM takes a very long time because of thrashing. On the other hand, sorting is just useless overhead when the index does fit in RAM. We choose to sort when the initial index size exceeds effective_cache_size. This is a revised version of work by Tom Raney and Shreya Bhargava.	2008-03-16 23:15:08 +00:00
Tom Lane	c9a1cc694a	Change hash index creation so that rather than always establishing exactly two buckets at the start, we create a number of buckets appropriate for the estimated size of the table. This avoids a lot of expensive bucket-split actions during initial index build on an already-populated table. This is one of the two core ideas of Tom Raney and Shreya Bhargava's patch to reduce hash index build time. I'm committing it separately to make it easier for people to test the effects of this separately from the effects of their other core idea (pre-sorting the index entries by bucket number).	2008-03-15 20:46:31 +00:00
Tom Lane	611b4393f2	Make TransactionIdIsInProgress check transam.c's single-item XID status cache before it goes groveling through the ProcArray. In situations where the same recently-committed transaction ID is checked repeatedly by tqual.c, this saves a lot of shared-memory searches. And it's cheap enough that it shouldn't hurt noticeably when it doesn't help. Concept and patch by Simon, some minor tweaking and comment-cleanup by Tom.	2008-03-11 20:20:35 +00:00
Tom Lane	6f10eb2111	Refactor heap_page_prune so that instead of changing item states on-the-fly, it accumulates the set of changes to be made and then applies them. It had to accumulate the set of changes anyway to prepare a WAL record for the pruning action, so this isn't an enormous change; the only new complexity is to not doubly mark tuples that are visited twice in the scan. The main advantage is that we can substantially reduce the scope of the critical section in which the changes are applied, thus avoiding PANIC in foreseeable cases like running out of memory in inval.c. A nice secondary advantage is that it is now far clearer that WAL replay will actually do the same thing that the original pruning did. This commit doesn't do anything about the open problem that CacheInvalidateHeapTuple doesn't have the right semantics for a CTID change caused by collapsing out a redirect pointer. But whatever we do about that, it'll be a good idea to not do it inside a critical section.	2008-03-08 21:57:59 +00:00
Tom Lane	7d6e6e2e97	Fix PREPARE TRANSACTION to reject the case where the transaction has dropped a temporary table; we can't support that because there's no way to clean up the source backend's internal state if the eventual COMMIT PREPARED is done by another backend. This was checked correctly in 8.1 but I broke it in 8.2 :-(. Patch by Heikki Linnakangas, original trouble report by John Smith.	2008-03-04 19:54:06 +00:00

1 2 3 4 5 ...

614 Commits