postgresql

mirror of https://git.postgresql.org/git/postgresql.git synced 2026-02-05 13:17:32 +08:00

Author	SHA1	Message	Date
Heikki Linnakangas	dafaa3efb7	Implement genuine serializable isolation level. Until now, our Serializable mode has in fact been what's called Snapshot Isolation, which allows some anomalies that could not occur in any serialized ordering of the transactions. This patch fixes that using a method called Serializable Snapshot Isolation, based on research papers by Michael J. Cahill (see README-SSI for full references). In Serializable Snapshot Isolation, transactions run like they do in Snapshot Isolation, but a predicate lock manager observes the reads and writes performed and aborts transactions if it detects that an anomaly might occur. This method produces some false positives, ie. it sometimes aborts transactions even though there is no anomaly. To track reads we implement predicate locking, see storage/lmgr/predicate.c. Whenever a tuple is read, a predicate lock is acquired on the tuple. Shared memory is finite, so when a transaction takes many tuple-level locks on a page, the locks are promoted to a single page-level lock, and further to a single relation level lock if necessary. To lock key values with no matching tuple, a sequential scan always takes a relation-level lock, and an index scan acquires a page-level lock that covers the search key, whether or not there are any matching keys at the moment. A predicate lock doesn't conflict with any regular locks or with another predicate locks in the normal sense. They're only used by the predicate lock manager to detect the danger of anomalies. Only serializable transactions participate in predicate locking, so there should be no extra overhead for for other transactions. Predicate locks can't be released at commit, but must be remembered until all the transactions that overlapped with it have completed. That means that we need to remember an unbounded amount of predicate locks, so we apply a lossy but conservative method of tracking locks for committed transactions. If we run short of shared memory, we overflow to a new "pg_serial" SLRU pool. We don't currently allow Serializable transactions in Hot Standby mode. That would be hard, because even read-only transactions can cause anomalies that wouldn't otherwise occur. Serializable isolation mode now means the new fully serializable level. Repeatable Read gives you the old Snapshot Isolation level that we have always had. Kevin Grittner and Dan Ports, reviewed by Jeff Davis, Heikki Linnakangas and Anssi Kääriäinen	2011-02-08 00:09:08 +02:00
Robert Haas	65377e0b9c	Tighten ALTER FOREIGN TABLE .. SET DATA TYPE checks. If the foreign table's rowtype is being used as the type of a column in another table, we can't just up and change its data type. This was already checked for composite types and ordinary tables, but we previously failed to enforce it for foreign tables.	2011-02-06 00:26:27 -05:00
Robert Haas	356f2cbbb4	Make handling of errcodes.h more consistent with other generated headers. This fixes make distprep, and seems more robust in other ways as well. Some special handling is required because errcodes.txt is needed by some stuff in src/port, but just by src/backend as is the case for the other generated headers. While I'm at it, fix a few other things that were overlooked in the original patch.	2011-02-04 09:29:10 -05:00
Robert Haas	ddfe26f644	Avoid maintaining three separate copies of the error codes list. src/pl/plpgsql/src/plerrcodes.h, src/include/utils/errcodes.h, and a big chunk of errcodes.sgml are now automatically generated from a single file, src/backend/utils/errcodes.txt. Jan Urbański, reviewed by Tom Lane.	2011-02-03 22:32:49 -05:00
Bruce Momjian	35b0a6b205	Simplify code used in is_absolute_path() macro; also add comment about 'E:abc' Win32 path handling.	2011-02-03 10:47:06 -05:00
Bruce Momjian	426227850b	Rename function to first_path_var_separator() to clarify it works with path variables, not directory paths.	2011-02-02 22:49:54 -05:00
Peter Eisentraut	15f55cc38a	Add validator to PL/Python Jan Urbański, reviewed by Hitoshi Harada	2011-02-01 22:55:04 +02:00
Magnus Hagander	5273f21434	Undefine setlocale() macro on Win32 New versions of libintl redefine setlocale() to a macro which causes problems when the backend and libintl are linked against different versions of the runtime, which is often the case in msvc builds. Hiroshi Inoue, slightly updated comment by me	2011-02-01 13:19:18 +01:00
Simon Riggs	56b21b7ae3	Re-classify ERRCODE_DATABASE_DROPPED to 57P04	2011-02-01 08:44:01 +00:00
Simon Riggs	9e95c9ad55	Create new errcode for recovery conflict caused by db drop on master. Previously reported as ERRCODE_ADMIN_SHUTDOWN, this case is now reported as ERRCODE_T_R_DATABASE_DROPPED. No message text change. Unlikely to happen on most servers, so low impact change to allow session poolers to correctly handle this situation. Tatsuo Ishii, edits by me, review by Robert Haas	2011-02-01 00:20:53 +00:00
Heikki Linnakangas	997b48ed96	Support multiple concurrent pg_basebackup backups. With this patch, pg_basebackup doesn't write a backup_label file in the data directory, so it doesn't interfere with a pg_start/stop_backup() based backup anymore. backup_label is still included in the backup, but it is injected directly into the tar stream. Heikki Linnakangas, reviewed by Fujii Masao and Magnus Hagander.	2011-01-31 18:25:39 +02:00
Andrew Dunstan	48c9de8028	Fix typo	2011-01-30 20:34:05 -05:00
Andrew Dunstan	91812df4ed	Enable building with the Mingw64 compiler. This can be used to build 64 bit Windows binaries, not only on 64 bit Windows but on supported cross-compiling hosts including 32 bit Windows, Cygwin, Darwin and Linux.	2011-01-30 19:56:46 -05:00
Magnus Hagander	507069de6d	Add option to include WAL in base backup When included, this makes the base backup a complete working "clone" of the initial database, ready to have a postmaster started against it without the need to set up any log archiving or similar. Magnus Hagander, reviewed by Fujii Masao and Heikki Linnakangas	2011-01-30 21:30:09 +01:00
Peter Eisentraut	6fe5e4e63e	autoreconf Synchronize pg_config.h.in with configure.in (someone must have forgotten to run autoheader or autoreconf), and clean up some spurious change in configure introduced by the last commit there.	2011-01-27 01:19:45 +02:00
Tom Lane	bd1ad1b019	Replace pg_class.relhasexclusion with pg_index.indisexclusion. There isn't any need to track this state on a table-wide basis, and trying to do so introduces undesirable semantic fuzziness. Move the flag to pg_index, where it clearly describes just a single index and can be immutable after index creation.	2011-01-25 17:51:59 -05:00
Tom Lane	88452d5ba6	Implement ALTER TABLE ADD UNIQUE/PRIMARY KEY USING INDEX. This feature allows a unique or pkey constraint to be created using an already-existing unique index. While the constraint isn't very functionally different from the bare index, it's nice to be able to do that for documentation purposes. The main advantage over just issuing a plain ALTER TABLE ADD UNIQUE/PRIMARY KEY is that the index can be created with CREATE INDEX CONCURRENTLY, so that there is not a long interval where the table is locked against updates. On the way, refactor some of the code in DefineIndex() and index_create() so that we don't have to pass through those functions in order to create the index constraint's catalog entries. Also, in parse_utilcmd.c, pass around the ParseState pointer in struct CreateStmtContext to save on notation, and add error location pointers to some error reports that didn't have one before. Gurjeet Singh, reviewed by Steve Singer and Tom Lane	2011-01-25 15:43:05 -05:00
Magnus Hagander	e5487f65fd	Make walsender options order-independent While doing this, also move base backup options into a struct instead of increasing the number of parameters to multiple functions for each new option.	2011-01-23 23:39:18 +01:00
Magnus Hagander	048d148fe6	Add pg_basebackup tool for streaming base backups This tool makes it possible to do the pg_start_backup/ copy files/pg_stop_backup step in a single command. There are still some steps to be done before this is a complete backup solution, such as the ability to stream the required WAL logs, but it's still usable, and could do with some buildfarm coverage. In passing, make the checkpoint request optionally fast instead of hardcoding it. Magnus Hagander, reviewed by Fujii Masao and Dimitri Fontaine	2011-01-23 12:21:23 +01:00
Robert Haas	6f59777c65	Code cleanup for assign_transaction_read_only. As in commit fb4c5d2798730f60b102d775f22fb53c26a6445d on 2011-01-21, this avoids spurious debug messages and allows idempotent changes at any time. Along the way, make assign_XactIsoLevel allow idempotent changes even when not within a subtransaction, to be consistent with the new coding of assign_transaction_read_only and because there's no compelling reason to do otherwise. Kevin Grittner, with some adjustments.	2011-01-22 20:55:50 -05:00
Robert Haas	fb4c5d2798	Code cleanup for assign_XactIsoLevel. The new coding avoids a spurious debug message when a transaction that has changed the isolation level has been rolled back. It also allows the property to be freely changed to the current value within a subtransaction. Kevin Grittner, with one small change by me.	2011-01-21 21:49:19 -05:00
Robert Haas	8ceb245680	Make ALTER TABLE revalidate uniqueness and exclusion constraints. Failure to do so can lead to constraint violations. This was broken by commit 1ddc2703a936d03953657f43345460b9242bbed1 on 2010-02-07, so back-patch to 9.0. Noah Misch. Regression test by me.	2011-01-20 22:44:10 -05:00
Tom Lane	6ca452ba7f	Move a couple of declarations to reflect where the routines really are.	2011-01-15 16:09:05 -05:00
Heikki Linnakangas	8f5d65e916	Treat a WAL sender process that hasn't started streaming yet as a regular backend, as far as the postmaster shutdown logic is concerned. That means, fast shutdown will wait for WAL sender processes to exit before signaling bgwriter to finish. This avoids race conditions between a base backup stopping or starting, and bgwriter writing the shutdown checkpoint WAL record. We don't want e.g the end-of-backup WAL record to be written after the shutdown checkpoint.	2011-01-15 16:38:21 +02:00
Magnus Hagander	fcd810c69a	Use a lexer and grammar for parsing walsender commands Makes it easier to parse mainly the BASE_BACKUP command with it's options, and avoids having to manually deal with quoted identifiers in the label (previously broken), and makes it easier to add new commands and options in the future. In passing, refactor the case statement in the walsender to put each command in it's own function.	2011-01-14 16:30:33 +01:00
Magnus Hagander	688423d004	Exit from base backups when shutdown is requested When the exit waits until the whole backup completes, it may take a very long time. In passing, add back an error check in the main loop so we detect clients that disconnect much earlier if the backup is large.	2011-01-14 12:36:45 +01:00
Tom Lane	52948169bc	Code review for postmaster.pid contents changes. Fix broken test for pre-existing postmaster, caused by wrong code for appending lines to the lockfile; don't write a failed listen_address setting into the lockfile; don't arbitrarily change the location of the data directory in the lockfile compared to previous releases; provide more consistent and useful definitions of the socket path and listen_address entries; avoid assuming that pg_ctl has the same DEFAULT_PGSOCKET_DIR as the postmaster; assorted code style improvements.	2011-01-13 19:01:28 -05:00
Tom Lane	d487afbb81	Fix PlanRowMark/ExecRowMark structures to handle inheritance correctly. In an inherited UPDATE/DELETE, each target table has its own subplan, because it might have a column set different from other targets. This means that the resjunk columns we add to support EvalPlanQual might be at different physical column numbers in each subplan. The EvalPlanQual rewrite I did for 9.0 failed to account for this, resulting in possible misbehavior or even crashes during concurrent updates to the same row, as seen in a recent report from Gordon Shannon. Revise the data structure so that we track resjunk column numbers separately for each subplan. I also chose to move responsibility for identifying the physical column numbers back to executor startup, instead of assuming that numbers derived during preprocess_targetlist would stay valid throughout subsequent massaging of the plan. That's a bit slower, so we might want to consider undoing it someday; but it would complicate the patch considerably and didn't seem justifiable in a bug fix that has to be back-patched to 9.0.	2011-01-12 20:47:02 -05:00
Magnus Hagander	4c8e20f815	Track walsender state in shared memory and expose in pg_stat_replication	2011-01-11 21:25:28 +01:00
Magnus Hagander	0eb59c4591	Backend support for streaming base backups Add BASE_BACKUP command to walsender, allowing it to stream a base backup to the client (in tar format). The syntax is still far from ideal, that will be fixed in the switch to use a proper grammar for walsender. No client included yet, will come as a separate commit. Magnus Hagander and Heikki Linnakangas	2011-01-10 14:04:19 +01:00
Magnus Hagander	4448917d51	Split pg_start_backup() and pg_stop_backup() into two pieces Move the actual functionality into a separate function that's easier to call internally, and change the SQL-callable function to be a wrapper calling this. Also create a pg_abort_backup() function, only callable internally, that does only the most vital parts of pg_stop_backup(), making it safe(r) to call from error handlers.	2011-01-09 21:00:28 +01:00
Tom Lane	52fd2d65a3	Fix up core tsquery GIN support for new extractQuery API. No need for the empty-prefix-match kluge to force a full scan anymore.	2011-01-09 14:34:50 -05:00
Magnus Hagander	db4d22d0ef	Add pgreadlink() on Windows to read junction points Add support for reading back information about the symbolic links we've created with pgsymlink(), which are actually Junction Points. Just like pgsymlink() can only create directory symlinks, pgreadlink() can only read directory symlinks.	2011-01-09 15:09:19 +01:00
Tom Lane	adf328c0e1	Add array_contains_nulls() function in arrayfuncs.c. This will support fixing contrib/intarray (and probably other places) so that they don't have to fail on arrays that contain a null bitmap but no live null entries.	2011-01-08 20:26:14 -05:00
Tom Lane	7e2f906201	Remove pg_am.amindexnulls. The only use we have had for amindexnulls is in determining whether an index is safe to cluster on; but since the addition of the amclusterable flag, that usage is pretty redundant. In passing, clean up assorted sloppiness from the last patch that touched pg_am.h: Natts_pg_am was wrong, and ambuildempty was not documented.	2011-01-08 16:08:05 -05:00
Tom Lane	56a57473a9	Refactor GIN's handling of duplicate search entries. The original coding could combine duplicate entries only when they originated from the same qual condition. In particular it could not combine cases where multiple qual conditions all give rise to full-index scan requests, which is an expensive case well worth optimizing. Refactor so that duplicates are recognized across all the quals.	2011-01-08 14:48:08 -05:00
Tom Lane	a032d50128	Fix the built-in GIN support procedure declarations in pg_proc.h. Add more "internal" arguments so that these pg_proc entries reflect the current preferred API. This is purely a cosmetic change, since GIN doesn't actually consult the pg_proc entry when calling a support function. Accordingly, no catversion bump.	2011-01-07 20:40:48 -05:00
Tom Lane	73912e7fbd	Fix GIN to support null keys, empty and null items, and full index scans. Per my recent proposal(s). Null key datums can now be returned by extractValue and extractQuery functions, and will be stored in the index. Also, placeholder entries are made for indexable items that are NULL or contain no keys according to extractValue. This means that the index is now always complete, having at least one entry for every indexed heap TID, and so we can get rid of the prohibition on full-index scans. A full-index scan is implemented much the same way as partial-match scans were already: we build a bitmap representing all the TIDs found in the index, and then drive the results off that. Also, introduce a concept of a "search mode" that can be requested by extractQuery when the operator requires matching to empty items (this is just as cheap as matching to a single key) or requires a full index scan (which is not so cheap, but it sure beats failing or giving wrong answers). The behavior remains backward compatible for opclasses that don't return any null keys or request a non-default search mode. Using these features, we can now make the GIN index opclass for anyarray behave in a way that matches the actual anyarray operators for &&, <@, @>, and = ... which it failed to do before in assorted corner cases. This commit fixes the core GIN code and ginarrayprocs.c, updates the documentation, and adds some simple regression test cases for the new behaviors using the array operators. The tsearch and contrib GIN opclass support functions still need to be looked over and probably fixed. Another thing I intend to fix separately is that this is pretty inefficient for cases where more than one scan condition needs a full-index search: we'll run duplicate GinScanEntrys, each one of which builds a large bitmap. There is some existing logic to merge duplicate GinScanEntrys but it needs refactoring to make it work for entries belonging to different scan keys. Note that most of gin.h has been split out into a new file gin_private.h, so that gin.h doesn't export anything that's not supposed to be used by GIN opclasses or the rest of the backend. I did quite a bit of other code beautification work as well, mostly fixing comments and choosing more appropriate names for things.	2011-01-07 19:16:24 -05:00
Robert Haas	9b4271deb9	Document pg_stat_replication, bump catversion since that was overlooked. Itagaki Takahiro, edited by me.	2011-01-07 11:06:55 -05:00
Itagaki Takahiro	a755ea33ae	New system view pg_stat_replication displays activity of wal sender processes. Itagaki Takahiro and Simon Riggs.	2011-01-07 20:35:38 +09:00
Magnus Hagander	66a8a0428d	Give superusers REPLIACTION permission by default This can be overriden by using NOREPLICATION on the CREATE ROLE statement, but by default they will have it, making it backwards compatible and "less surprising" (given that superusers normally override all checks).	2011-01-05 14:24:17 +01:00
Robert Haas	7f60be72b0	Fix crash in ALTER OPERATOR CLASS/FAMILY .. SET SCHEMA. In the previous coding, the parser emitted a List containing a C string, which is no good, because copyObject() can't handle it. Dimitri Fontaine	2011-01-03 22:08:55 -05:00
Magnus Hagander	77745cc7f1	Bump catversion, forgot in previous commit.	2011-01-03 12:50:30 +01:00
Magnus Hagander	40d9e94bd7	Add views and functions to monitor hot standby query conflicts Add the view pg_stat_database_conflicts and a column to pg_stat_database, and the underlying functions to provide the information.	2011-01-03 12:46:03 +01:00
Peter Eisentraut	39b8843296	Implement remaining fields of information_schema.sequences view Add new function pg_sequence_parameters that returns a sequence's start, minimum, maximum, increment, and cycle values, and use that in the view. (bug #5662; design suggestion by Tom Lane) Also slightly adjust the view's column order and permissions after review of SQL standard.	2011-01-02 15:15:21 +02:00
Robert Haas	0d692a0dc9	Basic foreign table support. Foreign tables are a core component of SQL/MED. This commit does not provide a working SQL/MED infrastructure, because foreign tables cannot yet be queried. Support for foreign table scans will need to be added in a future patch. However, this patch creates the necessary system catalog structure, syntax support, and support for ancillary operations such as COMMENT and SECURITY LABEL. Shigeru Hanada, heavily revised by Robert Haas	2011-01-01 23:48:11 -05:00
Bruce Momjian	5d950e3b0c	Stamp copyrights for year 2011.	2011-01-01 13:18:15 -05:00
Bruce Momjian	30aeda4394	Include the first valid listen address in pg_ctl to improve server start "wait" detection and add postmaster start time to help determine if the postmaster is actually using the specified data directory.	2010-12-31 17:25:02 -05:00
Tom Lane	7b46401557	Move symbols for ExecMergeJoin's state machine into nodeMergejoin.c. There's no reason for these values to be known anywhere else. After doing this, executor/execdefs.h is vestigial and can be removed.	2010-12-30 22:12:40 -05:00
Tom Lane	f4e4b32743	Support RIGHT and FULL OUTER JOIN in hash joins. This is advantageous first because it allows us to hash the smaller table regardless of the outer-join type, and second because hash join can be more flexible than merge join in dealing with arbitrary join quals in a FULL join. For merge join all the join quals have to be mergejoinable, but hash join will work so long as there's at least one hashjoinable qual --- the others can be any condition. (This is true essentially because we don't keep per-inner-tuple match flags in merge join, while hash join can do so.) To do this, we need a has-it-been-matched flag for each tuple in the hashtable, not just one for the current outer tuple. The key idea that makes this practical is that we can store the match flag in the tuple's infomask, since there are lots of bits there that are of no interest for a MinimalTuple. So we aren't increasing the size of the hashtable at all for the feature. To write this without turning the hash code into even more of a pile of spaghetti than it already was, I rewrote ExecHashJoin in a state-machine style, similar to ExecMergeJoin. Other than that decision, it was pretty straightforward.	2010-12-30 20:26:08 -05:00

1 2 3 4 5 ...

5323 Commits