Commit Graph

133 Commits

Author SHA1 Message Date
3d9e73ea5f Update copyright years in some recently added files 2017-01-25 12:32:05 -05:00
b480086760 Add more includes so header files are self-contained 2017-01-21 15:49:53 -05:00
665d1fad99 Logical replication
- Add PUBLICATION catalogs and DDL
- Add SUBSCRIPTION catalog and DDL
- Define logical replication protocol and output plugin
- Add logical replication workers

From: Petr Jelinek <petr@2ndquadrant.com>
Reviewed-by: Steve Singer <steve@ssinger.info>
Reviewed-by: Andres Freund <andres@anarazel.de>
Reviewed-by: Erik Rijkers <er@xs4all.nl>
Reviewed-by: Peter Eisentraut <peter.eisentraut@2ndquadrant.com>
2017-01-20 09:04:49 -05:00
352a24a1f9 Generate fmgr prototypes automatically
Gen_fmgrtab.pl creates a new file fmgrprotos.h, which contains
prototypes for all functions registered in pg_proc.h.  This avoids
having to manually maintain these prototypes across a random variety of
header files.  It also automatically enforces a correct function
signature, and since there are warnings about missing prototypes, it
will detect functions that are defined but not registered in
pg_proc.h (or otherwise used).

Reviewed-by: Pavel Stehule <pavel.stehule@gmail.com>
2017-01-17 14:06:07 -05:00
1d25779284 Update copyright via script for 2017 2017-01-03 13:48:53 -05:00
3901fd70cc Support quorum-based synchronous replication.
This feature is also known as "quorum commit" especially in discussion
on pgsql-hackers.

This commit adds the following new syntaxes into synchronous_standby_names
GUC. By using FIRST and ANY keywords, users can specify the method to
choose synchronous standbys from the listed servers.

  FIRST num_sync (standby_name [, ...])
  ANY num_sync (standby_name [, ...])

The keyword FIRST specifies a priority-based synchronous replication
which was available also in 9.6 or before. This method makes transaction
commits wait until their WAL records are replicated to num_sync
synchronous standbys chosen based on their priorities.

The keyword ANY specifies a quorum-based synchronous replication
and makes transaction commits wait until their WAL records are
replicated to *at least* num_sync listed standbys. In this method,
the values of sync_state.pg_stat_replication for the listed standbys
are reported as "quorum". The priority is still assigned to each standby,
but not used in this method.

The existing syntaxes having neither FIRST nor ANY keyword are still
supported. They are the same as new syntax with FIRST keyword, i.e.,
a priorirty-based synchronous replication.

Author: Masahiko Sawada
Reviewed-By: Michael Paquier, Amit Kapila and me
Discussion: <CAD21AoAACi9NeC_ecm+Vahm+MMA6nYh=Kqs3KB3np+MBOS_gZg@mail.gmail.com>

Many thanks to the various individuals who were involved in
discussing and developing this feature.
2016-12-19 21:15:30 +09:00
a924c327e2 Add support for temporary replication slots
This allows creating temporary replication slots that are removed
automatically at the end of the session or on error.

From: Petr Jelinek <petr.jelinek@2ndquadrant.com>
2016-12-12 08:38:17 -05:00
5714931b07 Fix Windows build for 78c8c814390f
Author: Petr Jelínek
2016-12-02 09:40:36 -03:00
78c8c81439 Refactor libpqwalreceiver
The whole walreceiver API is now wrapped into a struct, like most of our
other loadable module APIs.  The libpq connection is no longer a global
variable in libpqwalreceiver.  Instead, it is encapsulated into a struct
that is passed around the functions.  This allows multiple walreceivers
to run at the same time.

Add some rudimentary support for logical replication connections to
libpqwalreceiver.

These changes are mostly cosmetic and are going to be useful for the
future logical replication patches.

From: Petr Jelinek <petr@2ndquadrant.com>
2016-12-01 20:23:28 -05:00
597a87ccc9 Use latch instead of select() in walreceiver
Replace use of poll()/select() by WaitLatchOrSocket(), which is more
portable and flexible.

Also change walreceiver to use its procLatch instead of a custom latch.

From: Petr Jelinek <petr@2ndquadrant.com>
2016-12-01 20:23:28 -05:00
9ed551e0a4 Add conninfo to pg_stat_wal_receiver
Commit b1a9bad9e744 introduced a stats view to provide insight into the
running WAL receiver, but neglected to include the connection string in
it, as reported by Michaël Paquier.  This commit fixes that omission.
(Any security-sensitive information is not disclosed).

While at it, close the mild security hole that we were exposing the
password in the connection string in shared memory.  This isn't
user-accessible, but it still looks like a good idea to avoid having the
cleartext password in memory.

Author: Michaël Paquier, Álvaro Herrera
Review by: Vik Fearing

Discussion: https://www.postgresql.org/message-id/CAB7nPqStg4M561obo7ryZ5G+fUydG4v1Ajs1xZT1ujtu+woRag@mail.gmail.com
2016-06-29 16:57:17 -04:00
4bc424b968 pgindent run for 9.6 2016-06-09 18:02:36 -04:00
6b60916829 Fix thinko in comment
Pointed out by Andres Freund
2016-05-02 16:46:42 -03:00
234a266066 Fix code comments regarding logical decoding
Back in 3b02ea4f0780 I added some comments in various places to explain
how logical decoding and other things worked.  Not all of the changes
were welcome, because they were misleading or wrong.  This changes them
a little bit to make them more accurate.

Some other comments are also changed to be more accurate.  Also, fix a
bunch of typos.

Author: Álvaro Herrera, Craig Ringer

Andres Freund reviewed some parts of this.
2016-05-02 16:04:29 -03:00
4c804fbdfb Clean up parsing of synchronous_standby_names GUC variable.
Commit 989be0810dffd08b added a flex/bison lexer/parser to interpret
synchronous_standby_names.  It was done in a pretty crufty way, though,
making assorted end-use sites responsible for calling the parser at the
right times.  That was not only vulnerable to errors of omission, but made
it possible that lexer/parser errors occur at very undesirable times,
and created memory leakages even if there was no error.

Instead, perform the parsing once during check_synchronous_standby_names
and let guc.c manage the resulting data.  To do that, we have to flatten
the parsed representation into a single hunk of malloc'd memory, but that
is not very hard.

While at it, work a little harder on making useful error reports for
parsing problems; the previous code felt that "synchronous_standby_names
parser returned 1" was an appropriate user-facing error message.  (To
be fair, it did also log a syntax error message, but separately from the
GUC problem report, which is at best confusing.)  It had some outright
bugs in the face of invalid input, too.

I (tgl) also concluded that we need to restrict unquoted names in
synchronous_standby_names to be just SQL identifiers.  The previous coding
would accept darn near anything, which (1) makes the quoting convention
both nearly-unnecessary and formally ambiguous, (2) makes it very hard to
understand what is a syntax error and what is a creative interpretation of
the input as a standby name, and (3) makes it impossible to further extend
the syntax in future without a compatibility break.  I presume that we're
intending future extensions of the syntax, else this parsing infrastructure
is massive overkill, so (3) is an important objection.  Since we've taken
a compatibility hit for non-identifier names with this change anyway, we
might as well lock things down now and insist that users use double quotes
for standby names that aren't identifiers.

Kyotaro Horiguchi and Tom Lane
2016-04-27 17:55:25 -04:00
c6ff84b06a Emit invalidations to standby for transactions without xid.
So far, when a transaction with pending invalidations, but without an
assigned xid, committed, we simply ignored those invalidation
messages. That's problematic, because those are actually sent for a
reason.

Known symptoms of this include that existing sessions on a hot-standby
replica sometimes fail to notice new concurrently built indexes and
visibility map updates.

The solution is to WAL log such invalidations in transactions without an
xid. We considered to alternatively force-assign an xid, but that'd be
problematic for vacuum, which might be run in systems with few xids.

Important: This adds a new WAL record, but as the patch has to be
back-patched, we can't bump the WAL page magic. This means that standbys
have to be updated before primaries; otherwise
"PANIC: standby_redo: unknown op code 32" errors can be encountered.

XXX:

Reported-By: Васильев Дмитрий, Masahiko Sawada
Discussion:
    CAB-SwXY6oH=9twBkXJtgR4UC1NqT-vpYAtxCseME62ADwyK5OA@mail.gmail.com
    CAD21AoDpZ6Xjg=gFrGPnSn4oTRRcwK1EBrWCq9OqOHuAcMMC=w@mail.gmail.com
2016-04-26 20:21:54 -07:00
c2dc194bdb Adjust signature of walrcv_receive hook.
Commit 314cbfc5da988eff redefined the signature of this hook as
typedef int (*walrcv_receive_type) (char **buffer, int *wait_fd);

But in fact the type of the "wait_fd" variable ought to be pgsocket,
which is what WaitLatchOrSocket expects, and which is necessary if
we want to be able to assign PGINVALID_SOCKET to it on Windows.
So fix that.
2016-04-14 13:49:37 -04:00
be65eddd80 Add required database and origin filtering for logical messages.
Logical messages, added in 3fe3511d05, during decoding failed to filter
messages emitted in other databases and messages emitted "under" a
replication origin the output plugin isn't interested in.

Add tests to verify that both types of filtering actually work. While
touching message.sql remove hunk obsoleted by d25379e.

Bump XLOG_PAGE_MAGIC because xl_logical_message changed and because
3fe3511d05 had omitted doing so. 3fe3511d05 additionally didn't bump
catversion, but 7a542700d has done so since.

Author: Petr Jelinek
Reported-By: Andres Freund
Discussion: 20160406142513.wotqy3ba3kanr423@alap3.anarazel.de
2016-04-13 17:38:54 -07:00
3fe3511d05 Generic Messages for Logical Decoding
API and mechanism to allow generic messages to be inserted into WAL that are
intended to be read by logical decoding plugins. This commit adds an optional
new callback to the logical decoding API.

Messages are either text or bytea. Messages can be transactional, or not, and
are identified by a prefix to allow multiple concurrent decoding plugins.

(Not to be confused with Generic WAL records, which are intended to allow crash
recovery of extensible objects.)

Author: Petr Jelinek and Andres Freund
Reviewers: Artur Zakirov, Tomas Vondra, Simon Riggs
Discussion: 5685F999.6010202@2ndquadrant.com
2016-04-06 10:05:41 +01:00
989be0810d Support multiple synchronous standby servers.
Previously synchronous replication offered only the ability to confirm
that all changes made by a transaction had been transferred to at most
one synchronous standby server.

This commit extends synchronous replication so that it supports multiple
synchronous standby servers. It enables users to consider one or more
standby servers as synchronous, and increase the level of transaction
durability by ensuring that transaction commits wait for replies from
all of those synchronous standbys.

Multiple synchronous standby servers are configured in
synchronous_standby_names which is extended to support new syntax of
'num_sync ( standby_name [ , ... ] )', where num_sync specifies
the number of synchronous standbys that transaction commits need to
wait for replies from and standby_name is the name of a standby
server.

The syntax of 'standby_name [ , ... ]' which was used in 9.5 or before
is also still supported. It's the same as new syntax with num_sync=1.

This commit doesn't include "quorum commit" feature which was discussed
in pgsql-hackers. Synchronous standbys are chosen based on their priorities.
synchronous_standby_names determines the priority of each standby for
being chosen as a synchronous standby. The standbys whose names appear
earlier in the list are given higher priority and will be considered as
synchronous. Other standby servers appearing later in this list
represent potential synchronous standbys.

The regression test for multiple synchronous standbys is not included
in this commit. It should come later.

Authors: Sawada Masahiko, Beena Emerson, Michael Paquier, Fujii Masao
Reviewed-By: Kyotaro Horiguchi, Amit Kapila, Robert Haas, Simon Riggs,
Amit Langote, Thomas Munro, Sameer Thakur, Suraj Kharage, Abhijit Menon-Sen,
Rajeev Rastogi

Many thanks to the various individuals who were involved in
discussing and developing this feature.
2016-04-06 17:18:25 +09:00
3dd0792ae0 Blind attempt at fixing Win32 issue on 24c5f1a103c
As best as I can tell, MyReplicationSlot needs to be PGDLLIMPORT in
order for the new test_slot_timelines test module to compile.

Per buildfarm
2016-03-30 23:12:20 -03:00
314cbfc5da Add new replication mode synchronous_commit = 'remote_apply'.
In this mode, the master waits for the transaction to be applied on
the remote side, not just written to disk.  That means that you can
count on a transaction started on the standby to see all commits
previously acknowledged by the master.

To make this work, the standby sends a reply after replaying each
commit record generated with synchronous_commit >= 'remote_apply'.
This introduces a small inefficiency: the extra replies will be sent
even by standbys that aren't the current synchronous standby.  But
previously-existing synchronous_commit levels make no attempt at all
to optimize which replies are sent based on what the primary cares
about, so this is no worse, and at least avoids any extra replies for
people not using the feature at all.

Thomas Munro, reviewed by Michael Paquier and by me.  Some additional
tweaks by me.
2016-03-29 21:29:49 -04:00
c8f621c43a logical decoding: Fix handling of large old tuples with replica identity full.
When decoding the old version of an UPDATE or DELETE change, and if that
tuple was bigger than MaxHeapTupleSize, we either Assert'ed out, or
failed in more subtle ways in non-assert builds.  Normally individual
tuples aren't bigger than MaxHeapTupleSize, with big datums toasted.
But that's not the case for the old version of a tuple for logical
decoding; the replica identity is logged as one piece. With the default
replica identity btree limits that to small tuples, but that's not the
case for FULL.

Change the tuple buffer infrastructure to separate allocate over-large
tuples, instead of always going through the slab cache.

This unfortunately requires changing the ReorderBufferTupleBuf
definition, we need to store the allocated size someplace. To avoid
requiring output plugins to recompile, don't store HeapTupleHeaderData
directly after HeapTupleData, but point to it via t_data; that leaves
rooms for the allocated size.  As there's no reason for an output plugin
to look at ReorderBufferTupleBuf->t_data.header, remove the field. It
was just a minor convenience having it directly accessible.

Reported-By: Adam Dratwiński
Discussion: CAKg6ypLd7773AOX4DiOGRwQk1TVOQKhNwjYiVjJnpq8Wo+i62Q@mail.gmail.com
2016-03-05 18:02:20 -08:00
d9e903f3cb logical decoding: Tell reorderbuffer about all xids.
Logical decoding's reorderbuffer keeps transactions in an LSN ordered
list for efficiency. To make that's efficiently possible upper-level
xids are forced to be logged before nested subtransaction xids.  That
only works though if these records are all looked at: Unfortunately we
didn't do so for e.g. row level locks, which are otherwise uninteresting
for logical decoding.

This could lead to errors like:
"ERROR: subxact logged without previous toplevel record".

It's not sufficient to just look at row locking records, the xid could
appear first due to a lot of other types of records (which will trigger
the transaction to be marked logged with MarkCurrentTransactionIdLoggedIfAny).
So invent infrastructure to tell reorderbuffer about xids seen, when
they'd otherwise not pass through reorderbuffer.c.

Reported-By: Jarred Ward
Bug: #13844
Discussion: 20160105033249.1087.66040@wrigleys.postgresql.org
Backpatch: 9.4, where logical decoding was added
2016-03-05 18:02:20 -08:00
343f709c06 Fix typos
Backpatch to: 9.4
2016-02-25 20:50:20 -03:00
2251179e6a Migrate replication slot I/O locks into a separate tranche.
This is following in a long train of similar changes and for the same
reasons - see b319356f0e94a6482c726cf4af96597c211d8d6e and
fe702a7b3f9f2bc5bf6d173166d7d55226af82c8 inter alia.

Author: Amit Kapila
Reviewed-by: Alexander Korotkov, Robert Haas
2016-01-29 09:45:38 -05:00
b1a9bad9e7 pgstat: add WAL receiver status view & SRF
This new view provides insight into the state of a running WAL receiver
in a HOT standby node.
The information returned includes the PID of the WAL receiver process,
its status (stopped, starting, streaming, etc), start LSN and TLI, last
received LSN and TLI, timestamp of last message send and receipt, latest
end-of-WAL LSN and time, and the name of the slot (if any).

Access to the detailed data is only granted to superusers; others only
get the PID.

Author: Michael Paquier
Reviewer: Haribabu Kommi
2016-01-07 16:21:19 -03:00
ee94300446 Update copyright for 2016
Backpatch certain files through 9.1
2016-01-02 13:33:40 -05:00
17f5831c81 Fix "sesssion" typo
It was introduced alongside replication origins, by commit
5aa2350426c, so backpatch to 9.5.

Pointed out by Fujii Masao
2015-09-28 19:13:42 -03:00
4901b2f495 Don't include rel.h when relcache.h is sufficient
Trivial change to reduce exposure of rel.h.
2015-08-11 13:03:14 -03:00
6fcd88511f Allow pg_create_physical_replication_slot() to reserve WAL.
When creating a physical slot it's often useful to immediately reserve
the current WAL position instead of only doing after the first feedback
message arrives. That e.g. allows slots to guarantee that all the WAL
for a base backup will be available afterwards.

Logical slots already have to reserve WAL during creation, so generalize
that logic into being usable for both physical and logical slots.

Catversion bump because of the new parameter.

Author: Gurjeet Singh
Reviewed-By: Andres Freund
Discussion: CABwTF4Wh_dBCzTU=49pFXR6coR4NW1ynb+vBqT+Po=7fuq5iCw@mail.gmail.com
2015-08-11 12:34:31 +02:00
093d0c83c1 Introduce macros determining if a replication slot is physical or logical.
These make the code a bit easier to read, and make it easier to add a
more explicit notion of a slot's type at some point in the future.

Author: Gurjeet Singh
Discussion: CABwTF4Wh_dBCzTU=49pFXR6coR4NW1ynb+vBqT+Po=7fuq5iCw@mail.gmail.com
2015-08-11 12:32:48 +02:00
3a145757a0 Improve includes introduced in the replication origins patch.
pg_resetxlog.h contained two superfluous includes, origin.h superfluously
depended on logical.h, and pg_xlogdump's rmgrdesc.h only indirectly
included origin.h.

Backpatch: 9.5, where replication origins were introduced.
2015-08-06 12:41:46 +02:00
807b9e0dff pgindent run for 9.5 2015-05-23 21:35:49 -04:00
4fc72cc7bb Collection of typo fixes.
Use "a" and "an" correctly, mostly in comments. Two error messages were
also fixed (they were just elogs, so no translation work required). Two
function comments in pg_proc.h were also fixed. Etsuro Fujita reported one
of these, but I found a lot more with grep.

Also fix a few other typos spotted while grepping for the a/an typos.
For example, "consists out of ..." -> "consists of ...". Plus a "though"/
"through" mixup reported by Euler Taveira.

Many of these typos were in old code, which would be nice to backpatch to
make future backpatching easier. But much of the code was new, and I didn't
feel like crafting separate patches for each branch. So no backpatching.
2015-05-20 16:56:22 +03:00
72d422a522 Map basebackup tablespaces using a tablespace_map file
Windows can't reliably restore symbolic links from a tar format, so
instead during backup start we create a tablespace_map file, which is
used by the restoring postgres to create the correct links in pg_tblspc.
The backup protocol also now has an option to request this file to be
included in the backup stream, and this is used by pg_basebackup when
operating in tar mode.

This is done on all platforms, not just Windows.

This means that pg_basebackup will not not work in tar mode against 9.4
and older servers, as this protocol option isn't implemented there.

Amit Kapila, reviewed by Dilip Kumar, with a little editing from me.
2015-05-12 09:29:10 -04:00
168d5805e4 Add support for INSERT ... ON CONFLICT DO NOTHING/UPDATE.
The newly added ON CONFLICT clause allows to specify an alternative to
raising a unique or exclusion constraint violation error when inserting.
ON CONFLICT refers to constraints that can either be specified using a
inference clause (by specifying the columns of a unique constraint) or
by naming a unique or exclusion constraint.  DO NOTHING avoids the
constraint violation, without touching the pre-existing row.  DO UPDATE
SET ... [WHERE ...] updates the pre-existing tuple, and has access to
both the tuple proposed for insertion and the existing tuple; the
optional WHERE clause can be used to prevent an update from being
executed.  The UPDATE SET and WHERE clauses have access to the tuple
proposed for insertion using the "magic" EXCLUDED alias, and to the
pre-existing tuple using the table name or its alias.

This feature is often referred to as upsert.

This is implemented using a new infrastructure called "speculative
insertion". It is an optimistic variant of regular insertion that first
does a pre-check for existing tuples and then attempts an insert.  If a
violating tuple was inserted concurrently, the speculatively inserted
tuple is deleted and a new attempt is made.  If the pre-check finds a
matching tuple the alternative DO NOTHING or DO UPDATE action is taken.
If the insertion succeeds without detecting a conflict, the tuple is
deemed inserted.

To handle the possible ambiguity between the excluded alias and a table
named excluded, and for convenience with long relation names, INSERT
INTO now can alias its target table.

Bumps catversion as stored rules change.

Author: Peter Geoghegan, with significant contributions from Heikki
    Linnakangas and Andres Freund. Testing infrastructure by Jeff Janes.
Reviewed-By: Heikki Linnakangas, Andres Freund, Robert Haas, Simon Riggs,
    Dean Rasheed, Stephen Frost and many others.
2015-05-08 05:43:10 +02:00
e0f26fc765 Correct replication origin's use of UINT16_MAX to PG_UINT16_MAX.
We can't rely on UINT16_MAX being present, which is why we introduced
PG_UINT16_MAX...

Buildfarm animal bowerbird via Andrew Gierth.
2015-04-30 00:19:36 +02:00
5aa2350426 Introduce replication progress tracking infrastructure.
When implementing a replication solution ontop of logical decoding, two
related problems exist:
* How to safely keep track of replication progress
* How to change replication behavior, based on the origin of a row;
  e.g. to avoid loops in bi-directional replication setups

The solution to these problems, as implemented here, consist out of
three parts:

1) 'replication origins', which identify nodes in a replication setup.
2) 'replication progress tracking', which remembers, for each
   replication origin, how far replay has progressed in a efficient and
   crash safe manner.
3) The ability to filter out changes performed on the behest of a
   replication origin during logical decoding; this allows complex
   replication topologies. E.g. by filtering all replayed changes out.

Most of this could also be implemented in "userspace", e.g. by inserting
additional rows contain origin information, but that ends up being much
less efficient and more complicated.  We don't want to require various
replication solutions to reimplement logic for this independently. The
infrastructure is intended to be generic enough to be reusable.

This infrastructure also replaces the 'nodeid' infrastructure of commit
timestamps. It is intended to provide all the former capabilities,
except that there's only 2^16 different origins; but now they integrate
with logical decoding. Additionally more functionality is accessible via
SQL.  Since the commit timestamp infrastructure has also been introduced
in 9.5 (commit 73c986add) changing the API is not a problem.

For now the number of origins for which the replication progress can be
tracked simultaneously is determined by the max_replication_slots
GUC. That GUC is not a perfect match to configure this, but there
doesn't seem to be sufficient reason to introduce a separate new one.

Bumps both catversion and wal page magic.

Author: Andres Freund, with contributions from Petr Jelinek and Craig Ringer
Reviewed-By: Heikki Linnakangas, Petr Jelinek, Robert Haas, Steve Singer
Discussion: 20150216002155.GI15326@awork2.anarazel.de,
    20140923182422.GA15776@alap3.anarazel.de,
    20131114172632.GE7522@alap2.anarazel.de
2015-04-29 19:30:53 +02:00
d811c037ce Add 'active_in' column to pg_replication_slots.
Right now it is visible whether a replication slot is active in any
session, but not in which.  Adding the active_in column, containing the
pid of the backend having acquired the slot, makes it much easier to
associate pg_replication_slots entries with the corresponding
pg_stat_replication/pg_stat_activity row.

This should have been done from the start, but I (Andres) dropped the
ball there somehow.

Author: Craig Ringer, revised by me Discussion:
CAMsr+YFKgZca5_7_ouaMWxA5PneJC9LNViPzpDHusaPhU9pA7g@mail.gmail.com
2015-04-21 11:51:06 +02:00
785941cdc3 Tweak __attribute__-wrapping macros for better pgindent results.
This improves on commit bbfd7edae5aa5ad5553d3c7e102f2e450d4380d4 by
making two simple changes:

* pg_attribute_noreturn now takes parentheses, ie pg_attribute_noreturn().
Likewise pg_attribute_unused(), pg_attribute_packed().  This reduces
pgindent's tendency to misformat declarations involving them.

* attributes are now always attached to function declarations, not
definitions.  Previously some places were taking creative shortcuts,
which were not merely candidates for bad misformatting by pgindent
but often were outright wrong anyway.  (It does little good to put a
noreturn annotation where callers can't see it.)  In any case, if
we would like to believe that these macros can be used with non-gcc
compilers, we should avoid gratuitous variance in usage patterns.

I also went through and manually improved the formatting of a lot of
declarations, and got rid of excessively repetitive (and now obsolete
anyway) comments informing the reader what pg_attribute_printf is for.
2015-03-26 14:03:25 -04:00
656ea810e5 Make SyncRepWakeQueue to a static function
It is only used in src/backend/replication/syncrep.c.

Back-patch to all supported branches except 9.1 which declares the
function as static.
2015-03-26 10:34:08 +09:00
bbfd7edae5 Add macros wrapping all usage of gcc's __attribute__.
Until now __attribute__() was defined to be empty for all compilers but
gcc. That's problematic because it prevents using it in other compilers;
which is necessary e.g. for atomics portability.  It's also just
generally dubious to do so in a header as widely included as c.h.

Instead add pg_attribute_format_arg, pg_attribute_printf,
pg_attribute_noreturn macros which are implemented in the compilers that
understand them. Also add pg_attribute_noreturn and pg_attribute_packed,
but don't provide fallbacks, since they can affect functionality.

This means that external code that, possibly unwittingly, relied on
__attribute__ defined to be empty on !gcc compilers may now run into
warnings or errors on those compilers. But there shouldn't be many
occurances of that and it's hard to work around...

Discussion: 54B58BA3.8040302@ohmu.fi
Author: Oskari Saarenmaa, with some minor changes by me.
2015-03-11 14:30:01 +01:00
e1a11d9311 Use FLEXIBLE_ARRAY_MEMBER for HeapTupleHeaderData.t_bits[].
This requires changing quite a few places that were depending on
sizeof(HeapTupleHeaderData), but it seems for the best.

Michael Paquier, some adjustments by me
2015-02-21 15:13:06 -05:00
f2874feb7c Some more FLEXIBLE_ARRAY_MEMBER fixes. 2015-02-21 01:46:43 -05:00
09d8d110a6 Use FLEXIBLE_ARRAY_MEMBER in a bunch more places.
Replace some bogus "x[1]" declarations with "x[FLEXIBLE_ARRAY_MEMBER]".
Aside from being more self-documenting, this should help prevent bogus
warnings from static code analyzers and perhaps compiler misoptimizations.

This patch is just a down payment on eliminating the whole problem, but
it gets rid of a lot of easy-to-fix cases.

Note that the main problem with doing this is that one must no longer rely
on computing sizeof(the containing struct), since the result would be
compiler-dependent.  Instead use offsetof(struct, lastfield).  Autoconf
also warns against spelling that offsetof(struct, lastfield[0]).

Michael Paquier, review and additional fixes by me.
2015-02-20 00:11:42 -05:00
ff44fba46c Replace walsender's latch with the general shared latch.
Relying on the normal shared latch simplifies interrupt/signal
handling because we can rely on all signal handlers setting the proc
latch. That in turn allows us to avoid the use of
ImmediateInterruptOK, which arguably isn't correct because
WaitLatchOrSocket isn't declared to be immediately interruptible.

Also change sections that wait on the walsender's latch to notice
interrupts quicker/more reliably and make them more consistent with
each other.

This is part of a larger "get rid of ImmediateInterruptOK" series.

Discussion: 20150115020335.GZ5245@awork2.anarazel.de
2015-01-17 13:00:42 +01:00
4baaf863ec Update copyright for 2015
Backpatch certain files through 9.0
2015-01-06 11:43:47 -05:00
b1332e98c4 Put the logic to decide which synchronous standby is active into a function.
This avoids duplicating the code.

Michael Paquier, reviewed by Simon Riggs and me
2014-12-12 14:26:42 +02:00
2c03216d83 Revamp the WAL record format.
Each WAL record now carries information about the modified relation and
block(s) in a standardized format. That makes it easier to write tools that
need that information, like pg_rewind, prefetching the blocks to speed up
recovery, etc.

There's a whole new API for building WAL records, replacing the XLogRecData
chains used previously. The new API consists of XLogRegister* functions,
which are called for each buffer and chunk of data that is added to the
record. The new API also gives more control over when a full-page image is
written, by passing flags to the XLogRegisterBuffer function.

This also simplifies the XLogReadBufferForRedo() calls. The function can dig
the relation and block number from the WAL record, so they no longer need to
be passed as arguments.

For the convenience of redo routines, XLogReader now disects each WAL record
after reading it, copying the main data part and the per-block data into
MAXALIGNed buffers. The data chunks are not aligned within the WAL record,
but the redo routines can assume that the pointers returned by XLogRecGet*
functions are. Redo routines are now passed the XLogReaderState, which
contains the record in the already-disected format, instead of the plain
XLogRecord.

The new record format also makes the fixed size XLogRecord header smaller,
by removing the xl_len field. The length of the "main data" portion is now
stored at the end of the WAL record, and there's a separate header after
XLogRecord for it. The alignment padding at the end of XLogRecord is also
removed. This compansates for the fact that the new format would otherwise
be more bulky than the old format.

Reviewed by Andres Freund, Amit Kapila, Michael Paquier, Alvaro Herrera,
Fujii Masao.
2014-11-20 18:46:41 +02:00