Commit Graph

1291 Commits

Author SHA1 Message Date
ea92368cd1 Move max_wal_senders out of max_connections for connection slot handling
Since its introduction, max_wal_senders is counted as part of
max_connections when it comes to define how many connection slots can be
used for replication connections with a WAL sender context.  This can
lead to confusion for some users, as it could be possible to block a
base backup or replication from happening because other backend sessions
are already taken for other purposes by an application, and
superuser-only connection slots are not a correct solution to handle
that case.

This commit makes max_wal_senders independent of max_connections for its
handling of PGPROC entries in ProcGlobal, meaning that connection slots
for WAL senders are handled using their own free queue, like autovacuum
workers and bgworkers.

One compatibility issue that this change creates is that a standby now
requires to have a value of max_wal_senders at least equal to its
primary.  So, if a standby created enforces the value of
max_wal_senders to be lower than that, then this could break failovers.
Normally this should not be an issue though, as any settings of a
standby are inherited from its primary as postgresql.conf gets normally
copied as part of a base backup, so parameters would be consistent.

Author: Alexander Kukushkin
Reviewed-by: Kyotaro Horiguchi, Petr Jelínek, Masahiko Sawada, Oleksii
Kliukin
Discussion: https://postgr.es/m/CAFh8B=nBzHQeYAu0b8fjK-AF1X4+_p6GRtwG+cCgs6Vci2uRuQ@mail.gmail.com
2019-02-12 10:07:56 +09:00
171e0418b0 Fix heap_getattr() handling of fast defaults.
Previously heap_getattr() returned NULL for attributes with a fast
default value (c.f. 16828d5c0273), as it had no handling whatsoever
for that case.

A previous fix, 7636e5c60f, attempted to fix issues caused by this
oversight, but just expanding OLD tuples for triggers doesn't actually
solve the underlying issue.

One known consequence of this bug is that the check for HOT updates
can return the wrong result, when a previously fast-default'ed column
is set to NULL. Which in turn means that an index over a column with
fast default'ed columns might be corrupt if the underlying column(s)
allow NULLs.

Fix by handling fast default columns in heap_getattr(), remove now
superfluous expansion in GetTupleForTrigger().

Author: Andres Freund
Discussion: https://postgr.es/m/20190201162404.onngi77f26baem4g@alap3.anarazel.de
Backpatch: 11, where fast defaults were introduced
2019-02-06 01:09:32 -08:00
fa2cf164aa Rename nodes/relation.h to nodes/pathnodes.h.
The old name of this file was never a very good indication of what it
was for.  Now that there's also access/relation.h, we have a potential
confusion hazard as well, so let's rename it to something more apropos.
Per discussion, "pathnodes.h" is reasonable, since a good fraction of
the file is Path node definitions.

While at it, tweak a couple of other headers that were gratuitously
importing relation.h into modules that don't need it.

Discussion: https://postgr.es/m/7719.1548688728@sss.pgh.pa.us
2019-01-29 16:49:25 -05:00
c9b75c5838 Simplify restriction handling of two-phase commit for temporary objects
There were two flags used to track the access to temporary tables and
to the temporary namespace of a session which are used to restrict
PREPARE TRANSACTION, however the first control flag is a concept
included in the second.  This removes the flag for temporary table
tracking, keeping around only the one at namespace level.

Author: Michael Paquier
Reviewed-by: Álvaro Herrera
Discussion: https://postgr.es/m/20190118053126.GH1883@paquier.xyz
2019-01-26 10:45:23 +09:00
c91560defc Move remaining code from tqual.[ch] to heapam.h / heapam_visibility.c.
Given these routines are heap specific, and that there will be more
generic visibility support in via table AM, it makes sense to move the
prototypes to heapam.h (routines like HeapTupleSatisfiesVacuum will
not be exposed in a generic fashion, because they are too storage
specific).

Similarly, the code in tqual.c is specific to heap, so moving it into
access/heap/ makes sense.

Author: Andres Freund
Discussion: https://postgr.es/m/20180703070645.wchpu5muyto5n647@alap3.anarazel.de
2019-01-21 17:07:10 -08:00
4b21acf522 Introduce access/{table.h, relation.h}, for generic functions from heapam.h.
access/heapam contains functions that are very storage specific (say
heap_insert() and a lot of lower level functions), and fairly generic
infrastructure like relation_open(), heap_open() etc.  In the upcoming
pluggable storage work we're introducing a layer between table
accesses in general and heapam, to allow for different storage
methods. For a bit cleaner separation it thus seems advantageous to
move generic functions like the aforementioned to their own headers.

access/relation.h will contain relation_open() etc, and access/table.h
will contain table_open() (formerly known as heap_open()). I've decided
for table.h not to include relation.h, but we might change that at a
later stage.

relation.h already exists in another directory, but the other
plausible name (rel.h) also conflicts. It'd be nice if there were a
non-conflicting name, but nobody came up with a suggestion. It's
possible that the appropriate way to address the naming conflict would
be to rename nodes/relation.h, which isn't particularly well named.

To avoid breaking a lot of extensions that just use heap_open() etc,
table.h has macros mapping the old names to the new ones, and heapam.h
includes relation, table.h.  That also allows to keep the
bulk renaming of existing callers in a separate commit.

Author: Andres Freund
Discussion: https://postgr.es/m/20190111000539.xbv7s6w7ilcvm7dp@alap3.anarazel.de
2019-01-21 10:51:36 -08:00
c5660e0aa5 Restrict the use of temporary namespace in two-phase transactions
Attempting to use a temporary table within a two-phase transaction is
forbidden for ages.  However, there have been uncovered grounds for
a couple of other object types and commands which work on temporary
objects with two-phase commit.  In short, trying to create, lock or drop
an object on a temporary schema should not be authorized within a
two-phase transaction, as it would cause its state to create
dependencies with other sessions, causing all sorts of side effects with
the existing session or other sessions spawned later on trying to use
the same temporary schema name.

Regression tests are added to cover all the grounds found, the original
report mentioned function creation, but monitoring closer there are many
other patterns with LOCK, DROP or CREATE EXTENSION which are involved.
One of the symptoms resulting in combining both is that the session
which used the temporary schema is not able to shut down completely,
waiting for being able to drop the temporary schema, something that it
cannot complete because of the two-phase transaction involved with
temporary objects.  In this case the client is able to disconnect but
the session remains alive on the backend-side, potentially blocking
connection backend slots from being used.  Other problems reported could
also involve server crashes.

This is back-patched down to v10, which is where 9b013dc has introduced
MyXactFlags, something that this patch relies on.

Reported-by: Alexey Bashtanov
Author: Michael Paquier
Reviewed-by: Masahiko Sawada
Discussion: https://postgr.es/m/5d910e2e-0db8-ec06-dd5f-baec420513c3@imap.cc
Backpatch-through: 10
2019-01-18 09:21:44 +09:00
285d8e1205 Move vacuumlazy.c into access/heap.
It's heap table storage specific code that can't realistically be
generalized into table AM agnostic code.

Author: Andres Freund
Discussion: https://postgr.es/m/20180703070645.wchpu5muyto5n647@alap3.anarazel.de
2019-01-15 12:06:19 -08:00
1c53c4dec3 Finish reverting "recheck_on_update" patch.
This reverts commit c203d6cf8 and some follow-on fixes, completing the
task begun in commit 5d28c9bd7.  If that feature is ever resurrected,
the code will look quite a bit different from this, so it seems best
to start from a clean slate.

The v11 branch is not touched; in that branch, the recheck_on_update
storage option remains present, but nonfunctional and undocumented.

Discussion: https://postgr.es/m/20190114223409.3tcvejfhlvbucrv5@alap3.anarazel.de
2019-01-15 12:07:10 -05:00
0944ec54de Don't include genam.h from execnodes.h and relscan.h anymore.
This is the genam.h equivalent of 4c850ecec649c (which removed
heapam.h from a lot of other headers).  There's still a few header
includes of genam.h, but not from central headers anymore.

As a few headers are not indirectly included anymore, execnodes.h and
relscan.h need a few additional includes. Some of the depended on
types were replacable by using the underlying structs, but e.g. for
Snapshot in execnodes.h that'd have gotten more invasive than
reasonable in this commit.

Like the aforementioned commit 4c850ecec649c, this requires adding new
genam.h includes to a number of backend files, which likely is also
required in a few external projects.

Author: Andres Freund
Discussion: https://postgr.es/m/20190114000701.y4ttcb74jpskkcfb@alap3.anarazel.de
2019-01-14 17:02:12 -08:00
774a975c9a Make naming of tupdesc related structs more consistent with the rest of PG.
We usually don't change the name of structs between the struct name
itself and the name of the typedef. Additionally, structs that are
usually used via a typedef that hides being a pointer, are commonly
suffixed Data.  Change tupdesc code to follow those convention.

This is triggered by a future patch that intends to forward declare
TupleDescData in another header - keeping with the naming scheme makes
that easier to understand.

Author: Andres Freund
Discussion: https://postgr.es/m/20190114000701.y4ttcb74jpskkcfb@alap3.anarazel.de
2019-01-14 16:25:50 -08:00
e451dd5521 Remove too generically named MissingPtr typedef.
As there's only a single user of the typedef in the entire codebase,
just use the underlying struct directly.

Per complaint from Alvaro Herrera

Author: Andres Freund
Discussion: https://postgr.es/m/201901141836.oxtm4uzc63j3@alvherre.pgsql
2019-01-14 16:25:50 -08:00
4c850ecec6 Don't include heapam.h from others headers.
heapam.h previously was included in a number of widely used
headers (e.g. execnodes.h, indirectly in executor.h, ...). That's
problematic on its own, as heapam.h contains a lot of low-level
details that don't need to be exposed that widely, but becomes more
problematic with the upcoming introduction of pluggable table storage
- it seems inappropriate for heapam.h to be included that widely
afterwards.

heapam.h was largely only included in other headers to get the
HeapScanDesc typedef (which was defined in heapam.h, even though
HeapScanDescData is defined in relscan.h). The better solution here
seems to be to just use the underlying struct (forward declared where
necessary). Similar for BulkInsertState.

Another problem was that LockTupleMode was used in executor.h - parts
of the file tried to cope without heapam.h, but due to the fact that
it indirectly included it, several subsequent violations of that goal
were not not noticed. We could just reuse the approach of declaring
parameters as int, but it seems nicer to move LockTupleMode to
lockoptions.h - that's not a perfect location, but also doesn't seem
bad.

As a number of files relied on implicitly included heapam.h, a
significant number of files grew an explicit include. It's quite
probably that a few external projects will need to do the same.

Author: Andres Freund
Reviewed-By: Alvaro Herrera
Discussion: https://postgr.es/m/20190114000701.y4ttcb74jpskkcfb@alap3.anarazel.de
2019-01-14 16:24:41 -08:00
97c39498e5 Update copyright for 2019
Backpatch-through: certain files through 9.4
2019-01-02 12:44:25 -05:00
b450abd255 Remove entry tree root conflict checking from GIN predicate locking
According to README we acquire predicate locks on entry tree leafs and posting
tree roots.  However, when ginFindLeafPage() is going to lock leaf in exclusive
mode, then it checks root for conflicts regardless whether it's a entry or
posting tree.  Assuming that we never place predicate lock on entry tree root
(excluding corner case when root is leaf), this check is redundant.  This
commit removes this check.  Now, root conflict checking is controlled by
separate argument of ginFindLeafPage().

Discussion: https://postgr.es/m/CAPpHfdv7rrDyy%3DMgsaK-L9kk0AH7az0B-mdC3w3p0FSb9uoyEg%40mail.gmail.com
Author: Alexander Korotkov
Backpatch-through: 11
2018-12-27 04:24:20 +03:00
c952eae52a Check for conflicting queries during replay of gistvacuumpage()
013ebc0a7b implements so-called GiST microvacuum.  That is gistgettuple() marks
index tuples as dead when kill_prior_tuple is set.  Later, when new tuple
insertion claims page space, those dead index tuples are physically deleted
from page.  When this deletion is replayed on standby, it might conflict with
read-only queries.  But 013ebc0a7b doesn't handle this.  That may lead to
disappearance of some tuples from read-only snapshots on standby.

This commit implements resolving of conflicts between replay of GiST microvacuum
and standby queries.  On the master we implement new WAL record type
XLOG_GIST_DELETE, which comprises necessary information.  On stable releases
we've to be tricky to keep WAL compatibility.  Information required for conflict
processing is just appended to data of XLOG_GIST_PAGE_UPDATE record.  So,
PostgreSQL version, which doesn't know about conflict processing, will just
ignore that.

Reported-by: Andres Freund
Diagnosed-by: Andres Freund
Discussion: https://postgr.es/m/20181212224524.scafnlyjindmrbe6%40alap3.anarazel.de
Author: Alexander Korotkov
Backpatch-through: 9.6
2018-12-21 02:37:37 +03:00
09568ec3d3 Create a separate oid range for oids assigned by genbki.pl.
The changes I made in 578b229718e assigned oids below
FirstBootstrapObjectId to objects in include/catalog/*.dat files that
did not have an oid assigned, starting at the max oid explicitly
assigned.  Tom criticized that for mainly two reasons:
1) It's not clear which values are manually and which explicitly
   assigned.
2) The space below FirstBootstrapObjectId gets pretty crowded, and
   some PostgreSQL forks have used oids >= 9000 for their own objects,
   to avoid conflicting.

Thus create a new range for objects not assigned explicit oids, but
assigned by genbki.pl. For now 1-9999 is for explicitly assigned oids,
FirstGenbkiObjectId (10000) to FirstBootstrapObjectId (1200) -1 is for
genbki.pl assigned oids, and < FirstNormalObjectId (16384) is for oids
assigned during bootstrap.  It's possible that we'll have to adjust
these boundaries, but there's some headroom for now.

Add a note suggesting that oids in forks should be assigned in the
9000-9999 range.

Catversion bump for obvious reasons.

Per complaint from Tom Lane.

Author: Andres Freund
Discussion: https://postgr.es/m/16845.1544393682@sss.pgh.pa.us
2018-12-13 14:50:57 -08:00
52ac6cd2d0 Prevent GIN deleted pages from being reclaimed too early
When GIN vacuum deletes a posting tree page, it assumes that no concurrent
searchers can access it, thanks to ginStepRight() locking two pages at once.
However, since 9.4 searches can skip parts of posting trees descending from the
root.  That leads to the risk that page is deleted and reclaimed before
concurrent search can access it.

This commit prevents the risk of above by waiting for every transaction, which
might wait to reference this page, to finish.  Due to binary compatibility
we can't change GinPageOpaqueData to store corresponding transaction id.
Instead we reuse page header pd_prune_xid field, which is unused in index pages.

Discussion: https://postgr.es/m/31a702a.14dd.166c1366ac1.Coremail.chjischj%40163.com
Author: Andrey Borodin, Alexander Korotkov
Reviewed-by: Alexander Korotkov
Backpatch-through: 9.4
2018-12-13 06:55:34 +03:00
a243c55326 Cleanup comments in xlog compression
Skipping over the "hole" in full page images in the XLOG code was
described as being a form of compression, but this got a bit confusing
since we now have PGLZ-based compression happening, so adjust the
wording to discuss "removing" the "hole" and keeping the talk about
compression to where we're talking about using PGLZ-based compression of
the full page images.

Reviewed-By: Kyotaro Horiguchi
Discussion: https://postgr.es/m/20181127234341.GM3415@tamriel.snowman.net
2018-12-06 11:05:39 -05:00
2dedf4d9a8 Integrate recovery.conf into postgresql.conf
recovery.conf settings are now set in postgresql.conf (or other GUC
sources).  Currently, all the affected settings are PGC_POSTMASTER;
this could be refined in the future case by case.

Recovery is now initiated by a file recovery.signal.  Standby mode is
initiated by a file standby.signal.  The standby_mode setting is
gone.  If a recovery.conf file is found, an error is issued.

The trigger_file setting has been renamed to promote_trigger_file as
part of the move.

The documentation chapter "Recovery Configuration" has been integrated
into "Server Configuration".

pg_basebackup -R now appends settings to postgresql.auto.conf and
creates a standby.signal file.

Author: Fujii Masao <masao.fujii@gmail.com>
Author: Simon Riggs <simon@2ndquadrant.com>
Author: Abhijit Menon-Sen <ams@2ndquadrant.com>
Author: Sergei Kornilov <sk@zsrv.org>
Discussion: https://www.postgresql.org/message-id/flat/607741529606767@web3g.yandex.ru/
2018-11-25 16:33:40 +01:00
578b229718 Remove WITH OIDS support, change oid catalog column visibility.
Previously tables declared WITH OIDS, including a significant fraction
of the catalog tables, stored the oid column not as a normal column,
but as part of the tuple header.

This special column was not shown by default, which was somewhat odd,
as it's often (consider e.g. pg_class.oid) one of the more important
parts of a row.  Neither pg_dump nor COPY included the contents of the
oid column by default.

The fact that the oid column was not an ordinary column necessitated a
significant amount of special case code to support oid columns. That
already was painful for the existing, but upcoming work aiming to make
table storage pluggable, would have required expanding and duplicating
that "specialness" significantly.

WITH OIDS has been deprecated since 2005 (commit ff02d0a05280e0).
Remove it.

Removing includes:
- CREATE TABLE and ALTER TABLE syntax for declaring the table to be
  WITH OIDS has been removed (WITH (oids[ = true]) will error out)
- pg_dump does not support dumping tables declared WITH OIDS and will
  issue a warning when dumping one (and ignore the oid column).
- restoring an pg_dump archive with pg_restore will warn when
  restoring a table with oid contents (and ignore the oid column)
- COPY will refuse to load binary dump that includes oids.
- pg_upgrade will error out when encountering tables declared WITH
  OIDS, they have to be altered to remove the oid column first.
- Functionality to access the oid of the last inserted row (like
  plpgsql's RESULT_OID, spi's SPI_lastoid, ...) has been removed.

The syntax for declaring a table WITHOUT OIDS (or WITH (oids = false)
for CREATE TABLE) is still supported. While that requires a bit of
support code, it seems unnecessary to break applications / dumps that
do not use oids, and are explicit about not using them.

The biggest user of WITH OID columns was postgres' catalog. This
commit changes all 'magic' oid columns to be columns that are normally
declared and stored. To reduce unnecessary query breakage all the
newly added columns are still named 'oid', even if a table's column
naming scheme would indicate 'reloid' or such.  This obviously
requires adapting a lot code, mostly replacing oid access via
HeapTupleGetOid() with access to the underlying Form_pg_*->oid column.

The bootstrap process now assigns oids for all oid columns in
genbki.pl that do not have an explicit value (starting at the largest
oid previously used), only oids assigned later by oids will be above
FirstBootstrapObjectId. As the oid column now is a normal column the
special bootstrap syntax for oids has been removed.

Oids are not automatically assigned during insertion anymore, all
backend code explicitly assigns oids with GetNewOidWithIndex(). For
the rare case that insertions into the catalog via SQL are called for
the new pg_nextoid() function can be used (which only works on catalog
tables).

The fact that oid columns on system tables are now normal columns
means that they will be included in the set of columns expanded
by * (i.e. SELECT * FROM pg_class will now include the table's oid,
previously it did not). It'd not technically be hard to hide oid
column by default, but that'd mean confusing behavior would either
have to be carried forward forever, or it'd cause breakage down the
line.

While it's not unlikely that further adjustments are needed, the
scope/invasiveness of the patch makes it worthwhile to get merge this
now. It's painful to maintain externally, too complicated to commit
after the code code freeze, and a dependency of a number of other
patches.

Catversion bump, for obvious reasons.

Author: Andres Freund, with contributions by John Naylor
Discussion: https://postgr.es/m/20180930034810.ywp2c7awz7opzcfr@alap3.anarazel.de
2018-11-20 16:00:17 -08:00
0999ac4792 Improve description of buffer used to store records in WAL reader
The dedicated private buffer to store records is used only for these
crossing a page boundary since 285bd0ac, but its description did not
match completely the reality.

Reported-by: Andrey Lepikhov
Author: Michael Paquier
Discussion: https://postgr.es/m/49518b48-2036-5e43-1818-0f594e375e76@postgrespro.ru
2018-11-21 08:43:32 +09:00
4da597edf1 Make TupleTableSlots extensible, finish split of existing slot type.
This commit completes the work prepared in 1a0586de36, splitting the
old TupleTableSlot implementation (which could store buffer, heap,
minimal and virtual slots) into four different slot types.  As
described in the aforementioned commit, this is done with the goal of
making tuple table slots extensible, to allow for pluggable table
access methods.

To achieve runtime extensibility for TupleTableSlots, operations on
slots that can differ between types of slots are performed using the
TupleTableSlotOps struct provided at slot creation time.  That
includes information from the size of TupleTableSlot struct to be
allocated, initialization, deforming etc.  See the struct's definition
for more detailed information about callbacks TupleTableSlotOps.

I decided to rename TTSOpsBufferTuple to TTSOpsBufferHeapTuple and
ExecCopySlotTuple to ExecCopySlotHeapTuple, as that seems more
consistent with other naming introduced in recent patches.

There's plenty optimization potential in the slot implementation, but
according to benchmarking the state after this commit has similar
performance characteristics to before this set of changes, which seems
sufficient.

There's a few changes in execReplication.c that currently need to poke
through the slot abstraction, that'll be repaired once the pluggable
storage patchset provides the necessary infrastructure.

Author: Andres Freund and  Ashutosh Bapat, with changes by Amit Khandekar
Discussion: https://postgr.es/m/20181105210039.hh4vvi4vwoq5ba2q@alap3.anarazel.de
2018-11-16 16:35:15 -08:00
3ce1201894 Fix incorrect routine name in xlog_heapam.h
s/xl_heap_delete/xl_heap_truncate/ in a comment block referring to flags
for truncation.

Discussion: https://postgr.es/m/20180413034734.GE1552@paquier.xyz
2018-11-10 08:58:55 +09:00
003c68a3b4 Rename rbtree.c functions to use "rbt" prefix not "rb" prefix.
The "rb" prefix is used by Ruby, so that our existing code results
in name collisions that break plruby.  We discussed ways to prevent
that by adjusting dynamic linker options, but it seems that at best
we'd move the pain to other cases.  Renaming to avoid the collision
is the only portable fix anyway.  Fortunately, our rbtree code is
not (yet?) widely used --- in core, there's only a single usage
in GIN --- so it seems likely that we can get away with a rename.

I chose to do this basically as s/rb/rbt/g, except for places where
there already was a "t" after "rb".  The patch could have been made
smaller by only touching linker-visible symbols, but it would have
resulted in oddly inconsistent-looking code.  Better to make it look
like "rbt" was the plan all along.

Back-patch to v10.  The rbtree.c code exists back to 9.5, but
rb_iterate() which is the actual immediate source of pain was added
in v10, so it seems like changing the names before that would have
more risk than benefit.

Per report from Pavel Raiskup.

Discussion: https://postgr.es/m/4738198.8KVIIDhgEB@nb.usersys.redhat.com
2018-11-06 13:25:24 -05:00
10074651e3 Add pg_promote function
This function is able to promote a standby with this new SQL-callable
function.  Execution access can be granted to non-superusers so that
failover tools can observe the principle of least privilege.

Catalog version is bumped.

Author: Laurenz Albe
Reviewed-by: Michael Paquier, Masahiko Sawada
Discussion: https://postgr.es/m/6e7c79b3ec916cf49742fb8849ed17cd87aed620.camel@cybertec.at
2018-10-25 09:46:00 +09:00
9d906f1119 Move generic slot support functions from heaptuple.c into execTuples.c.
heaptuple.c was never a particular good fit for slot_getattr(),
slot_getsomeattrs() and slot_getmissingattrs(), but in upcoming
changes slots will be made more abstract (allowing slots that contain
different types of tuples), making it clearly the wrong place.

Note that slot_deform_tuple() remains in it's current place, as it
clearly deals with a HeapTuple.  getmissingattrs() also remains, but
it's less clear that that's correct - but execTuples.c wouldn't be the
right place.

Author: Ashutosh Bapat.
Discussion: https://postgr.es/m/20180220224318.gw4oe5jadhpmcdnm@alap3.anarazel.de
2018-10-15 15:17:04 -07:00
e9edc1ba0b Fix logical decoding error when system table w/ toast is repeatedly rewritten.
Repeatedly rewriting a mapped catalog table with VACUUM FULL or
CLUSTER could cause logical decoding to fail with:
ERROR, "could not map filenode \"%s\" to relation OID"

To trigger the problem the rewritten catalog had to have live tuples
with toasted columns.

The problem was triggered as during catalog table rewrites the
heap_insert() check that prevents logical decoding information to be
emitted for system catalogs, failed to treat the new heap's toast table
as a system catalog (because the new heap is not recognized as a
catalog table via RelationIsLogicallyLogged()). The relmapper, in
contrast to the normal catalog contents, does not contain historical
information. After a single rewrite of a mapped table the new relation
is known to the relmapper, but if the table is rewritten twice before
logical decoding occurs, the relfilenode cannot be mapped to a
relation anymore.  Which then leads us to error out.   This only
happens for toast tables, because the main table contents aren't
re-inserted with heap_insert().

The fix is simple, add a new heap_insert() flag that prevents logical
decoding information from being emitted, and accept during decoding
that there might not be tuple data for toast tables.

Unfortunately that does not fix pre-existing logical decoding
errors. Doing so would require not throwing an error when a filenode
cannot be mapped to a relation during decoding, and that seems too
likely to hide bugs.  If it's crucial to fix decoding for an existing
slot, temporarily changing the ERROR in ReorderBufferCommit() to a
WARNING appears to be the best fix.

Author: Andres Freund
Discussion: https://postgr.es/m/20180914021046.oi7dm4ra3ot2g2kt@alap3.anarazel.de
Backpatch: 9.4-, where logical decoding was introduced
2018-10-10 13:53:02 -07:00
07ee62ce9e Propagate xactStartTimestamp and stmtStartTimestamp to parallel workers.
Previously, a worker process would establish values for these based on
its own start time.  In v10 and up, this can trivially be shown to cause
misbehavior of transaction_timestamp(), timestamp_in(), and related
functions which are (perhaps unwisely?) marked parallel-safe.  It seems
likely that other behaviors might diverge from what happens in the parent
as well.

It's not as trivial to demonstrate problems in 9.6 or 9.5, but I'm sure
it's still possible, so back-patch to all branches containing parallel
worker infrastructure.

In HEAD only, mark now() and statement_timestamp() as parallel-safe
(other affected functions already were).  While in theory we could
still squeeze that change into v11, it doesn't seem important enough
to force a last-minute catversion bump.

Konstantin Knizhnik, whacked around a bit by me

Discussion: https://postgr.es/m/6406dbd2-5d37-4cb6-6eb2-9c44172c7e7c@postgrespro.ru
2018-10-06 12:00:09 -04:00
c87cb5f7a6 Allow btree comparison functions to return INT_MIN.
Historically we forbade datatype-specific comparison functions from
returning INT_MIN, so that it would be safe to invert the sort order
just by negating the comparison result.  However, this was never
really safe for comparison functions that directly return the result
of memcmp(), strcmp(), etc, as POSIX doesn't place any such restriction
on those library functions.  Buildfarm results show that at least on
recent Linux on s390x, memcmp() actually does return INT_MIN sometimes,
causing sort failures.

The agreed-on answer is to remove this restriction and fix relevant
call sites to not make such an assumption; code such as "res = -res"
should be replaced by "INVERT_COMPARE_RESULT(res)".  The same is needed
in a few places that just directly negated the result of memcmp or
strcmp.

To help find places having this problem, I've also added a compile option
to nbtcompare.c that causes some of the commonly used comparators to
return INT_MIN/INT_MAX instead of their usual -1/+1.  It'd likely be
a good idea to have at least one buildfarm member running with
"-DSTRESS_SORT_INT_MIN".  That's far from a complete test of course,
but it should help to prevent fresh introductions of such bugs.

This is a longstanding portability hazard, so back-patch to all supported
branches.

Discussion: https://postgr.es/m/20180928185215.ffoq2xrq5d3pafna@alap3.anarazel.de
2018-10-05 16:01:29 -04:00
cc2905e963 Use slots more widely in tuple mapping code and make naming more consistent.
It's inefficient to use a single slot for mapping between tuple
descriptors for multiple tuples, as previously done when using
ConvertPartitionTupleSlot(), as that means the slot's tuple descriptors
change for every tuple.

Previously we also, via ConvertPartitionTupleSlot(), built new tuples
after the mapping even in cases where we, immediately afterwards,
access individual columns again.

Refactor the code so one slot, on demand, is used for each
partition. That avoids having to change the descriptor (and allows to
use the more efficient "fixed" tuple slots). Then use slot->slot
mapping, to avoid unnecessarily forming a tuple.

As the naming between the tuple and slot mapping functions wasn't
consistent, rename them to execute_attr_map_{tuple,slot}.  It's likely
that we'll also rename convert_tuples_by_* to denote that these
functions "only" build a map, but that's left for later.

Author: Amit Khandekar and Amit Langote, editorialized by me
Reviewed-By: Amit Langote, Amit Khandekar, Andres Freund
Discussion:
    https://postgr.es/m/CAJ3gD9fR0wRNeAE8VqffNTyONS_UfFPRpqxhnD9Q42vZB+Jvpg@mail.gmail.com
    https://postgr.es/m/e4f9d743-cd4b-efb0-7574-da21d86a7f36%40lab.ntt.co.jp
Backpatch: -
2018-10-02 11:14:26 -07:00
2a6368343f Add support for nearest-neighbor (KNN) searches to SP-GiST
Currently, KNN searches were supported only by GiST.  SP-GiST also capable to
support them.  This commit implements that support.  SP-GiST scan stack is
replaced with queue, which serves as stack if no ordering is specified.  KNN
support is provided for three SP-GIST opclasses: quad_point_ops, kd_point_ops
and poly_ops (catversion is bumped).  Some common parts between GiST and SP-GiST
KNNs are extracted into separate functions.

Discussion: https://postgr.es/m/570825e8-47d0-4732-2bf6-88d67d2d51c8%40postgrespro.ru
Author: Nikita Glukhov, Alexander Korotkov based on GSoC work by Vlad Sterzhanov
Review: Andrey Borodin, Alexander Korotkov
2018-09-19 01:54:10 +03:00
ac27c74def Fix the overrun in hash index metapage for smaller block sizes.
The commit 620b49a1 changed the value of HASH_MAX_BITMAPS with the intent
to allow many non-unique values in hash indexes without worrying to reach
the limit of the number of overflow pages.  At that time, this didn't
occur to us that it can overrun the block for smaller block sizes.

Choose the value of HASH_MAX_BITMAPS based on BLCKSZ such that it gives
the same answer as now for the cases where the overrun doesn't occur, and
some other sufficiently-value for the cases where an overrun currently
does occur.  This allows us not to change the behavior in any case that
currently works, so there's really no reason for a HASH_VERSION bump.

Author: Dilip Kumar
Reviewed-by: Amit Kapila
Backpatch-through: 10
Discussion: https://postgr.es/m/CAA4eK1LtF4VmU4mx_+i72ff1MdNZ8XaJMGkt2HV8+uSWcn8t4A@mail.gmail.com
2018-09-06 09:27:19 +05:30
e75733d46c Comment fix for rewriteheap.h.
The description of the filename for mapping files did not match the
code.
2018-08-25 09:17:14 -07:00
b19495772e doc: Update uses of the word "procedure"
Historically, the term procedure was used as a synonym for function in
Postgres/PostgreSQL.  Now we have procedures as separate objects from
functions, so we need to clean up the documentation to not mix those
terms.

In particular, mentions of "trigger procedures" are changed to "trigger
functions", and access method "support procedures" are changed to
"support functions".  (The latter already used FUNCTION in the SQL
syntax anyway.)  Also, the terminology in the SPI chapter has been
cleaned up.

A few tests, examples, and code comments are also adjusted to be
consistent with documentation changes, but not everything.

Reported-by: Peter Geoghegan <pg@bowt.ie>
Reviewed-by: Jonathan S. Katz <jonathan.katz@excoventures.com>
2018-08-22 14:44:49 +02:00
a22445ff0b Flip argument order in XLogSegNoOffsetToRecPtr
Commit fc49e24fa69a added an input argument after the existing output
argument.  Flip those.

Author: Álvaro Herrera <alvherre@alvh.no-ip.org>
Reviewed-by: Andres Freund <andres@anarazel.de>
Discussion: https://postgr.es/m/20180708182345.imdgovmkffgtihhk@alvherre.pgsql
2018-07-09 14:33:38 -04:00
1e9c858090 pgindent run prior to branching 2018-06-30 12:25:49 -04:00
8121ab88e7 Cosmetic improvements for faster column addition.
Changed the name of few structure members for the sake of clarity and
removed spurious whitespace.

Reported-by: Amit Kapila
Author: Amit Kapila, based on suggestion by Andrew Dunstan
Reviewed-by: Alvaro Herrera
Discussion: https://postgr.es/m/CAA4eK1K2znsFpC+NQ9A4vxT4uDxADN4RmvHX0L6Y=aHVo9gB4Q@mail.gmail.com
2018-06-27 08:16:13 +05:30
08186dc05b Move _bt_upgrademetapage() into critical section.
Any changes on page should be done in critical section, so move
_bt_upgrademetapage into critical section. Improve comment. Found by Amit
Kapila during post-commit review of 857f9c36.

Author: Amit Kapila
2018-05-30 19:45:39 +03:00
0668719801 Fix scenario where streaming standby gets stuck at a continuation record.
If a continuation record is split so that its first half has already been
removed from the master, and is only present in pg_wal, and there is a
recycled WAL segment in the standby server that looks like it would
contain the second half, recovery would get stuck. The code in
XLogPageRead() incorrectly started streaming at the beginning of the
WAL record, even if we had already read the first page.

Backpatch to 9.4. In principle, older versions have the same problem, but
without replication slots, there was no straightforward mechanism to
prevent the master from recycling old WAL that was still needed by standby.
Without such a mechanism, I think it's reasonable to assume that there's
enough slack in how many old segments are kept around to not run into this,
or you have a WAL archive.

Reported by Jonathon Nelson. Analysis and patch by Kyotaro HORIGUCHI, with
some extra comments by me.

Discussion: https://www.postgresql.org/message-id/CACJqAM3xVz0JY1XFDKPP%2BJoJAjoGx%3DGNuOAshEDWCext7BFvCQ%40mail.gmail.com
2018-05-05 01:34:53 +03:00
0bef1c0678 Re-think predicate locking on GIN indexes.
The principle behind the locking was not very well thought-out, and not
documented. Add a section in the README to explain how it's supposed to
work, and change the code so that it actually works that way.

This fixes two bugs:

1. If fast update was turned on concurrently, subsequent inserts to the
   pending list would not conflict with predicate locks that were acquired
   earlier, on entry pages. The included 'predicate-gin-fastupdate' test
   demonstrates that. To fix, make all scans acquire a predicate lock on
   the metapage. That lock represents a scan of the pending list, whether
   or not there is a pending list at the moment. Forget about the
   optimization to skip locking/checking for locks, when fastupdate=off.
2. If a scan finds no match, it still needs to lock the entry page. The
   point of predicate locks is to lock the gabs between values, whether
   or not there is a match. The included 'predicate-gin-nomatch' test
   tests that case.

In addition to those two bug fixes, this removes some unnecessary locking,
following the principle laid out in the README. Because all items in
a posting tree have the same key value, a lock on the posting tree root is
enough to cover all the items. (With a very large posting tree, it would
possibly be better to lock the posting tree leaf pages instead, so that a
"skip scan" with a query like "A & B", you could avoid unnecessary conflict
if a new tuple is inserted with A but !B. But let's keep this simple.)

Also, some spelling  fixes.

Author: Heikki Linnakangas with some editorization by me
Review: Andrey Borodin, Alexander Korotkov
Discussion: https://www.postgresql.org/message-id/0b3ad2c2-2692-62a9-3a04-5724f2af9114@iki.fi
2018-05-04 11:27:50 +03:00
1667148a4d Improve representation of 'moved partitions' indicator on deleted tuples.
Previously a tuple that has been moved to a different partition (see
f16241bef7c), set the block number on the old tuple to an invalid
value to indicate that fact. But the tuple offset was left
untouched. That turned out to trigger a wal_consistency_checking
failure as reported by Peter Geoghegan, as the offset wasn't
always overwritten during WAL replay.

Heikki observed that we're wasting valuable data by not putting
information also in the offset. Thus set that to
MovedPartitionsOffsetNumber when a tuple indicates it has moved.

We continue to set the block number to MovedPartitionsBlockNumber, as
that seems more likely to cause problems for code not updated to know
about moved tuples.

As t_ctid's offset number is now always set, this refinement also
fixes the wal_consistency_checking issue.

This technically is a minor disk format break, with previously created
moved tuples not being recognized anymore. But since there not even
has been a beta release since f16241bef7c...

Reported-By: Peter Geoghegan
Author: Heikki Linnakangas, Amul Sul
Discussion: https://postgr.es/m/CAH2-Wzm9ty+1BX7-GMNJ=xPRg67oJTVeDNdA9LSyJJtMgRiCMA@mail.gmail.com
2018-05-01 13:30:12 -07:00
bdf46af748 Post-feature-freeze pgindent run.
Discussion: https://postgr.es/m/15719.1523984266@sss.pgh.pa.us
2018-04-26 14:47:16 -04:00
6db4b49986 Fix wrong validation of top-parent pointer during page deletion in Btree.
After introducing usage of t_tid of inner or page high key for storing
number of attributes of tuple, validation of tuple's ItemPointer with
ItemPointerIsValid becomes incorrect, it's need to validate only blocknumber of
ItemPointer. Missing this causes a incorrect page deletion, fix that. Test is
added.

BTW, current contrib/amcheck doesn't fail on index corrupted by this way.

Also introduce BTreeTupleGetTopParent/BTreeTupleSetTopParent macroses to improve
code readability and to avoid possible confusion with page high key: high key
is used to store top-parent link for branch to remove.

Bug found by Michael Paquier, but bug doesn't exist in previous versions because
t_tid was set to P_HIKEY.

Author: Teodor Sigaev
Reviewer: Peter Geoghegan
Discussion: https://www.postgresql.org/message-id/flat/20180419052436.GA16000%40paquier.xyz
2018-04-23 15:55:10 +03:00
ff4943042f Fix datatype for number of heap tuples during last cleanup
It appears that new fields introduced in 857f9c36 have inconsistent datatypes:
BTMetaPageData.btm_last_cleanup_num_heap_tuples is of float4 type,
while xl_btree_metadata.last_cleanup_num_heap_tuples is of double type.
IndexVacuumInfo.num_heap_tuples, which is a source of values for
both former fields is of double type.  So, make both those fields in
BTMetaPageData and xl_btree_metadata use float8 type in order to match the
precision of the source.  That shouldn't be double type, because we always
use types with explicit width in WAL.

Patch introduces incompatibility of on-disk format since 857f9c36 commit, but
that versions never was released, so just bump catalog version to avoid
possible confusion.

Author: Alexander Korortkov
2018-04-19 11:28:03 +03:00
075aade436 Adjust INCLUDE index truncation comments and code.
Add several assertions that ensure that we're dealing with a pivot tuple
without non-key attributes where that's expected.  Also, remove the
assertion within _bt_isequal(), restoring the v10 function signature.  A
similar check will be performed for the page highkey within
_bt_moveright() in most cases.  Also avoid dropping all objects within
regression tests, to increase pg_dump test coverage for INCLUDE indexes.

Rather than using infrastructure that's generally intended to be used
with reference counted heap tuple descriptors during truncation, use the
same function that was introduced to store flat TupleDescs in shared
memory (we use a temp palloc'd buffer).  This isn't strictly necessary,
but seems more future-proof than the old approach.  It also lets us
avoid including rel.h within indextuple.c, which was arguably a
modularity violation.  Also, we now call index_deform_tuple() with the
truncated TupleDesc, not the source TupleDesc, since that's more robust,
and saves a few cycles.

In passing, fix a memory leak by pfree'ing truncated pivot tuple memory
during CREATE INDEX.  Also pfree during a page split, just to be
consistent.

Refactor _bt_check_natts() to be more readable.

Author: Peter Geoghegan with some editorization by me
Reviewed by: Alexander Korotkov, Teodor Sigaev
Discussion: https://www.postgresql.org/message-id/CAH2-Wz%3DkCWuXeMrBCopC-tFs3FbiVxQNjjgNKdG2sHxZ5k2y3w%40mail.gmail.com
2018-04-19 08:45:58 +03:00
cf5a189059 Fix confusion on the padding of GIDs in on commit and abort records.
Review of commit 1eb6d652: It's pointless to add padding to the GID fields,
when the code that follows assumes that there is no alignment, and uses
memcpy(). Remove the pointless padding.

Update comments to note the new fields in the WAL records.

Reviewed-by: Michael Paquier
Discussion: https://www.postgresql.org/message-id/33b787bf-dc20-1161-54e9-3f3b607bf59d%40iki.fi
2018-04-17 16:10:42 -04:00
2a67d6440d Add commentary explaining why MaxIndexTuplesPerPage calculation is safe.
MaxIndexTuplesPerPage ignores the fact that btree indexes sometimes
store tuples with no data payload.  But it also ignores the possibility
of "special space" on index pages, which offsets that, so that the
result isn't an underestimate.  This all seems worth documenting, though.

In passing, remove #define MinIndexTupleSize, which was added by
commit 2c03216d8 but not used in that commit nor later ones.

Comment text by me; issue noticed by Peter Geoghegan.

Discussion: https://postgr.es/m/CAH2-WzkQmb54Kbx-YHXstRKXcNc+_87jwV3DRb54xcybLR7Oig@mail.gmail.com
2018-04-14 12:33:15 -04:00
08ea7a2291 Revert MERGE patch
This reverts commits d204ef63776b8a00ca220adec23979091564e465,
83454e3c2b28141c0db01c7d2027e01040df5249 and a few more commits thereafter
(complete list at the end) related to MERGE feature.

While the feature was fully functional, with sufficient test coverage and
necessary documentation, it was felt that some parts of the executor and
parse-analyzer can use a different design and it wasn't possible to do that in
the available time. So it was decided to revert the patch for PG11 and retry
again in the future.

Thanks again to all reviewers and bug reporters.

List of commits reverted, in reverse chronological order:

 f1464c5380 Improve parse representation for MERGE
 ddb4158579 MERGE syntax diagram correction
 530e69e59b Allow cpluspluscheck to pass by renaming variable
 01b88b4df5 MERGE minor errata
 3af7b2b0d4 MERGE fix variable warning in non-assert builds
 a5d86181ec MERGE INSERT allows only one VALUES clause
 4b2d44031f MERGE post-commit review
 4923550c20 Tab completion for MERGE
 aa3faa3c7a WITH support in MERGE
 83454e3c2b New files for MERGE
 d204ef6377 MERGE SQL Command following SQL:2016

Author: Pavan Deolasee
Reviewed-by: Michael Paquier
2018-04-12 11:22:56 +01:00
a228cc13ae Revert "Allow on-line enabling and disabling of data checksums"
This reverts the backend sides of commit 1fde38beaa0c3e66c340efc7cc0dc272d6254bb0.
I have, at least for now, left the pg_verify_checksums tool in place, as
this tool can be very valuable without the rest of the patch as well,
and since it's a read-only tool that only runs when the cluster is down
it should be a lot safer.
2018-04-09 19:03:42 +02:00