This is following in a long train of similar changes and for the same
reasons - see b319356f0e94a6482c726cf4af96597c211d8d6e and
fe702a7b3f9f2bc5bf6d173166d7d55226af82c8 inter alia.
Author: Amit Kapila
Reviewed-by: Alexander Korotkov, Robert Haas
Previously, each PGPROC's backendLock was part of the main tranche,
and the PGPROC just contained a pointer. Now, the actual LWLock is
part of the PGPROC.
As with previous, similar patches, this makes it significantly easier
to identify these lwlocks in LWLOCK_STATS or Trace_lwlocks output
and improves modularity.
Author: Ildus Kurbangaliev
Reviewed-by: Amit Kapila, Robert Haas
Move the content lock directly into the BufferDesc, so that locking and
pinning a buffer touches only one cache line rather than two. Adjust
the definition of BufferDesc slightly so that this doesn't make the
BufferDesc any larger than one cache line (at least on platforms where
a spinlock is only 1 or 2 bytes).
We can't fit the I/O locks into the BufferDesc and stay within one
cache line, so move those to a completely separate tranche. This
leaves a relatively limited number of LWLocks in the main tranche, so
increase the padding of those remaining locks to a full cache line,
rather than allowing adjacent locks to share a cache line, hopefully
reducing false sharing.
Performance testing shows that these changes make little difference
on laptop-class machines, but help significantly on larger servers,
especially those with more than 2 sockets.
Andres Freund, originally based on an earlier patch by Simon Riggs.
Review and cosmetic adjustments (including heavy rewriting of the
comments) by me.
It's a bit cumbersome to use LWLockNewTrancheId(), because the returned
value needs to be shared between backends so that each backend can call
LWLockRegisterTranche() with the correct ID. So, for built-in tranches,
use a hard-coded value instead.
This is motivated by an upcoming patch adding further built-in tranches.
Andres Freund and Robert Haas
At the end of crash recovery, unlogged relations are reset to the empty
state, using their init fork as the template. The init fork is copied to
the main fork without going through shared buffers. Unfortunately WAL
replay so far has not necessarily flushed writes from shared buffers to
disk at that point. In normal crash recovery, and before the
introduction of 'fast promotions' in fd4ced523 / 9.3, the
END_OF_RECOVERY checkpoint flushes the buffers out in time. But with
fast promotions that's not the case anymore.
To fix, force WAL writes targeting the init fork to be flushed
immediately (using the new FlushOneBuffer() function). In 9.5+ that
flush can centrally be triggered from the code dealing with restoring
full page writes (XLogReadBufferForRedoExtended), in earlier releases
that responsibility is in the hands of XLOG_HEAP_NEWPAGE's replay
function.
Backpatch to 9.1, even if this currently is only known to trigger in
9.3+. Flushing earlier is more robust, and it is advantageous to keep
the branches similar.
Typical symptoms of this bug are errors like
'ERROR: index "..." contains unexpected zero page at block 0'
shortly after promoting a node.
Reported-By: Thom Brown
Author: Andres Freund and Michael Paquier
Discussion: 20150326175024.GJ451@alap3.anarazel.de
Backpatch: 9.1-
Prior to commit 0709b7ee72e4bc71ad07b7120acd117265ab51d0, access to
variables within a spinlock-protected critical section had to be done
through a volatile pointer, but that should no longer be necessary.
Review by Andres Freund
Postmaster child processes that aren't supposed to be attached to shared
memory were not bothering to close the shared memory mapping handle they
inherit from the postmaster process. That's mostly harmless, since the
handle vanishes anyway when the child process exits -- but the syslogger
process, if used, doesn't get killed and restarted during recovery from a
backend crash. That meant that Windows doesn't see the shared memory
mapping as becoming free, so it doesn't delete it and the postmaster is
unable to create a new one, resulting in failure to recover from crashes
whenever logging_collector is turned on.
Per report from Dmitry Vasilyev. It's a bit astonishing that we'd not
figured this out long ago, since it's been broken from the very beginnings
of out native Windows support; probably some previously-unexplained trouble
reports trace to this.
A secondary problem is that on Cygwin (perhaps only in older versions?),
exec() may not detach from the shared memory segment after all, in which
case these child processes did remain attached to shared memory, posing
the risk of an unexpected shared memory clobber if they went off the rails
somehow. That may be a long-gone bug, but we can deal with it now if it's
still live, by detaching within the infrastructure introduced here to deal
with closing the handle.
Back-patch to all supported branches.
Tom Lane and Amit Kapila
This flag has proven to be a recipe for bugs, and it doesn't seem like
it can really buy anything in terms of performance. So let's just
*always* set the process latch when we receive SIGUSR1 instead of
trying to do it only when needed.
Per my recent proposal on pgsql-hackers.
The shm_mq mechanism was built to send error (and notice) messages and
tuples between backends. However, shm_mq itself only deals in raw
bytes. Since commit 2bd9e412f92bc6a68f3e8bcb18e04955cc35001d, we have
had infrastructure for one message to redirect protocol messages to a
queue and for another backend to parse them and do useful things with
them. This commit introduces a somewhat analogous facility for tuples
by adding a new type of DestReceiver, DestTupleQueue, which writes
each tuple generated by a query into a shm_mq, and a new
TupleQueueFunnel facility which reads raw tuples out of the queue and
reconstructs the HeapTuple format expected by the executor.
The TupleQueueFunnel abstraction supports reading from multiple tuple
streams at the same time, but only in round-robin fashion. Someone
could imaginably want other policies, but this should be good enough
to meet our short-term needs related to parallel query, and we can
always extend it later.
This also makes one minor addition to the shm_mq API that didn'
seem worth breaking out as a separate patch.
Extracted from Amit Kapila's parallel sequential scan patch. This
code was originally written by me, and then it was revised by Amit,
and then it was revised some more by me.
Naming the individual lwlocks seems like something that may be useful
for other types of debugging, monitoring, or instrumentation output,
but this commit just implements it for the specific case of
trace_lwlocks.
Patch by me, reviewed by Amit Kapila and Kyotaro Horiguchi
Per discussion, nowadays it is possible to have tablespaces that have
wildly different I/O characteristics from others. Setting different
effective_io_concurrency parameters for those has been measured to
improve performance.
Author: Julien Rouhaud
Reviewed by: Andres Freund
Post-commit review by Andres Freund discovered a couple of concurrency
bugs in the original patch: specifically, if the leader cleared a
follower's XID before it reached PGSemaphoreLock, the semaphore would be
left in the wrong state; and if another process did PGSemaphoreUnlock
for some unrelated reason, we might resume execution before the fact
that our XID was cleared was globally visible.
Also, improve the wording of some comments, rename nextClearXidElem
to firstClearXidElem in PROC_HDR for clarity, and drop some volatile
qualifiers that aren't necessary.
Amit Kapila, reviewed and slightly revised by me.
Some googling turned up multiple sources saying that older versions of icc
do not accept gcc-compatible asm blocks on IA64, though asm does work on
x86[_64]. This is apparently fixed as of icc version 12.0 or so, but that
doesn't help us much; if we have to carry the extra implementation anyway,
we may as well just use it for icc rather than add a compiler version test.
Hence, revert commit 2c713d6ea29c91cd2cbd92fa801a61e55ea2a3c4 (though I
separated the icc code from the gcc code completely, producing what seems
cleaner code). Document the state of affairs more explicitly, both in
s_lock.h and postgres.c, and make some cosmetic adjustments around the
IA64 code in s_lock.h.
On recent AIX it's necessary to configure gcc to use the native assembler
(because the GNU assembler hasn't been updated to handle AIX 6+). This
caused PG builds to fail with assembler syntax errors, because we'd try
to compile s_lock.h's gcc asm fragment for PPC, and that assembly code
relied on GNU-style local labels. We can't substitute normal labels
because it would fail in any file containing more than one inlined use of
tas(). Fortunately, that code is stable enough, and the PPC ISA is simple
enough, that it doesn't seem like too much of a maintenance burden to just
hand-code the branch offsets, removing the need for any labels.
Note that the AIX assembler only accepts "$" for the location counter
pseudo-symbol. The usual GNU convention is "."; but it appears that all
versions of gas for PPC also accept "$", so in theory this patch will not
break any other PPC platforms.
This has been reported by a few people, but Steve Underwood gets the credit
for being the first to pursue the problem far enough to understand why it
was failing. Thanks also to Noah Misch for additional testing.
Some frontend code like e.g. pg_xlogdump or pg_resetxlog, has to use
backend headers. Unfortunately until now that code includes most of the
locking code. It's generally not nice to expose such low level details,
but de6fd1c898 made that a hard problem. We fall back to defining
'inline' away if the compiler doesn't support it - that can cause linker
errors like on buildfarm animal pademelon if a inline function
references backend only code.
To fix that problem separate definitions from lock.h that are required
from frontend code into lockdefs.h and use it in the relevant
places. I've only removed the minimal amount of necessary definitions
for now - it might turn out that we want more for other reasons.
To avoid such details being exposed again put some checks against being
included from frontend code into atomics.h, lock.h, lwlock.h and
s_lock.h. It's otherwise fairly easy to indirectly include these
headers.
Discussion: 20150806070902.GE12214@awork2.anarazel.de
When a write transaction commits, it must clear its XID advertised via
the ProcArray, which requires that we hold ProcArrayLock in exclusive
mode in order to prevent concurrent processes running GetSnapshotData
from seeing inconsistent results. When many processes try to commit
at once, ProcArrayLock must change hands repeatedly, with each
concurrent process trying to commit waking up to acquire the lock in
turn. To make things more efficient, when more than one backend is
trying to commit a write transaction at the same time, have just one
of them acquire ProcArrayLock in exclusive mode and clear the XIDs of
all processes in the group. Benchmarking reveals that this is much
more efficient at very high client counts.
Amit Kapila, heavily revised by me, with some review also from Pavan
Deolasee.
The lwlock scalability work introduced two race conditions into the
lwlock variable support provided for xlog.c. First, and harmlessly on
most platforms, it set/read the variable without the spinlock in some
places. Secondly, due to the removal of the spinlock, it was possible
that a backend missed changes to the variable's state if it changed in
the wrong moment because checking the lock's state, the variable's state
and the queuing are not protected by a single spinlock acquisition
anymore.
To fix first move resetting the variable's from LWLockAcquireWithVar to
WALInsertLockRelease, via a new function LWLockReleaseClearVar. That
prevents issues around waiting for a variable's value to change when a
new locker has acquired the lock, but not yet set the value. Secondly
re-check that the variable hasn't changed after enqueing, that prevents
the issue that the lock has been released and already re-acquired by the
time the woken up backend checks for the lock's state.
Reported-By: Jeff Janes
Analyzed-By: Heikki Linnakangas
Reviewed-By: Heikki Linnakangas
Discussion: 5592DB35.2060401@iki.fi
Backpatch: 9.5, where the lwlock scalability went in
This code was originally written as part of parallel query effort, but
it seems to have independent value, because if we make one decision
about where to get a PGPROC when we allocate and then put it back on a
different list at backend-exit time, bad things happen. This isn't
just a theoretical risk; we fixed an actual problem of this type in
commit e280c630a87e1b8325770c6073097d109d79a00f.
_Asm_sched_fence() is just a compiler barrier, not a memory barrier. But
spinlock release on IA64 needs, at the very least, release
semantics. Use a full barrier instead.
This might be the cause for the occasional failures on buildfarm member
anole.
Discussion: 20150629101108.GB17640@alap3.anarazel.de
Commit 2ce439f3379aed857517c8ce207485655000fc8e introduced a rather serious
regression, namely that if its scan of the data directory came across any
un-fsync-able files, it would fail and thereby prevent database startup.
Worse yet, symlinks to such files also caused the problem, which meant that
crash restart was guaranteed to fail on certain common installations such
as older Debian.
After discussion, we agreed that (1) failure to start is worse than any
consequence of not fsync'ing is likely to be, therefore treat all errors
in this code as nonfatal; (2) we should not chase symlinks other than
those that are expected to exist, namely pg_xlog/ and tablespace links
under pg_tblspc/. The latter restriction avoids possibly fsync'ing a
much larger part of the filesystem than intended, if the user has left
random symlinks hanging about in the data directory.
This commit takes care of that and also does some code beautification,
mainly moving the relevant code into fd.c, which seems a much better place
for it than xlog.c, and making sure that the conditional compilation for
the pre_sync_fname pass has something to do with whether pg_flush_data
works.
I also relocated the call site in xlog.c down a few lines; it seems a
bit silly to be doing this before ValidateXLOGDirectoryStructure().
The similar logic in initdb.c ought to be made to match this, but that
change is noncritical and will be dealt with separately.
Back-patch to all active branches, like the prior commit.
Abhijit Menon-Sen and Tom Lane
We want this struct to be exactly a series of 3 int16 words, no more
and no less. Historically, at least, some ARM compilers preferred to
pad it to 8 bytes unless coerced. Our old way of doing that was just
to use __attribute__((packed)), but as pointed out by Piotr Stefaniak,
that does too much: it also licenses the compiler to give the struct
only byte-alignment. We don't want that because it adds access overhead,
possibly quite significant overhead. According to the GCC manual, what
we want requires also specifying __attribute__((align(2))). It's not
entirely clear if all the relevant compilers accept this pragma as well,
but we can hope the buildfarm will tell us if not. We can also add a
static assertion that should fire if the compiler padded the struct.
Since the combination of these pragmas should define exactly what we
want on any compiler that accepts them, let's try using them wherever
we think they exist, not only for __arm__. (This is likely to expose
that the conditional definitions in c.h are inadequate, but finding
that out would be a good thing.)
The immediate motivation for this is that the current definition of
ExecRowMark allows its curCtid field to be misaligned. It is not clear
whether there are any other uses of ItemPointerData with a similar hazard.
We could change the definition of ExecRowMark if this doesn't work, but
it would be far better to have a future-proof fix.
Piotr Stefaniak, some further hacking by me
The newly added ON CONFLICT clause allows to specify an alternative to
raising a unique or exclusion constraint violation error when inserting.
ON CONFLICT refers to constraints that can either be specified using a
inference clause (by specifying the columns of a unique constraint) or
by naming a unique or exclusion constraint. DO NOTHING avoids the
constraint violation, without touching the pre-existing row. DO UPDATE
SET ... [WHERE ...] updates the pre-existing tuple, and has access to
both the tuple proposed for insertion and the existing tuple; the
optional WHERE clause can be used to prevent an update from being
executed. The UPDATE SET and WHERE clauses have access to the tuple
proposed for insertion using the "magic" EXCLUDED alias, and to the
pre-existing tuple using the table name or its alias.
This feature is often referred to as upsert.
This is implemented using a new infrastructure called "speculative
insertion". It is an optimistic variant of regular insertion that first
does a pre-check for existing tuples and then attempts an insert. If a
violating tuple was inserted concurrently, the speculatively inserted
tuple is deleted and a new attempt is made. If the pre-check finds a
matching tuple the alternative DO NOTHING or DO UPDATE action is taken.
If the insertion succeeds without detecting a conflict, the tuple is
deemed inserted.
To handle the possible ambiguity between the excluded alias and a table
named excluded, and for convenience with long relation names, INSERT
INTO now can alias its target table.
Bumps catversion as stored rules change.
Author: Peter Geoghegan, with significant contributions from Heikki
Linnakangas and Andres Freund. Testing infrastructure by Jeff Janes.
Reviewed-By: Heikki Linnakangas, Andres Freund, Robert Haas, Simon Riggs,
Dean Rasheed, Stephen Frost and many others.
Otherwise, if there's another crash, some writes from after the first
crash might make it to disk while writes from before the crash fail
to make it to disk. This could lead to data corruption.
Back-patch to all supported versions.
Abhijit Menon-Sen, reviewed by Andres Freund and slightly revised
by me.
This does four basic things. First, it provides convenience routines
to coordinate the startup and shutdown of parallel workers. Second,
it synchronizes various pieces of state (e.g. GUCs, combo CID
mappings, transaction snapshot) from the parallel group leader to the
worker processes. Third, it prohibits various operations that would
result in unsafe changes to that state while parallelism is active.
Finally, it propagates events that would result in an ErrorResponse,
NoticeResponse, or NotifyResponse message being sent to the client
from the parallel workers back to the master, from which they can then
be sent on to the client.
Robert Haas, Amit Kapila, Noah Misch, Rushabh Lathia, Jeevan Chalke.
Suggestions and review from Andres Freund, Heikki Linnakangas, Noah
Misch, Simon Riggs, Euler Taveira, and Jim Nasby.
When implementing a replication solution ontop of logical decoding, two
related problems exist:
* How to safely keep track of replication progress
* How to change replication behavior, based on the origin of a row;
e.g. to avoid loops in bi-directional replication setups
The solution to these problems, as implemented here, consist out of
three parts:
1) 'replication origins', which identify nodes in a replication setup.
2) 'replication progress tracking', which remembers, for each
replication origin, how far replay has progressed in a efficient and
crash safe manner.
3) The ability to filter out changes performed on the behest of a
replication origin during logical decoding; this allows complex
replication topologies. E.g. by filtering all replayed changes out.
Most of this could also be implemented in "userspace", e.g. by inserting
additional rows contain origin information, but that ends up being much
less efficient and more complicated. We don't want to require various
replication solutions to reimplement logic for this independently. The
infrastructure is intended to be generic enough to be reusable.
This infrastructure also replaces the 'nodeid' infrastructure of commit
timestamps. It is intended to provide all the former capabilities,
except that there's only 2^16 different origins; but now they integrate
with logical decoding. Additionally more functionality is accessible via
SQL. Since the commit timestamp infrastructure has also been introduced
in 9.5 (commit 73c986add) changing the API is not a problem.
For now the number of origins for which the replication progress can be
tracked simultaneously is determined by the max_replication_slots
GUC. That GUC is not a perfect match to configure this, but there
doesn't seem to be sufficient reason to introduce a separate new one.
Bumps both catversion and wal page magic.
Author: Andres Freund, with contributions from Petr Jelinek and Craig Ringer
Reviewed-By: Heikki Linnakangas, Petr Jelinek, Robert Haas, Steve Singer
Discussion: 20150216002155.GI15326@awork2.anarazel.de,
20140923182422.GA15776@alap3.anarazel.de,
20131114172632.GE7522@alap2.anarazel.de
In 83ff1618 we defined integer limits iff they're not provided by the
system. That turns out not to be the greatest idea because there's
different ways some datatypes can be represented. E.g. on OSX PG's 64bit
datatype will be a 'long int', but OSX unconditionally uses 'long
long'. That disparity then can lead to warnings, e.g. around printf
formats.
One way to fix that would be to back int64 using stdint.h's
int64_t. While a good idea it's not that easy to implement. We would
e.g. need to include stdint.h in our external headers, which we don't
today. Also computing the correct int64 printf formats in that case is
nontrivial.
Instead simply prefix the integer limits with PG_ and define them
unconditionally. I've adjusted all the references to them in code, but
not the ones in comments; the latter seems unnecessary to me.
Discussion: 20150331141423.GK4878@alap3.anarazel.de
This improves on commit bbfd7edae5aa5ad5553d3c7e102f2e450d4380d4 by
making two simple changes:
* pg_attribute_noreturn now takes parentheses, ie pg_attribute_noreturn().
Likewise pg_attribute_unused(), pg_attribute_packed(). This reduces
pgindent's tendency to misformat declarations involving them.
* attributes are now always attached to function declarations, not
definitions. Previously some places were taking creative shortcuts,
which were not merely candidates for bad misformatting by pgindent
but often were outright wrong anyway. (It does little good to put a
noreturn annotation where callers can't see it.) In any case, if
we would like to believe that these macros can be used with non-gcc
compilers, we should avoid gratuitous variance in usage patterns.
I also went through and manually improved the formatting of a lot of
declarations, and got rid of excessively repetitive (and now obsolete
anyway) comments informing the reader what pg_attribute_printf is for.
Several submitted and even committed patches have run into the problem
that C89, our baseline, does not provide minimum/maximum values for
various integer datatypes. C99's stdint.h does, but we can't rely on
it.
Several parts of the code defined limits locally, so instead centralize
the definitions to c.h.
This patch also changes the more obvious usages of literal limit values;
there's more places that could be changed, but it's less clear whether
it's beneficial to change those.
Author: Andrew Gierth
Discussion: 87619tc5wc.fsf@news-spur.riddles.org.uk
Right now, there's only one flag, DSM_CREATE_NULL_IF_MAXSEGMENTS,
which suppresses the error that would normally be thrown when the
maximum number of segments already exists, instead returning NULL.
It might be useful to add more flags in the future, such as one to
ignore allocation errors, but I haven't done that here.
Until now __attribute__() was defined to be empty for all compilers but
gcc. That's problematic because it prevents using it in other compilers;
which is necessary e.g. for atomics portability. It's also just
generally dubious to do so in a header as widely included as c.h.
Instead add pg_attribute_format_arg, pg_attribute_printf,
pg_attribute_noreturn macros which are implemented in the compilers that
understand them. Also add pg_attribute_noreturn and pg_attribute_packed,
but don't provide fallbacks, since they can affect functionality.
This means that external code that, possibly unwittingly, relied on
__attribute__ defined to be empty on !gcc compilers may now run into
warnings or errors on those compilers. But there shouldn't be many
occurances of that and it's hard to work around...
Discussion: 54B58BA3.8040302@ohmu.fi
Author: Oskari Saarenmaa, with some minor changes by me.
Replace some bogus "x[1]" declarations with "x[FLEXIBLE_ARRAY_MEMBER]".
Aside from being more self-documenting, this should help prevent bogus
warnings from static code analyzers and perhaps compiler misoptimizations.
This patch is just a down payment on eliminating the whole problem, but
it gets rid of a lot of easy-to-fix cases.
Note that the main problem with doing this is that one must no longer rely
on computing sizeof(the containing struct), since the result would be
compiler-dependent. Instead use offsetof(struct, lastfield). Autoconf
also warns against spelling that offsetof(struct, lastfield[0]).
Michael Paquier, review and additional fixes by me.
The remaining caller (lwlocks) doesn't need that facility, and we plan
to remove ImmedidateInterruptOK entirely. That means that interrupts
can't be serviced race-free and portably anyway, so there's little
reason for keeping the feature.
Reviewed-By: Heikki Linnakangas
Deadlock checking was performed inside signal handlers up to
now. While it's a remarkable feat to have made this work reliably,
it's quite complex to understand why that is the case. Partially it
worked due to the assumption that semaphores are signal safe - which
is not actually documented to be the case for sysv semaphores.
The reason we had to rely on performing this work inside signal
handlers is that semaphores aren't guaranteed to be interruptable by
signals on all platforms. But now that latches provide a somewhat
similar API, which actually has the guarantee of being interruptible,
we can avoid doing so.
Signalling between ProcSleep, ProcWakeup, ProcWaitForSignal and
ProcSendSignal is now done using latches. This increases the
likelihood of spurious wakeups. As spurious wakeup already were
possible and aren't likely to be frequent enough to be an actual
problem, this seems acceptable.
This change would allow for further simplification of the deadlock
checking, now that it doesn't have to run in a signal handler. But
even if I were motivated to do so right now, it would still be better
to do that separately. Such a cleanup shouldn't have to be reviewed a
the same time as the more fundamental changes in this commit.
There is one possible usability regression due to this commit. Namely
it is more likely than before that log_lock_waits messages are output
more than once.
Reviewed-By: Heikki Linnakangas
Up to now large swathes of backend code ran inside signal handlers
while reading commands from the client, to allow for speedy reaction to
asynchronous events. Most prominently shared invalidation and NOTIFY
handling. That means that complex code like the starting/stopping of
transactions is run in signal handlers... The required code was
fragile and verbose, and is likely to contain bugs.
That approach also severely limited what could be done while
communicating with the client. As the read might be from within
openssl it wasn't safely possible to trigger an error, e.g. to cancel
a backend in idle-in-transaction state. We did that in some cases,
namely fatal errors, nonetheless.
Now that FE/BE communication in the backend employs non-blocking
sockets and latches to block, we can quite simply interrupt reads from
signal handlers by setting the latch. That allows us to signal an
interrupted read, which is supposed to be retried after returning from
within the ssl library.
As signal handlers now only need to set the latch to guarantee timely
interrupt processing, remove a fair amount of complicated & fragile
code from async.c and sinval.c.
We could now actually start to process some kinds of interrupts, like
sinval ones, more often that before, but that seems better done
separately.
This work will hopefully allow to handle cases like being blocked by
sending data, interrupting idle transactions and similar to be
implemented without too much effort. In addition to allowing getting
rid of ImmediateInterruptOK, that is.
Author: Andres Freund
Reviewed-By: Heikki Linnakangas
Benchmarks has shown that aligning the buffer descriptor array to
cache lines is important for scalability; especially on bigger,
multi-socket, machines.
Currently the array sometimes already happens to be aligned by
happenstance, depending how large previous shared memory allocations
were. That can lead to wildly varying performance results after minor
configuration changes.
In addition to aligning the start of descriptor array, also force the
size of individual descriptors to be of a common cache line size (64
bytes). That happens to already be the case on 64bit platforms, but
this way we can change the struct BufferDesc more easily.
As the alignment primarily matters in highly concurrent workloads
which probably all are 64bit these days, and the space wastage of
element alignment would be a bit more noticeable on 32bit systems, we
don't force the stride to be cacheline sized on 32bit platforms for
now. If somebody does actual performance testing, we can reevaluate
that decision by changing the definition of BUFFERDESC_PADDED_SIZE.
Discussion: 20140202151319.GD32123@awork2.anarazel.de
Per discussion with Bruce Momjan, Tom Lane, Robert Haas, and Peter
Geoghegan.
The possibly, depending on compiler settings, generated warning was
"warning: `S_UNLOCK' redefined".
The hppa spinlock implementation doesn't follow the rules of s_lock.h
and provides a gcc specific implementation outside of the the part of
the file that's supposed to do that. It does so to avoid duplication
between the HP compiler and gcc. That unfortunately means that
S_UNLOCK is already defined when the HPPA specific section is reached.
Undefine the generic fallback S_UNLOCK definition inside the HPPA
section. That's far from pretty, but has the big advantage of being
simple. If somebody is interested to fix this in a prettier way...
This presumably got broken in the course of 0709b7ee72.
Discussion: 20150114225919.GY5245@awork2.anarazel.de
Per complaint from Tom Lane.
To do so, move InitializeLatchSupport() into the new common process
initialization functions, and add a new global variable MyLatch.
MyLatch is usable as soon InitPostmasterChild() has been called
(i.e. very early during startup). Initially it points to a process
local latch that exists in all processes. InitProcess/InitAuxiliaryProcess
then replaces that local latch with PGPROC->procLatch. During shutdown
the reverse happens.
This is primarily advantageous for two reasons: For one it simplifies
dealing with the shared process latch, especially in signal handlers,
because instead of having to check for MyProc, MyLatch can be used
unconditionally. For another, a later patch that makes FEs/BE
communication use latches, now can rely on the existence of a latch,
even before having gone through InitProcess.
Discussion: 20140927191243.GD5423@alap3.anarazel.de