Commit Graph

11456 Commits

Author SHA1 Message Date
850d341ff7 Remove arbitrary limitation on length of common name in SSL certificates.
Both libpq and the backend would truncate a common name extracted from a
certificate at 32 bytes.  Replace that fixed-size buffer with dynamically
allocated string so that there is no hard limit.  While at it, remove the
code for extracting peer_dn, which we weren't using for anything; and
don't bother to store peer_cn longer than we need it in libpq.

This limit was not so terribly unreasonable when the code was written,
because we weren't using the result for anything critical, just logging it.
But now that there are options for checking the common name against the
server host name (in libpq) or using it as the user's name (in the server),
this could result in undesirable failures.  In the worst case it even seems
possible to spoof a server name or user name, if the correct name is
exactly 32 bytes and the attacker can persuade a trusted CA to issue a
certificate in which that string is a prefix of the certificate's common
name.  (To exploit this for a server name, he'd also have to send the
connection astray via phony DNS data or some such.)  The case that this is
a realistic security threat is a bit thin, but nonetheless we'll treat it
as one.

Back-patch to 8.4.  Older releases contain the faulty code, but it's not
a security problem because the common name wasn't used for anything
interesting.

Reported and patched by Heikki Linnakangas

Security: CVE-2012-0867
2012-02-23 15:48:14 -05:00
de323d534c Require execute permission on the trigger function for CREATE TRIGGER.
This check was overlooked when we added function execute permissions to the
system years ago.  For an ordinary trigger function it's not a big deal,
since trigger functions execute with the permissions of the table owner,
so they couldn't do anything the user issuing the CREATE TRIGGER couldn't
have done anyway.  However, if a trigger function is SECURITY DEFINER,
that is not the case.  The lack of checking would allow another user to
install it on his own table and then invoke it with, essentially, forged
input data; which the trigger function is unlikely to realize, so it might
do something undesirable, for instance insert false entries in an audit log
table.

Reported by Dinesh Kumar, patch by Robert Haas

Security: CVE-2012-0866
2012-02-23 15:39:07 -05:00
144fcf754f Translation updates 2012-02-23 20:36:36 +02:00
31f26140b3 Remove inappropriate quotes
And adjust wording for consistency.
2012-02-23 12:51:33 +02:00
140766dff6 REASSIGN OWNED: Support foreign data wrappers and servers
This was overlooked when implementing those kinds of objects, in commit
cae565e503c42a0942ca1771665243b4453c5770.

Per report from Pawel Casperek.
2012-02-22 17:32:23 -03:00
315cb2f967 Correctly initialise shared recoveryLastRecPtr in recovery.
Previously we used ReadRecPtr rather than EndRecPtr, which was
not a serious error but caused pg_stat_replication to report
incorrect replay_location until at least one WAL record is replayed.

Fujii Masao
2012-02-22 13:55:04 +00:00
e0eb63238a Don't clear btpo_cycleid during _bt_vacuum_one_page.
When "vacuuming" a single btree page by removing LP_DEAD tuples, we are not
actually within a vacuum operation, but rather in an ordinary insertion
process that could well be running concurrently with a vacuum.  So clearing
the cycleid is incorrect, and could cause the concurrent vacuum to miss
removing tuples that it needs to remove.  This is a longstanding bug
introduced by commit e6284649b9e30372b3990107a082bc7520325676 of
2006-07-25.  I believe it explains Maxim Boguk's recent report of index
corruption, and probably some other previously unexplained reports.

In 9.0 and up this is a one-line fix; before that we need to introduce a
flag to tell _bt_delitems what to do.
2012-02-21 15:03:50 -05:00
d1ed3363f6 Avoid double close of file handle in syslogger on win32
This causes an exception when running under a debugger or in particular
when running on a debug version of Windows.

Patch from MauMau
2012-02-21 17:14:14 +01:00
1fee1bf042 Fix regex back-references that are directly quantified with *.
The syntax "\n*", that is a backref with a * quantifier directly applied
to it, has never worked correctly in Spencer's library.  This has been an
open bug in the Tcl bug tracker since 2005:
https://sourceforge.net/tracker/index.php?func=detail&aid=1115587&group_id=10894&atid=110894

The core of the problem is in parseqatom(), which first changes "\n*" to
"\n+|" and then applies repeat() to the NFA representing the backref atom.
repeat() thinks that any arc leading into its "rp" argument is part of the
sub-NFA to be repeated.  Unfortunately, since parseqatom() already created
the arc that was intended to represent the empty bypass around "\n+", this
arc gets moved too, so that it now leads into the state loop created by
repeat().  Thus, what was supposed to be an "empty" bypass gets turned into
something that represents zero or more repetitions of the NFA representing
the backref atom.  In the original example, in place of
	^([bc])\1*$
we now have something that acts like
	^([bc])(\1+|[bc]*)$
At runtime, the branch involving the actual backref fails, as it's supposed
to, but then the other branch succeeds anyway.

We could no doubt fix this by some rearrangement of the operations in
parseqatom(), but that code is plenty ugly already, and what's more the
whole business of converting "x*" to "x+|" probably needs to go away to fix
another problem I'll mention in a moment.  Instead, this patch suppresses
the *-conversion when the target is a simple backref atom, leaving the case
of m == 0 to be handled at runtime.  This makes the patch in regcomp.c a
one-liner, at the cost of having to tweak cbrdissect() a little.  In the
event I went a bit further than that and rewrote cbrdissect() to check all
the string-length-related conditions before it starts comparing characters.
It seems a bit stupid to possibly iterate through many copies of an
n-character backreference, only to fail at the end because the target
string's length isn't a multiple of n --- we could have found that out
before starting.  The existing coding could only be a win if integer
division is hugely expensive compared to character comparison, but I don't
know of any modern machine where that might be true.

This does not fix all the problems with quantified back-references.  In
particular, the code is still broken for back-references that appear within
a larger expression that is quantified (so that direct insertion of the
quantification limits into the BACKREF node doesn't apply).  I think fixing
that will take some major surgery on the NFA code, specifically introducing
an explicit iteration node type instead of trying to transform iteration
into concatenation of modified regexps.

Back-patch to all supported branches.  In HEAD, also add a regression test
case for this.  (It may seem a bit silly to create a regression test file
for just one test case; but I'm expecting that we will soon import a whole
bunch of regex regression tests from Tcl, so might as well create the
infrastructure now.)
2012-02-20 00:52:49 -05:00
cc0f6fff1a Fix postmaster to attempt restart after a hot-standby crash.
The postmaster was coded to treat any unexpected exit of the startup
process (i.e., the WAL replay process) as a catastrophic crash, and not try
to restart it. This was OK so long as the startup process could not have
any sibling postmaster children.  However, if a hot-standby backend
crashes, we SIGQUIT the startup process along with everything else, and the
resulting exit is hardly "unexpected".  Treating it as such meant we failed
to restart a standby server after any child crash at all, not only a crash
of the WAL replay process as intended.  Adjust that.  Back-patch to 9.0
where hot standby was introduced.
2012-02-06 15:29:52 -05:00
07e36415e5 Avoid throwing ERROR during WAL replay of DROP TABLESPACE.
Although we will not even issue an XLOG_TBLSPC_DROP WAL record unless
removal of the tablespace's directories succeeds, that does not guarantee
that the same operation will succeed during WAL replay.  Foreseeable
reasons for it to fail include temp files created in the tablespace by Hot
Standby backends, wrong directory permissions on a standby server, etc etc.
The original coding threw ERROR if replay failed to remove the directories,
but that is a serious overreaction.  Throwing an error aborts recovery,
and worse means that manual intervention will be needed to get the database
to start again, since otherwise the same error will recur on subsequent
attempts to replay the same WAL record.  And the consequence of failing to
remove the directories is only that some probably-small amount of disk
space is wasted, so it hardly seems justified to throw an error.
Accordingly, arrange to report such failures as LOG messages and keep going
when a failure occurs during replay.

Back-patch to 9.0 where Hot Standby was introduced.  In principle such
problems can occur in earlier releases, but Hot Standby increases the odds
of trouble significantly.  Given the lack of field reports of such issues,
I'm satisfied with patching back as far as the patch applies easily.
2012-02-06 14:44:10 -05:00
852a5cded3 Avoid problems with OID wraparound during WAL replay.
Fix a longstanding thinko in replay of NEXTOID and checkpoint records: we
tried to advance nextOid only if it was behind the value in the WAL record,
but the comparison would draw the wrong conclusion if OID wraparound had
occurred since the previous value.  Better to just unconditionally assign
the new value, since OID assignment shouldn't be happening during replay
anyway.

The consequences of a failure to update nextOid would be pretty minimal,
since we have long had the code set up to obtain another OID and try again
if the generated value is already in use.  But in the worst case there
could be significant performance glitches while such loops iterate through
many already-used OIDs before finding a free one.

The odds of a wraparound happening during WAL replay would be small in a
crash-recovery scenario, and the length of any ensuing OID-assignment stall
quite limited anyway.  But neither of these statements hold true for a
replication slave that follows a WAL stream for a long period; its behavior
upon going live could be almost unboundedly bad.  Hence it seems worth
back-patching this fix into all supported branches.

Already fixed in HEAD in commit c6d76d7c82ebebb7210029f7382c0ebe2c558bca.
2012-02-06 13:14:52 -05:00
2b196f01ef Fix transient clobbering of shared buffers during WAL replay.
RestoreBkpBlocks was in the habit of zeroing and refilling the target
buffer; which was perfectly safe when the code was written, but is unsafe
during Hot Standby operation.  The reason is that we have coding rules
that allow backends to continue accessing a tuple in a heap relation while
holding only a pin on its buffer.  Such a backend could see transiently
zeroed data, if WAL replay had occasion to change other data on the page.
This has been shown to be the cause of bug #6425 from Duncan Rance (who
deserves kudos for developing a sufficiently-reproducible test case) as
well as Bridget Frey's re-report of bug #6200.  It most likely explains the
original report as well, though we don't yet have confirmation of that.

To fix, change the code so that only bytes that are supposed to change will
change, even transiently.  This actually saves cycles in RestoreBkpBlocks,
since it's not writing the same bytes twice.

Also fix seq_redo, which has the same disease, though it has to work a bit
harder to meet the requirement.

So far as I can tell, no other WAL replay routines have this type of bug.
In particular, the index-related replay routines, which would certainly be
broken if they had to meet the same standard, are not at risk because we
do not have coding rules that allow access to an index page when not
holding a buffer lock on it.

Back-patch to 9.0 where Hot Standby was added.
2012-02-05 15:49:46 -05:00
a286b6f6c7 Resolve timing issue with logging locks for Hot Standby.
We log AccessExclusiveLocks for replay onto standby nodes,
but because of timing issues on ProcArray it is possible to
log a lock that is still held by a just committed transaction
that is very soon to be removed. To avoid any timing issue we
avoid applying locks made by transactions with InvalidXid.

Simon Riggs, bug report Tom Lane, diagnosis Pavan Deolasee
2012-02-01 09:33:16 +00:00
8bff6407ba Accept a non-existent value in "ALTER USER/DATABASE SET ..." command.
When default_text_search_config, default_tablespace, or temp_tablespaces
setting is set per-user or per-database, with an "ALTER USER/DATABASE SET
..." statement, don't throw an error if the text search configuration or
tablespace does not exist. In case of text search configuration, even if
it doesn't exist in the current database, it might exist in another
database, where the setting is intended to have its effect. This behavior
is now the same as search_path's.

Tablespaces are cluster-wide, so the same argument doesn't hold for
tablespaces, but there's a problem with pg_dumpall: it dumps "ALTER USER
SET ..." statements before the "CREATE TABLESPACE" statements. Arguably
that's pg_dumpall's fault - it should dump the statements in such an order
that the tablespace is created first and then the "ALTER USER SET
default_tablespace ..." statements after that - but it seems better to be
consistent with search_path and default_text_search_config anyway. Besides,
you could still create a dump that throws an error, by creating the
tablespace, running "ALTER USER SET default_tablespace", then dropping the
tablespace and running pg_dumpall on that.

Backpatch to all supported versions.
2012-01-30 11:39:26 +02:00
e5f97c5f81 Fix CLUSTER/VACUUM FULL for toast values owned by recently-updated rows.
In commit 7b0d0e9356963d5c3e4d329a917f5fbb82a2ef05, I made CLUSTER and
VACUUM FULL try to preserve toast value OIDs from the original toast table
to the new one.  However, if we have to copy both live and recently-dead
versions of a row that has a toasted column, those versions may well
reference the same toast value with the same OID.  The patch then led to
duplicate-key failures as we tried to insert the toast value twice with the
same OID.  (The previous behavior was not very desirable either, since it
would have silently inserted the same value twice with different OIDs.
That wastes space, but what's worse is that the toast values inserted for
already-dead heap rows would not be reclaimed by subsequent ordinary
VACUUMs, since they go into the new toast table marked live not deleted.)

To fix, check if the copied OID already exists in the new toast table, and
if so, assume that it stores the desired value.  This is reasonably safe
since the only case where we will copy an OID from a previous toast pointer
is when toast_insert_or_update was given that toast pointer and so we just
pulled the data from the old table; if we got two different values that way
then we have big problems anyway.  We do have to assume that no other
backend is inserting items into the new toast table concurrently, but
that's surely safe for CLUSTER and VACUUM FULL.

Per bug #6393 from Maxim Boguk.  Back-patch to 9.0, same as the previous
patch.
2012-01-12 16:40:24 -05:00
c024a3b3be Make executor's SELECT INTO code save and restore original tuple receiver.
As previously coded, the QueryDesc's dest pointer was left dangling
(pointing at an already-freed receiver object) after ExecutorEnd.  It's a
bit astonishing that it took us this long to notice, and I'm not sure that
the known problem case with SQL functions is the only one.  Fix it by
saving and restoring the original receiver pointer, which seems the most
bulletproof way of ensuring any related bugs are also covered.

Per bug #6379 from Paul Ramsey.  Back-patch to 8.4 where the current
handling of SELECT INTO was introduced.
2012-01-04 18:31:08 -05:00
7443ab2b34 Update per-column ACLs, not only per-table ACL, when changing table owner.
We forgot to modify column ACLs, so privileges were still shown as having
been granted by the old owner.  This meant that neither the new owner nor
a superuser could revoke the now-untraceable-to-table-owner permissions.
Per bug #6350 from Marc Balmer.

This has been wrong since column ACLs were added, so back-patch to 8.4.
2011-12-21 18:23:24 -05:00
61dd2ffaff Avoid crashing when we have problems unlinking files post-commit.
smgrdounlink takes care to not throw an ERROR if it fails to unlink
something, but that caution was rendered useless by commit
3396000684b41e7e9467d1abc67152b39e697035, which put an smgrexists call in
front of it; smgrexists *does* throw error if anything looks funny, such
as getting a permissions error from trying to open the file.  If that
happens post-commit, you get a PANIC, and what's worse the same logic
appears in the WAL replay code, so the database even fails to restart.

Restore the intended behavior by removing the smgrexists call --- it isn't
accomplishing anything that we can't do better by adjusting mdunlink's
ideas of whether it ought to warn about ENOENT or not.

Per report from Joseph Shraibman of unrecoverable crash after trying to
drop a table whose FSM fork had somehow gotten chmod'd to 000 permissions.
Backpatch to 8.4, where the bogus coding was introduced.
2011-12-20 15:00:47 -05:00
6c0a375adf Revert the behavior of inet/cidr functions to not unpack the arguments.
I forgot to change the functions to use the PG_GETARG_INET_PP() macro,
when I changed DatumGetInetP() to unpack the datum, like Datum*P macros
usually do. Also, I screwed up the definition of the PG_GETARG_INET_PP()
macro, and didn't notice because it wasn't used.

This fixes the memory leak when sorting inet values, as reported
by Jochen Erwied and debugged by Andres Freund. Backpatch to 8.3, like
the previous patch that broke it.
2011-12-12 10:05:24 +02:00
94b18c60c7 Don't set reachedMinRecoveryPoint during crash recovery. In crash recovery,
we don't reach consistency before replaying all of the WAL. Rename the
variable to reachedConsistency, to make its intention clearer.

In master, that was an active bug because of the recent patch to
immediately PANIC if a reference to a missing page is found in WAL after
reaching consistency, as Tom Lane's test case demonstrated. In 9.1 and 9.0,
the only consequence was a misleading "consistent recovery state reached at
%X/%X" message in the log at the beginning of crash recovery (the database
is not consistent at that point yet). In 8.4, the log message was not
printed in crash recovery, even though there was a similar
reachedMinRecoveryPoint local variable that was also set early. So,
backpatch to 9.1 and 9.0.
2011-12-09 15:50:19 +02:00
698bb4ec4f Translation updates 2011-12-01 22:59:40 +02:00
efb7423793 Fix getTypeIOParam to support type record[].
Since record[] uses array_in, it needs to have its element type passed
as typioparam.  In HEAD and 9.1, this fix essentially reverts commit
9bc933b2125a5358722490acbc50889887bf7680, which was a hack that is no
longer needed since domains don't set their typelem anymore.  Before
that, adjust the logic so that only domains are excluded from being
treated like arrays, rather than assuming that only base types should
be included.  Add a regression test to demonstrate the need for this.
Per report from Maxim Boguk.

Back-patch to 8.4, where type record[] was added.
2011-12-01 12:44:28 -05:00
b06c6f52d0 Tweak previous patch to ensure edata->filename always gets initialized.
On a platform that isn't supplying __FILE__, previous coding would either
crash or give a stale result for the filename string.  Not sure how likely
that is, but the original code catered for it, so let's keep doing so.
2011-11-30 00:37:20 -05:00
75b6183694 Strip file names reported in error messages in vpath builds
In vpath builds, the __FILE__ macro that is used in verbose error
reports contains the full absolute file name, which makes the error
messages excessively verbose.  So keep only the base name, thus
matching the behavior of non-vpath builds.
2011-11-29 23:51:28 +02:00
b09fc7c39d Ensure that whole-row junk Vars are always of composite type.
The EvalPlanQual machinery assumes that whole-row Vars generated for the
outputs of non-table RTEs will be of composite types.  However, for the
case where the RTE is a function call returning a scalar type, we were
doing the wrong thing, as a result of sharing code with a parser case
where the function's scalar output is wanted.  (Or at least, that's what
that case has done historically; it does seem a bit inconsistent.)

To fix, extend makeWholeRowVar's API so that it can support both use-cases.
This fixes Belinda Cussen's report of crashes during concurrent execution
of UPDATEs involving joins to the result of UNNEST() --- in READ COMMITTED
mode, we'd run the EvalPlanQual machinery after a conflicting row update
commits, and it was expecting to get a HeapTuple not a scalar datum from
the "wholerowN" variable referencing the function RTE.

Back-patch to 9.0 where the current EvalPlanQual implementation appeared.

In 9.1 and up, this patch also fixes failure to attach the correct
collation to the Var generated for a scalar-result case.  An example:
regression=# select upper(x.*) from textcat('ab', 'cd') x;
ERROR:  could not determine which collation to use for upper() function
2011-11-27 22:27:39 -05:00
80cbf3401c Fix erroneous replay of GIN_UPDATE_META_PAGE WAL records.
A simple thinko in ginRedoUpdateMetapage, namely failing to increment a
loop counter, led to inserting records into the last pending-list page in
the wrong order (the opposite of that intended).  So far as I can tell,
this would not upset the code that eventually flushes pending items into
the main part of the GIN index.  But it did break the code that searched
the pending list for matches, resulting in transient failure to find
matching entries during index lookups, as illustrated in bug #6307 from
Maksym Boguk.

Back-patch to 8.4 where the incorrect code was introduced.
2011-11-25 13:59:22 -05:00
3c4f293dd5 Avoid floating-point underflow while tracking buffer allocation rate.
When the system is idle for awhile after activity, the "smoothed_alloc"
state variable in BgBufferSync converges slowly to zero.  With standard
IEEE float arithmetic this results in several iterations with denormalized
values, which causes kernel traps and annoying log messages on some
poorly-designed platforms.  There's no real need to track such small values
of smoothed_alloc, so we can prevent the kernel traps by forcing it to zero
as soon as it's too small to be interesting for our purposes.  This issue
is purely cosmetic, since the iterations don't happen fast enough for the
kernel traps to pose any meaningful performance problem, but still it seems
worth shutting up the log messages.

The kernel log messages were previously reported by a number of people,
but kudos to Greg Matthews for tracking down exactly where they were coming
from.
2011-11-19 00:36:16 -05:00
a062ee4257 Make DatumGetInetP() unpack inet datums with a 1-byte header, and add
a new macro, DatumGetInetPP(), that does not. This brings these macros
in line with other DatumGet*P() macros.

Backpatch to 8.3, where 1-byte header varlenas were introduced.
2011-11-08 22:45:28 +02:00
b07b2bdc57 Don't assume that a tuple's header size is unchanged during toasting.
This assumption can be wrong when the toaster is passed a raw on-disk
tuple, because the tuple might pre-date an ALTER TABLE ADD COLUMN operation
that added columns without rewriting the table.  In such a case the tuple's
natts value is smaller than what we expect from the tuple descriptor, and
so its t_hoff value could be smaller too.  In fact, the tuple might not
have a null bitmap at all, and yet our current opinion of it is that it
contains some trailing nulls.

In such a situation, toast_insert_or_update did the wrong thing, because
to save a few lines of code it would use the old t_hoff value as the offset
where heap_fill_tuple should start filling data.  This did not leave enough
room for the new nulls bitmap, with the result that the first few bytes of
data could be overwritten with null flag bits, as in a recent report from
Hubert Depesz Lubaczewski.

The particular case reported requires ALTER TABLE ADD COLUMN followed by
CREATE TABLE AS SELECT * FROM ... or INSERT ... SELECT * FROM ..., and
further requires that there be some out-of-line toasted fields in one of
the tuples to be copied; else we'll not reach the troublesome code.
The problem can only manifest in this form in 8.4 and later, because
before commit a77eaa6a95009a3441e0d475d1980259d45da072, CREATE TABLE AS or
INSERT/SELECT wouldn't result in raw disk tuples getting passed directly
to heap_insert --- there would always have been at least a junkfilter in
between, and that would reconstitute the tuple header with an up-to-date
t_natts and hence t_hoff.  But I'm backpatching the tuptoaster change all
the way anyway, because I'm not convinced there are no older code paths
that present a similar risk.
2011-11-04 23:23:16 -04:00
9b3c35a9db Fix inline_set_returning_function() to allow multiple OUT parameters.
inline_set_returning_function failed to distinguish functions returning
generic RECORD (which require a column list in the RTE, as well as run-time
type checking) from those with multiple OUT parameters (which do not).
This prevented inlining from happening.  Per complaint from Jay Levitt.
Back-patch to 8.4 where this capability was introduced.
2011-11-03 17:53:26 -04:00
42f77244e7 Revert "Stop btree indexscans upon reaching nulls in either direction."
This reverts commit 7357f92a3ef1faff0ce85f55f3c5a7f4dc820eaa.
As pointed out by Naoya Anzai, we need to do more work to make that
idea handle end-of-index cases, and it is looking like too much risk
for a back-patch.  So bug #6278 is only going to be fixed in HEAD.
2011-11-02 13:36:31 -04:00
656bba95af Derive oldestActiveXid at correct time for Hot Standby.
There was a timing window between when oldestActiveXid was derived
and when it should have been derived that only shows itself under
heavy load. Move code around to ensure correct timing of derivation.
No change to StartupSUBTRANS() code, which is where this failed.

Bug report by Chris Redekop
2011-11-02 08:52:59 +00:00
ff8451aa14 Start Hot Standby faster when initial snapshot is incomplete.
If the initial snapshot had overflowed then we can start whenever
the latest snapshot is empty, not overflowed or as we did already,
start when the xmin on primary was higher than xmax of our starting
snapshot, which proves we have full snapshot data.

Bug report by Chris Redekop
2011-11-02 08:27:00 +00:00
2f55c535e1 Fix timing of Startup CLOG and MultiXact during Hot Standby
Patch by me, bug report by Chris Redekop, analysis by Florian Pflug
2011-11-02 08:03:21 +00:00
7f797d27fe Fix race condition with toast table access from a stale syscache entry.
If a tuple in a syscache contains an out-of-line toasted field, and we
try to fetch that field shortly after some other transaction has committed
an update or deletion of the tuple, there is a race condition: vacuum
could come along and remove the toast tuples before we can fetch them.
This leads to transient failures like "missing chunk number 0 for toast
value NNNNN in pg_toast_2619", as seen in recent reports from Andrew
Hammond and Tim Uckun.

The design idea of syscache is that access to stale syscache entries
should be prevented by relation-level locks, but that fails for at least
two cases where toasted fields are possible: ANALYZE updates pg_statistic
rows without locking out sessions that might want to plan queries on the
same table, and CREATE OR REPLACE FUNCTION updates pg_proc rows without
any meaningful lock at all.

The least risky fix seems to be an idea that Heikki suggested when we
were dealing with a related problem back in August: forcibly detoast any
out-of-line fields before putting a tuple into syscache in the first place.
This avoids the problem because at the time we fetch the parent tuple from
the catalog, we should be holding an MVCC snapshot that will prevent
removal of the toast tuples, even if the parent tuple is outdated
immediately after we fetch it.  (Note: I'm not convinced that this
statement holds true at every instant where we could be fetching a syscache
entry at all, but it does appear to hold true at the times where we could
fetch an entry that could have a toasted field.  We will need to be a bit
wary of adding toast tables to low-level catalogs that don't have them
already.)  An additional benefit is that subsequent uses of the syscache
entry should be faster, since they won't have to detoast the field.

Back-patch to all supported versions.  The problem is significantly harder
to reproduce in pre-9.0 releases, because of their willingness to flush
every entry in a syscache whenever the underlying catalog is vacuumed
(cf CatalogCacheFlushRelation); but there is still a window for trouble.
2011-11-01 19:48:49 -04:00
7357f92a3e Stop btree indexscans upon reaching nulls in either direction.
The existing scan-direction-sensitive tests were overly complex, and
failed to stop the scan in cases where it's perfectly legitimate to do so.
Per bug #6278 from Maksym Boguk.

Back-patch to 8.3, which is as far back as the patch applies easily.
Doesn't seem worth sweating over a relatively minor performance issue in
8.2 at this late date.  (But note that this was a performance regression
from 8.1 and before, so 8.2 is being left as an outlier.)
2011-10-31 16:40:16 -04:00
5093944311 Fix assorted bogosities in cash_in() and cash_out().
cash_out failed to handle multiple-byte thousands separators, as per bug
#6277 from Alexander Law.  In addition, cash_in didn't handle that either,
nor could it handle multiple-byte positive_sign.  Both routines failed to
support multiple-byte mon_decimal_point, which I did not think was worth
changing, but at least now they check for the possibility and fall back to
using '.' rather than emitting invalid output.  Also, make cash_in handle
trailing negative signs, which formerly it would reject.  Since cash_out
generates trailing negative signs whenever the locale tells it to, this
last omission represents a fail-to-reload-dumped-data bug.  IMO that
justifies patching this all the way back.
2011-10-29 14:31:03 -04:00
68b0997017 Change FK trigger creation order to better support self-referential FKs.
When a foreign-key constraint references another column of the same table,
row updates will queue both the PK's ON UPDATE action and the FK's CHECK
action in the same event.  The ON UPDATE action must execute first, else
the CHECK will check a non-final state of the row and possibly throw an
inappropriate error, as seen in bug #6268 from Roman Lytovchenko.

Now, the firing order of multiple triggers for the same event is determined
by the sort order of their pg_trigger.tgnames, and the auto-generated names
we use for FK triggers are "RI_ConstraintTrigger_NNNN" where NNNN is the
trigger OID.  So most of the time the firing order is the same as creation
order, and so rearranging the creation order fixes it.

This patch will fail to fix the problem if the OID counter wraps around or
adds a decimal digit (eg, from 99999 to 100000) while we are creating the
triggers for an FK constraint.  Given the small odds of that, and the low
usage of self-referential FKs, we'll live with that solution in the back
branches.  A better fix is to change the auto-generated names for FK
triggers, but it seems unwise to do that in stable branches because there
may be client code that depends on the naming convention.  We'll fix it
that way in HEAD in a separate patch.

Back-patch to all supported branches, since this bug has existed for a long
time.
2011-10-26 13:02:40 -04:00
9fbb3edff0 Don't trust deferred-unique indexes for join removal.
The uniqueness condition might fail to hold intra-transaction, and assuming
it does can give incorrect query results.  Per report from Marti Raudsepp,
though this is not his proposed patch.

Back-patch to 9.0, where both these features were introduced.  In the
released branches, add the new IndexOptInfo field to the end of the struct,
to try to minimize ABI breakage for third-party code that may be examining
that struct.
2011-10-23 00:43:52 -04:00
9ca46f5bb6 Fix bugs in information_schema.referential_constraints view.
This view was being insufficiently careful about matching the FK constraint
to the depended-on primary or unique key constraint.  That could result in
failure to show an FK constraint at all, or showing it multiple times, or
claiming that it depended on a different constraint than the one it really
does.  Fix by joining via pg_depend to ensure that we find only the correct
dependency.

Back-patch, but don't bump catversion because we can't force initdb in back
branches.  The next minor-version release notes should explain that if you
need to fix this in an existing installation, you can drop the
information_schema schema then re-create it by sourcing
$SHAREDIR/information_schema.sql in each database (as a superuser of
course).
2011-10-14 20:24:32 -04:00
c02e52dfdc Don't let transform_null_equals=on affect CASE foo WHEN NULL ... constructs.
transform_null_equals is only supposed to affect "foo = NULL" expressions
given directly by the user, not the internal "foo = NULL" expression
generated from CASE-WHEN.

This fixes bug #6242, reported by Sergey. Backpatch to all supported
branches.
2011-10-08 11:21:04 +03:00
ddc36df7af Add sourcefile/sourceline data to EXEC_BACKEND GUC transmission files.
This oversight meant that on Windows, the pg_settings view would not
display source file or line number information for values coming from
postgresql.conf, unless the backend had received a SIGHUP since starting.

In passing, also make the error detection in read_nondefault_variables a
tad more thorough, and fix it to not lose precision on float GUCs (these
changes are already in HEAD as of my previous commit).
2011-10-04 17:01:06 -04:00
f994bf965d ProcedureCreate neglected to record dependencies on default expressions.
Thus, an object referenced in a default expression could be dropped while
the function remained present.  This was unaccountably missed in the
original patch to add default parameters for functions.  Reported by
Pavel Stehule.
2011-10-03 12:13:46 -04:00
b43bb707cc Translation updates 2011-09-22 23:10:16 +03:00
b04214f6cf gistendscan() forgot to free so->giststate.
This oversight led to a massive memory leak --- upwards of 10KB per tuple
--- during creation-time verification of an exclusion constraint based on a
GIST index.  In most other scenarios it'd just be a leak of 10KB that would
be recovered at end of query, so not too significant; though perhaps the
leak would be noticeable in a situation where a GIST index was being used
in a nestloop inner indexscan.  In any case, it's a real leak of long
standing, so patch all supported branches.  Per report from Harald Fuchs.
2011-09-16 04:28:01 -04:00
cac73320ef deflist_to_tuplestore dumped core on an option with no value.
Make it return NULL for the option_value, instead.

Per report from Frank van Vugt.  Back-patch to 8.4 where this code was
added.
2011-09-13 11:36:57 -04:00
6e7a3c364b Fix corner case bug in numeric to_char().
Trailing-zero stripping applied by the FM specifier could strip zeroes
to the left of the decimal point, for a format with no digit positions
after the decimal point (such as "FM999.").

Reported and diagnosed by Marti Raudsepp, though I didn't use his patch.
2011-09-07 17:06:26 -04:00
d5e429b128 Avoid possibly accessing off the end of memory in SJIS2004 conversion.
The code in shift_jis_20042euc_jis_2004() would fetch two bytes even when
only one remained in the string.  Since conversion functions aren't
supposed to assume null-terminated input, this poses a small risk of
fetching past the end of memory and incurring SIGSEGV.  No such crash has
been identified in the field, but we've certainly seen the equivalent
happen in other code paths, so patch this one all the way back.

Report and patch by Noah Misch.
2011-09-06 14:50:56 -04:00
ad1e8274eb Avoid possibly accessing off the end of memory in examine_attribute().
Since the last couple of columns of pg_type are often NULL,
sizeof(FormData_pg_type) can be an overestimate of the actual size of the
tuple data part.  Therefore memcpy'ing that much out of the catalog cache,
as analyze.c was doing, poses a small risk of copying past the end of
memory and incurring SIGSEGV.  No such crash has been identified in the
field, but we've certainly seen the equivalent happen in other code paths,
so patch this one all the way back.

Per valgrind testing by Noah Misch, though this is not his proposed patch.
I chose to use SearchSysCacheCopy1 rather than inventing special-purpose
infrastructure for copying only the minimal part of a pg_type tuple.
2011-09-06 14:37:48 -04:00