rows will normally never obtain an XID at all. We already did things this way
for subtransactions, but this patch extends the concept to top-level
transactions. In applications where there are lots of short read-only
transactions, this should improve performance noticeably; not so much from
removal of the actual XID-assignments, as from reduction of overhead that's
driven by the rate of XID consumption. We add a concept of a "virtual
transaction ID" so that active transactions can be uniquely identified even
if they don't have a regular XID. This is a much lighter-weight concept:
uniqueness of VXIDs is only guaranteed over the short term, and no on-disk
record is made about them.
Florian Pflug, with some editorialization by Tom.
(Actually, it works as a plain statement too, but I didn't document that
because it seems a bit useless.) Unify VariableResetStmt with
VariableSetStmt, and clean up some ancient cruft in the representation of
same.
initcap style --- the vast majority of the existing descriptions do not use
an initial cap. I didn't change places where the first word was all-cap.
initdb not forced because this doesn't change any regression test results.
operator-family rewrite. I had mistakenly supposed that these could use the
pg_amproc entries for text[] and inet[] respectively. However, binary
compatibility of the underlying types does not make two array types binary
compatible (since they must differ in the header field that gives the element
type OID), and so the index support code doesn't consider those entries
applicable. Add back the missing pg_amproc entries, and add an opr_sanity
query to try to catch such mistakes in future. Per report from Gregory
Maxwell.
There are still some loose ends: I didn't do anything about the SET FROM
CURRENT idea yet, and it's not real clear whether we are happy with the
interaction of SET LOCAL with function-local settings. The documentation
is a bit spartan, too.
the number of rows likely to be produced by a query such as
SELECT * FROM t1 LEFT JOIN t2 USING (key) WHERE t2.key IS NULL;
What this is doing is selecting for t1 rows with no match in t2, and thus
it may produce a significant number of rows even if the t2.key table column
contains no nulls at all. 8.2 thinks the table column's null fraction is
relevant and thus may estimate no rows out, which results in terrible plans
if there are more joins above this one. A proper fix for this will involve
passing much more information about the context of a clause to the selectivity
estimator functions than we ever have. There's no time left to write such a
patch for 8.3, and it wouldn't be back-patchable into 8.2 anyway. Instead,
put in an ad-hoc test to defeat the normal table-stats-based estimation when
an IS NULL test is evaluated at an outer join, and just use a constant
estimate instead --- I went with 0.5 for lack of a better idea. This won't
catch every case but it will catch the typical ways of writing such queries,
and it seems unlikely to make things worse for other queries.
sets for outer joins, in the light of bug #3588 and additional thought and
experimentation. The original methodology was fatally flawed for nests of
more than two outer joins: it got the relationships between adjacent joins
right, but didn't always come to the right conclusions about whether a join
could be interchanged with one two or more levels below it. This was largely
caused by a mistaken idea that we should use the min_lefthand + min_righthand
sets of a sub-join as the minimum left or right input set of an upper join
when we conclude that the sub-join can't commute with the upper one. If
there's a still-lower join that the sub-join *can* commute with, this method
led us to think that that one could commute with the topmost join; which it
can't. Another problem (not directly connected to bug #3588) was that
make_outerjoininfo's processing-order-dependent method for enforcing outer
join identity #3 didn't work right: if we decided that join A could safely
commute with lower join B, we dropped all information about sub-joins under B
that join A could perhaps not safely commute with, because we removed B's
entire min_righthand from A's.
To fix, make an explicit computation of all inner join combinations that occur
below an outer join, and add to that the full syntactic relsets of any lower
outer joins that we determine it can't commute with. This method gives much
more direct enforcement of the outer join rearrangement identities, and it
turns out not to cost a lot of additional bookkeeping.
Thanks to Richard Harris for the bug report and test case.
namespace isn't necessarily first in the search path (there could be implicit
schemas ahead of it). Examples are
test=# set search_path TO s1;
test=# create view pg_timezone_names as select * from pg_timezone_names();
ERROR: "pg_timezone_names" is already a view
test=# create table pg_class (f1 int primary key);
ERROR: permission denied: "pg_class" is a system catalog
You'd expect these commands to create the requested objects in s1, since
names beginning with pg_ aren't supposed to be reserved anymore. What is
happening is that we create the requested base table and then execute
additional commands (here, CREATE RULE or CREATE INDEX), and that code is
passed the same RangeVar that was in the original command. Since that
RangeVar has schemaname = NULL, the secondary commands think they should do a
path search, and that means they find system catalogs that are implicitly in
front of s1 in the search path.
This is perilously close to being a security hole: if the secondary command
failed to apply a permission check then it'd be possible for unprivileged
users to make schema modifications to system catalogs. But as far as I can
find, there is no code path in which a check doesn't occur. Which makes it
just a weird corner-case bug for people who are silly enough to want to
name their tables the same as a system catalog.
The relevant code has changed quite a bit since 8.2, which means this patch
wouldn't work as-is in the back branches. Since it's a corner case no one
has reported from the field, I'm not going to bother trying to back-patch.
days that was obsolete the moment we had IN (SELECT ...) capability.
It's arguably a security hole since it applied no permissions check to
the table it searched, and since it was never documented anywhere,
removing it seems more appropriate than fixing it.
- ispell initialization crashed on empty dictionary file
- ispell initialization crashed on affix file with prefixes but no suffixes
- stop words file was run through pg_verify_mbstr, with database
encoding, but it's supposed to be UTF-8; similar bug for synonym files
- bunch of comments added, typos fixed, and other cleanup
Introduced consistent encoding checking/conversion of data read from tsearch
configuration files, by doing this in a single t_readline() subroutine
(replacing direct usages of fgets). Cleaned up API for readstopwords too.
Heikki Linnakangas
This prevents needing to do complex and poorly-defined updates of the
mapping table if the new parser has different token types than the old.
Per discussion.
init options of the template as top-level options in the syntax. This also
makes ALTER a bit easier to use, since options can be replaced individually.
I also made these statements verify that the tmplinit method will accept
the new settings before they get stored; in the original coding you didn't
find out about mistakes until the dictionary got invoked.
Under the hood, init methods now get options as a List of DefElem instead
of a raw text string --- that lets tsearch use existing options-pushing code
instead of duplicating functionality.
Oleg Bartunov and Teodor Sigaev, but I did a lot of editorializing,
so anything that's broken is probably my fault.
Documentation is nonexistent as yet, but let's land the patch so we can
get some portability testing done.
are not one of the query's defined result relations, but nonetheless have
triggers fired against them while the query is active. This was formerly
impossible but can now occur because of my recent patch to fix the firing
order for RI triggers. Caching a ResultRelInfo avoids duplicating work by
repeatedly opening and closing the same relation, and also allows EXPLAIN
ANALYZE to "see" and report on these extra triggers. Use the same mechanism
to cache open relations when firing deferred triggers at transaction shutdown;
this replaces the former one-element-cache strategy used in that case, and
should improve performance a bit when there are deferred triggers on a number
of relations.
row within one query: we were firing check triggers before all the updates
were done, leading to bogus failures. Fix by making the triggers queued by
an RI update go at the end of the outer query's trigger event list, thereby
effectively making the processing "breadth-first". This was indeed how it
worked pre-8.0, so the bug does not occur in the 7.x branches.
Per report from Pavel Stehule.
that still thought they could set HEAP_XMAX_COMMITTED immediately after
seeing the other transaction commit. Make them use the same logic as
tqual.c does to determine if the hint bit can be set yet.
child memory contexts is indented two spaces to the right of its
parent context. This should make it easier to deduce the memory
context hierarchy from the output of MemoryContextStats().
between the setting of log_line_prefix and the setting of log_timezone. We
can't realistically set log_timezone any earlier than we do now, so the best
behavior seems to be to use GMT zone if any timestamps are to be logged during
early startup. Create a dummy zone variable with a minimal definition of GMT
(in particular it will never know about leap seconds), so that we can set it
up without reference to any external files.
displayed in the postmaster log. This avoids Windows-specific problems with
localized time zone names that are in the wrong encoding, and generally seems
like a good idea to forestall other potential platform-dependent issues.
To preserve the existing behavior that all backends will log in the same time
zone, create a new GUC variable log_timezone that can only be changed on a
system-wide basis, and reference log-related calculations to that zone instead
of the TimeZone variable.
This fixes the issue reported by Hiroshi Saito that timestamps printed by
xlog.c startup could be improperly localized on Windows. We still need a
simpler patch for that problem in the back branches, however.
so that we will be able to create a cookie for all processes for CSVlogs.
It is set wherever MyProcPid is set. Take the opportunity to remove the now
unnecessary session-only restriction on the %s and %c escapes in log_line_prefix.
before reporting a transaction committed. Data consistency is still
guaranteed (unlike setting fsync = off), but a crash may lose the effects
of the last few transactions. Patch by Simon, some editorialization by Tom.
with the recent patch to log temp file sizes at removal time. Doesn't seem
worth fixing since it's unused.
In passing, make a few elog messages conform to the message style guide.
named pg_toast_temp_nnn, alongside the pg_temp_nnn schemas used for the temp
tables themselves. This allows low-level code such as the relcache to
recognize that these tables are indeed temporary, which enables various
optimizations such as not WAL-logging changes and using local rather than
shared buffers for access. Aside from obvious performance benefits, this
provides a solution to bug #3483, in which other backends unexpectedly held
open file references to temporary tables. The scheme preserves the property
that TOAST tables are not in any schema that's normally in the search path,
so they don't conflict with user table names.
initdb forced because of changes in system view definitions.
and fsync WAL at convenient intervals. For the moment it just tries to
offload this work from backends, but soon it will be responsible for
guaranteeing a maximum delay before asynchronously-committed transactions
will be flushed to disk.
This is a portion of Simon Riggs' async-commit patch, committed to CVS
separately because a background WAL writer seems like it might be a good idea
independently of the async-commit feature. I rebased walwriter.c on
bgwriter.c because it seemed like a more appropriate way of handling signals;
while the startup/shutdown logic in postmaster.c is more like autovac because
we want walwriter to quit before we start the shutdown checkpoint.
against a Unix server, and Windows-specific server-side authentication
using SSPI "negotiate" method (Kerberos or NTLM).
Only builds properly with MSVC for now.
we don't know at that point which relation OID to tell pgstat to forget.
The code was passing the relfilenode, which is incorrect, and could possibly
cause some other relation's stats to be zeroed out. While we could try to
clean this up, it seems much simpler and more reliable to let the next
invocation of pgstat_vacuum_tabstat() fix things; which indeed is how it
worked before I introduced the buggy code into 8.1.3 and later :-(.
Problem noticed by Itagaki Takahiro, fix is per subsequent discussion.
PGconn. Invent a new libpq connection-status function,
PQconnectionUsedPassword() that returns true if the server
demanded a password during authentication, false otherwise.
This may be useful to clients in general, but is immediately
useful to help plug a privilege escalation path in dblink.
Per list discussion and design proposed by Tom Lane.
unwarranted liberties with int8 vs float8 values for these types.
Specifically, be sure to apply either hashint8 or hashfloat8 depending
on HAVE_INT64_TIMESTAMP. Per my gripe of even date.
Sequences and views could previously be renamed using ALTER TABLE, but
this was a repeated source of confusion for users. Update the docs,
and psql tab completion. Patch from David Fetter; various minor fixes
by myself.
This is a Linux kernel bug that apparently exists in every extant kernel
version: sometimes shmctl() will fail with EIDRM when EINVAL is correct.
We were assuming that EIDRM indicates a possible conflict with pre-existing
backends, and refusing to start the postmaster when this happens. Fortunately,
there does not seem to be any case where Linux can legitimately return EIDRM
(it doesn't track shmem segments in a way that would allow that), so we can
get away with just assuming that EIDRM means EINVAL on this platform.
Per reports from Michael Fuhr and Jon Lapham --- it's a bit surprising
we have not seen more reports, actually.