Get rid of the fundamentally indefensible assumption that "long long int"
exists and is exactly 64 bits wide on every platform Postgres runs on.
Instead let the configure script select the type to use for "pg_int64".
This is a bit of a pain in the rear since we do not want to pollute client
namespace with all the random symbols that pg_config.h defines; instead
we have to create a separate generated header file, "pg_config_ext.h".
But now that the infrastructure is there, we might have the ability to
add some other stuff that's long been wanting in this area.
4TB large objects (standard 8KB BLCKSZ case). For this purpose new
libpq API lo_lseek64, lo_tell64 and lo_truncate64 are added. Also
corresponding new backend functions lo_lseek64, lo_tell64 and
lo_truncate64 are added. inv_api.c is changed to handle 64-bit
offsets.
Patch contributed by Nozomi Anzai (backend side) and Yugo Nagata
(frontend side, docs, regression tests and example program). Reviewed
by Kohei Kaigai. Committed by Tatsuo Ishii with minor editings.
The regular backend's main loop handles signal handling and error recovery
better than the current WAL sender command loop does. For example, if the
client hangs and a SIGTERM is received before starting streaming, the
walsender will now terminate immediately, rather than hang until the
connection times out.
Per discussion, schema-element subcommands are not allowed together with
this option, since it's not very obvious what should happen to the element
objects.
Fabrízio de Royes Mello
Remove duplicate implementation of catalog munging and miscellaneous
privilege and consistency checks. Instead rely on already existing data
in objectaddress.c to do the work.
Author: KaiGai Kohei
Tweaked by me
Reviewed by Robert Haas
Instead of having each object type implement the catalog munging
independently, centralize knowledge about how to do it and expand the
existing table in objectaddress.c with enough data about each object
type to support this operation.
Author: KaiGai Kohei
Tweaks by me
Reviewed by Robert Haas
This is just refactoring, to make the functions accessible outside xlog.c.
A followup patch will make use of that, to allow fetching timeline history
files over streaming replication.
On reflection (especially after noticing how many buildfarm critters have
__builtin_types_compatible_p but not _Static_assert), it seems like we
ought to try a bit harder to make these macros do something everywhere.
The initial cut at it would have been no help to code that is compiled only
on platforms without _Static_assert, for instance; and in any case not all
our contributors do their initial coding on the latest gcc version.
Some googling about static assertions turns up quite a bit of prior art
for making it work in compilers that lack _Static_assert. The method
that seems closest to our needs involves defining a struct with a bit-field
that has negative width if the assertion condition fails. There seems no
reliable way to get the error message string to be output, but throwing a
compile error with a confusing message is better than missing the problem
altogether.
In the same spirit, if we don't have __builtin_types_compatible_p we can at
least insist that the variable have the same width as the type. This won't
catch errors such as "wrong pointer type", but it's far better than
nothing.
In addition to changing the macro definitions, adjust a
compile-time-constant Assert in contrib/hstore to use StaticAssertStmt,
so we can get some buildfarm coverage on whether that macro behaves sanely
or not. There's surely more places that could be converted, but this is
the first one I came across.
Currently, the macros only work with fairly recent gcc versions, but there
is room to expand them to other compilers that have comparable features.
Heavily revised and autoconfiscated version of a patch by Andres Freund.
This fixes another error in commit 9e8da0f75731aaa7605cf4656c21ea09e84d2eb1.
I neglected to make the mark/restore functionality save and restore the
current set of array key values, which led to strange behavior if an
IndexScan with ScalarArrayOpExpr quals was used as the inner side of a
mergejoin. Per bug #7570 from Melese Tesfaye.
This allows easily splitting configuration into many files, deployed in a
directory.
Magnus Hagander, Greg Smith, Selena Deckelmann, reviewed by Noah Misch.
Produce a NOTICE when the label already exists, for consistency with other
CREATE IF NOT EXISTS commands. Also, fix the code so it produces something
more user-friendly than an index violation when the label already exists.
This not incidentally enables making a regression test that the previous
patch didn't make for fear of exposing an unpredictable OID in the results.
Also some wordsmithing on the documentation.
If the label is already in the enum the statement becomes a no-op.
This will reduce the pain that comes from our not allowing this
operation inside a transaction block.
Andrew Dunstan, reviewed by Tom Lane and Magnus Hagander.
The previous scheme had bugs in some corner cases involving tables that had
been renamed since a view was made. This could result in dumped views that
failed to reload or reloaded incorrectly, as seen in bug #7553 from Lloyd
Albin, as well as in some pgsql-hackers discussion back in January. Also,
its behavior for printing EXPLAIN plans was sometimes confusing because of
willingness to use the same alias for multiple RTEs (it was Ashutosh
Bapat's complaint about that aspect that started the January thread).
To fix, ensure that each RTE in the query has a unique unqualified alias,
by modifying the alias if necessary (we add "_" and digits as needed to
create a non-conflicting name). Then we can just print its variables with
that alias, avoiding the confusing and bug-prone scheme of sometimes
schema-qualifying variable names. In EXPLAIN, it proves to be expedient to
take the further step of only assigning such aliases to RTEs that are
actually referenced in the query, since the planner has a habit of
generating extra RTEs with the same alias in situations such as
inheritance-tree expansion.
Although this fixes a bug of very long standing, I'm hesitant to back-patch
such a noticeable behavioral change. My experiments while creating a
regression test convinced me that actually incorrect output (as opposed to
confusing output) occurs only in very narrow cases, which is backed up by
the lack of previous complaints from the field. So we may be better off
living with it in released branches; and in any case it'd be smart to let
this ripen awhile in HEAD before we consider back-patching it.
Similar changes were done to pg_hba.conf earlier already, this commit makes
pg_ident.conf to behave the same as pg_hba.conf.
This has two user-visible effects. First, if pg_ident.conf contains multiple
errors, the whole file is parsed at postmaster startup time and all the
errors are immediately reported. Before this patch, the file was parsed and
the errors were reported only when someone tries to connect using an
authentication method that uses the file, and the parsing stopped on first
error. Second, if you SIGHUP to reload the config files, and the new
pg_ident.conf file contains an error, the error is logged but the old file
stays in effect.
Also, regular expressions in pg_ident.conf are now compiled only once when
the file is loaded, rather than every time the a user is authenticated. That
should speed up authentication if you have a lot of regexps in the file.
Amit Kapila
The planner previously assumed that parameter Vars having the same absolute
query level, varno, and varattno could safely be assigned the same runtime
PARAM_EXEC slot, even though they might be different Vars appearing in
different subqueries. This was (probably) safe before the introduction of
CTEs, but the lazy-evalution mechanism used for CTEs means that a CTE can
be executed during execution of some other subquery, causing the lifespan
of Params at the same syntactic nesting level as the CTE to overlap with
use of the same slots inside the CTE. In 9.1 we created additional hazards
by using the same parameter-assignment technology for nestloop inner scan
parameters, but it was broken before that, as illustrated by the added
regression test.
To fix, restructure the planner's management of PlannerParamItems so that
items having different semantic lifespans are kept rigorously separated.
This will probably result in complex queries using more runtime PARAM_EXEC
slots than before, but the slots are cheap enough that this hardly matters.
Also, stop generating PlannerParamItems containing Params for subquery
outputs: all we really need to do is reserve the PARAM_EXEC slot number,
and that now only takes incrementing a counter. The planning code is
simpler and probably faster than before, as well as being more correct.
Per report from Vik Reykja.
These changes will mostly also need to be made in the back branches, but
I'm going to hold off on that until after 9.2.0 wraps.
The cascading replication code assumed that the current RecoveryTargetTLI
never changes, but that's not true with recovery_target_timeline='latest'.
The obvious upshot of that is that RecoveryTargetTLI in shared memory needs
to be protected by a lock. A less obvious consequence is that when a
cascading standby is connected, and the standby switches to a new target
timeline after scanning the archive, it will continue to stream WAL to the
cascading standby, but from a wrong file, ie. the file of the previous
timeline. For example, if the standby is currently streaming from the middle
of file 000000010000000000000005, and the timeline changes, the standby
will continue to stream from that file. However, the WAL on the new
timeline is in file 000000020000000000000005, so the standby sends garbage
from 000000010000000000000005 to the cascading standby, instead of the
correct WAL from file 000000020000000000000005.
This also fixes a related bug where a partial WAL segment is restored from
the archive and streamed to a cascading standby. The code assumed that when
a WAL segment is copied from the archive, it can immediately be fully
streamed to a cascading standby. However, if the segment is only partially
filled, ie. has the right size, but only N first bytes contain valid WAL,
that's not safe. That can happen if a partial WAL segment is manually copied
to the archive, or if a partial WAL segment is archived because a server is
started up on a new timeline within that segment. The cascading standby will
get confused if the WAL it received is not valid, and will get stuck until
it's restarted. This patch fixes that problem by not allowing WAL restored
from the archive to be streamed to a cascading standby until it's been
replayed, and thus validated.
We can detect whether the planner top level is going to care at all about
cheap startup cost (it will only do so if query_planner's tuple_fraction
argument is greater than zero). If it isn't, we might as well discard
paths immediately whose only advantage over others is cheap startup cost.
This turns out to get rid of quite a lot of paths in complex queries ---
I saw planner runtime reduction of more than a third on one large query.
Since add_path isn't currently passed the PlannerInfo "root", the easiest
way to tell it whether to do this was to add a bool flag to RelOptInfo.
That's a bit redundant, since all relations in a given query level will
have the same setting. But in the future it's possible that we'd refine
the control decision to work on a per-relation basis, so this seems like
a good arrangement anyway.
Per my suggestion of a few months ago.
We previously supposed that any given platform would supply both or neither
of these functions, so that one configure test would be sufficient. It now
appears that at least on AIX this is not the case ... which is likely an
AIX bug, but nonetheless we need to cope with it. So use separate tests.
Per bug #6758; thanks to Andrew Hastie for doing the followup testing
needed to confirm what was happening.
Backpatch to 9.1, where we began using these functions.
This reduces unnecessary exposure of other headers through htup.h, which
is very widely included by many files.
I have chosen to move the function prototypes to the new file as well,
because that means htup.h no longer needs to include tupdesc.h. In
itself this doesn't have much effect in indirect inclusion of tupdesc.h
throughout the tree, because it's also required by execnodes.h; but it's
something to explore in the future, and it seemed best to do the htup.h
change now while I'm busy with it.
Given a query such as
SELECT * FROM foo JOIN LATERAL (SELECT foo.var1) ss(x) ON ss.x = foo.var2
the existence of the join clause "ss.x = foo.var2" encourages indxpath.c to
build a parameterized path for foo using any index available for foo.var2.
This is completely useless activity, though, since foo has got to be on the
outside not the inside of any nestloop join with ss. It's reasonably
inexpensive to add tests that prevent creation of such paths, so let's do
that.
In the initial cut at LATERAL, I kept the rule that cheapest_total_path
was always unparameterized, which meant it had to be NULL if the relation
has no unparameterized paths. It turns out to work much more nicely if
we always have *some* path nominated as cheapest-total for each relation.
In particular, let's still say it's the cheapest unparameterized path if
there is one; if not, take the cheapest-total-cost path among those of
the minimum available parameterization. (The first rule is actually
a special case of the second.)
This allows reversion of some temporary lobotomizations I'd put in place.
In particular, the planner can now consider hash and merge joins for
joins below a parameter-supplying nestloop, even if there aren't any
unparameterized paths available. This should bring planning of
LATERAL-containing queries to the same level as queries not using that
feature.
Along the way, simplify management of parameterized paths in add_path()
and friends. In the original coding for parameterized paths in 9.2,
I tried to minimize the logic changes in add_path(), so it just treated
parameterization as yet another dimension of comparison for paths.
We later made it ignore pathkeys (sort ordering) of parameterized paths,
on the grounds that ordering isn't a useful property for the path on the
inside of a nestloop, so we might as well get rid of useless parameterized
paths as quickly as possible. But we didn't take that reasoning as far as
we should have. Startup cost isn't a useful property inside a nestloop
either, so add_path() ought to discount startup cost of parameterized paths
as well. Having done that, the secondary sorting I'd implemented (in
add_parameterized_path) is no longer needed --- any parameterized path that
survives add_path() at all is worth considering at higher levels. So this
should be a bit faster as well as simpler.
The heapam XLog functions are used by other modules, not all of which
are interested in the rest of the heapam API. With this, we let them
get just the XLog stuff in which they are interested and not pollute
them with unrelated includes.
Also, since heapam.h no longer requires xlog.h, many files that do
include heapam.h no longer get xlog.h automatically, including a few
headers. This is useful because heapam.h is getting pulled in by
execnodes.h, which is in turn included by a lot of files.
This enables selectivity estimation of the <<, >>, &<, &> and && operators,
as well as the normal inequality operators: <, <=, >=, >. "range @> element"
is also supported, but the range-variant @> and <@ operators are not,
because they cannot be sensibly estimated with lower and upper bound
histograms alone. We would need to make some assumption about the lengths of
the ranges for that. Alexander's patch included a separate histogram of
lengths for that, but I left that out of the patch for simplicity. Hopefully
that will be added as a followup patch.
The fraction of empty ranges is also calculated and used in estimation.
Alexander Korotkov, heavily modified by me.
This patch takes care of a number of problems having to do with failure
to choose valid join orders and incorrect handling of lateral references
pulled up from subqueries. Notable changes:
* Add a LateralJoinInfo data structure similar to SpecialJoinInfo, to
represent join ordering constraints created by lateral references.
(I first considered extending the SpecialJoinInfo structure, but the
semantics are different enough that a separate data structure seems
better.) Extend join_is_legal() and related functions to prevent trying
to form unworkable joins, and to ensure that we will consider joins that
satisfy lateral references even if the joins would be clauseless.
* Fill in the infrastructure needed for the last few types of relation scan
paths to support parameterization. We'd have wanted this eventually
anyway, but it is necessary now because a relation that gets pulled up out
of a UNION ALL subquery may acquire a reltargetlist containing lateral
references, meaning that its paths *have* to be parameterized whether or
not we have any code that can push join quals down into the scan.
* Compute data about lateral references early in query_planner(), and save
in RelOptInfo nodes, to avoid repetitive calculations later.
* Assorted corner-case bug fixes.
There's probably still some bugs left, but this is a lot closer to being
real than it was before.
When elevel >= ERROR, we add an abort() call to the ereport() macro to
give the compiler a hint that the ereport() expansion will not return,
but the abort() isn't actually reached because the longjmp happens in
errfinish().
Because the effect of ereport() varies with the elevel, we cannot use
standard compiler attributes such as noreturn for this.
We had put a test for libxml2's xmlStructuredErrorContext variable in
configure, but of course that doesn't work on Windows builds. The next
best alternative seems to be to test the LIBXML_VERSION symbol provided
by xmlversion.h.
Per report from Talha Bin Rizwan, though this fixes it in a different way
than his proposed patch.
The implementation is a quad-tree, largely copied from the quad-tree
implementation for points. The lower and upper bound of ranges are the 2d
coordinates, with some extra code to handle empty ranges.
I left out the support for adjacent operator, -|-, from the original patch.
Not because there was necessarily anything wrong with it, but it was more
complicated than the other operators, and I only have limited time for
reviewing. That will follow as a separate patch.
Alexander Korotkov, reviewed by Jeff Davis and me.
We use a hash table to track the parents of inner pages, but when inserting
to a leaf page, the caller of gistbufferinginserttuples() must pass a
correct block number of the leaf's parent page. Before gistProcessItup()
descends to a child page, it checks if the downlink needs to be adjusted to
accommodate the new tuple, and updates the downlink if necessary. However,
updating the downlink might require splitting the page, which might move the
downlink to a page to the right. gistProcessItup() doesn't realize that, so
when it descends to the leaf page, it might pass an out-of-date parent block
number as a result. Fix that by returning the block a tuple was inserted to
from gistbufferinginserttuples().
This fixes the bug reported by Zdeněk Jílovec.
Re-allow subquery pullup for LATERAL subqueries, except when the subquery
is below an outer join and contains lateral references to relations outside
that outer join. If we pull up in such a case, we risk introducing lateral
cross-references into outer joins' ON quals, which is something the code is
entirely unprepared to cope with right now; and I'm not sure it'll ever be
worth coping with.
Support lateral refs in VALUES (this seems to be the only additional path
type that needs such support as a consequence of re-allowing subquery
pullup).
Put in a slightly hacky fix for joinpath.c's refusal to consider
parameterized join paths even when there cannot be any unparameterized
ones. This was causing "could not devise a query plan for the given query"
failures in queries involving more than two FROM items.
Put in an even more hacky fix for distribute_qual_to_rels() being unhappy
with join quals that contain references to rels outside their syntactic
scope; which is to say, disable that test altogether. Need to think about
how to preserve some sort of debugging cross-check here, while not
expending more cycles than befits a debugging cross-check.
This command generated new pg_depend entries linking the index to the
constraint and the constraint to the table, which match the entries made
when a unique or primary key constraint is built de novo. However, it did
not bother to get rid of the entries linking the index directly to the
table. We had considered the issue when the ADD CONSTRAINT USING INDEX
patch was written, and concluded that we didn't need to get rid of the
extra entries. But this is wrong: ALTER COLUMN TYPE wasn't expecting such
redundant dependencies to exist, as reported by Hubert Depesz Lubaczewski.
On reflection it seems rather likely to break other things as well, since
there are many bits of code that crawl pg_depend for one purpose or
another, and most of them are pretty naive about what relationships they're
expecting to find. Fortunately it's not that hard to get rid of the extra
dependency entries, so let's do that.
Back-patch to 9.1, where ALTER TABLE ADD CONSTRAINT USING INDEX was added.
Replace unix_socket_directory with unix_socket_directories, which is a list
of socket directories, and adjust postmaster's code to allow zero or more
Unix-domain sockets to be created.
This is mostly a straightforward change, but since the Unix sockets ought
to be created after the TCP/IP sockets for safety reasons (better chance
of detecting a port number conflict), AddToDataDirLockFile needs to be
fixed to support out-of-order updates of data directory lockfile lines.
That's a change that had been foreseen to be necessary someday anyway.
Honza Horak, reviewed and revised by Tom Lane
Formerly we relied on checking after-the-fact to see if an expression
contained aggregates, window functions, or sub-selects when it shouldn't.
This is grotty, easily forgotten (indeed, we had forgotten to teach
DefineIndex about rejecting window functions), and none too efficient
since it requires extra traversals of the parse tree. To improve matters,
define an enum type that classifies all SQL sub-expressions, store it in
ParseState to show what kind of expression we are currently parsing, and
make transformAggregateCall, transformWindowFuncCall, and transformSubLink
check the expression type and throw error if the type indicates the
construct is disallowed. This allows removal of a large number of ad-hoc
checks scattered around the code base. The enum type is sufficiently
fine-grained that we can still produce error messages of at least the
same specificity as before.
Bringing these error checks together revealed that we'd been none too
consistent about phrasing of the error messages, so standardize the wording
a bit.
Also, rewrite checking of aggregate arguments so that it requires only one
traversal of the arguments, rather than up to three as before.
In passing, clean up some more comments left over from add_missing_from
support, and annotate some tests that I think are dead code now that that's
gone. (I didn't risk actually removing said dead code, though.)
Now that we are storing structs in these lists, the distinction between
the two lists can be represented with a couple of extra flags while using
only a single list. This simplifies the code and should save a little
bit of palloc traffic, since the majority of RTEs are represented in both
lists anyway.
This patch implements the standard syntax of LATERAL attached to a
sub-SELECT in FROM, and also allows LATERAL attached to a function in FROM,
since set-returning function calls are expected to be one of the principal
use-cases.
The main change here is a rewrite of the mechanism for keeping track of
which relations are visible for column references while the FROM clause is
being scanned. The parser "namespace" lists are no longer lists of bare
RTEs, but are lists of ParseNamespaceItem structs, which carry an RTE
pointer as well as some visibility-controlling flags. Aside from
supporting LATERAL correctly, this lets us get rid of the ancient hacks
that required rechecking subqueries and JOIN/ON and function-in-FROM
expressions for invalid references after they were initially parsed.
Invalid column references are now always correctly detected on sight.
In passing, remove assorted parser error checks that are now dead code by
virtue of our having gotten rid of add_missing_from, as well as some
comments that are obsolete for the same reason. (It was mainly
add_missing_from that caused so much fudging here in the first place.)
The planner support for this feature is very minimal, and will be improved
in future patches. It works well enough for testing purposes, though.
catversion bump forced due to new field in RangeTblEntry.
The correct test for whether a redirection tuple is removable is whether
tuple's xid < RecentGlobalXmin, not OldestXmin; the previous coding
failed to protect index searches being done in concurrent transactions that
have no XID. This mirrors the recent fix in btree's page recycling logic
made in commit d3abbbebe52eb1e59e621c880ad57df9d40d13f2.
Also, WAL-log the newest XID of any removed redirection tuple on an index
page, and apply ResolveRecoveryConflictWithSnapshot during InHotStandby WAL
replay. This protects against concurrent Hot Standby transactions possibly
needing to see the redirection tuple(s).
Per my query of 2012-03-12 and subsequent discussion.