(a/k/a SELECT INTO). Instead, flush and fsync the whole relation before
committing. We do still need the WAL log when PITR is active, however.
Simon Riggs and Tom Lane.
work if either of the join relations are empty. The logic is:
(1) if the inner relation's startup cost is less than the outer
relation's startup cost and this is not an outer join, read
a single tuple from the inner relation via ExecHash()
- if NULL, we're done
(2) read a single tuple from the outer relation
- if NULL, we're done
(3) build the hash table on the inner relation
- if hash table is empty and this is not an outer join,
we're done
(4) otherwise, do hash join as usual
The implementation uses the new MultiExecProcNode API, per a
suggestion from Tom: invoking ExecHash() now produces the first
tuple from the Hash node's child node, whereas MultiExecHash()
builds the hash table.
I had to put in a bit of a kludge to get the row count returned
for EXPLAIN ANALYZE to be correct: since ExecHash() is invoked to
return a tuple, and then MultiExecHash() is invoked, we would
return one too many tuples to EXPLAIN ANALYZE. I hacked around
this by just manually detecting this situation and subtracting 1
from the EXPLAIN ANALYZE row count.
nonconsecutive columns of a multicolumn index, as per discussion around
mid-May (pghackers thread "Best way to scan on-disk bitmaps"). This
turns out to require only minimal changes in btree, and so far as I can
see none at all in GiST. btcostestimate did need some work, but its
original assumption that index selectivity == heap selectivity was
quite bogus even before this.
to a subquery if the outer query is simple enough that the LIMIT can
be reflected directly to the subquery. This didn't use to be very
interesting, because a subquery that couldn't have been flattened into
the upper query was usually not going to be very responsive to
tuple_fraction anyway. But with new code that allows UNION ALL subqueries
to pay attention to tuple_fraction, this is useful to do. In particular
this lets the optimization occur when the UNION ALL is directly inside
a view.
of a relation in a flat 'joininfo' list. The former arrangement grouped
the join clauses according to the set of unjoined relids used in each;
however, profiling on test cases involving lots of joins proves that
that data structure is a net loss. It takes more time to group the
join clauses together than is saved by avoiding duplicate tests later.
It doesn't help any that there are usually not more than one or two
clauses per group ...
other_rel_list with a single array indexed by rangetable index.
This reduces find_base_rel from O(N) to O(1) without any real penalty.
While find_base_rel isn't one of the major bottlenecks in any profile
I've seen so far, it was starting to creep up on the radar screen
for complex queries --- so might as well fix it.
a new PlannerInfo struct, which is passed around instead of the bare
Query in all the planning code. This commit is essentially just a
code-beautification exercise, but it does open the door to making
larger changes to the planner data structures without having to muck
with the widely-known Query struct.
representation as the jointree) with two lists of RTEs, one showing
the RTEs accessible by qualified names, and the other showing the RTEs
accessible by unqualified names. I think this is conceptually simpler
than what we did before, and it's sure a whole lot easier to search.
This seems to eliminate the parse-time bottleneck for deeply nested
JOIN structures that was exhibited by phil@vodafone.
performance problem pointed out by phil@vodafone: to wit, we were
spending O(N^2) time to check dropped-ness in an N-deep join tree,
even in the case where the tree was freshly constructed and couldn't
possibly mention any dropped columns. Instead of recursing in
get_rte_attribute_is_dropped(), change the data structure definition:
the joinaliasvars list of a JOIN RTE must have a NULL Const instead
of a Var at any position that references a now-dropped column. This
costs nothing during normal parse-rewrite-plan path, and instead we
have a linear-time update to make when loading a stored rule that
might contain now-dropped columns. While at it, move the responsibility
for acquring locks on relations referenced by rules into this separate
function (which I therefore chose to call AcquireRewriteLocks).
This saves effort --- namely, duplicated lock grabs in parser and rewriter
--- in the normal path at a cost of one extra non-locked heap_open()
in the stored-rule path; seems a good tradeoff. A fringe benefit is
that it is now *much* clearer that we acquire lock on relations referenced
in rules before we make any rewriter decisions based on their properties.
(I don't know of any bug of that ilk, but it wasn't exactly clear before.)
When one side of the join has a NULL, we don't want to uselessly try
to match it against every remaining tuple of the other side. While
at it, rewrite the comparison machinery to avoid multiple evaluations
of the left and right input expressions and to use a btree comparator
where available, instead of double operator calls. Also revise the
state machine to eliminate redundant comparisons and hopefully make it
more readable too.
startup to end, rather than re-opening it in each MultiExecBitmapIndexScan
call. I had foolishly thought that opening/closing wouldn't be much
more expensive than a rescan call, but that was sheer brain fade.
This seems to fix about half of the performance lossage reported by
Sergey Koposov. I'm still not sure where the other half went.
to eliminate unnecessary deadlocks. This commit adds SELECT ... FOR SHARE
paralleling SELECT ... FOR UPDATE. The implementation uses a new SLRU
data structure (managed much like pg_subtrans) to represent multiple-
transaction-ID sets. When more than one transaction is holding a shared
lock on a particular row, we create a MultiXactId representing that set
of transactions and store its ID in the row's XMAX. This scheme allows
an effectively unlimited number of row locks, just as we did before,
while not costing any extra overhead except when a shared lock actually
has to be shared. Still TODO: use the regular lock manager to control
the grant order when multiple backends are waiting for a row lock.
Alvaro Herrera and Tom Lane.
node, as this behavior is now better done as a bitmap OR indexscan.
This allows considerable simplification in nodeIndexscan.c itself as
well as several planner modules concerned with indexscan plan generation.
Also we can improve the sharing of code between regular and bitmap
indexscans, since they are now working with nigh-identical Plan nodes.
but just to open and close it during MultiExecBitmapIndexScan. This
avoids acquiring duplicate resources (eg, multiple locks on the same
relation) in a tree with many bitmap scans. Also, don't bother to
lock the parent heap at all here, since we must be underneath a
BitmapHeapScan node that will be holding a suitable lock.
but the code is basically working. Along the way, rewrite the entire
approach to processing OR index conditions, and make it work in join
cases for the first time ever. orindxpath.c is now basically obsolete,
but I left it in for the time being to allow easy comparison testing
against the old implementation.
logic operations during planning. Seems cleaner to create two new Path
node types, instead --- this avoids duplication of cost-estimation code.
Also, create an enable_bitmapscan GUC parameter to control use of bitmap
plans.
scans, using in-memory tuple ID bitmaps as the intermediary. The planner
frontend (path creation and cost estimation) is not there yet, so none
of this code can be executed. I have tested it using some hacked planner
code that is far too ugly to see the light of day, however. Committing
now so that the bulk of the infrastructure changes go in before the tree
drifts under me.
in UPDATE. We also now issue a NOTICE if a query has _any_ implicit
range table entries -- in the past, we would only warn about implicit
RTEs in SELECTs with at least one explicit RTE.
As a result of the warning change, 25 of the regression tests had to
be updated. I also took the opportunity to remove some bogus whitespace
differences between some of the float4 and float8 variants. I believe
I have correctly updated all the platform-specific variants, but let
me know if that's not the case.
Original patch for DELETE ... USING from Euler Taveira de Oliveira,
reworked by Neil Conway.
few palloc's. I also chose to eliminate the restype and restypmod fields
entirely, since they are redundant with information stored in the node's
contained expression; re-examining the expression at need seems simpler
and more reliable than trying to keep restype/restypmod up to date.
initdb forced due to change in contents of stored rules.
implement any new feature, it just pushes the 'not implemented' error
message deeper into the backend. I also tweaked the grammar to accept
Oracle-ish parameter syntax (parameter name first), as well as the
SQL99 standard syntax (parameter mode first), since it was easy and
people will doubtless try to use both anyway.
structs. There are many places in the planner where we were passing
both a rel and an index to subroutines, and now need only pass the
index struct. Notationally simpler, and perhaps a tad faster.
executing a statement that fires triggers. Formerly this time was
included in "Total runtime" but not otherwise accounted for.
As a side benefit, we avoid re-opening relations when firing non-deferred
AFTER triggers, because the trigger code can re-use the main executor's
ResultRelInfo data structure.
of tuples when passing data up through multiple plan nodes. A slot can now
hold either a normal "physical" HeapTuple, or a "virtual" tuple consisting
of Datum/isnull arrays. Upper plan levels can usually just copy the Datum
arrays, avoiding heap_formtuple() and possible subsequent nocachegetattr()
calls to extract the data again. This work extends Atsushi Ogawa's earlier
patch, which provided the key idea of adding Datum arrays to TupleTableSlots.
(I believe however that something like this was foreseen way back in Berkeley
days --- see the old comment on ExecProject.) A test case involving many
levels of join of fairly wide tables (about 80 columns altogether) showed
about 3x overall speedup, though simple queries will probably not be
helped very much.
I have also duplicated some code in heaptuple.c in order to provide versions
of heap_formtuple and friends that use "bool" arrays to indicate null
attributes, instead of the old convention of "char" arrays containing either
'n' or ' '. This provides a better match to the convention used by
ExecEvalExpr. While I have not made a concerted effort to get rid of uses
of the old routines, I think they should be deprecated and eventually removed.
whether or not it is a security definer. Changing a function's strictness
is required by SQL2003, and the other capabilities make sense. Also, allow
an optional RESTRICT noise word to be specified, for SQL conformance.
Some trivial regression tests added and the documentation has been
updated.
Formerly, if such a clause contained no aggregate functions we mistakenly
treated it as equivalent to WHERE. Per spec it must cause the query to
be treated as a grouped query of a single group, the same as appearance
of aggregate functions would do. Also, the HAVING filter must execute
after aggregate function computation even if it itself contains no
aggregate functions.
on-the-fly, and thereby avoid blowing out memory when the planner has
underestimated the hash table size. Hash join will now obey the
work_mem limit with some faithfulness. Per my recent proposal
(hash aggregate part isn't done yet though).
command. This is useful because we can allow truncation of tables
referenced by foreign keys, so long as the referencing table is
truncated in the same command.
Alvaro Herrera
Also performed an initial run through of upgrading our Copyright date to
extend to 2005 ... first run here was very simple ... change everything
where: grep 1996-2004 && the word 'Copyright' ... scanned through the
generated list with 'less' first, and after, to make sure that I only
picked up the right entries ...
of an inheritance child table is binary-compatible with the rowtype of
its parent, invent an expression node type that does the conversion
correctly. Fixes the new bug exhibited by Kris Shannon as well as a
lot of old bugs that would only show up when using multiple inheritance
or after altering the parent table.
a relation's number of blocks, rather than the possibly-obsolete value
in pg_class.relpages. Scale the value in pg_class.reltuples correspondingly
to arrive at a hopefully more accurate number of rows. When pg_class
contains 0/0, estimate a tuple width from the column datatypes and divide
that into current file size to estimate number of rows. This improved
methodology allows us to jettison the ancient hacks that put bogus default
values into pg_class when a table is first created. Also, per a suggestion
from Simon, make VACUUM (but not VACUUM FULL or ANALYZE) adjust the value
it puts into pg_class.reltuples to try to represent the mean tuple density
instead of the minimal density that actually prevails just after VACUUM.
These changes alter the plans selected for certain regression tests, so
update the expected files accordingly. (I removed join_1.out because
it's not clear if it still applies; we can add back any variant versions
as they are shown to be needed.)
clause implicitly whenever one is not given explicitly. Remove concept
of a schema having an associated tablespace, and simplify the rules for
selecting a default tablespace for a table or index. It's now just
(a) explicit TABLESPACE clause; (b) default_tablespace if that's not an
empty string; (c) database's default. This will allow pg_dump to use
SET commands instead of tablespace clauses to determine object locations
(but I didn't actually make it do so). All per recent discussions.
columns. The returned tuple needs to have appropriate NULL columns
inserted so that it actually matches the declared rowtype. It seemed
convenient to use a JunkFilter for this, so I made some cleanups and
simplifications in the JunkFilter code to allow it to support this
additional functionality. (That in turn exposed a latent bug in
nodeAppend.c, which is that it was returning a tuple slot whose
descriptor didn't match its data.) Also, move check_sql_fn_retval
out of pg_proc.c and into functions.c, where it seems to more naturally
belong.
(1) Replace while loop with the new forboth() construct in
parser/analyze.c
(2) Replace lcons() with lappend() in SearchCatCacheList(). Since these
now have the same performance, there is no reason to prefer lcons() in
this case, and using lappend() leads to cleaner code.
(3) Improve the name of the second parameter to for_each_cell()
presence of dropped columns. Document the already-presumed fact that
eref aliases in relation RTEs are supposed to have entries for dropped
columns; cause the user alias structs to have such entries too, so that
there's always a one-to-one mapping to the underlying physical attnums.
Adjust expandRTE() and related code to handle the case where a column
that is part of a JOIN has been dropped. Generalize expandRTE()'s API
so that it can be used in a couple of places that formerly rolled their
own implementation of the same logic. Fix ruleutils.c to suppress
display of aliases for columns that were dropped since the rule was made.
to the physical layout of the rowtype, ie, there are dummy arguments
corresponding to any dropped columns in the rowtype. We formerly had a
couple of places that did it this way and several others that did not.
Fixes Gaetano Mendola's "cache lookup failed for type 0" bug of 5-Aug.