Commit Graph

387 Commits

Author SHA1 Message Date
f5bc74192d Make GEQO's planning deterministic by having it start from a predictable
random number seed each time.  This is how it used to work years ago, but
we got rid of the seed reset because it was resetting the main random()
sequence and thus having undesirable effects on the rest of the system.
To fix, establish a private random number state for each execution of
geqo(), and initialize the state using the new GUC variable geqo_seed.
People who want to experiment with different random searches can do so
by changing geqo_seed, but you'll always get the same plan for the same
value of geqo_seed (if holding all other planner inputs constant, of course).

The new state is kept in PlannerInfo by adding a "void *" field reserved
for use by join_search hooks.  Most of the rather bulky code changes in
this commit are just arranging to pass PlannerInfo around to all the GEQO
functions (many of which formerly didn't receive it).

Andres Freund, with some editorialization by Tom
2009-07-16 20:55:44 +00:00
de160e2c00 Make backend header files C++ safe
This alters various incidental uses of C++ key words to use other similar
identifiers, so that a C++ compiler won't choke outright.  You still
(probably) need extern "C" { }; around the inclusion of backend headers.

based on a patch by Kurt Harriman <harriman@acm.org>

Also add a script cpluspluscheck to check for C++ compatibility in the
future.  As of right now, this passes without error for me.
2009-07-16 06:33:46 +00:00
9b27eab71c Fix set_append_rel_pathlist() to deal intelligently with cases where
substituting a child rel's output expressions into the appendrel's restriction
clauses yields a pseudoconstant restriction.  We might be able to skip scanning
that child rel entirely (if we get constant FALSE), or generate a one-time
filter.  8.3 more or less accidentally generated plans that weren't completely
stupid in these cases, but that was only because an extra recursive level of
subquery_planner() always occurred and allowed const-simplification to happen.
8.4's ability to pull up appendrel members with non-Var outputs exposes the
fact that we need to work harder here.  Per gripe from Sergey Burladyan.
2009-07-06 18:26:30 +00:00
d747140279 8.4 pgindent run, with new combined Linux/FreeBSD/MinGW typedef list
provided by Andrew.
2009-06-11 14:49:15 +00:00
0ada559187 Do some minor code refactoring in preparation for changing the APIs of
find_inheritance_children() and find_all_inheritors().  I got annoyed that
these are buried inside the planner but mostly used elsewhere.  So, create
a new file catalog/pg_inherits.c and put them there, along with a couple
of other functions that search pg_inherits.

The code that modifies pg_inherits is (still) in tablecmds.c --- it's
kind of entangled with unrelated code that modifies pg_depend and other
stuff, so pulling it out seemed like a bigger change than I wanted to make
right now.  But this file provides a natural home for it if anyone ever
gets around to that.

This commit just moves code around; it doesn't change anything, except
I succumbed to the temptation to make a couple of trivial optimizations
in typeInheritsFrom().
2009-05-12 00:56:05 +00:00
8dcf18414b Fix cost_nestloop and cost_hashjoin to model the behavior of semi and anti
joins a bit better, ie, understand the differing cost functions for matched
and unmatched outer tuples.  There is more that could be done in cost_hashjoin
but this already helps a great deal.  Per discussions with Robert Haas.
2009-05-09 22:51:41 +00:00
c59d8dd44d Improve pull_up_subqueries logic so that it doesn't insert unnecessary
PlaceHolderVar nodes in join quals appearing in or below the lowest
outer join that could null the subquery being pulled up.  This improves
the planner's ability to recognize constant join quals, and probably
helps with detection of common sort keys (equivalence classes) as well.
2009-04-28 21:31:16 +00:00
1d97c19a0f Fix estimate_num_groups() to not fail on PlaceHolderVars, per report from
Stefan Kaltenbrunner.  The most reasonable behavior (at least for the near
term) seems to be to ignore the PlaceHolderVar and examine its argument
instead.  In support of this, change the API of pull_var_clause() to allow
callers to request recursion into PlaceHolderVars.  Currently
estimate_num_groups() is the only customer for that behavior, but where
there's one there may be others.
2009-04-19 19:46:33 +00:00
d7a6a04dc7 Fix planner to restore its previous level of intelligence about pushing
constants through full joins, as in

	select * from tenk1 a full join tenk1 b using (unique1)
	where unique1 = 42;

which should generate a fairly cheap plan where we apply the constraint
unique1 = 42 in each relation scan.  This had been broken by my patch of
2008-06-27, which is now reverted in favor of a more invasive but hopefully
less incorrect approach.  That patch was meant to prevent incorrect extraction
of OR'd indexclauses from OR conditions above an outer join.  To do that
correctly we need more information than the outerjoin_delay flag can provide,
so add a nullable_relids field to RestrictInfo that records exactly which
relations are nulled by outer joins that are underneath a particular qual
clause.  A side benefit is that we can make the test in create_or_index_quals
more specific: it is now smart enough to extract an OR'd indexclause into the
outer side of an outer join, even though it must not do so in the inner side.
The old coding couldn't distinguish these cases so it could not do either.
2009-04-16 20:42:16 +00:00
e549722a8b Get rid of the rather fuzzily defined FlattenedSubLink node type in favor of
making pull_up_sublinks() construct a full-blown JoinExpr tree representation
of IN/EXISTS SubLinks that it is able to convert to semi or anti joins.
This makes pull_up_sublinks() a shade more complex, but the gain in semantic
clarity is worth it.  I still have more to do in this area to address the
previously-discussed problems, but this commit in itself fixes at least one
bug in HEAD, as shown by added regression test case.
2009-02-25 03:30:38 +00:00
d04db37072 Arrange for function default arguments to be processed properly in expressions
that are set up for execution with ExecPrepareExpr rather than going through
the full planner process.  By introducing an explicit notion of "expression
planning", this patch also lays a bit of groundwork for maybe someday
allowing sub-selects in standalone expressions.
2009-01-09 15:46:11 +00:00
445ce15702 Create a third option named "partition" for constraint_exclusion, and make it
the default.  This setting enables constraint exclusion checks only for
appendrel members (ie, inheritance children and UNION ALL arms), which are
the cases in which constraint exclusion is most likely to be useful.  Avoiding
the overhead for simple queries that are unlikely to benefit should bring
the cost down to the point where this is a reasonable default setting.
Per today's discussion.
2009-01-07 22:40:49 +00:00
511db38ace Update copyright for 2009. 2009-01-01 17:24:05 +00:00
8e8854daa2 Add some basic support for window frame clauses to the window-functions
patch.  This includes the ability to force the frame to cover the whole
partition, and the ability to make the frame end exactly on the current row
rather than its last ORDER BY peer.  Supporting any more of the full SQL
frame-clause syntax will require nontrivial hacking on the window aggregate
code, so it'll have to wait for 8.5 or beyond.
2008-12-31 00:08:39 +00:00
95b07bc7f5 Support window functions a la SQL:2008.
Hitoshi Harada, with some kibitzing from Heikki and Tom.
2008-12-28 18:54:01 +00:00
0436679969 Get rid of adjust_appendrel_attr_needed(), which has been broken ever since
we extended the appendrel mechanism to support UNION ALL optimization.  The
reason nobody noticed was that we are not actually using attr_needed data for
appendrel children; hence it seems more reasonable to rip it out than fix it.
Back-patch to 8.2 because an Assert failure is possible in corner cases.
Per examination of an example from Jim Nasby.

In HEAD, also get rid of AppendRelInfo.col_mappings, which is quite inadequate
to represent UNION ALL situations; depend entirely on translated_vars instead.
2008-11-11 18:13:32 +00:00
e6ae3b5dbf Add a concept of "placeholder" variables to the planner. These are variables
that represent some expression that we desire to compute below the top level
of the plan, and then let that value "bubble up" as though it were a plain
Var (ie, a column value).

The immediate application is to allow sub-selects to be flattened even when
they are below an outer join and have non-nullable output expressions.
Formerly we couldn't flatten because such an expression wouldn't properly
go to NULL when evaluated above the outer join.  Now, we wrap it in a
PlaceHolderVar and arrange for the actual evaluation to occur below the outer
join.  When the resulting Var bubbles up through the join, it will be set to
NULL if necessary, yielding the correct results.  This fixes a planner
limitation that's existed since 7.1.

In future we might want to use this mechanism to re-introduce some form of
Hellerstein's "expensive functions" optimization, ie place the evaluation of
an expensive function at the most suitable point in the plan tree.
2008-10-21 20:42:53 +00:00
76e6602417 Improve the recently-added code for inlining set-returning functions so that
it can handle functions returning setof record.  The case was left undone
originally, but it turns out to be simple to fix.
2008-10-09 19:27:40 +00:00
0d115dde82 Extend CTE patch to support recursive UNION (ie, without ALL). The
implementation uses an in-memory hash table, so it will poop out for very
large recursive results ... but the performance characteristics of a
sort-based implementation would be pretty unpleasant too.
2008-10-07 19:27:04 +00:00
44d5be0e53 Implement SQL-standard WITH clauses, including WITH RECURSIVE.
There are some unimplemented aspects: recursive queries must use UNION ALL
(should allow UNION too), and we don't have SEARCH or CYCLE clauses.
These might or might not get done for 8.4, but even without them it's a
pretty useful feature.

There are also a couple of small loose ends and definitional quibbles,
which I'll send a memo about to pgsql-hackers shortly.  But let's land
the patch now so we can get on with other development.

Yoshiyuki Asaba, with lots of help from Tatsuo Ishii and Tom Lane
2008-10-04 21:56:55 +00:00
ee33b95d9c Improve the plan cache invalidation mechanism to make it invalidate plans
when user-defined functions used in a plan are modified.  Also invalidate
plans when schemas, operators, or operator classes are modified; but for these
cases we just invalidate everything rather than tracking exact dependencies,
since these types of objects seldom change in a production database.

Tom Lane; loosely based on a patch by Martin Pihlak.
2008-09-09 18:58:09 +00:00
b153c09209 Add a bunch of new error location reports to parse-analysis error messages.
There are still some weak spots around JOIN USING and relation alias lists,
but most errors reported within backend/parser/ now have locations.
2008-09-01 20:42:46 +00:00
e5536e77a5 Move exprType(), exprTypmod(), expression_tree_walker(), and related routines
into nodes/nodeFuncs, so as to reduce wanton cross-subsystem #includes inside
the backend.  There's probably more that should be done along this line,
but this is a start anyway.
2008-08-25 22:42:34 +00:00
bd3daddaf2 Arrange to convert EXISTS subqueries that are equivalent to hashable IN
subqueries into the same thing you'd have gotten from IN (except always with
unknownEqFalse = true, so as to get the proper semantics for an EXISTS).
I believe this fixes the last case within CVS HEAD in which an EXISTS could
give worse performance than an equivalent IN subquery.

The tricky part of this is that if the upper query probes the EXISTS for only
a few rows, the hashing implementation can actually be worse than the default,
and therefore we need to make a cost-based decision about which way to use.
But at the time when the planner generates plans for subqueries, it doesn't
really know how many times the subquery will be executed.  The least invasive
solution seems to be to generate both plans and postpone the choice until
execution.  Therefore, in a query that has been optimized this way, EXPLAIN
will show two subplans for the EXISTS, of which only one will actually get
executed.

There is a lot more that could be done based on this infrastructure: in
particular it's interesting to consider switching to the hash plan if we start
out using the non-hashed plan but find a lot more upper rows going by than we
expected.  I have therefore left some minor inefficiencies in place, such as
initializing both subplans even though we will currently only use one.
2008-08-22 00:16:04 +00:00
19e34b6239 Improve sublink pullup code to handle ANY/EXISTS sublinks that are at top
level of a JOIN/ON clause, not only at top level of WHERE.  (However, we
can't do this in an outer join's ON clause, unless the ANY/EXISTS refers
only to the nullable side of the outer join, so that it can effectively
be pushed down into the nullable side.)  Per request from Kevin Grittner.

In passing, fix a bug in the initial implementation of EXISTS pullup:
it would Assert if the EXIST's WHERE clause used a join alias variable.
Since we haven't yet flattened join aliases when this transformation
happens, it's necessary to include join relids in the computed set of
RHS relids.
2008-08-17 01:20:00 +00:00
d4af2a6481 Clean up the loose ends in selectivity estimation left by my patch for semi
and anti joins.  To do this, pass the SpecialJoinInfo struct for the current
join as an additional optional argument to operator join selectivity
estimation functions.  This allows the estimator to tell not only what kind
of join is being formed, but which variable is on which side of the join;
a requirement long recognized but not dealt with till now.  This also leaves
the door open for future improvements in the estimators, such as accounting
for the null-insertion effects of lower outer joins.  I didn't do anything
about that in the current patch but the information is in principle deducible
from what's passed.

The patch also clarifies the definition of join selectivity for semi/anti
joins: it's the fraction of the left input that has (at least one) match
in the right input.  This allows getting rid of some very fuzzy thinking
that I had committed in the original 7.4-era IN-optimization patch.
There's probably room to estimate this better than the present patch does,
but at least we know what to estimate.

Since I had to touch CREATE OPERATOR anyway to allow a variant signature
for join estimator functions, I took the opportunity to add a couple of
additional checks that were missing, per my recent message to -hackers:
* Check that estimator functions return float8;
* Require execute permission at the time of CREATE OPERATOR on the
operator's function as well as the estimator functions;
* Require ownership of any pre-existing operator that's modified by
the command.
I also moved the lookup of the functions out of OperatorCreate() and
into operatorcmds.c, since that seemed more consistent with most of
the other catalog object creation processes, eg CREATE TYPE.
2008-08-16 00:01:38 +00:00
e006a24ad1 Implement SEMI and ANTI joins in the planner and executor. (Semijoins replace
the old JOIN_IN code, but antijoins are new functionality.)  Teach the planner
to convert appropriate EXISTS and NOT EXISTS subqueries into semi and anti
joins respectively.  Also, LEFT JOINs with suitable upper-level IS NULL
filters are recognized as being anti joins.  Unify the InClauseInfo and
OuterJoinInfo infrastructure into "SpecialJoinInfo".  With that change,
it becomes possible to associate a SpecialJoinInfo with every join attempt,
which permits some cleanup of join selectivity estimation.  That needs to be
taken much further than this patch does, but the next step is to change the
API for oprjoin selectivity functions, which seems like material for a
separate patch.  So for the moment the output size estimates for semi and
especially anti joins are quite bogus.
2008-08-14 18:48:00 +00:00
af95d7aa63 Improve INTERSECT/EXCEPT hashing by realizing that we don't need to make any
hashtable entries for tuples that are found only in the second input: they
can never contribute to the output.  Furthermore, this implies that the
planner should endeavor to put first the smaller (in number of groups) input
relation for an INTERSECT.  Implement that, and upgrade prepunion's estimation
of the number of rows returned by setops so that there's some amount of sanity
in the estimate of which one is smaller.
2008-08-07 19:35:02 +00:00
368df30427 Support hashing for duplicate-elimination in INTERSECT and EXCEPT queries.
This completes my project of improving usage of hashing for duplicate
elimination (aggregate functions with DISTINCT remain undone, but that's
for some other day).

As with the previous patches, this means we can INTERSECT/EXCEPT on datatypes
that can hash but not sort, and it means that INTERSECT/EXCEPT without ORDER
BY are no longer certain to produce sorted output.
2008-08-07 03:04:04 +00:00
2d1d96b1ce Teach the system how to use hashing for UNION. (INTERSECT/EXCEPT will follow,
but seem like a separate patch since most of the remaining work is on the
executor side.)  I took the opportunity to push selection of the grouping
operators for set operations into the parser where it belongs.  Otherwise this
is just a small exercise in making prepunion.c consider both alternatives.

As with the recent DISTINCT patch, this means we can UNION on datatypes that
can hash but not sort, and it means that UNION without ORDER BY is no longer
certain to produce sorted output.
2008-08-07 01:11:52 +00:00
9511304752 Rearrange the querytree representation of ORDER BY/GROUP BY/DISTINCT items
as per my recent proposal:

1. Fold SortClause and GroupClause into a single node type SortGroupClause.
We were already relying on them to be struct-equivalent, so using two node
tags wasn't accomplishing much except to get in the way of comparing items
with equal().

2. Add an "eqop" field to SortGroupClause to carry the associated equality
operator.  This is cheap for the parser to get at the same time it's looking
up the sort operator, and storing it eliminates the need for repeated
not-so-cheap lookups during planning.  In future this will also let us
represent GROUP/DISTINCT operations on datatypes that have hash opclasses
but no btree opclasses (ie, they have equality but no natural sort order).
The previous representation simply didn't work for that, since its only
indicator of comparison semantics was a sort operator.

3. Add a hasDistinctOn boolean to struct Query to explicitly record whether
the distinctClause came from DISTINCT or DISTINCT ON.  This allows removing
some complicated and not 100% bulletproof code that attempted to figure
that out from the distinctClause alone.

This patch doesn't in itself create any new capability, but it's necessary
infrastructure for future attempts to use hash-based grouping for DISTINCT
and UNION/INTERSECT/EXCEPT.
2008-08-02 21:32:01 +00:00
eaf1b5d348 Tighten up SS_finalize_plan's computation of valid_params to exclude Params of
the current query level that aren't in fact output parameters of the current
initPlans.  (This means, for example, output parameters of regular subplans.)
To make this work correctly for output parameters coming from sibling
initplans requires rejiggering the API of SS_finalize_plan just a bit:
we need the siblings to be visible to it, rather than hidden as
SS_make_initplan_from_plan had been doing.  This is really part of my response
to bug #4290, but I concluded this part probably shouldn't be back-patched,
since all that it's doing is to make a debugging cross-check tighter.
2008-07-10 02:14:03 +00:00
a3540b0f65 Improve our #include situation by moving pointer types away from the
corresponding struct definitions.  This allows other headers to avoid including
certain highly-loaded headers such as rel.h and relscan.h, instead using just
relcache.h, heapam.h or genam.h, which are more lightweight and thus cause less
unnecessary dependencies.
2008-06-19 00:46:06 +00:00
db147b3483 Allow the planner's estimate of the fraction of a cursor's rows that will be
retrieved to be controlled through a GUC variable.

Robert Hell
2008-05-02 21:26:10 +00:00
25e46a504b Fix a couple of oversights associated with the "physical tlist" optimization:
we had several code paths where a physical tlist could be used for the input
to a Sort node, which is a dumb idea because any unneeded table columns will
increase the volume of data the sort has to push around.

(Unfortunately the easy-looking fix of calling disuse_physical_tlist during
make_sort_xxx doesn't work because in most cases we're already committed to
the current input tlist --- it's been marked with sort column numbers, or
we've built grouping column numbers using it, etc.  The tlist has to be
selected properly at the calling level before we start constructing sort-col
information.  This is easy enough to do, we were just failing to take the
point into consideration.)

Back-patch to 8.3.  I believe the problem probably exists clear back to 7.4
when the physical tlist optimization was added, but I'm afraid to back-patch
further than 8.3 without a great deal more study than I want to put into it.
The code in this area has drifted a lot over time.  The real-world importance
of these code paths is uncertain anyway --- I think in many cases we'd
probably prefer hash-based methods.
2008-04-17 21:22:14 +00:00
6b73d7e567 Fix an oversight I made in a cleanup patch over a year ago:
eval_const_expressions needs to be passed the PlannerInfo ("root") structure,
because in some cases we want it to substitute values for Param nodes.
(So "constant" is not so constant as all that ...)  This mistake partially
disabled optimization of unnamed extended-Query statements in 8.3: in
particular the LIKE-to-indexscan optimization would never be applied if the
LIKE pattern was passed as a parameter, and constraint exclusion depending
on a parameter value didn't work either.
2008-04-01 00:48:33 +00:00
d344115519 Apply my original fix for Taiki Yamaguchi's bug report about DISTINCT MAX().
Add some regression tests for plausible failures in this area.
2008-03-31 16:59:26 +00:00
0d49838df6 Arrange to "inline" SQL functions that appear in a query's FROM clause,
are declared to return set, and consist of just a single SELECT.  We
can replace the FROM-item with a sub-SELECT and then optimize much as
if we were dealing with a view.  Patch from Richard Rowell, cleaned up
by me.
2008-03-18 22:04:14 +00:00
c9a1cc694a Change hash index creation so that rather than always establishing exactly
two buckets at the start, we create a number of buckets appropriate for the
estimated size of the table.  This avoids a lot of expensive bucket-split
actions during initial index build on an already-populated table.

This is one of the two core ideas of Tom Raney and Shreya Bhargava's patch
to reduce hash index build time.  I'm committing it separately to make it
easier for people to test the effects of this separately from the effects
of their other core idea (pre-sorting the index entries by bucket number).
2008-03-15 20:46:31 +00:00
9098ab9e32 Update copyrights in source tree to 2008. 2008-01-01 19:46:01 +00:00
f6e8730d11 Re-run pgindent with updated list of typedefs. (Updated README should
avoid this problem in the future.)
2007-11-15 22:25:18 +00:00
fdf5a5efb7 pgindent run for 8.3. 2007-11-15 21:14:46 +00:00
c291203ca3 Fix EquivalenceClass code to handle volatile sort expressions in a more
predictable manner; in particular that if you say ORDER BY output-column-ref,
it will in fact sort by that specific column even if there are multiple
syntactic matches.  An example is
	SELECT random() AS a, random() AS b FROM ... ORDER BY b, a;
While the use-case for this might be a bit debatable, it worked as expected
in earlier releases, so we should preserve the behavior for 8.3.  Per my
recent proposal.

While at it, fix convert_subquery_pathkeys() to handle RelabelType stripping
in both directions; it needs this for the same reasons make_sort_from_pathkeys
does.
2007-11-08 21:49:48 +00:00
1be0601681 Last week's patch for make_sort_from_pathkeys wasn't good enough: it has
to be able to discard top-level RelabelType nodes on *both* sides of the
equivalence-class-to-target-list comparison, since make_pathkey_from_sortinfo
might either add or remove a RelabelType.  Also fix the latter to do the
removal case cleanly.  Per example from Peter.
2007-11-08 19:25:37 +00:00
82d8ab6fc4 Fix the plan-invalidation mechanism to treat regclass constants that refer to
a relation as a reason to invalidate a plan when the relation changes.  This
handles scenarios such as dropping/recreating a sequence that is referenced by
nextval('seq') in a cached plan.  Rather than teach plancache.c all about
digging through plan trees to find regclass Consts, we charge the planner's
setrefs.c with making a list of the relation OIDs on which each plan depends.
That way the list can be built cheaply during a plan tree traversal that has
to happen anyway.  Per bug #3662 and subsequent discussion.
2007-10-11 18:05:27 +00:00
89db887b1e Keep the planner from failing on "WHERE false AND something IN (SELECT ...)".
eval_const_expressions simplifies this to just "WHERE false", but we have
already done pull_up_IN_clauses so the IN join will be done, or at least
planned, anyway.  The trouble case comes when the sub-SELECT is itself a join
and we decide to implement the IN by unique-ifying the sub-SELECT outputs:
with no remaining reference to the output Vars in WHERE, we won't have
propagated the Vars up to the upper join point, leading to "variable not found
in subplan target lists" error.  Fix by adding an extra scan of in_info_list
and forcing all Vars mentioned therein to be propagated up to the IN join
point.  Per bug report from Miroslav Sulc.
2007-10-04 20:44:47 +00:00
cdf0231c88 Create a function variable "join_search_hook" to let plugins override the
join search order portion of the planner; this is specifically intended to
simplify developing a replacement for GEQO planning.  Patch by Julius
Stroffek, editorialized on by me.  I renamed make_one_rel_by_joins to
standard_join_search and make_rels_by_joins to join_search_one_level to better
reflect their place within this scheme.
2007-09-26 18:51:51 +00:00
7125687511 Fix cost estimates for EXISTS subqueries that are evaluated as initPlans
(because they are uncorrelated with the immediate parent query).  We were
charging the full run cost to the parent node, disregarding the fact that
only one row need be fetched for EXISTS.  While this would only be a
cosmetic issue in most cases, it might possibly affect planning outcomes
if the parent query were itself a subquery to some upper query.
Per recent discussion with Steve Crawford.
2007-09-22 21:36:40 +00:00
282d2a03dd HOT updates. When we update a tuple without changing any of its indexed
columns, and the new version can be stored on the same heap page, we no longer
generate extra index entries for the new version.  Instead, index searches
follow the HOT-chain links to ensure they find the correct tuple version.

In addition, this patch introduces the ability to "prune" dead tuples on a
per-page basis, without having to do a complete VACUUM pass to recover space.
VACUUM is still needed to clean up dead index entries, however.

Pavan Deolasee, with help from a bunch of other people.
2007-09-20 17:56:33 +00:00
906b2e1b37 Rename DLLIMPORT macro to PGDLLIMPORT to avoid conflict with
third party includes (like tcl) that define DLLIMPORT.
2007-07-25 12:22:54 +00:00