mirror of
https://git.postgresql.org/git/postgresql.git
synced 2026-02-21 05:46:59 +08:00
Before executing a cached generic plan, AcquireExecutorLocks() in
plancache.c locks all relations in a plan's range table to ensure the
plan is safe for execution. However, this locks runtime-prunable
relations that will later be pruned during "initial" runtime pruning,
introducing unnecessary overhead.
This commit defers locking for such relations to executor startup and
ensures that if the CachedPlan is invalidated due to concurrent DDL
during this window, replanning is triggered. Deferring these locks
avoids unnecessary locking overhead for pruned partitions, resulting
in significant speedup, particularly when many partitions are pruned
during initial runtime pruning.
* Changes to locking when executing generic plans:
AcquireExecutorLocks() now locks only unprunable relations, that is,
those found in PlannedStmt.unprunableRelids (introduced in commit
cbc127917e), to avoid locking runtime-prunable partitions
unnecessarily. The remaining locks are taken by
ExecDoInitialPruning(), which acquires them only for partitions that
survive pruning.
This deferral does not affect the locks required for permission
checking in InitPlan(), which takes place before initial pruning.
ExecCheckPermissions() now includes an Assert to verify that all
relations undergoing permission checks, none of which can be in the
set of runtime-prunable relations, are properly locked.
* Plan invalidation handling:
Deferring locks introduces a window where prunable relations may be
altered by concurrent DDL, invalidating the plan. A new function,
ExecutorStartCachedPlan(), wraps ExecutorStart() to detect and handle
invalidation caused by deferred locking. If invalidation occurs,
ExecutorStartCachedPlan() updates CachedPlan using the new
UpdateCachedPlan() function and retries execution with the updated
plan. To ensure all code paths that may be affected by this handle
invalidation properly, all callers of ExecutorStart that may execute a
PlannedStmt from a CachedPlan have been updated to use
ExecutorStartCachedPlan() instead.
UpdateCachedPlan() replaces stale plans in CachedPlan.stmt_list. A new
CachedPlan.stmt_context, created as a child of CachedPlan.context,
allows freeing old PlannedStmts while preserving the CachedPlan
structure and its statement list. This ensures that loops over
statements in upstream callers of ExecutorStartCachedPlan() remain
intact.
ExecutorStart() and ExecutorStart_hook implementations now return a
boolean value indicating whether plan initialization succeeded with a
valid PlanState tree in QueryDesc.planstate, or false otherwise, in
which case QueryDesc.planstate is NULL. Hook implementations are
required to call standard_ExecutorStart() at the beginning, and if it
returns false, they should do the same without proceeding.
* Testing:
To verify these changes, the delay_execution module tests scenarios
where cached plans become invalid due to changes in prunable relations
after deferred locks.
* Note to extension authors:
ExecutorStart_hook implementations must verify plan validity after
calling standard_ExecutorStart(), as explained earlier. For example:
if (prev_ExecutorStart)
plan_valid = prev_ExecutorStart(queryDesc, eflags);
else
plan_valid = standard_ExecutorStart(queryDesc, eflags);
if (!plan_valid)
return false;
<extension-code>
return true;
Extensions accessing child relations, especially prunable partitions,
via ExecGetRangeTableRelation() must now ensure their RT indexes are
present in es_unpruned_relids (introduced in commit cbc127917e), or
they will encounter an error. This is a strict requirement after this
change, as only relations in that set are locked.
The idea of deferring some locks to executor startup, allowing locks
for prunable partitions to be skipped, was first proposed by Tom Lane.
Reviewed-by: Robert Haas <robertmhaas@gmail.com> (earlier versions)
Reviewed-by: David Rowley <dgrowleyml@gmail.com> (earlier versions)
Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us> (earlier versions)
Reviewed-by: Tomas Vondra <tomas@vondra.me>
Reviewed-by: Junwang Zhao <zhjwpku@gmail.com>
Discussion: https://postgr.es/m/CA+HiwqFGkMSge6TgC9KQzde0ohpAycLQuV7ooitEEpbKB0O_mg@mail.gmail.com
254 lines
10 KiB
C
254 lines
10 KiB
C
/*-------------------------------------------------------------------------
|
|
*
|
|
* portal.h
|
|
* POSTGRES portal definitions.
|
|
*
|
|
* A portal is an abstraction which represents the execution state of
|
|
* a running or runnable query. Portals support both SQL-level CURSORs
|
|
* and protocol-level portals.
|
|
*
|
|
* Scrolling (nonsequential access) and suspension of execution are allowed
|
|
* only for portals that contain a single SELECT-type query. We do not want
|
|
* to let the client suspend an update-type query partway through! Because
|
|
* the query rewriter does not allow arbitrary ON SELECT rewrite rules,
|
|
* only queries that were originally update-type could produce multiple
|
|
* plan trees; so the restriction to a single query is not a problem
|
|
* in practice.
|
|
*
|
|
* For SQL cursors, we support three kinds of scroll behavior:
|
|
*
|
|
* (1) Neither NO SCROLL nor SCROLL was specified: to remain backward
|
|
* compatible, we allow backward fetches here, unless it would
|
|
* impose additional runtime overhead to do so.
|
|
*
|
|
* (2) NO SCROLL was specified: don't allow any backward fetches.
|
|
*
|
|
* (3) SCROLL was specified: allow all kinds of backward fetches, even
|
|
* if we need to take a performance hit to do so. (The planner sticks
|
|
* a Materialize node atop the query plan if needed.)
|
|
*
|
|
* Case #1 is converted to #2 or #3 by looking at the query itself and
|
|
* determining if scrollability can be supported without additional
|
|
* overhead.
|
|
*
|
|
* Protocol-level portals have no nonsequential-fetch API and so the
|
|
* distinction doesn't matter for them. They are always initialized
|
|
* to look like NO SCROLL cursors.
|
|
*
|
|
*
|
|
* Portions Copyright (c) 1996-2025, PostgreSQL Global Development Group
|
|
* Portions Copyright (c) 1994, Regents of the University of California
|
|
*
|
|
* src/include/utils/portal.h
|
|
*
|
|
*-------------------------------------------------------------------------
|
|
*/
|
|
#ifndef PORTAL_H
|
|
#define PORTAL_H
|
|
|
|
#include "datatype/timestamp.h"
|
|
#include "executor/execdesc.h"
|
|
#include "tcop/cmdtag.h"
|
|
#include "utils/plancache.h"
|
|
#include "utils/resowner.h"
|
|
|
|
/*
|
|
* We have several execution strategies for Portals, depending on what
|
|
* query or queries are to be executed. (Note: in all cases, a Portal
|
|
* executes just a single source-SQL query, and thus produces just a
|
|
* single result from the user's viewpoint. However, the rule rewriter
|
|
* may expand the single source query to zero or many actual queries.)
|
|
*
|
|
* PORTAL_ONE_SELECT: the portal contains one single SELECT query. We run
|
|
* the Executor incrementally as results are demanded. This strategy also
|
|
* supports holdable cursors (the Executor results can be dumped into a
|
|
* tuplestore for access after transaction completion).
|
|
*
|
|
* PORTAL_ONE_RETURNING: the portal contains a single INSERT/UPDATE/DELETE/
|
|
* MERGE query with a RETURNING clause (plus possibly auxiliary queries added
|
|
* by rule rewriting). On first execution, we run the portal to completion
|
|
* and dump the primary query's results into the portal tuplestore; the
|
|
* results are then returned to the client as demanded. (We can't support
|
|
* suspension of the query partway through, because the AFTER TRIGGER code
|
|
* can't cope, and also because we don't want to risk failing to execute
|
|
* all the auxiliary queries.)
|
|
*
|
|
* PORTAL_ONE_MOD_WITH: the portal contains one single SELECT query, but
|
|
* it has data-modifying CTEs. This is currently treated the same as the
|
|
* PORTAL_ONE_RETURNING case because of the possibility of needing to fire
|
|
* triggers. It may act more like PORTAL_ONE_SELECT in future.
|
|
*
|
|
* PORTAL_UTIL_SELECT: the portal contains a utility statement that returns
|
|
* a SELECT-like result (for example, EXPLAIN or SHOW). On first execution,
|
|
* we run the statement and dump its results into the portal tuplestore;
|
|
* the results are then returned to the client as demanded.
|
|
*
|
|
* PORTAL_MULTI_QUERY: all other cases. Here, we do not support partial
|
|
* execution: the portal's queries will be run to completion on first call.
|
|
*/
|
|
typedef enum PortalStrategy
|
|
{
|
|
PORTAL_ONE_SELECT,
|
|
PORTAL_ONE_RETURNING,
|
|
PORTAL_ONE_MOD_WITH,
|
|
PORTAL_UTIL_SELECT,
|
|
PORTAL_MULTI_QUERY,
|
|
} PortalStrategy;
|
|
|
|
/*
|
|
* A portal is always in one of these states. It is possible to transit
|
|
* from ACTIVE back to READY if the query is not run to completion;
|
|
* otherwise we never back up in status.
|
|
*/
|
|
typedef enum PortalStatus
|
|
{
|
|
PORTAL_NEW, /* freshly created */
|
|
PORTAL_DEFINED, /* PortalDefineQuery done */
|
|
PORTAL_READY, /* PortalStart complete, can run it */
|
|
PORTAL_ACTIVE, /* portal is running (can't delete it) */
|
|
PORTAL_DONE, /* portal is finished (don't re-run it) */
|
|
PORTAL_FAILED, /* portal got error (can't re-run it) */
|
|
} PortalStatus;
|
|
|
|
typedef struct PortalData *Portal;
|
|
|
|
typedef struct PortalData
|
|
{
|
|
/* Bookkeeping data */
|
|
const char *name; /* portal's name */
|
|
const char *prepStmtName; /* source prepared statement (NULL if none) */
|
|
MemoryContext portalContext; /* subsidiary memory for portal */
|
|
ResourceOwner resowner; /* resources owned by portal */
|
|
void (*cleanup) (Portal portal); /* cleanup hook */
|
|
|
|
/*
|
|
* State data for remembering which subtransaction(s) the portal was
|
|
* created or used in. If the portal is held over from a previous
|
|
* transaction, both subxids are InvalidSubTransactionId. Otherwise,
|
|
* createSubid is the creating subxact and activeSubid is the last subxact
|
|
* in which we ran the portal.
|
|
*/
|
|
SubTransactionId createSubid; /* the creating subxact */
|
|
SubTransactionId activeSubid; /* the last subxact with activity */
|
|
int createLevel; /* creating subxact's nesting level */
|
|
|
|
/* The query or queries the portal will execute */
|
|
const char *sourceText; /* text of query (as of 8.4, never NULL) */
|
|
CommandTag commandTag; /* command tag for original query */
|
|
QueryCompletion qc; /* command completion data for executed query */
|
|
List *stmts; /* list of PlannedStmts */
|
|
CachedPlan *cplan; /* CachedPlan, if stmts are from one */
|
|
CachedPlanSource *plansource; /* CachedPlanSource, for cplan */
|
|
|
|
ParamListInfo portalParams; /* params to pass to query */
|
|
QueryEnvironment *queryEnv; /* environment for query */
|
|
|
|
/* Features/options */
|
|
PortalStrategy strategy; /* see above */
|
|
int cursorOptions; /* DECLARE CURSOR option bits */
|
|
|
|
/* Status data */
|
|
PortalStatus status; /* see above */
|
|
bool portalPinned; /* a pinned portal can't be dropped */
|
|
bool autoHeld; /* was automatically converted from pinned to
|
|
* held (see HoldPinnedPortals()) */
|
|
|
|
/* If not NULL, Executor is active; call ExecutorEnd eventually: */
|
|
QueryDesc *queryDesc; /* info needed for executor invocation */
|
|
|
|
/* If portal returns tuples, this is their tupdesc: */
|
|
TupleDesc tupDesc; /* descriptor for result tuples */
|
|
/* and these are the format codes to use for the columns: */
|
|
int16 *formats; /* a format code for each column */
|
|
|
|
/*
|
|
* Outermost ActiveSnapshot for execution of the portal's queries. For
|
|
* all but a few utility commands, we require such a snapshot to exist.
|
|
* This ensures that TOAST references in query results can be detoasted,
|
|
* and helps to reduce thrashing of the process's exposed xmin.
|
|
*/
|
|
Snapshot portalSnapshot; /* active snapshot, or NULL if none */
|
|
|
|
/*
|
|
* Where we store tuples for a held cursor or a PORTAL_ONE_RETURNING,
|
|
* PORTAL_ONE_MOD_WITH, or PORTAL_UTIL_SELECT query. (A cursor held past
|
|
* the end of its transaction no longer has any active executor state.)
|
|
*/
|
|
Tuplestorestate *holdStore; /* store for holdable cursors */
|
|
MemoryContext holdContext; /* memory containing holdStore */
|
|
|
|
/*
|
|
* Snapshot under which tuples in the holdStore were read. We must keep a
|
|
* reference to this snapshot if there is any possibility that the tuples
|
|
* contain TOAST references, because releasing the snapshot could allow
|
|
* recently-dead rows to be vacuumed away, along with any toast data
|
|
* belonging to them. In the case of a held cursor, we avoid needing to
|
|
* keep such a snapshot by forcibly detoasting the data.
|
|
*/
|
|
Snapshot holdSnapshot; /* registered snapshot, or NULL if none */
|
|
|
|
/*
|
|
* atStart, atEnd and portalPos indicate the current cursor position.
|
|
* portalPos is zero before the first row, N after fetching N'th row of
|
|
* query. After we run off the end, portalPos = # of rows in query, and
|
|
* atEnd is true. Note that atStart implies portalPos == 0, but not the
|
|
* reverse: we might have backed up only as far as the first row, not to
|
|
* the start. Also note that various code inspects atStart and atEnd, but
|
|
* only the portal movement routines should touch portalPos.
|
|
*/
|
|
bool atStart;
|
|
bool atEnd;
|
|
uint64 portalPos;
|
|
|
|
/* Presentation data, primarily used by the pg_cursors system view */
|
|
TimestampTz creation_time; /* time at which this portal was defined */
|
|
bool visible; /* include this portal in pg_cursors? */
|
|
} PortalData;
|
|
|
|
/*
|
|
* PortalIsValid
|
|
* True iff portal is valid.
|
|
*/
|
|
#define PortalIsValid(p) PointerIsValid(p)
|
|
|
|
|
|
/* Prototypes for functions in utils/mmgr/portalmem.c */
|
|
extern void EnablePortalManager(void);
|
|
extern bool PreCommit_Portals(bool isPrepare);
|
|
extern void AtAbort_Portals(void);
|
|
extern void AtCleanup_Portals(void);
|
|
extern void PortalErrorCleanup(void);
|
|
extern void AtSubCommit_Portals(SubTransactionId mySubid,
|
|
SubTransactionId parentSubid,
|
|
int parentLevel,
|
|
ResourceOwner parentXactOwner);
|
|
extern void AtSubAbort_Portals(SubTransactionId mySubid,
|
|
SubTransactionId parentSubid,
|
|
ResourceOwner myXactOwner,
|
|
ResourceOwner parentXactOwner);
|
|
extern void AtSubCleanup_Portals(SubTransactionId mySubid);
|
|
extern Portal CreatePortal(const char *name, bool allowDup, bool dupSilent);
|
|
extern Portal CreateNewPortal(void);
|
|
extern void PinPortal(Portal portal);
|
|
extern void UnpinPortal(Portal portal);
|
|
extern void MarkPortalActive(Portal portal);
|
|
extern void MarkPortalDone(Portal portal);
|
|
extern void MarkPortalFailed(Portal portal);
|
|
extern void PortalDrop(Portal portal, bool isTopCommit);
|
|
extern Portal GetPortalByName(const char *name);
|
|
extern void PortalDefineQuery(Portal portal,
|
|
const char *prepStmtName,
|
|
const char *sourceText,
|
|
CommandTag commandTag,
|
|
List *stmts,
|
|
CachedPlan *cplan,
|
|
CachedPlanSource *plansource);
|
|
extern PlannedStmt *PortalGetPrimaryStmt(Portal portal);
|
|
extern void PortalCreateHoldStore(Portal portal);
|
|
extern void PortalHashTableDeleteAll(void);
|
|
extern bool ThereAreNoReadyPortals(void);
|
|
extern void HoldPinnedPortals(void);
|
|
extern void ForgetPortalSnapshots(void);
|
|
|
|
#endif /* PORTAL_H */
|