Commit Graph

7781 Commits

Author SHA1 Message Date
05abb0f830 Fix type in test_escape test
On machines where char is unsigned this could lead to option parsing looping
endlessly. It's also too narrow a type on other hardware.

Found via Tom Lane's monitoring of the buildfarm.

Reported-by: Tom Lane <tgl@sss.pgh.pa.us>
Security: CVE-2025-1094
Backpatch-through: 13
2025-02-10 12:12:53 -05:00
02d4d87ac2 Add test of various escape functions
As highlighted by the prior commit, writing correct escape functions is less
trivial than one might hope.

This test module tries to verify that different escaping functions behave
reasonably. It e.g. tests:

- Invalidly encoded input to an escape function leads to invalidly encoded
  output

- Trailing incomplete multi-byte characters are handled sensibly

- Escaped strings are parsed as single statement by psql's parser (which
  derives from the backend parser)

There are further tests that would be good to add. But even in the current
state it was rather useful for writing the fix in the prior commit.

Reviewed-by: Noah Misch <noah@leadboat.com>
Backpatch-through: 13
Security: CVE-2025-1094
2025-02-10 10:03:38 -05:00
7d43ca6fe0 Add pg_encoding_set_invalid()
There are cases where we cannot / do not want to error out for invalidly
encoded input. In such cases it can be useful to replace e.g. an incomplete
multi-byte characters with bytes that will trigger an error when getting
validated as part of a larger string.

Unfortunately, until now, for some encoding no such sequence existed. For
those encodings this commit removes one previously accepted input combination
- we consider that to be ok, as the chosen bytes are outside of the valid
ranges for the encodings, we just previously failed to detect that.

As we cannot add a new field to pg_wchar_table without breaking ABI, this is
implemented "in-line" in the newly added function.

Author: Noah Misch <noah@leadboat.com>
Reviewed-by: Andres Freund <andres@anarazel.de>
Backpatch-through: 13
Security: CVE-2025-1094
2025-02-10 10:03:38 -05:00
09df23cf32 meson: Add missing dependencies to libpq_pipeline test
The missing dependency was, e.g., visible when doing
  ninja clean && ninja meson-test-prereq && meson test --no-rebuild --suite setup --suite libpq_pipeline

Apply this to all branches with meson support, as part of an effort to fix
incorrect test dependencies that can lead to test failures.

Discussion: https://postgr.es/m/CAGECzQSvM3iSDmjF+=Kof5an6jN8UbkP_4cKKT9w6GZavmb5yQ@mail.gmail.com
Discussion: https://postgr.es/m/bdba588f-69a9-4f3e-9b95-62d07210a32e@eisentraut.org
Backpatch: 16-, where meson support was added
2025-02-04 17:56:20 -05:00
493398de2c meson: Add test dependencies for test_json_parser
This is required to ensure correct test dependencies, previously
the test binaries would not necessarily be built.

The missing dependency was, e.g., visible when doing
  ninja clean && ninja meson-test-prereq && m test --no-rebuild --suite setup --suite test_json_parser

Apply this to all branches with meson support, as part of an effort to fix
incorrect test dependencies that can lead to test failures.

Author: Peter Eisentraut <peter@eisentraut.org>
Author: Andres Freund <andres@anarazel.de>
Discussion: https://postgr.es/m/CAGECzQSvM3iSDmjF+=Kof5an6jN8UbkP_4cKKT9w6GZavmb5yQ@mail.gmail.com
Discussion: https://postgr.es/m/bdba588f-69a9-4f3e-9b95-62d07210a32e@eisentraut.org
Backpatch: 17-, where test_json_parser was added
2025-02-04 17:56:20 -05:00
f4af4515bb At update of non-LP_NORMAL TID, fail instead of corrupting page header.
The right mix of DDL and VACUUM could corrupt a catalog page header such
that PageIsVerified() durably fails, requiring a restore from backup.
This affects only catalogs that both have a syscache and have DDL code
that uses syscache tuples to construct updates.  One of the test
permutations shows a variant not yet fixed.

This makes !TransactionIdIsValid(TM_FailureData.xmax) possible with
TM_Deleted.  I think core and PGXN are indifferent to that.

Per bug #17821 from Alexander Lakhin.  Back-patch to v13 (all supported
versions).  The test case is v17+, since it uses INJECTION_POINT.

Discussion: https://postgr.es/m/17821-dd8c334263399284@postgresql.org
2025-01-25 11:28:18 -08:00
eca456edc1 Disable runningcheck for src/test/modules/injection_points/specs.
Directory "injection_points" has specified NO_INSTALLCHECK since before
commit c35f419d6efbdf1a050250d84b687e6705917711 added the specs, but
that commit neglected to disable the corresponding meson runningcheck.
The alternative would be to enable "make installcheck" for ISOLATION,
but the GNU make build system lacks a concept of setting NO_INSTALLCHECK
for REGRESS without also setting it for ISOLATION.  Back-patch to v17,
where that commit first appeared, to avoid surprises when back-patching
additional specs.

Discussion: https://postgr.es/m/17821-dd8c334263399284@postgresql.org
2025-01-25 11:28:18 -08:00
dbc3499a0b Reword recent error messages: "should" -> "must"
Most were introduced in the 17 timeframe.  The ones in wparser_def.c are
very old.

I also changed "JSON path expression for column \"%s\" should return
single item without wrapper" to "JSON path expression for column \"%s\"
must return single item when no wrapper is requested" to avoid
ambiguity.

Backpatch to 17.

Crickets: https://postgr.es/m/202501131819.26ors7oouafu@alvherre.pgsql
2025-01-21 15:24:49 +01:00
2f30847d1d Fix detach of a partition that has a toplevel FK to a partitioned table
In common cases, foreign keys are defined on the toplevel partitioned
table; but if instead one is defined on a partition and references a
partitioned table, and the referencing partition is detached, we would
examine the pg_constraint row on the partition being detached, and fail
to realize that the sub-constraints must be left alone.  This causes the
ALTER TABLE DETACH process to fail with

 ERROR:  could not find ON INSERT check triggers of foreign key constraint NNN

This is similar but not quite the same as what was fixed by
53af9491a043.  This bug doesn't affect branches earlier than 15, because
the detach procedure was different there, so we only backpatch down to
15.

Fix by skipping such modifying constraints that are children of other
constraints being detached.

Author: Amul Sul <sulamul@gmail.com>
Diagnosys-by: Sami Imseih <samimseih@gmail.com>
Discussion: https://postgr.es/m/CAAJ_b97GuPh6wQPbxQS-Zpy16Oh+0aMv-w64QcGrLhCOZZ6p+g@mail.gmail.com
2025-01-21 14:53:46 +01:00
b7bad919e1 Avoid using timezone Asia/Manila in regression tests.
The freshly-released 2025a version of tzdata has a refined estimate
for the longitude of Manila, changing their value for LMT in
pre-standardized-timezone days.  This changes the output of one of
our test cases.  Since we need to be able to run with system tzdata
files that may or may not contain this update, we'd better stop
making that specific test.

I switched it to use Asia/Singapore, which has a roughly similar UTC
offset.  That LMT value hasn't changed in tzdb since 2003, so we can
hope that it's well established.

I also noticed that this set of make_timestamptz tests only exercises
zones east of Greenwich, which seems rather sad, and was not the
original intent AFAICS.  (We've already changed these tests once
to stabilize their results across tzdata updates, cf 66b737cd9;
it looks like I failed to consider the UTC-offset-sign aspect then.)
To improve that, add a test with Pacific/Honolulu.  That LMT offset
is also quite old in tzdb, so we'll cross our fingers that it doesn't
get improved.

Reported-by: Christoph Berg <cb@df7cb.de>
Discussion: https://postgr.es/m/Z46inkznCxesvDEb@msg.df7cb.de
Backpatch-through: 13
2025-01-20 15:47:53 -05:00
e6767c0ed1 Fix header check for continuation records where standbys could be stuck
XLogPageRead() checks immediately for an invalid WAL record header on a
standby, to be able to handle the case of continuation records that need
to be read across two different sources.  As written, the check was too
generic, applying to any target LSN.  Based on an analysis by Kyotaro
Horiguchi, what really matters is to make sure that the page header is
checked when attempting to read a LSN at the boundary of a segment, to
handle the case of a continuation record that spawns across multiple
pages when dealing with multiple segments, as WAL receivers are spawned
they request WAL from the beginning of a segment.  This fix has been
proposed by Kyotaro Horiguchi.

This could cause standbys to loop infinitely when dealing with a
continuation record during a timeline jump, in the case where the
contents of the record in the follow-up page are invalid.

Some regression tests are added to check such scenarios, able to
reproduce the original problem.  In the test, the contents of a
continuation record are overwritten with junk zeros on its follow-up
page, and replayed on standbys.  This is inspired by 039_end_of_wal.pl,
and is enough to show how standbys should react on promotion by not
being stuck.  Without the fix, the test would fail with a timeout.  The
test to reproduce the problem has been written by Alexander Kukushkin.

The original check has been introduced in 066871980183, for a similar
problem.

Author: Kyotaro Horiguchi, Alexander Kukushkin
Reviewed-by: Michael Paquier
Discussion: https://postgr.es/m/CAFh8B=mozC+e1wGJq0H=0O65goZju+6ab5AU7DEWCSUA2OtwDg@mail.gmail.com
Backpatch-through: 13
2025-01-20 09:30:33 +09:00
d1bf86a622 Revert recent changes related to handling of 2PC files at recovery
This commit reverts 8f67f994e8ea (down to v13) and c3de0f9eed38 (down to
v17), as these are proving to not be completely correct regarding two
aspects:
- In v17 and newer branches, c3de0f9eed38's check for epoch handling is
incorrect, and does not correctly handle frozen epochs.  A logic closer
to widen_snapshot_xid() should be used.  The 2PC code should try to
integrate deeper with FullTransactionIds, 5a1dfde8334b being not enough.
- In v13 and newer branches, 8f67f994e8ea is a workaround for the real
issue, which is that we should not attempt CLOG lookups without reaching
consistency.  This exists since 728bd991c3c4, and this is reachable with
ProcessTwoPhaseBuffer() called by restoreTwoPhaseData() at the beginning
of recovery.

Per discussion with Noah Misch.

Discussion: https://postgr.es/m/20250116010051.f3.nmisch@google.com
Backpatch-through: 13
2025-01-17 13:27:42 +09:00
0671a71e02 Fix setrefs.c's failure to do expression processing on prune steps.
We should run the expression subtrees of PartitionedRelPruneInfo
structs through fix_scan_expr.  Failure to do so means that
AlternativeSubPlans within those expressions won't be cleaned up
properly, resulting in "unrecognized node type" errors since v14.

It seems fairly likely that at least some of the other steps done
by fix_scan_expr are important here as well, resulting in as-yet-
undetected bugs.  Therefore, I've chosen to back-patch this to
all supported branches including v13, even though the known
symptom doesn't manifest in v13.

Per bug #18778 from Alexander Lakhin.

Discussion: https://postgr.es/m/18778-24cd399df6c806af@postgresql.org
2025-01-16 20:40:07 -05:00
149ed87e22 Move routines to manipulate WAL into PostgreSQL::Test::Cluster
These facilities were originally in the recovery TAP test
039_end_of_wal.pl.  A follow-up bug fix with a TAP test doing similar
WAL manipulations requires them, and all these had better not be
duplicated due to their complexity.  The routine names are tweaked to
use "wal" more consistently, similarly to the existing "advance_wal".

In v14 and v13, the new routines are moved to PostgresNode.pm.
039_end_of_wal.pl is updated to use the refactored routines, without
changing its coverage.

Reviewed-by: Alexander Kukushkin
Discussion: https://postgr.es/m/CAFh8B=mozC+e1wGJq0H=0O65goZju+6ab5AU7DEWCSUA2OtwDg@mail.gmail.com
Backpatch-through: 13
2025-01-16 09:26:25 +09:00
96e61b2792 Fix catcache invalidation of a list entry that's being built
If a new catalog tuple is inserted that belongs to a catcache list
entry, and cache invalidation happens while the list entry is being
built, the list entry might miss the newly inserted tuple.

To fix, change the way we detect concurrent invalidations while a
catcache entry is being built. Keep a stack of entries that are being
built, and apply cache invalidation to those entries in addition to
the real catcache entries. This is similar to the in-progress list in
relcache.c.

Back-patch to all supported versions.

Reviewed-by: Noah Misch
Discussion: https://www.postgresql.org/message-id/2234dc98-06fe-42ed-b5db-ac17384dc880@iki.fi
2025-01-14 14:35:11 +02:00
d037cc2af1 Fix JsonExpr deparsing to quote variable names in the PASSING clause.
When deparsing a JsonExpr, variable names in the PASSING clause were
not quoted. However, since they are parsed as ColLabel tokens, some
variable names require double quotes to ensure that they are properly
interpreted. Fix by using quote_identifier() in the deparsing code.

This oversight was limited to the SQL/JSON query functions
JSON_EXISTS(), JSON_QUERY(), and JSON_VALUE().

Back-patch to v17, where these functions were added.

Dean Rasheed, reviewed by Tom Lane.

Discussion: https://postgr.es/m/CAEZATCXTpAS%3DncfLNTZ7YS6O5puHeLg_SUYAit%2Bcs7wsrd9Msg%40mail.gmail.com
2025-01-12 13:36:44 +00:00
61b12135f5 Fix XMLTABLE() deparsing to quote namespace names if necessary.
When deparsing an XMLTABLE() expression, XML namespace names were not
quoted. However, since they are parsed as ColLabel tokens, some names
require double quotes to ensure that they are properly interpreted.
Fix by using quote_identifier() in the deparsing code.

Back-patch to all supported versions.

Dean Rasheed, reviewed by Tom Lane.

Discussion: https://postgr.es/m/CAEZATCXTpAS%3DncfLNTZ7YS6O5puHeLg_SUYAit%2Bcs7wsrd9Msg%40mail.gmail.com
2025-01-12 12:56:52 +00:00
c559f61b54 Fix an ALTER GROUP ... DROP USER error message.
This error message stated the privileges required to add a member
to a group even if the user was trying to drop a member:

	postgres=> alter group a drop user b;
	ERROR:  permission denied to alter role
	DETAIL:  Only roles with the ADMIN option on role "a" may add members.

Since the required privileges for both operations are the same, we
can fix this by modifying the message to mention both adding and
dropping members:

	postgres=> alter group a drop user b;
	ERROR:  permission denied to alter role
	DETAIL:  Only roles with the ADMIN option on role "a" may add or drop members.

Author: ChangAo Chen
Reviewed-by: Tom Lane
Discussion: https://postgr.es/m/tencent_FAA0D00E3514AAF0BBB6322542A6094FEF05%40qq.com
Backpatch-through: 16
2025-01-09 17:10:13 -06:00
d7905aa1da Fix error message wording
The originals are ambiguous and a bit out of style.

Reviewed-by: Amit Langote <amitlangote09@gmail.com>
Discussion: https://postgr.es/m/202412141243.efesjyyvzxsz@alvherre.pgsql
2025-01-07 20:07:32 +01:00
297b280abd Ignore nullingrels when looking up statistics
When looking up statistical data about an expression, we do not need
to concern ourselves with the outer joins that could null the
Vars/PHVs contained in the expression.  Accounting for nullingrels in
the expression could cause estimate_num_groups to count the same Var
multiple times if it's marked with different nullingrels.  This is
incorrect, and could lead to "ERROR:  corrupt MVNDistinct entry" when
searching for multivariate n-distinct.

Furthermore, the nullingrels could prevent us from matching an
expression to expressional index columns or to the expressions in
extended statistics, leading to inaccurate estimates.

To fix, strip out all the nullingrels from the expression before we
look up statistical data about it.  There is one ensuing plan change
in the regression tests, but it looks reasonable and does not
compromise its original purpose.

This patch could result in plan changes, but it fixes an actual bug,
so back-patch to v16 where the outer-join-aware-Var infrastructure was
introduced.

Author: Richard Guo
Discussion: https://postgr.es/m/CAMbWs4-2Z4k+nFTiZe0Qbu5n8juUWenDAtMzi98bAZQtwHx0-w@mail.gmail.com
2025-01-02 17:59:32 +09:00
c3de0f9eed Fix failures with incorrect epoch handling for 2PC files at recovery
At the beginning of recovery, an orphaned two-phase file in an epoch
different than the one defined in the checkpoint record could not be
removed based on the assumptions that AdjustToFullTransactionId() relies
on, assuming that all files would be either from the current epoch or
from the previous epoch.

If the checkpoint epoch was 0 while the 2PC file was orphaned and in the
future, AdjustToFullTransactionId() would underflow the epoch used to
build the 2PC file path.  In non-assert builds, this would create a
WARNING message referring to a 2PC file with an epoch of "FFFFFFFF" (or
UINT32_MAX), as an effect of the underflow calculation, leaving the
orphaned file around.

Some tests are added with dummy 2PC files in the past and the future,
checking that these are properly removed.

Issue introduced by 5a1dfde8334b, that has switched two-phase state
files to use FullTransactionIds.

Reported-by: Vitaly Davydov
Author: Michael Paquier
Reviewed-by: Vitaly Davydov
Discussion: https://postgr.es/m/13b5b6-676c3080-4d-531db900@47931709
Backpatch-through: 17
2024-12-30 09:58:09 +09:00
15b4c46c32 Exclude parallel workers from connection privilege/limit checks.
Cause parallel workers to not check datallowconn, rolcanlogin, and
ACL_CONNECT privileges.  The leader already checked these things
(except for rolcanlogin which might have been checked for a different
role).  Re-checking can accomplish little except to induce unexpected
failures in applications that might not even be aware that their query
has been parallelized.  We already had the principle that parallel
workers rely on their leader to pass a valid set of authorization
information, so this change just extends that a bit further.

Also, modify the ReservedConnections, datconnlimit and rolconnlimit
logic so that these limits are only enforced against regular backends,
and only regular backends are counted while checking if the limits
were already reached.  Previously, background processes that had an
assigned database or role were subject to these limits (with rather
random exclusions for autovac workers and walsenders), and the set of
existing processes that counted against each limit was quite haphazard
as well.  The point of these limits, AFAICS, is to ensure the
availability of PGPROC slots for regular backends.  Since all other
types of processes have their own separate pools of PGPROC slots, it
makes no sense either to enforce these limits against them or to count
them while enforcing the limit.

While edge-case failures of these sorts have been possible for a
long time, the problem got a good deal worse with commit 5a2fed911
(CVE-2024-10978), which caused parallel workers to make some of these
checks using the leader's current role where before we had used its
AuthenticatedUserId, thus allowing parallel queries to fail after
SET ROLE.  The previous behavior was fairly accidental and I have
no desire to return to it.

This patch includes reverting 73c9f91a1, which was an emergency hack
to suppress these same checks in some cases.  It wasn't complete,
as shown by a recent bug report from Laurenz Albe.  We can also revert
fd4d93d26 and 492217301, which hacked around the same problems in one
regression test.

In passing, remove the special case for autovac workers in
CheckMyDatabase; it seems cleaner to have AutoVacWorkerMain pass
the INIT_PG_OVERRIDE_ALLOW_CONNS flag, now that that does what's
needed.

Like 5a2fed911, back-patch to supported branches (which sadly no
longer includes v12).

Discussion: https://postgr.es/m/1808397.1735156190@sss.pgh.pa.us
2024-12-28 16:08:50 -05:00
fa61313770 In REASSIGN OWNED of a database, lock the tuple as mandated.
Commit aac2c9b4fde889d13f859c233c2523345e72d32b mandated such locking
and attempted to fulfill that mandate, but it missed REASSIGN OWNED.
Hence, it remained possible to lose VACUUM's inplace update of
datfrozenxid if a REASSIGN OWNED processed that database at the same
time.  This didn't affect the other inplace-updated catalog, pg_class.
For pg_class, REASSIGN OWNED calls ATExecChangeOwner() instead of the
generic AlterObjectOwner_internal(), and ATExecChangeOwner() fulfills
the locking mandate.

Like in GRANT, implement this by following the locking protocol for any
catalog subject to the generic AlterObjectOwner_internal().  It would
suffice to do this for IsInplaceUpdateOid() catalogs only.  Back-patch
to v13 (all supported versions).

Kirill Reshke.  Reported by Alexander Kukushkin.

Discussion: https://postgr.es/m/CAFh8B=mpKjAy4Cuun-HP-f_vRzh2HSvYFG3rhVfYbfEBUhBAGg@mail.gmail.com
2024-12-28 07:16:26 -08:00
7b8d45d278 Fix Assert failure in WITH RECURSIVE UNION queries
If the non-recursive part of a recursive CTE ended up using
TTSOpsBufferHeapTuple as the table slot type, then a duplicate value
could cause an Assert failure in CheckOpSlotCompatibility() when
checking the hash table for the duplicate value.  The expected slot type
for the deform step was TTSOpsMinimalTuple so the Assert failed when the
TTSOpsBufferHeapTuple slot was used.

This is a long-standing bug which we likely didn't notice because it
seems much more likely that the non-recursive term would have required
projection and used a TTSOpsVirtual slot, which CheckOpSlotCompatibility
is ok with.

There doesn't seem to be any harm done here other than the Assert
failure.  Both TTSOpsMinimalTuple and TTSOpsBufferHeapTuple slot types
require tuple deformation, so the EEOP_*_FETCHSOME ExprState step would
have properly existed in the ExprState.

The solution is to pass NULL for the ExecBuildGroupingEqual's 'lops'
parameter.  This means the ExprState's EEOP_*_FETCHSOME step won't
expect a fixed slot type.  This makes CheckOpSlotCompatibility() happy as
no checking is performed when the ExprEvalStep is not expecting a fixed
slot type.

Reported-by: Richard Guo
Reviewed-by: Tom Lane
Discussion: https://postgr.es/m/CAMbWs4-8U9q2LAtf8+ghV11zeUReA3AmrYkxzBEv0vKnDxwkKA@mail.gmail.com
Backpatch-through: 13, all supported versions
2024-12-19 13:12:18 +13:00
d1253c67fa Make 009_twophase.pl test pass with recovery_min_apply_delay set
The test failed if you ran the regression tests with TEMP_CONFIG with
recovery_min_apply_delay = '500ms'. Fix the race condition by waiting
for transaction to be applied in the replica, like in a few other
tests.

The failing test was introduced in commit cbfbda7841. Backpatch to all
supported versions like that commit (except v12, which is no longer
supported).

Reported-by: Alexander Lakhin
Discussion: https://www.postgresql.org/message-id/09e2a70a-a6c2-4b5c-aeae-040a7449c9f2@gmail.com
2024-12-16 15:59:27 +02:00
9d5ce4f1a0 Fix possible crash during WindowAgg evaluation
When short-circuiting WindowAgg node evaluation on the top-level
WindowAgg node using quals on monotonic window functions, because the
WindowAgg run condition can mean there's no need to evaluate subsequent
window function results in the same partition once the run condition
becomes false, it was possible that the executor would use stale results
from the previous invocation of the window function in some cases.

A fix for this was partially done by a5832722, but that commit only
fixed the issue for non-top-level WindowAgg nodes.  I mistakenly thought
that the top-level WindowAgg didn't have this issue, but Jayesh's example
case clearly shows that's incorrect.  At the time, I also thought that
this only affected 32-bit systems as all window functions which then
supported run conditions returned BIGINT, however, that's wrong as
ExecProject is still called and that could cause evaluation of any other
window function belonging to the same WindowAgg node, one of which may
return a byref type.

The only queries affected by this are WindowAggs with a "Run Condition"
which contains at least one window function with a byref result type,
such as lead() or lag() on a byref column.  The window clause must also
contain a PARTITION BY clause (without a PARTITION BY, execution of the
WindowAgg stops immediately when the run condition becomes false and
there's no risk of using the stale results).

Reported-by: Jayesh Dehankar
Discussion: https://postgr.es/m/193261e2c4d.3dd3cd7c1842.871636075166132237@zohocorp.com
Backpatch-through: 15, where WindowAgg run conditions were added
2024-12-09 14:24:07 +13:00
765f76d8cd Fix is_digit labeling of to_timestamp's FFn format codes.
These format codes produce or consume strings of digits, so they
should be labeled with is_digit = true, but they were not.
This has effect in only one place, where is_next_separator()
is checked to see if the preceding format code should slurp up
all the available digits.  Thus, with a format such as '...SSFF3'
with remaining input '12345', the 'SS' code would consume all
five digits (and then complain about seconds being out of range)
when it should eat only two digits.

Per report from Nick Davies.  This bug goes back to d589f9446
where the FFn codes were introduced, so back-patch to v13.

Discussion: https://postgr.es/m/AM8PR08MB6356AC979252CFEA78B56678B6312@AM8PR08MB6356.eurprd08.prod.outlook.com
2024-12-07 13:12:32 -05:00
78883cd905 Avoid mislabeling of lateral references, redux.
As I'd feared, commit 5c9d8636d was still a few bricks shy of a load.
We can't just leave pulled-up lateral-reference Vars with no new
nullingrels: we have to carefully compute what subset of the
to-be-replaced Var's nullingrels apply to them, else we still get
"wrong varnullingrels" errors.  This is a bit tedious, but it looks
like we can use the nullingrel data this patch computes for other
purposes, enabling better optimization.  We don't want to inject
unnecessary plan changes into stable branches though, so leave that
idea for a later HEAD-only patch.

Patch by me, but thanks to Richard Guo for devising a test case that
broke 5c9d8636d, and for preliminary investigation about how to fix
it.  As before, back-patch to v16.

Discussion: https://postgr.es/m/E1tGn4j-0003zi-MP@gemulon.postgresql.org
2024-11-30 12:42:20 -05:00
72822a99d4 Avoid mislabeling of lateral references when pulling up a subquery.
If we are pulling up a subquery that's under an outer join, and
the subquery's target list contains a strict expression that uses
both a subquery variable and a lateral-reference variable, it's okay
to pull up the expression without wrapping it in a PlaceHolderVar.
That's safe because if the subquery variable is forced to NULL
by the outer join, the expression result will come out as NULL too,
so we don't have to force that outcome by evaluating the expression
below the outer join.  It'd be correct to wrap in a PHV, but that can
lead to very significantly worse plans, since we'd then have to use
a nestloop plan to pass down the lateral reference to where the
expression will be evaluated.

However, when we do that, we should not mark the lateral reference
variable as being nulled by the outer join, because it isn't after
we pull up the expression in this way.  So the marking logic added
by cb8e50a4a was incorrect in this detail, leading to "wrong
varnullingrels" errors from the consistency-checking logic in
setrefs.c.  It seems to be sufficient to just not mark lateral
references at all in this case.  (I have a nagging feeling that more
complexity may be needed in cases where there are several levels of
outer join, but some attempts to break it with that didn't succeed.)

Per report from Bertrand Mamasam.  Back-patch to v16, as the previous
patch was.

Discussion: https://postgr.es/m/CACZ67_UA_EVrqiFXJu9XK50baEpH=ofEPJswa2kFxg6xuSw-ww@mail.gmail.com
2024-11-28 17:33:16 -05:00
6e793582bc Fix pg_get_constraintdef for NOT NULL constraints on domains
We added pg_constraint rows for all not-null constraints, first for
tables and later for domains; but while the ones for tables were
reverted, the ones for domains were not.  However, we did accidentally
revert ruleutils.c support for the ones on domains in 6f8bb7c1e961,
which breaks running pg_get_constraintdef() on them.  Put that back.

This is only needed in branch 17, because we've reinstated this code in
branch master with commit 14e87ffa5c54.  Add some new tests in both
branches.

I couldn't find anything else that needs de-reverting.

Reported-by: Erki Eessaar <erki.eessaar@taltech.ee>
Reviewed-by: Magnus Hagander <magnus@hagander.net>
Discussion: https://postgr.es/m/AS8PR01MB75110350415AAB8BBABBA1ECFE222@AS8PR01MB7511.eurprd01.prod.exchangelabs.com
2024-11-27 13:50:27 +01:00
97be02ad00 Fix NULLIF()'s handling of read-write expanded objects.
If passed a read-write expanded object pointer, the EEOP_NULLIF
code would hand that same pointer to the equality function
and then (unless equality was reported) also return the same
pointer as its value.  This is no good, because a function that
receives a read-write expanded object pointer is fully entitled
to scribble on or even delete the object, thus corrupting the
NULLIF output.  (This problem is likely unobservable with the
equality functions provided in core Postgres, but it's easy to
demonstrate with one coded in plpgsql.)

To fix, make sure the pointer passed to the equality function
is read-only.  We can still return the original read-write
pointer as the NULLIF result, allowing optimization of later
operations.

Per bug #18722 from Alexander Lakhin.  This has been wrong
since we invented expanded objects, so back-patch to all
supported branches.

Discussion: https://postgr.es/m/18722-fd9e645448cc78b4@postgresql.org
2024-11-25 18:09:10 -05:00
718af10dab Avoid "you don't own a lock of type ExclusiveLock" in GRANT TABLESPACE.
This WARNING appeared because SearchSysCacheLocked1() read
cc_relisshared before catcache initialization, when the field is false
unconditionally.  On the basis of reading false there, it constructed a
locktag as though pg_tablespace weren't relisshared.  Only shared
catalogs could be affected, and only GRANT TABLESPACE was affected in
practice.  SearchSysCacheLocked1() callers use one other shared-relation
syscache, DATABASEOID.  DATABASEOID is initialized by the end of
CheckMyDatabase(), making the problem unreachable for pg_database.

Back-patch to v13 (all supported versions).  This has no known impact
before v16, where ExecGrant_common() first appeared.  Earlier branches
avoid trouble by having a separate ExecGrant_Tablespace() that doesn't
use LOCKTAG_TUPLE.  However, leaving this unfixed in v15 could ensnare a
future back-patch of a SearchSysCacheLocked1() call.

Reported by Aya Iwata.

Discussion: https://postgr.es/m/OS7PR01MB11964507B5548245A7EE54E70EA212@OS7PR01MB11964.jpnprd01.prod.outlook.com
2024-11-25 14:42:38 -08:00
1c05004a89 Fix per-session activation of ALTER {ROLE|DATABASE} SET role.
After commit 5a2fed911a85ed6d8a015a6bafe3a0d9a69334ae, the catalog state
resulting from these commands ceased to affect sessions.  Restore the
longstanding behavior, which is like beginning the session with a SET
ROLE command.  If cherry-picking the CVE-2024-10978 fixes, default to
including this, too.  (This fixes an unintended side effect of fixing
CVE-2024-10978.)  Back-patch to v12, like that commit.  The release team
decided to include v12, despite the original intent to halt v12 commits
earlier this week.

Tom Lane and Noah Misch.  Reported by Etienne LAFARGE.

Discussion: https://postgr.es/m/CADOZwSb0UsEr4_UTFXC5k7=fyyK8uKXekucd+-uuGjJsGBfxgw@mail.gmail.com
2024-11-15 20:39:59 -08:00
5f28e6ba7f Avoid assertion due to disconnected NFA sub-graphs in regex parsing.
In commit 08c0d6ad6 which introduced "rainbow" arcs in regex NFAs,
I didn't think terribly hard about what to do when creating the color
complement of a rainbow arc.  Clearly, the complement cannot match any
characters, and I took the easy way out by just not building any arcs
at all in the complement arc set.  That mostly works, but Nikolay
Shaplov found a case where it doesn't: if we decide to delete that
sub-NFA later because it's inside a "{0}" quantifier, delsub()
suffered an assertion failure.  That's because delsub() relies on
the target sub-NFA being fully connected.  That was always true
before, and the best fix seems to be to restore that property.
Hence, invent a new arc type CANTMATCH that can be generated in
place of an empty color complement, and drop it again later when we
start NFA optimization.  (At that point we don't need to do delsub()
any more, and besides there are other cases where NFA optimization can
lead to disconnected subgraphs.)

It appears that this bug has no consequences in a non-assert-enabled
build: there will be some transiently leaked NFA states/arcs, but
they'll get cleaned up eventually.  Still, we don't like assertion
failures, so back-patch to v14 where rainbow arcs were introduced.

Per bug #18708 from Nikolay Shaplov.

Discussion: https://postgr.es/m/18708-f94f2599c9d2c005@postgresql.org
2024-11-15 18:23:38 -05:00
cd82afdda5 Fix improper interactions between session_authorization and role.
The SQL spec mandates that SET SESSION AUTHORIZATION implies
SET ROLE NONE.  We tried to implement that within the lowest-level
functions that manipulate these settings, but that was a bad idea.
In particular, guc.c assumes that it doesn't matter in what order
it applies GUC variable updates, but that was not the case for these
two variables.  This problem, compounded by some hackish attempts to
work around it, led to some security-grade issues:

* Rolling back a transaction that had done SET SESSION AUTHORIZATION
would revert to SET ROLE NONE, even if that had not been the previous
state, so that the effective user ID might now be different from what
it had been.

* The same for SET SESSION AUTHORIZATION in a function SET clause.

* If a parallel worker inspected current_setting('role'), it saw
"none" even when it should see something else.

Also, although the parallel worker startup code intended to cope
with the current role's pg_authid row having disappeared, its
implementation of that was incomplete so it would still fail.

Fix by fully separating the miscinit.c functions that assign
session_authorization from those that assign role.  To implement the
spec's requirement, teach set_config_option itself to perform "SET
ROLE NONE" when it sets session_authorization.  (This is undoubtedly
ugly, but the alternatives seem worse.  In particular, there's no way
to do it within assign_session_authorization without incompatible
changes in the API for GUC assign hooks.)  Also, improve
ParallelWorkerMain to directly set all the relevant user-ID variables
instead of relying on some of them to get set indirectly.  That
allows us to survive not finding the pg_authid row during worker
startup.

In v16 and earlier, this includes back-patching 9987a7bf3 which
fixed a violation of GUC coding rules: SetSessionAuthorization
is not an appropriate place to be throwing errors from.

Security: CVE-2024-10978
2024-11-11 10:29:54 -05:00
edcda9bb4c Ensure cached plans are correctly marked as dependent on role.
If a CTE, subquery, sublink, security invoker view, or coercion
projection references a table with row-level security policies, we
neglected to mark the plan as potentially dependent on which role
is executing it.  This could lead to later executions in the same
session returning or hiding rows that should have been hidden or
returned instead.

Reported-by: Wolfgang Walther
Reviewed-by: Noah Misch
Security: CVE-2024-10976
Backpatch-through: 12
2024-11-11 09:00:00 -06:00
3ebcfa54db Block environment variable mutations from trusted PL/Perl.
Many process environment variables (e.g. PATH), bypass the containment
expected of a trusted PL.  Hence, trusted PLs must not offer features
that achieve setenv().  Otherwise, an attacker having USAGE privilege on
the language often can achieve arbitrary code execution, even if the
attacker lacks a database server operating system user.

To fix PL/Perl, replace trusted PL/Perl %ENV with a tied hash that just
replaces each modification attempt with a warning.  Sites that reach
these warnings should evaluate the application-specific implications of
proceeding without the environment modification:

  Can the application reasonably proceed without the modification?

    If no, switch to plperlu or another approach.

    If yes, the application should change the code to stop attempting
    environment modifications.  If that's too difficult, add "untie
    %main::ENV" in any code executed before the warning.  For example,
    one might add it to the start of the affected function or even to
    the plperl.on_plperl_init setting.

In passing, link to Perl's guidance about the Perl features behind the
security posture of PL/Perl.

Back-patch to v12 (all supported versions).

Andrew Dunstan and Noah Misch

Security: CVE-2024-10979
2024-11-11 06:23:46 -08:00
a0cdfc8893 Disallow partitionwise join when collations don't match
If the collation of any join key column doesn’t match the collation of
the corresponding partition key, partitionwise joins can yield incorrect
results. For example, rows that would match under the join key collation
might be located in different partitions due to the partitioning
collation. In such cases, a partitionwise join would yield different
results from a non-partitionwise join, so disallow it in such cases.

Reported-by: Tender Wang <tndrwang@gmail.com>
Author: Jian He <jian.universality@gmail.com>
Reviewed-by: Tender Wang <tndrwang@gmail.com>
Reviewed-by: Junwang Zhao <zhjwpku@gmail.com>
Discussion: https://postgr.es/m/CAHewXNno_HKiQ6PqyLYfuqDtwp7KKHZiH1J7Pqyz0nr+PS2Dwg@mail.gmail.com
Backpatch-through: 12
2024-11-08 17:19:35 +09:00
b6484ca953 Disallow partitionwise grouping when collations don't match
If the collation of any grouping column doesn’t match the collation of
the corresponding partition key, partitionwise grouping can yield
incorrect results. For example, rows that would be grouped under the
grouping collation may end up in different partitions under the
partitioning collation. In such cases, full partitionwise grouping
would produce results that differ from those without partitionwise
grouping, so disallowed that.

Partial partitionwise aggregation is still allowed, as the Finalize
step reconciles partition-level aggregates with grouping requirements
across all partitions, ensuring that the final output remains
consistent.

This commit also fixes group_by_has_partkey() by ensuring the
RelabelType node is stripped from grouping expressions when matching
them to partition key expressions to avoid false mismatches.

Bug: #18568
Reported-by: Webbo Han <1105066510@qq.com>
Author: Webbo Han <1105066510@qq.com>
Reviewed-by: Tender Wang <tndrwang@gmail.com>
Reviewed-by: Aleksander Alekseev <aleksander@timescale.com>
Reviewed-by: Jian He <jian.universality@gmail.com>
Discussion: https://postgr.es/m/18568-2a9afb6b9f7e6ed3@postgresql.org
Discussion: https://postgr.es/m/tencent_9D9103CDA420C07768349CC1DFF88465F90A@qq.com
Discussion: https://postgr.es/m/CAHewXNno_HKiQ6PqyLYfuqDtwp7KKHZiH1J7Pqyz0nr+PS2Dwg@mail.gmail.com
Backpatch-through: 12
2024-11-08 16:07:13 +09:00
c1099dd745 Revert "For inplace update, send nontransactional invalidations."
This reverts commit 95c5acb3fc261067ab65ddc0b2dca8e162f09442 (v17) and
counterparts in each other non-master branch.  If released, that commit
would have caused a worst-in-years minor release regression, via
undetected LWLock self-deadlock.  This commit and its self-deadlock fix
warrant more bake time in the master branch.

Reported by Alexander Lakhin.

Discussion: https://postgr.es/m/10ec0bc3-5933-1189-6bb8-5dec4114558e@gmail.com
2024-11-02 09:05:00 -07:00
a358019159 Stabilize jsonb_path_query test case.
An operation like '12:34:56'::time_tz takes the UTC offset from
the prevailing time zone, which means that the results change
across DST transitions.  One of the test cases added in ed055d249
failed to consider this.

Per report from Bernhard Wiedemann.  Back-patch to v17, as the
test case was.

Discussion: https://postgr.es/m/ba8e1bc0-8a99-45b7-8397-3f2e94415e03@suse.de
2024-10-30 11:42:34 -04:00
936ab6de95 Fix some more bugs in foreign keys connecting partitioned tables
* In DetachPartitionFinalize() we were applying a tuple conversion map
  to tuples that didn't need one, which can lead to erratic behavior if
  a partitioned table has a partition with a different column order, as
  reported by Alexander Lakhin. This was introduced by 53af9491a043.
  Don't do that.  Also, modify a recently added test case to exercise
  this.

* The same function as well as CloneFkReferenced() were acquiring
  AccessShareLock on a partition, only to have CreateTrigger() later
  acquire ShareRowExclusiveLock on it.  This can lead to deadlock by
  lock escalation, unnecessarily.  Avoid that by acquiring the stronger
  lock to begin with.  This probably dates back to branch 12, but I have
  never seen a report of this being a problem in the field.

* Innocuous but wasteful: also introduced by 53af9491a043, we were
  reading a pg_constraint tuple from syscache that we don't need, as
  reported by Tender Wang.  Don't.

Backpatch to 15.

Discussion: https://postgr.es/m/461e9c26-2076-8224-e119-84998b6a784e@gmail.com
2024-10-30 10:54:03 +01:00
bb584e8312 Fix dependency of partitioned table and table AM with CREATE TABLE .. USING
A pg_depend entry between a partitioned table and its table access
method was missing when using CREATE TABLE .. USING with an unpinned
access method.  DROP ACCESS METHOD could be used, while it should be
blocked if CASCADE is not specified, even if there was a partitioned
table that depends on the table access method.  pg_class.relam would
then hold an orphaned OID value still pointing to the AM dropped.

The problem is fixed by adding a dependency between the partitioned
table and its table access method if set when the relation is created.
A test checking the contents of pg_depend in this case is added.

Issue introduced in 374c7a229042, that has added support for CREATE
TABLE .. USING for partitioned tables.

Reviewed-by: Alexander Lakhin
Discussion: https://postgr.es/m/18674-1ef01eceec278fab@postgresql.org
Backpatch-through: 17
2024-10-29 08:41:53 +09:00
95c5acb3fc For inplace update, send nontransactional invalidations.
The inplace update survives ROLLBACK.  The inval didn't, so another
backend's DDL could then update the row without incorporating the
inplace update.  In the test this fixes, a mix of CREATE INDEX and ALTER
TABLE resulted in a table with an index, yet relhasindex=f.  That is a
source of index corruption.  Back-patch to v12 (all supported versions).
The back branch versions don't change WAL, because those branches just
added end-of-recovery SIResetAll().  All branches change the ABI of
extern function PrepareToInvalidateCacheTuple().  No PGXN extension
calls that, and there's no apparent use case in extensions.

Reviewed by Nitin Motiani and (in earlier versions) Andres Freund.

Discussion: https://postgr.es/m/20240523000548.58.nmisch@google.com
2024-10-25 06:51:06 -07:00
e119076828 Stop reading uninitialized memory in heap_inplace_lock().
Stop computing a never-used value.  This removes the read; the read had
no functional implications.  Back-patch to v12, like commit
a07e03fd8fa7daf4d1356f7cb501ffe784ea6257.

Reported by Alexander Lakhin.

Discussion: https://postgr.es/m/6c92f59b-f5bc-e58c-9bdd-d1f21c17c786@gmail.com
2024-10-24 09:16:17 -07:00
5914a22f6e Restructure foreign key handling code for ATTACH/DETACH
... to fix bugs when the referenced table is partitioned.

The catalog representation we chose for foreign keys connecting
partitioned tables (in commit f56f8f8da6af) is inconvenient, in the
sense that a standalone table has a different way to represent the
constraint when referencing a partitioned table, than when the same
table becomes a partition (and vice versa).  Because of this, we need to
create additional catalog rows on detach (pg_constraint and pg_trigger),
and remove them on attach.  We were doing some of those things, but not
all of them, leading to missing catalog rows in certain cases.

The worst problem seems to be that we are missing action triggers after
detaching a partition, which means that you could update/delete rows
from the referenced partitioned table that still had referencing rows on
that table, the server failing to throw the required errors.

!!!
Note that this means existing databases with FKs that reference
partitioned tables might have rows that break relational integrity, on
tables that were once partitions on the referencing side of the FK.

Another possible problem is that trying to reattach a table
that had been detached would fail indicating that internal triggers
cannot be found, which from the user's point of view is nonsensical.

In branches 15 and above, we fix this by creating a new helper function
addFkConstraint() which is in charge of creating a standalone
pg_constraint row, and repurposing addFkRecurseReferencing() and
addFkRecurseReferenced() so that they're only the recursive routine for
each side of the FK, and they call addFkConstraint() to create
pg_constraint at each partitioning level and add the necessary triggers.
These new routines can be used during partition creation, partition
attach and detach, and foreign key creation.  This reduces redundant
code and simplifies the flow.

In branches 14 and 13, we have a much simpler fix that consists on
simply removing the constraint on detach.  The reason is that those
branches are missing commit f4566345cf40, which reworked the way this
works in a way that we didn't consider back-patchable at the time.

We opted to leave branch 12 alone, because it's different from branch 13
enough that the fix doesn't apply; and because it is going in EOL mode
very soon, patching it now might be worse since there's no way to undo
the damage if it goes wrong.

Existing databases might need to be repaired.

In the future we might want to rethink the catalog representation to
avoid this problem, but for now the code seems to do what's required to
make the constraints operate correctly.

Co-authored-by: Jehan-Guillaume de Rorthais <jgdr@dalibo.com>
Co-authored-by: Tender Wang <tndrwang@gmail.com>
Co-authored-by: Alvaro Herrera <alvherre@alvh.no-ip.org>
Reported-by: Guillaume Lelarge <guillaume@lelarge.info>
Reported-by: Jehan-Guillaume de Rorthais <jgdr@dalibo.com>
Reported-by: Thomas Baehler (SBB CFF FFS) <thomas.baehler2@sbb.ch>
Discussion: https://postgr.es/m/20230420144344.40744130@karst
Discussion: https://postgr.es/m/20230705233028.2f554f73@karst
Discussion: https://postgr.es/m/GVAP278MB02787E7134FD691861635A8BC9032@GVAP278MB0278.CHEP278.PROD.OUTLOOK.COM
Discussion: https://postgr.es/m/18541-628a61bc267cd2d3@postgresql.org
2024-10-22 16:01:18 +02:00
3685ad6188 Fix wrong assertion and poor error messages in "COPY (query) TO".
If the query is rewritten into a NOTIFY command by a DO INSTEAD
rule, we'd get an assertion failure, or in non-assert builds
issue a rather confusing error message.  Improve that.

Also fix a longstanding grammar mistake in a nearby error message.

Per bug #18664 from Alexander Lakhin.  Back-patch to all supported
branches.

Tender Wang and Tom Lane

Discussion: https://postgr.es/m/18664-ffd0ebc2386598df@postgresql.org
2024-10-21 15:08:22 -04:00
7148cb3e30 SQL/JSON: Fix some oversights in commit b6e1157e7
The decision in b6e1157e7 to ignore raw_expr when evaluating a
JsonValueExpr was incorrect.  While its value is not ultimately
used (since formatted_expr's value is), failing to initialize it
can lead to problems, for instance,  when the expression tree in
raw_expr contains Aggref nodes, which must be initialized to
ensure the parent Agg node works correctly.

Also, optimize eval_const_expressions_mutator()'s handling of
JsonValueExpr a bit.  Currently, when formatted_expr cannot be folded
into a constant, we end up processing it twice -- once directly in
eval_const_expressions_mutator() and again recursively via
ece_generic_processing().  This recursive processing is required to
handle raw_expr. To avoid the redundant processing of formatted_expr,
we now  process raw_expr directly in eval_const_expressions_mutator().

Finally, update the comment of JsonValueExpr to describe the roles of
raw_expr and formatted_expr more clearly.

Bug: #18657
Reported-by: Alexander Lakhin <exclusion@gmail.com>
Diagnosed-by: Fabio R. Sluzala <fabio3rs@gmail.com>
Diagnosed-by: Tender Wang <tndrwang@gmail.com>
Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us>
Discussion: https://postgr.es/m/18657-1b90ccce2b16bdb8@postgresql.org
Backpatch-through: 16
2024-10-20 12:21:12 +09:00
b8d08aafc0 Fix description of PostgreSQL::Test::Cluster::wait_for_event()
The arguments of the function were listed in an incorrect order in the
description of the routine.  This information can be seen with perldoc.

Issue spotted while working on this area of the code.

Backpatch-through: 17
2024-10-18 13:50:07 +09:00
e90d108823 Fix whitespace 2024-10-17 08:42:58 +02:00