This header is semi-private, being used only in files related to
raw parsing, so move to the backend directory where those files
live. This allows removal of Makefile rules that symlink gram.h to
src/include/parser, since gramparse.h can now include gram.h from
within the same directory. This has the side-effect of no longer
installing gram.h and gramparse.h, but there doesn't seem to be a
good reason to continue doing so.
Per suggestion from Andres Freund and Peter Eisentraut
Discussion: https://www.postgresql.org/message-id/20220904181759.px6uosll6zbxcum5%40awork3.anarazel.de
PG_COMPRESSION_OPTION_LEVEL is removed from the compression
specification logic, and instead the compression level is always
assigned with each library's default if nothing is directly given. This
centralizes the checks on the compression methods supported by a given
build, and always assigns a default compression level when parsing a
compression specification. This results in complaining at an earlier
stage than previously if a build supports a compression method or not,
aka when parsing a specification in the backend or the frontend, and not
when processing it. zstd, lz4 and zlib are able to handle in their
respective routines setting up the compression level the case of a
default value, hence the backend or frontend code (pg_receivewal or
pg_basebackup) has now no need to know what the default compression
level should be if nothing is specified: the logic is now done so as the
specification parsing assigns it. It can also be enforced by passing
down a "level" set to the default value, that the backend will accept
(the replication protocol is for example able to handle a command like
BASE_BACKUP (COMPRESSION_DETAIL 'gzip:level=-1')).
This code simplification fixes an issue with pg_basebackup --gzip
introduced by ffd5365, where the tarball of the streamed WAL segments
would be created as of pg_wal.tar.gz with uncompressed contents, while
the intention is to compress the segments with gzip at a default level.
The origin of the confusion comes from the handling of the default
compression level of gzip (-1 or Z_DEFAULT_COMPRESSION) and the value of
0 was getting assigned, which is what walmethods.c would consider
as equivalent to no compression when streaming WAL segments with its tar
methods. Assigning always the compression level removes the confusion
of some code paths considering a value of 0 set in a specification as
either no compression or a default compression level.
Note that 010_pg_basebackup.pl has to be adjusted to skip a few tests
where the shape of the compression detail string for client and
server-side compression was checked using gzip. This is a result of the
code simplification, as gzip specifications cannot be used if a build
does not support it.
Reported-by: Tom Lane
Reviewed-by: Tom Lane
Discussion: https://postgr.es/m/1400032.1662217889@sss.pgh.pa.us
Backpatch-through: 15
guc.c has grown to be one of our largest .c files, making it
a bottleneck for compilation. It's also acquired a bunch of
knowledge that'd be better kept elsewhere, because of our not
very good habit of putting variable-specific check hooks here.
Hence, split it up along these lines:
* guc.c itself retains just the core GUC housekeeping mechanisms.
* New file guc_funcs.c contains the SET/SHOW interfaces and some
SQL-accessible functions for GUC manipulation.
* New file guc_tables.c contains the data arrays that define the
built-in GUC variables, along with some already-exported constant
tables.
* GUC check/assign/show hook functions are moved to the variable's
home module, whenever that's clearly identifiable. A few hard-
to-classify hooks ended up in commands/variable.c, which was
already a home for miscellaneous GUC hook functions.
To avoid cluttering a lot more header files with #include "guc.h",
I also invented a new header file utils/guc_hooks.h and put all
the GUC hook functions' declarations there, regardless of their
originating module. That allowed removal of #include "guc.h"
from some existing headers. The fallout from that (hopefully
all caught here) demonstrates clearly why such inclusions are
best minimized: there are a lot of files that, for example,
were getting array.h at two or more levels of remove, despite
not having any connection at all to GUCs in themselves.
There is some very minor code beautification here, such as
renaming a couple of inconsistently-named hook functions
and improving some comments. But mostly this just moves
code from point A to point B and deals with the ensuing
needs for #include adjustments and exporting a few functions
that previously weren't exported.
Patch by me, per a suggestion from Andres Freund; thanks also
to Michael Paquier for the idea to invent guc_funcs.c.
Discussion: https://postgr.es/m/587607.1662836699@sss.pgh.pa.us
Commit d8594d123 updated the list of non-spacing codepoints used
for calculating display width, but in doing so inadvertently removed
some, since the script used for that commit only considered combining
characters.
For complete coverage for zero-width characters, include codepoints in
the category Cf (Format). To reflect the wider purpose, also rename files
and update comments that referred specifically to combining characters.
Some of these ranges have been missing since v12, but due to lack of
field complaints it was determined not important enough to justify adding
special-case logic the backbranches.
Kyotaro Horiguchi
Report by Pavel Stehule
Discussion: https://www.postgresql.org/message-id/flat/CAFj8pRBE8yvpQ0FSkPCoe0Ny1jAAsAQ6j3qMgVwWvkqAoaaNmQ%40mail.gmail.com
This adds additional variants of palloc, pg_malloc, etc. that
encapsulate common usage patterns and provide more type safety.
Specifically, this adds palloc_object(), palloc_array(), and
repalloc_array(), which take the type name of the object to be
allocated as its first argument and cast the return as a pointer to
that type. There are also palloc0_object() and palloc0_array()
variants for initializing with zero, and pg_malloc_*() variants of all
of the above.
Inspired by the talloc library.
Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us>
Discussion: https://www.postgresql.org/message-id/flat/bb755632-2a43-d523-36f8-a1e7a389a907@enterprisedb.com
We should process completed IOs *before* trying to start more, so that
it is always possible to decode one more record when the decoded record
queue is empty, even if maintenance_io_concurrency is set so low that a
single earlier WAL record might have saturated the IO queue.
That bug was hidden because the effect of maintenance_io_concurrency was
arbitrarily clamped to be at least 2. Fix the ordering, and also remove
that clamp. We need a special case for 0, which is now treated the same
as recovery_prefetch=off, but otherwise the number is used directly.
This allows for testing with 1, which would have made the problem
obvious in simple test scenarios.
Also add an explicit error message for missing contrecords. It was a
bit strange that we didn't report an error already, and became a latent
bug with prefetching, since the internal state that tracks aborted
contrecords would not survive retrying, as revealed by
026_overwrite_contrecord.pl with this adjustment. Reporting an error
prevents that.
Back-patch to 15.
Reported-by: Justin Pryzby <pryzby@telsasoft.com>
Reviewed-by: Kyotaro Horiguchi <horikyota.ntt@gmail.com>
Discussion: https://postgr.es/m/20220831140128.GS31833%40telsasoft.com
The addition of published column names forgot to filter on attisdropped,
leading to cases where you could see "........pg.dropped.1........"
or the like as a reportedly-published column.
While we're here, rewrite the new subquery to get a more efficient plan
for it.
Hou Zhijie, per report from Jaime Casanova. Back-patch to v15 where
the bug was introduced. (Sadly, this means we need a post-beta4
catversion bump before beta4 has even hit the streets. I see no
good alternative though.)
Discussion: https://postgr.es/m/Yxa1SU4nH2HfN3/i@ahch-to
XLogPageRead() can retry internally after a pread() system call has
succeeded, in the case of short reads, and page validation failures
while in standby mode (see commit 0668719801). Due to an oversight in
commit 3f1ce973, these cases could leave stale data in the internal
cache of xlogreader.c without marking it invalid. The main defense
against stale cached data on failure to read a page was in the error
handling path of the calling function ReadPageInternal(), but that
wasn't quite enough for errors handled internally by XLogPageRead()'s
retry loop if we then exited with XLREAD_WOULDBLOCK.
1. ReadPageInternal() now marks the cache invalid before calling the
page_read callback, by setting state->readLen to 0. It'll be set to
a non-zero value only after a successful read. It'll stay valid as
long as the caller requests data in the cached range.
2. XLogPageRead() no long performs internal retries while reading
ahead. While such retries should work, the general philosophy is
that we should give up prefetching if anything unusual happens so we
can handle it when recovery catches up, to reduce the complexity of
the system. Let's do that here too.
3. While here, a new function XLogReaderResetError() improves the
separation between xlogrecovery.c and xlogreader.c, where the former
previously clobbered the latter's internal error buffer directly.
The new function makes this more explicit, and also clears a related
flag, without which a standby would needlessly retry in the outer
function.
Thanks to Noah Misch for tracking down the conditions required for a
rare build farm failure in src/bin/pg_ctl/t/003_promote.pl, and
providing a reproducer.
Back-patch to 15.
Reported-by: Noah Misch <noah@leadboat.com>
Discussion: https://postgr.es/m/20220807003627.GA4168930%40rfd.leadboat.com
The planner has to special-case indexes on boolean columns, because
what we need for an indexscan on such a column is a qual of the shape
of "boolvar = pseudoconstant". For plain bool constants, previous
simplification will have reduced this to "boolvar" or "NOT boolvar",
and we have to reverse that if we want to make an indexqual. There is
existing code to do so, but it only fires when the index's opfamily
is BOOL_BTREE_FAM_OID or BOOL_HASH_FAM_OID. Thus extension AMs, or
extension opclasses such as contrib/btree_gin, are out in the cold.
The reason for hard-wiring the set of relevant opfamilies was mostly
to avoid a catalog lookup in a hot code path. We can improve matters
while not taking much of a performance hit by relying on the
hard-wired set when the opfamily OID is visibly built-in, and only
checking the catalogs when dealing with an extension opfamily.
While here, rename IsBooleanOpfamily to IsBuiltinBooleanOpfamily
to remind future users of that macro of its limitations. At some
point we might want to make indxpath.c's improved version of the
test globally accessible, but it's not presently needed elsewhere.
Zongliang Quan and Tom Lane
Discussion: https://postgr.es/m/f293b91d-1d46-d386-b6bb-4b06ff5c667b@yeah.net
The reverts the following and makes some associated cleanups:
commit f79b803dc: Common SQL/JSON clauses
commit f4fb45d15: SQL/JSON constructors
commit 5f0adec25: Make STRING an unreserved_keyword.
commit 33a377608: IS JSON predicate
commit 1a36bc9db: SQL/JSON query functions
commit 606948b05: SQL JSON functions
commit 49082c2cc: RETURNING clause for JSON() and JSON_SCALAR()
commit 4e34747c8: JSON_TABLE
commit fadb48b00: PLAN clauses for JSON_TABLE
commit 2ef6f11b0: Reduce running time of jsonb_sqljson test
commit 14d3f24fa: Further improve jsonb_sqljson parallel test
commit a6baa4bad: Documentation for SQL/JSON features
commit b46bcf7a4: Improve readability of SQL/JSON documentation.
commit 112fdb352: Fix finalization for json_objectagg and friends
commit fcdb35c32: Fix transformJsonBehavior
commit 4cd8717af: Improve a couple of sql/json error messages
commit f7a605f63: Small cleanups in SQL/JSON code
commit 9c3d25e17: Fix JSON_OBJECTAGG uniquefying bug
commit a79153b7a: Claim SQL standard compliance for SQL/JSON features
commit a1e7616d6: Rework SQL/JSON documentation
commit 8d9f9634e: Fix errors in copyfuncs/equalfuncs support for JSON node types.
commit 3c633f32b: Only allow returning string types or bytea from json_serialize
commit 67b26703b: expression eval: Fix EEOP_JSON_CONSTRUCTOR and EEOP_JSONEXPR size.
The release notes are also adjusted.
Backpatch to release 15.
Discussion: https://postgr.es/m/40d2c882-bcac-19a9-754d-4299e1d87ac7@postgresql.org
Before now, the cutoffs that VACUUM used to determine which XIDs/MXIDs
to freeze were determined at the start of each VACUUM by taking related
cutoffs that represent which XIDs/MXIDs VACUUM should treat as still
running, and subtracting an XID/MXID age based value controlled by GUCs
like vacuum_freeze_min_age. The FreezeLimit cutoff (XID freeze cutoff)
was derived by subtracting an XID age value from OldestXmin, while the
MultiXactCutoff cutoff (MXID freeze cutoff) was derived by subtracting
an MXID age value from OldestMxact. This approach didn't match the
approach used nearby to determine whether this VACUUM operation should
be an aggressive VACUUM or not.
VACUUM now uses the standard approach instead: it subtracts the same
age-based values from next XID/next MXID (rather than subtracting from
OldestXmin/OldestMxact). This approach is simpler and more uniform.
Most of the time it will have only a negligible impact on how and when
VACUUM freezes. It will occasionally make VACUUM more robust in the
event of problems caused by long running transaction. These are cases
where OldestXmin and OldestMxact are held back by so much that they
attain an age that is a significant fraction of the value of age-based
settings like vacuum_freeze_min_age.
There is no principled reason why freezing should be affected in any way
by the presence of a long-running transaction -- at least not before the
point that the OldestXmin and OldestMxact limits used by each VACUUM
operation attain an age that makes it unsafe to freeze some of the
XIDs/MXIDs whose age exceeds the value of the relevant age-based
settings. The new approach should at least make freezing degrade more
gracefully than before, even in the most extreme cases.
Author: Peter Geoghegan <pg@bowt.ie>
Reviewed-By: Nathan Bossart <nathandbossart@gmail.com>
Reviewed-By: Matthias van de Meent <boekewurm+postgres@gmail.com>
Discussion: https://postgr.es/m/CAH2-WzkOv5CEeyOO=c91XnT5WBR_0gii0Wn5UbZhJ=4TTykDYg@mail.gmail.com
Robert Haas reported that his older clang compiler didn't like the two
Asserts which were verifying that the given MemoryContextMethodID was <=
MEMORY_CONTEXT_METHODID_MASK when building with
-Wtautological-constant-out-of-range-compare. In my (David's) opinion,
the compiler is wrong to warn about that. Newer versions of clang don't
warn about the out of range enum value, so perhaps this was a bug that has
now been fixed. To keep older clang versions happy, let's just cast the
enum value to int to stop the compiler complaining.
The main reason for the Asserts mentioned above to exist are to inform
future developers which are adding new MemoryContexts if they run out of
bit space in MemoryChunk to store the MemoryContextMethodID. As pointed
out by Tom Lane, it seems wise to also add a comment to the header for
that enum to document the restriction on these enum values.
Additionally, also fix an incorrect usage of UINT64CONST() which was
introduced in c6e0fe1f2.
Author: Robert Haas, David Rowley
Discussion: https://postgr.es/m/CA+TgmoYGG2C7Vbw1cjkQRRBL3zOk8SmhrQnsJgzscX=N9AwPrw@mail.gmail.com
This reverts commit df0f4feef. It turns out the problem which was causing
the 32-bit ARM and PPC animals to fail was due to a MAXALIGN problem in
slab.c. This was fixed by d5ee4db0e. The padding that was added in
df0f4feef would only do anything on machines where uint64 was not aligned
to 8 bytes. The 32-bit machines which were failing are not in that
category, so revert this commit.
Discussion: https://postgr.es/m/3209100.1661787561@sss.pgh.pa.us
Buildfarm animals skate, grison and mamba are Assert failing on the
pointer being given to repalloc not being MAXALIGNED. c6e0fe1f2a made
changes in that area.
All of these animals are 32-bit with a MAXIMUM_ALIGNOF of 8 and a
SIZEOF_VOID_P of 4. I suspect that the pointer is not properly aligned due
to the lack of padding in the MemoryChunk struct.
Here we add the same type of padding that was previously used in
AllocChunkData and GenerationChunk that c6e0fe1f2a neglected to add.
Discussion: https://postgr.es/m/CAA4eK1%2B1JyW5TiL%3DyV-3Uq1CrfnTyn0Xrk5uArt31Z%3D8rgPhXQ%40mail.gmail.com
NEON support is required on the Aarch64 architecture for standard
implementations. Hardware designers for specialized markets can choose
not to support it, but that's true of floating point as well, which
we assume is supported. As with x86, some SIMD support is available
on 32-bit platforms, but those are not interesting from a performance
standpoint and would require an inconvenient runtime check.
Nathan Bossart
Reviewed by John Naylor, Andres Freund, Thomas Munro, and Tom Lane
Discussion: https://www.postgresql.org/message-id/flat/CAFBsxsEyR9JkfbPcDXBRYEfdfC__OkwVGdwEAgY4Rv0cvw35EA%40mail.gmail.com#aba7a64b11503494ffd8dd27067626a9
Whenever we palloc a chunk of memory, traditionally, we prefix the
returned pointer with a pointer to the memory context to which the chunk
belongs. This is required so that we're able to easily determine the
owning context when performing operations such as pfree() and repalloc().
For the AllocSet context, prior to this commit we additionally prefixed
the pointer to the owning context with the size of the chunk. This made
the header 16 bytes in size. This 16-byte overhead was required for all
AllocSet allocations regardless of the allocation size.
For the generation context, the problem was worse; in addition to the
pointer to the owning context and chunk size, we also stored a pointer to
the owning block so that we could track the number of freed chunks on a
block.
The slab allocator had a 16-byte chunk header.
The changes being made here reduce the chunk header size down to just 8
bytes for all 3 of our memory context types. For small to medium sized
allocations, this significantly increases the number of chunks that we can
fit on a given block which results in much more efficient use of memory.
Additionally, this commit completely changes the rule that pointers to
palloc'd memory must be directly prefixed by a pointer to the owning
memory context and instead, we now insist that they're directly prefixed
by an 8-byte value where the least significant 3-bits are set to a value
to indicate which type of memory context the pointer belongs to. Using
those 3 bits as an index (known as MemoryContextMethodID) to a new array
which stores the methods for each memory context type, we're now able to
pass the pointer given to functions such as pfree() and repalloc() to the
function specific to that context implementation to allow them to devise
their own methods of finding the memory context which owns the given
allocated chunk of memory.
The reason we're able to reduce the chunk header down to just 8 bytes is
because of the way we make use of the remaining 61 bits of the required
8-byte chunk header. Here we also implement a general-purpose MemoryChunk
struct which makes use of those 61 remaining bits to allow the storage of
a 30-bit value which the MemoryContext is free to use as it pleases, and
also the number of bytes which must be subtracted from the chunk to get a
reference to the block that the chunk is stored on (also 30 bits). The 1
additional remaining bit is to denote if the chunk is an "external" chunk
or not. External here means that the chunk header does not store the
30-bit value or the block offset. The MemoryContext can use these
external chunks at any time, but must use them if any of the two 30-bit
fields are not large enough for the value(s) that need to be stored in
them. When the chunk is marked as external, it is up to the MemoryContext
to devise its own means to determine the block offset.
Using 3-bits for the MemoryContextMethodID does mean we're limiting
ourselves to only having a maximum of 8 different memory context types.
We could reduce the bit space for the 30-bit value a little to make way
for more than 3 bits, but it seems like it might be better to do that only
if we ever need more than 8 context types. This would only be a problem
if some future memory context type which does not use MemoryChunk really
couldn't give up any of the 61 remaining bits in the chunk header.
With this MemoryChunk, each of our 3 memory context types can quickly
obtain a reference to the block any given chunk is located on. AllocSet
is able to find the context to which the chunk is owned, by first
obtaining a reference to the block by subtracting the block offset as is
stored in the 'hdrmask' field and then referencing the block's 'aset'
field. The Generation context uses the same method, but GenerationBlock
did not have a field pointing back to the owning context, so one is added
by this commit.
In aset.c and generation.c, all allocations larger than allocChunkLimit
are stored on dedicated blocks. When there's just a single chunk on a
block like this, it's easy to find the block from the chunk, we just
subtract the size of the block header from the chunk pointer. The size of
these chunks is also known as we store the endptr on the block, so we can
just subtract the pointer to the allocated memory from that. Because we
can easily find the owning block and the size of the chunk for these
dedicated blocks, we just always use external chunks for allocation sizes
larger than allocChunkLimit. For generation.c, this sidesteps the problem
of non-external MemoryChunks being unable to represent chunk sizes >= 1GB.
This is less of a problem for aset.c as we store the free list index in
the MemoryChunk's spare 30-bit field (the value of which will never be
close to using all 30-bits). We can easily reverse engineer the chunk size
from this when needed. Storing this saves AllocSetFree() from having to
make a call to AllocSetFreeIndex() to determine which free list to put the
newly freed chunk on.
For the slab allocator, this commit adds a new restriction that slab
chunks cannot be >= 1GB in size. If there happened to be any users of
slab.c which used chunk sizes this large, they really should be using
AllocSet instead.
Here we also add a restriction that normal non-dedicated blocks cannot be
1GB or larger. It's now not possible to pass a 'maxBlockSize' >= 1GB
during the creation of an AllocSet or Generation context. Allocations can
still be larger than 1GB, it's just these will always be on dedicated
blocks (which do not have the 1GB restriction).
Author: Andres Freund, David Rowley
Discussion: https://postgr.es/m/CAApHDvpjauCRXcgcaL6+e3eqecEHoeRm9D-kcbuvBitgPnW=vw@mail.gmail.com
Per flame graph from Jelte Fennema, COPY FROM ... USING BINARY shows
input validation taking at least 5% of the profile, so it's worth trying
to be more efficient here. With this change, validation of pure ASCII is
nearly 40% faster on contemporary Intel hardware. To make this change
legible and easier to adopt to additional architectures, use helper
functions to abstract the platform details away.
Reviewed by Nathan Bossart
Discussion: https://www.postgresql.org/message-id/CAFBsxsG%3Dk8t%3DC457FXnoBXb%3D8iA4OaZkbFogFMachWif7mNnww%40mail.gmail.com
SUSv3 <netinet/in.h> defines struct sockaddr_in6, and all targeted Unix
systems have it. Windows has it in <ws2ipdef.h>. Remove the configure
probe, the macro and a small amount of dead code.
Also remove a mention of IPv6-less builds from the documentation, since
there aren't any.
This is similar to commits f5580882 and 077bf2f2 for Unix sockets. Even
though AF_INET6 is an "optional" component of SUSv3, there are no known
modern operating system without it, and it seems even less likely to be
omitted from future systems than AF_UNIX.
Reviewed-by: Andres Freund <andres@anarazel.de>
Discussion: https://postgr.es/m/CA+hUKGKErNfhmvb_H0UprEmp4LPzGN06yR2_0tYikjzB-2ECMw@mail.gmail.com
The GRANT statement can now specify WITH INHERIT TRUE or WITH
INHERIT FALSE to control whether the member inherits the granted
role's permissions. For symmetry, you can now likewise write
WITH ADMIN TRUE or WITH ADMIN FALSE to turn ADMIN OPTION on or off.
If a GRANT does not specify WITH INHERIT, the behavior based on
whether the member role is marked INHERIT or NOINHERIT. This means
that if all roles are marked INHERIT or NOINHERIT before any role
grants are performed, the behavior is identical to what we had before;
otherwise, it's different, because ALTER ROLE [NO]INHERIT now only
changes the default behavior of future grants, and has no effect on
existing ones.
Patch by me. Reviewed and testing by Nathan Bossart and Tushar Ahuja,
with design-level comments from various others.
Discussion: http://postgr.es/m/CA+Tgmoa5Sf4PiWrfxA=sGzDKg0Ojo3dADw=wAHOhR9dggV=RmQ@mail.gmail.com
This is preparatory work for a project to increase the number of bits
in a RelFileNumber from 32 to 56.
Along the way, introduce static inline accessor functions for a couple
of BufferTag fields.
Dilip Kumar, reviewed by me. The overall patch series has also had
review at various times from Andres Freund, Ashutosh Sharma, Hannu
Krosing, Vignesh C, Álvaro Herrera, and Tom Lane.
Discussion: http://postgr.es/m/CAFiTN-trubju5YbWAq-BSpZ90-Z6xCVBQE8BVqXqANOZAF1Znw@mail.gmail.com
This commit moves authn_id into a new global structure called
ClientConnectionInfo (mapping to a MyClientConnectionInfo for each
backend) which is intended to hold all the client information that
should be shared between the backend and any of its parallel workers,
access for extensions and triggers being the primary use case. There is
no need to push all the data of Port to the workers, and authn_id is
quite a generic concept so using a separate structure provides the best
balance (the name of the structure has been suggested by Robert Haas).
While on it, and per discussion as this would be useful for a potential
SYSTEM_USER that can be accessed through parallel workers, a second
field is added for the authentication method, copied directly from
Port.
ClientConnectionInfo is serialized and restored using a new parallel
key and a structure tracks the length of the authn_id, making the
addition of more fields straight-forward.
Author: Jacob Champion
Reviewed-by: Bertrand Drouvot, Stephen Frost, Robert Haas, Tom Lane,
Michael Paquier, Julien Rouhaud
Discussion: https://postgr.es/m/793d990837ae5c06a558d58d62de9378ab525d83.camel@vmware.com
I added this in commit 153f40067, out of paranoia about kernels
possibly rejecting very large listen backlog requests. However,
POSIX has said for decades that the kernel must silently reduce
any value it considers too large, and there's no evidence that
any current system doesn't obey that. Let's just drop this limit
and save some complication.
While we're here, compute the request as twice MaxConnections not
twice MaxBackends; the latter no longer means what it did in 2001.
Per discussion of a report from Kevin McKibbin.
Discussion: https://postgr.es/m/CADc_NKg2d+oZY9mg4DdQdoUcGzN2kOYXBu-3--RW_hEe0tUV=g@mail.gmail.com
Previously, membership of role A in role B could be recorded in the
catalog tables only once. This meant that a new grant of role A to
role B would overwrite the previous grant. For other object types, a
new grant of permission on an object - in this case role A - exists
along side the existing grant provided that the grantor is different.
Either grant can be revoked independently of the other, and
permissions remain so long as at least one grant remains. Make role
grants work similarly.
Previously, when granting membership in a role, the superuser could
specify any role whatsoever as the grantor, but for other object types,
the grantor of record must be either the owner of the object, or a
role that currently has privileges to perform a similar GRANT.
Implement the same scheme for role grants, treating the bootstrap
superuser as the role owner since roles do not have owners. This means
that attempting to revoke a grant, or admin option on a grant, can now
fail if there are dependent privileges, and that CASCADE can be used
to revoke these. It also means that you can't grant ADMIN OPTION on
a role back to a user who granted it directly or indirectly to you,
similar to how you can't give WITH GRANT OPTION on a privilege back
to a role which granted it directly or indirectly to you.
Previously, only the superuser could specify GRANTED BY with a user
other than the current user. Relax that rule to allow the grantor
to be any role whose privileges the current user posseses. This
doesn't improve compatibility with what we do for other object types,
where support for GRANTED BY is entirely vestigial, but it makes this
feature more usable and seems to make sense to change at the same time
we're changing related behaviors.
Along the way, fix "ALTER GROUP group_name ADD USER user_name" to
require the same privileges as "GRANT group_name TO user_name".
Previously, CREATEROLE privileges were sufficient for either, but
only the former form was permissible with ADMIN OPTION on the role.
Now, either CREATEROLE or ADMIN OPTION on the role suffices for
either spelling.
Patch by me, reviewed by Stephen Frost.
Discussion: http://postgr.es/m/CA+TgmoaFr-RZeQ+WoQ5nKPv97oT9+aDgK_a5+qWHSgbDsMp1Vg@mail.gmail.com
Remove four probes for members of sockaddr_storage. Keep only the probe
for sockaddr's sa_len, which is enough for our two remaining places that
know about _len fields:
1. ifaddr.c needs to know if sockaddr has sa_len to understand the
result of ioctl(SIOCGIFCONF). Only AIX is still using the relevant code
today, but it seems like a good idea to keep it compilable on Linux.
2. ip.c was testing for presence of ss_len to decide whether to fill in
sun_len in our getaddrinfo_unix() function. It's just as good to test
for sa_len. If you have one, you have them all.
(The code in #2 isn't actually needed at all on several OSes I checked
since modern versions ignore sa_len on input to system calls. Proving
that's the case for all relevant OSes is left for another day, but
wouldn't get rid of that last probe anyway if we still want it for #1.)
Discussion: https://postgr.es/m/CA%2BhUKGJJjF2AqdU_Aug5n2MAc1gr%3DGykNjVBZq%2Bd6Jrcp3Dyvg%40mail.gmail.com
Previously, "GRANT foo TO bar" or "GRANT foo TO bar GRANTED BY baz"
would record the OID of the grantor in pg_auth_members.grantor, but
that role could later be dropped without modifying or removing the
pg_auth_members record. That's not great, because we typically try
to avoid dangling references in catalog data.
Now, a role grant depends on the grantor, and the grantor can't be
dropped without removing the grant or changing the grantor. "DROP
OWNED BY" will remove the grant, just as it does for other kinds of
privileges. "REASSIGN OWNED BY" will not, again just like what we do
in other cases involving privileges.
pg_auth_members now has an OID column, because that is needed in order
for dependencies to work. It also now has an index on the grantor
column, because otherwise dropping a role would require a sequential
scan of the entire table to see whether the role's OID is in use as
a grantor. That probably wouldn't be too large a problem in practice,
but it seems better to have an index just in case.
A follow-on patch is planned with the goal of more thoroughly
rationalizing the behavior of role grants. This patch is just trying
to do enough to make sure that the data we store in the catalogs is at
some basic level valid.
Patch by me, reviewed by Stephen Frost
Discussion: http://postgr.es/m/CA+TgmoaFr-RZeQ+WoQ5nKPv97oT9+aDgK_a5+qWHSgbDsMp1Vg@mail.gmail.com
The present implementations of adjust_appendrel_attrs_multilevel and
its sibling adjust_child_relids_multilevel are very messy, because
they work by reconstructing the relids of the child's immediate
parent and then seeing if that's bms_equal to the relids of the
target parent. Aside from being quite inefficient, this will not
work with planned future changes to make joinrels' relid sets
contain outer-join relids in addition to baserels.
The whole thing can be solved at a stroke by adding explicit parent
and top_parent links to child RelOptInfos, and making these functions
work with RelOptInfo pointers instead of relids. Doing that is
simpler for most callers, too.
In my original version of this patch, I got rid of
RelOptInfo.top_parent_relids on the grounds that it was now redundant.
However, that adds a lot of code churn in places that otherwise would
not need changing, and arguably the extra indirection needed to fetch
top_parent->relids in those places costs something. So this version
leaves that field in place.
Discussion: https://postgr.es/m/553080.1657481916@sss.pgh.pa.us