postgresql

mirror of https://git.postgresql.org/git/postgresql.git synced 2026-02-18 12:26:59 +08:00

Author	SHA1	Message	Date
Alexander Korotkov	014f9f34d2	Move pg_wal_replay_wait() to xlogfuncs.c This commit moves pg_wal_replay_wait() procedure to be a neighbor of WAL-related functions in xlogfuncs.c. The implementation of LSN waiting continues to reside in the same place. By proposal from Michael Paquier. Reported-by: Peter Eisentraut Discussion: https://postgr.es/m/18c0fa64-0475-415e-a1bd-665d922c5201%40eisentraut.org	2024-09-19 14:26:11 +03:00
Nathan Bossart	b52c4fc3c0	Add TOAST table to pg_index. This change allows pg_index rows to use out-of-line storage for the "indexprs" and "indpred" columns, which enables use-cases such as very large index expressions. This system catalog was previously not given a TOAST table due to a fear of circularity issues (see commit 96cdeae07f). Testing has not revealed any such problems, and it seems unlikely that the entries for system indexes could ever need out-of-line storage. In any case, it is still early in the v18 development cycle, so committing this now will hopefully increase the chances of finding any unexpected problems prior to release. Bumps catversion. Reported-by: Jonathan Katz Reviewed-by: Tom Lane Discussion: https://postgr.es/m/b611015f-b423-458c-aa2d-be0e655cc1b4%40postgresql.org	2024-09-18 14:42:57 -05:00
Michael Paquier	b14e9ce7d5	Extend PgStat_HashKey.objid from 4 to 8 bytes This opens the possibility to define keys for more types of statistics kinds in PgStat_HashKey, the first case being 8-byte query IDs for statistics like pg_stat_statements. This increases the size of PgStat_HashKey from 12 to 16 bytes, while PgStatShared_HashEntry, entry stored in the dshash for pgstats, keeps the same size due to alignment. xl_xact_stats_item, that tracks the stats items to drop in commit WAL records, is increased from 12 to 16 bytes. Note that individual chunks in commit WAL records should be multiples of sizeof(int), hence 8-byte object IDs are stored as two uint32, based on a suggestion from Heikki Linnakangas. While on it, the field of PgStat_HashKey is renamed from "objoid" to "objid", as for some stats kinds this field does not refer to OIDs but just IDs, like for replication slot stats. This commit bumps the following format variables: - PGSTAT_FILE_FORMAT_ID, as PgStat_HashKey is written to the stats file for non-serialized stats kinds in the dshash table. - XLOG_PAGE_MAGIC for the changes in xl_xact_stats_item. - Catalog version, for the SQL function pg_stat_have_stats(). Reviewed-by: Bertrand Drouvot Discussion: https://postgr.es/m/ZsvTS9EW79Up8I62@paquier.xyz	2024-09-18 12:44:15 +09:00
Thomas Munro	70d38e3d8a	Allow ReadStream to be consumed as raw block numbers. Commits 041b9680 and 6377e12a changed the interface of scan_analyze_next_block() to take a ReadStream instead of a BlockNumber and a BufferAccessStrategy, and to return a value to indicate when the stream has run out of blocks. This caused integration problems for at least one known extension that uses specially encoded BlockNumber values that map to different underlying storage, because acquire_sample_rows() sets up the stream so that read_stream_next_buffer() reads blocks from the main fork of the relation's SMgrRelation. Provide read_stream_next_block(), as a way for such an extension to access the stream of raw BlockNumbers directly and forward them to its own ReadBuffer() calls after decoding, as it could in earlier releases. The new function returns the BlockNumber and BufferAccessStrategy that were previously passed directly to scan_analyze_next_block(). Alternatively, an extension could wrap the stream of BlockNumbers in another ReadStream with a callback that performs any decoding required to arrive at real storage manager BlockNumber values, so that it could benefit from the I/O combining and concurrency provided by read_stream.c. Another class of table access method that does nothing in scan_analyze_next_block() because it is not block-oriented could use this function to control the number of block sampling loops. It could match the previous behavior with "return read_stream_next_block(stream, &bas) != InvalidBlockNumber". Ongoing work is expected to provide better ANALYZE support for table access methods that don't behave like heapam with respect to storage blocks, but that will be for future releases. Back-patch to 17. Reported-by: Mats Kindahl <mats@timescale.com> Reviewed-by: Mats Kindahl <mats@timescale.com> Discussion: https://postgr.es/m/CA%2B14425%2BCcm07ocG97Fp%2BFrD9xUXqmBKFvecp0p%2BgV2YYR258Q%40mail.gmail.com	2024-09-18 11:34:28 +12:00
Peter Eisentraut	89f908a6d0	Add temporal FOREIGN KEY contraints Add PERIOD clause to foreign key constraint definitions. This is supported for range and multirange types. Temporal foreign keys check for range containment instead of equality. This feature matches the behavior of the SQL standard temporal foreign keys, but it works on PostgreSQL's native ranges instead of SQL's "periods", which don't exist in PostgreSQL (yet). Reference actions ON {UPDATE,DELETE} {CASCADE,SET NULL,SET DEFAULT} are not supported yet. (previously committed as 34768ee3616, reverted by 8aee330af55; this is essentially unchanged from those) Author: Paul A. Jungwirth <pj@illuminatedcomputing.com> Reviewed-by: Peter Eisentraut <peter@eisentraut.org> Reviewed-by: jian he <jian.universality@gmail.com> Discussion: https://www.postgresql.org/message-id/flat/CA+renyUApHgSZF9-nd-a0+OPGharLQLO=mDHcY4_qQ0+noCUVg@mail.gmail.com	2024-09-17 11:29:30 +02:00
Peter Eisentraut	fc0438b4e8	Add temporal PRIMARY KEY and UNIQUE constraints Add WITHOUT OVERLAPS clause to PRIMARY KEY and UNIQUE constraints. These are backed by GiST indexes instead of B-tree indexes, since they are essentially exclusion constraints with = for the scalar parts of the key and && for the temporal part. (previously committed as 46a0cd4cefb, reverted by 46a0cd4cefb; the new part is this:) Because 'empty' && 'empty' is false, the temporal PK/UQ constraint allowed duplicates, which is confusing to users and breaks internal expectations. For instance, when GROUP BY checks functional dependencies on the PK, it allows selecting other columns from the table, but in the presence of duplicate keys you could get the value from any of their rows. So we need to forbid empties. This all means that at the moment we can only support ranges and multiranges for temporal PK/UQs, unlike the original patch (above). Documentation and tests for this are added. But this could conceivably be extended by introducing some more general support for the notion of "empty" for other types. Author: Paul A. Jungwirth <pj@illuminatedcomputing.com> Reviewed-by: Peter Eisentraut <peter@eisentraut.org> Reviewed-by: jian he <jian.universality@gmail.com> Discussion: https://www.postgresql.org/message-id/flat/CA+renyUApHgSZF9-nd-a0+OPGharLQLO=mDHcY4_qQ0+noCUVg@mail.gmail.com	2024-09-17 11:29:30 +02:00
Peter Eisentraut	7406ab623f	Add stratnum GiST support function This is support function 12 for the GiST AM and translates "well-known" RT*StrategyNumber values into whatever strategy number is used by the opclass (since no particular numbers are actually required). We will use this to support temporal PRIMARY KEY/UNIQUE/FOREIGN KEY/FOR PORTION OF functionality. This commit adds two implementations, one for internal GiST opclasses (just an identity function) and another for btree_gist opclasses. It updates btree_gist from 1.7 to 1.8, adding the support function for all its opclasses. (previously committed as 6db4598fcb8, reverted by 8aee330af55; this is essentially unchanged from those) Author: Paul A. Jungwirth <pj@illuminatedcomputing.com> Reviewed-by: Peter Eisentraut <peter@eisentraut.org> Reviewed-by: jian he <jian.universality@gmail.com> Discussion: https://www.postgresql.org/message-id/flat/CA+renyUApHgSZF9-nd-a0+OPGharLQLO=mDHcY4_qQ0+noCUVg@mail.gmail.com	2024-09-17 11:29:29 +02:00
Jeff Davis	b0c30612c5	Simplify checks for deterministic collations. Remove redundant checks for locale->collate_is_c now that we always have a valid pg_locale_t. Also, remove pg_locale_deterministic() wrapper, which is no longer useful after commit e9931bfb75. Just check the field directly, consistent with other fields in pg_locale_t. Author: Andreas Karlsson Discussion: https://postgr.es/m/60929555-4709-40a7-b136-bcb44cff5a3c@proxel.se	2024-09-12 13:35:56 -07:00
Fujii Masao	4eada203a5	Add has_largeobject_privilege function. This function checks whether a user has specific privileges on a large object, identified by OID. The user can be provided by name, OID, or default to the current user. If the specified large object doesn't exist, the function returns NULL. It raises an error for a non-existent user name. This behavior is basically consistent with other privilege inquiry functions like has_table_privilege. Bump catalog version. Author: Yugo Nagata Reviewed-by: Fujii Masao Discussion: https://postgr.es/m/20240702163444.ab586f6075e502eb84f11b1a@sranhm.sraoss.co.jp	2024-09-12 21:51:26 +09:00
Fujii Masao	412229d197	Deduplicate code in LargeObjectExists and myLargeObjectExists. myLargeObjectExists() and LargeObjectExists() had nearly identical code, except for handling snapshots. This commit renames myLargeObjectExists() to LargeObjectExistsWithSnapshot() and refactors LargeObjectExists() to call it internally, reducing duplication. Author: Yugo Nagata Reviewed-by: Fujii Masao Discussion: https://postgr.es/m/20240702163444.ab586f6075e502eb84f11b1a@sranhm.sraoss.co.jp	2024-09-12 21:45:42 +09:00
Peter Eisentraut	23d0b48468	Remove hardcoded hash opclass function signature exceptions hashvalidate(), which validates the signatures of support functions for the hash AM, contained several hardcoded exceptions. For example, hash/date_ops support function 1 was hashint4(), which would ordinarily fail validation because the function argument is int4, not date. But this works internally because int4 and date are of the same size. There are several more exceptions like this that happen to work and were allowed historically but would now fail the function signature validation. This patch removes those exceptions by providing new support functions that have the proper declared signatures. They internally share most of the code with the "wrong" functions they replace, so the behavior is still the same. With the exceptions gone, hashvalidate() is now simplified and relies fully on check_amproc_signature(). hashvarlena() and hashvarlenaextended() are kept in pg_proc.dat because some extensions currently use them to build hash functions for their own types, and we need to keep exposing these functions as "LANGUAGE internal" functions for that to continue to work. Reviewed-by: Tom Lane <tgl@sss.pgh.pa.us> Discussion: https://www.postgresql.org/message-id/flat/29c3b746-69e7-482a-b37c-dbbf7e5b009b@eisentraut.org	2024-09-12 12:57:43 +02:00
Michael Paquier	00c76cf21c	Move logic related to WAL replay of Heap/Heap2 into its own file This brings more clarity to heapam.c, by cleanly separating all the logic related to WAL replay and the rest of Heap and Heap2, similarly to other RMGRs like hash, btree, etc. The header reorganization is also nice in heapam.c, cutting half of the headers required. Author: Li Yong Reviewed-by: Sutou Kouhei, Michael Paquier Discussion: https://postgr.es/m/EFE55E65-D7BD-4C6A-B630-91F43FD0771B@ebay.com	2024-09-12 13:32:05 +09:00
David Rowley	9fba1ed294	Adjust tuplestore stats API 1eff8279d added an API to tuplestore.c to allow callers to obtain storage telemetry data. That API wasn't quite good enough for callers that perform tuplestore_clear() as the telemetry functions only accounted for the current state of the tuplestore, not the maximums before tuplestore_clear() was called. There's a pending patch that would like to add tuplestore telemetry output to EXPLAIN ANALYZE for WindowAgg. That node type uses tuplestore_clear() before moving to the next window partition and we want to show the maximum space used, not the space used for the final partition. Reviewed-by: Tatsuo Ishii, Ashutosh Bapat Discussion: https://postgres/m/CAApHDvoY8cibGcicLV0fNh=9JVx9PANcWvhkdjBnDCc9Quqytg@mail.gmail.com	2024-09-12 16:02:01 +12:00
Peter Eisentraut	0785d1b8b2	common/jsonapi: support libpq as a client Based on a patch by Michael Paquier. For libpq, use PQExpBuffer instead of StringInfo. This requires us to track allocation failures so that we can return JSON_OUT_OF_MEMORY as needed rather than exit()ing. Author: Jacob Champion <jacob.champion@enterprisedb.com> Co-authored-by: Michael Paquier <michael@paquier.xyz> Co-authored-by: Daniel Gustafsson <daniel@yesql.se> Reviewed-by: Peter Eisentraut <peter@eisentraut.org> Discussion: https://www.postgresql.org/message-id/flat/d1b467a78e0e36ed85a09adf979d04cf124a9d4b.camel@vmware.com	2024-09-11 09:01:07 +02:00
Peter Eisentraut	56fead44dc	Add amgettreeheight index AM API routine The only current implementation is for btree where it calls _bt_getrootheight(). Other index types can now also use this to pass information to their amcostestimate routine. Previously, btree was hardcoded and other index types could not hook into the optimizer at this point. Author: Mark Dilger <mark.dilger@enterprisedb.com> Discussion: https://www.postgresql.org/message-id/flat/E72EAA49-354D-4C2E-8EB9-255197F55330@enterprisedb.com	2024-09-10 10:03:23 +02:00
Richard Guo	f5050f795a	Mark expressions nullable by grouping sets When generating window_pathkeys, distinct_pathkeys, or sort_pathkeys, we failed to realize that the grouping/ordering expressions might be nullable by grouping sets. As a result, we may incorrectly deem that the PathKeys are redundant by EquivalenceClass processing and thus remove them from the pathkeys list. That would lead to wrong results in some cases. To fix this issue, we mark the grouping expressions nullable by grouping sets if that is the case. If the grouping expression is a Var or PlaceHolderVar or constructed from those, we can just add the RT index of the RTE_GROUP RTE to the existing nullingrels field(s); otherwise we have to add a PlaceHolderVar to carry on the nullingrel bit. However, we have to manually remove this nullingrel bit from expressions in various cases where these expressions are logically below the grouping step, such as when we generate groupClause pathkeys for grouping sets, or when we generate PathTarget for initial input to grouping nodes. Furthermore, in set_upper_references, the targetlist and quals of an Agg node should have nullingrels that include the effects of the grouping step, ie they will have nullingrels equal to the input Vars/PHVs' nullingrels plus the nullingrel bit that references the grouping RTE. In order to perform exact nullingrels matches, we also need to manually remove this nullingrel bit. Bump catversion because this changes the querytree produced by the parser. Thanks to Tom Lane for the idea to invent a new kind of RTE. Per reports from Geoff Winkless, Tobias Wendorff, Richard Guo from various threads. Author: Richard Guo Reviewed-by: Ashutosh Bapat, Sutou Kouhei Discussion: https://postgr.es/m/CAMbWs4_dp7e7oTwaiZeBX8+P1rXw4ThkZxh1QG81rhu9Z47VsQ@mail.gmail.com	2024-09-10 12:36:48 +09:00
Richard Guo	247dea89f7	Introduce an RTE for the grouping step If there are subqueries in the grouping expressions, each of these subqueries in the targetlist and HAVING clause is expanded into distinct SubPlan nodes. As a result, only one of these SubPlan nodes would be converted to reference to the grouping key column output by the Agg node; others would have to get evaluated afresh. This is not efficient, and with grouping sets this can cause wrong results issues in cases where they should go to NULL because they are from the wrong grouping set. Furthermore, during re-evaluation, these SubPlan nodes might use nulled column values from grouping sets, which is not correct. This issue is not limited to subqueries. For other types of expressions that are part of grouping items, if they are transformed into another form during preprocessing, they may fail to match lower target items. This can also lead to wrong results with grouping sets. To fix this issue, we introduce a new kind of RTE representing the output of the grouping step, with columns that are the Vars or expressions being grouped on. In the parser, we replace the grouping expressions in the targetlist and HAVING clause with Vars referencing this new RTE, so that the output of the parser directly expresses the semantic requirement that the grouping expressions be gotten from the grouping output rather than computed some other way. In the planner, we first preprocess all the columns of this new RTE and then replace any Vars in the targetlist and HAVING clause that reference this new RTE with the underlying grouping expressions, so that we will have only one instance of a SubPlan node for each subquery contained in the grouping expressions. Bump catversion because this changes the querytree produced by the parser. Thanks to Tom Lane for the idea to invent a new kind of RTE. Per reports from Geoff Winkless, Tobias Wendorff, Richard Guo from various threads. Author: Richard Guo Reviewed-by: Ashutosh Bapat, Sutou Kouhei Discussion: https://postgr.es/m/CAMbWs4_dp7e7oTwaiZeBX8+P1rXw4ThkZxh1QG81rhu9Z47VsQ@mail.gmail.com	2024-09-10 12:35:34 +09:00
Robert Haas	cdb6b0fdb0	Add PQfullProtocolVersion() to surface the precise protocol version. The existing function PQprotocolVersion() does not include the minor version of the protocol. In preparation for pending work that will bump that number for the first time, add a new function to provide it to clients that may care, using the (major * 10000 + minor) convention already used by PQserverVersion(). Jacob Champion based on earlier work by Jelte Fennema-Nio Discussion: http://postgr.es/m/CAOYmi+mM8+6Swt1k7XsLcichJv8xdhPnuNv7-02zJWsezuDL+g@mail.gmail.com	2024-09-09 11:54:55 -04:00
Michael Paquier	fc415edf8c	Add callbacks to control flush of fixed-numbered stats This commit adds two callbacks in pgstats to have a better control of the flush timing of pgstat_report_stat(), whose operation depends on the three PGSTAT_*_INTERVAL variables: - have_fixed_pending_cb(), to check if a stats kind has any pending data waiting for a flush. This is used as a fast path if there are no pending statistics to flush, and this check is done for fixed-numbered statistics only if there are no variable-numbered statistics to flush. A flush will need to happen if at least one callback reports any pending data. - flush_fixed_cb(), to do the actual flush. These callbacks are currently used by the SLRU, WAL and IO statistics, generalizing the concept for all stats kinds (builtin and custom). The SLRU and IO stats relied each on one global variable to determine whether a flush should happen; these are now local to pgstat_slru.c and pgstat_io.c, cleaning up a bit how the pending flush states are tracked in pgstat.c. pgstat_flush_io() and pgstat_flush_wal() are still required, but we do not need to check their return result anymore. Reviewed-by: Bertrand Drouvot, Kyotaro Horiguchi Discussion: https://postgr.es/m/ZtaVO0N-aTwiAk3w@paquier.xyz	2024-09-09 11:12:29 +09:00
Jeff Davis	51edc4ca54	Remove lc_ctype_is_c(). Instead always fetch the locale and look at the ctype_is_c field. hba.c relies on regexes working for the C locale without needing catalog access, which worked before due to a special case for C_COLLATION_OID in lc_ctype_is_c(). Move the special case to pg_set_regex_collation() now that lc_ctype_is_c() is gone. Author: Andreas Karlsson Discussion: https://postgr.es/m/60929555-4709-40a7-b136-bcb44cff5a3c@proxel.se	2024-09-06 13:23:21 -07:00
Amit Langote	3422f5f93f	Update comment about ExprState.escontext The updated comment provides more helpful guidance by mentioning that escontext should be set when soft error handling is needed. Reported-by: Jian He <jian.universality@gmail.com> Discussion: https://postgr.es/m/CACJufxEo4sUjKCYtda0_qt9tazqqKPmF1cqhW9KBOUeJFqQd2g@mail.gmail.com Backpatch-through: 17	2024-09-06 10:13:53 +09:00
Michael Paquier	1b373aed20	Add callback for backend initialization in pgstats pgstat_initialize() is currently used by the WAL stats as a code path to take some custom actions when a backend starts. A callback is added to generalize the concept so as all stats kinds can do the same, for builtin and custom kinds, if set. Reviewed-by: Bertrand Drouvot, Kyotaro Horiguchi Discussion: https://postgr.es/m/ZtZr1K4PLdeWclXY@paquier.xyz	2024-09-05 16:05:21 +09:00
David Rowley	908a968612	Optimize WindowAgg's use of tuplestores When WindowAgg finished one partition of a PARTITION BY, it previously would call tuplestore_end() to purge all the stored tuples before again calling tuplestore_begin_heap() and carefully setting up all of the tuplestore read pointers exactly as required for the given frameOptions. Since the frameOptions don't change between partitions, this part does not make much sense. For queries that had very few rows per partition, the overhead of this was very large. It seems much better to create the tuplestore and the read pointers once and simply call tuplestore_clear() at the end of each partition. tuplestore_clear() moves all of the read pointers back to the start position and deletes all the previously stored tuples. A simple test query with 1 million partitions and 1 tuple per partition has been shown to run around 40% faster than without this change. The additional effort seems to have mostly been spent in malloc/free. Making this work required adding a new bool field to WindowAggState which had the unfortunate effect of being the 9th bool field in a group resulting in the struct being enlarged. Here we shuffle the fields around a little so that the two bool fields for runcondition relating stuff fit into existing padding. Also, move the "runcondition" field to be near those. This frees up enough space with the other bool fields so that the newly added one fits into the padding bytes. This was done to address a very small but apparent performance regression with queries containing a large number of rows per partition. Reviewed-by: Ashutosh Bapat <ashutosh.bapat.oss@gmail.com> Reviewed-by: Tatsuo Ishii <ishii@postgresql.org> Discussion: https://postgr.es/m/CAHoyFK9n-QCXKTUWT_xxtXninSMEv%2BgbJN66-y6prM3f4WkEHw%40mail.gmail.com	2024-09-05 16:18:30 +12:00
Jeff Davis	06421b0843	Remove lc_collate_is_c(). Instead just look up the collation and check collate_is_c field. Author: Andreas Karlsson Discussion: https://postgr.es/m/60929555-4709-40a7-b136-bcb44cff5a3c@proxel.se	2024-09-04 14:35:25 -07:00
Michael Paquier	b4db64270e	Apply more quoting to GUC names in messages This is a continuation of 17974ec25946. More quotes are applied to GUC names in error messages and hints, taking care of what seems to be all the remaining holes currently in the tree for the GUCs. Author: Peter Smith Discussion: https://postgr.es/m/CAHut+Pv-kSN8SkxSdoHano_wPubqcg5789ejhCDZAcLFceBR-w@mail.gmail.com	2024-09-04 13:50:44 +09:00
Amit Kapila	6c2b5edecc	Collect statistics about conflicts in logical replication. This commit adds columns in view pg_stat_subscription_stats to show the number of times a particular conflict type has occurred during the application of logical replication changes. The following columns are added: confl_insert_exists: Number of times a row insertion violated a NOT DEFERRABLE unique constraint. confl_update_origin_differs: Number of times an update was performed on a row that was previously modified by another origin. confl_update_exists: Number of times that the updated value of a row violates a NOT DEFERRABLE unique constraint. confl_update_missing: Number of times that the tuple to be updated is missing. confl_delete_origin_differs: Number of times a delete was performed on a row that was previously modified by another origin. confl_delete_missing: Number of times that the tuple to be deleted is missing. The update_origin_differs and delete_origin_differs conflicts can be detected only when track_commit_timestamp is enabled. Author: Hou Zhijie Reviewed-by: Shveta Malik, Peter Smith, Anit Kapila Discussion: https://postgr.es/m/OS0PR01MB57160A07BD575773045FC214948F2@OS0PR01MB5716.jpnprd01.prod.outlook.com	2024-09-04 08:55:21 +05:30
Noah Misch	c582b75851	Add block_range_read_stream_cb(), to deduplicate code. This replaces two functions for iterating over all blocks in a range. A pending patch will use this instead of adding a third. Nazir Bilal Yavuz Discussion: https://postgr.es/m/20240820184742.f2.nmisch@google.com	2024-09-03 10:46:20 -07:00
Peter Eisentraut	2b5f57977f	Add const qualifiers to XLogRegister*() functions Add const qualifiers to XLogRegisterData() and XLogRegisterBufData(). Several unconstify() calls can be removed. Reviewed-by: Aleksander Alekseev <aleksander@timescale.com> Discussion: https://www.postgresql.org/message-id/dd889784-9ce7-436a-b4f1-52e4a5e577bd@eisentraut.org	2024-09-03 08:06:03 +02:00
Michael Paquier	4236825197	Fix typos and grammar in code comments and docs Author: Alexander Lakhin Discussion: https://postgr.es/m/f7e514cf-2446-21f1-a5d2-8c089a6e2168@gmail.com	2024-09-03 14:49:04 +09:00
Michael Paquier	c7cd2d6ed0	Define PG_TBLSPC_DIR for path pg_tblspc/ in data folder Similarly to 2065ddf5e34c, this introduces a define for "pg_tblspc". This makes the style more consistent with the existing PG_STAT_TMP_DIR, for example. There is a difference with the other cases with the introduction of PG_TBLSPC_DIR_SLASH, required in two places for recovery and backups. Author: Bertrand Drouvot Reviewed-by: Ashutosh Bapat, Álvaro Herrera, Yugo Nagata, Michael Paquier Discussion: https://postgr.es/m/ZryVvjqS9SnV1GPP@ip-10-97-1-34.eu-west-3.compute.internal	2024-09-03 09:11:54 +09:00
Daniel Gustafsson	a70e01d430	Remove support for OpenSSL older than 1.1.0 OpenSSL 1.0.2 has been EOL from the upstream OpenSSL project for some time, and is no longer the default OpenSSL version with any vendor which package PostgreSQL. By retiring support for OpenSSL 1.0.2 we can remove a lot of no longer required complexity for managing state within libcrypto which is now handled by OpenSSL. Reviewed-by: Jacob Champion <jacob.champion@enterprisedb.com> Reviewed-by: Peter Eisentraut <peter@eisentraut.org> Reviewed-by: Michael Paquier <michael@paquier.xyz> Discussion: https://postgr.es/m/ZG3JNursG69dz1lr@paquier.xyz Discussion: https://postgr.es/m/CA+hUKGKh7QrYzu=8yWEUJvXtMVm_CNWH1L_TLWCbZMwbi1XP2Q@mail.gmail.com	2024-09-02 13:51:48 +02:00
Peter Eisentraut	4d5111b3f1	More use of getpwuid_r() directly Remove src/port/user.c, call getpwuid_r() directly. This reduces some complexity and allows better control of the error behavior. For example, the old code would in some circumstances silently truncate the result string, or produce error message strings that the caller wouldn't use. src/port/user.c used to be called src/port/thread.c and contained various portability complications to support thread-safety. These are all obsolete, and all but the user-lookup functions have already been removed. This patch completes this by also removing the user-lookup functions. Also convert src/backend/libpq/auth.c to use getpwuid_r() for thread-safety. Originally, I tried to be overly correct by using sysconf(_SC_GETPW_R_SIZE_MAX) to get the buffer size for getpwuid_r(), but that doesn't work on FreeBSD. All the OS where I could find the source code internally use 1024 as the suggested buffer size, so I just ended up hardcoding that. The previous code used BUFSIZ, which is an unrelated constant from stdio.h, so its use seemed inappropriate. Reviewed-by: Heikki Linnakangas <hlinnaka@iki.fi> Discussion: https://www.postgresql.org/message-id/flat/5f293da9-ceb4-4937-8e52-82c25db8e4d3%40eisentraut.org	2024-09-02 09:04:30 +02:00
Michael Paquier	c39afc38cf	Define PG_LOGICAL_DIR for path pg_logical/ in data folder This is similar to 2065ddf5e34c, but this time for pg_logical/ itself and its contents, like the paths for snapshots, mappings or origin checkpoints. Author: Bertrand Drouvot Reviewed-by: Ashutosh Bapat, Yugo Nagata, Michael Paquier Discussion: https://postgr.es/m/ZryVvjqS9SnV1GPP@ip-10-97-1-34.eu-west-3.compute.internal	2024-08-30 15:25:12 +09:00
Michael Paquier	2065ddf5e3	Define PG_REPLSLOT_DIR for path pg_replslot/ in data folder This commit replaces most of the hardcoded values of "pg_replslot" by a new PG_REPLSLOT_DIR #define. This makes the style more consistent with the existing PG_STAT_TMP_DIR, for example. More places will follow a similar change. Author: Bertrand Drouvot Reviewed-by: Ashutosh Bapat, Yugo Nagata, Michael Paquier Discussion: https://postgr.es/m/ZryVvjqS9SnV1GPP@ip-10-97-1-34.eu-west-3.compute.internal	2024-08-30 10:42:21 +09:00
Michael Paquier	a83a944e9f	Rename pg_sequence_read_tuple() to pg_get_sequence_data() This commit removes log_cnt from the tuple returned by the SQL function. This field is an internal counter that tracks when a WAL record should be generated for a sequence, and it is reset each time the sequence is restored or recovered. It is not necessary to rebuild the sequence DDL commands for pg_dump and pg_upgrade where this function is used. The field can still be queried with a scan of the "table" created under-the-hood for a sequence. Issue noticed while hacking on a feature that can rely on this new function rather than pg_sequence_last_value(), aimed at making sequence computation more easily pluggable. Bump catalog version. Reviewed-by: Nathan Bossart Discussion: https://postgr.es/m/Zsvka3r-y2ZoXAdH@paquier.xyz	2024-08-30 08:49:24 +09:00
Heikki Linnakangas	478846e768	Rename some shared memory initialization routines To make them follow the usual naming convention where FoobarShmemSize() calculates the amount of shared memory needed by Foobar subsystem, and FoobarShmemInit() performs the initialization. I didn't rename CreateLWLocks() and InitShmmeIndex(), because they are a little special. They need to be called before any of the other ShmemInit() functions, because they set up the shared memory bookkeeping itself. I also didn't rename InitProcGlobal(), because unlike other Shmeminit functions, it's not called by individual backends. Reviewed-by: Andreas Karlsson Discussion: https://www.postgresql.org/message-id/c09694ff-2453-47e5-b26c-32a16cd75ce6@iki.fi	2024-08-29 09:46:21 +03:00
Heikki Linnakangas	fbce7dfc77	Refactor lock manager initialization to make it a bit less special Split the shared and local initialization to separate functions, and follow the common naming conventions. With this, we no longer create the LockMethodLocalHash hash table in the postmaster process, which was always pointless. Reviewed-by: Andreas Karlsson Discussion: https://www.postgresql.org/message-id/c09694ff-2453-47e5-b26c-32a16cd75ce6@iki.fi	2024-08-29 09:46:06 +03:00
Amit Kapila	640178c92e	Rename the conflict types for the origin differ cases. The conflict types 'update_differ' and 'delete_differ' indicate that a row to be modified was previously altered by another origin. Rename those to 'update_origin_differs' and 'delete_origin_differs' to clarify their meaning. Author: Hou Zhijie Reviewed-by: Shveta Malik, Peter Smith Discussion: https://postgr.es/m/CAA4eK1+HEKwG_UYt4Zvwh5o_HoCKCjEGesRjJX38xAH3OxuuYA@mail.gmail.com	2024-08-29 09:12:12 +05:30
Peter Eisentraut	6654bb9204	Add prefetching support on macOS macOS doesn't have posix_fadvise(), but fcntl() with the F_RDADVISE command does the same thing. Some related documentation has been generalized to not mention posix_advise() specifically anymore. Reviewed-by: Thomas Munro <thomas.munro@gmail.com> Discussion: https://www.postgresql.org/message-id/flat/0827edec-1317-4917-a186-035eb1e3241d%40eisentraut.org	2024-08-28 07:28:27 +02:00
Alexander Korotkov	3890d90c15	Revert support for ALTER TABLE ... MERGE/SPLIT PARTITION(S) commands This commit reverts 1adf16b8fb, 87c21bb941, and subsequent fixes and improvements including df64c81ca9, c99ef1811a, 9dfcac8e15, 885742b9f8, 842c9b2705, fcf80c5d5f, 96c7381c4c, f4fc7cb54b, 60ae37a8bc, 259c96fa8f, 449cdcd486, 3ca43dbbb6, 2a679ae94e, 3a82c689fd, fbd4321fd5, d53a4286d7, c086896625, 4e5d6c4091, 04158e7fa3. The reason for reverting is security issues related to repeatable name lookups (CVE-2014-0062). Even though 04158e7fa3 solved part of the problem, there are still remaining issues, which aren't feasible to even carefully analyze before the RC deadline. Reported-by: Noah Misch, Robert Haas Discussion: https://postgr.es/m/20240808171351.a9.nmisch%40google.com Backpatch-through: 17	2024-08-24 18:48:48 +03:00
Peter Eisentraut	a2bbc58f74	thread-safety: gmtime_r(), localtime_r() Use gmtime_r() and localtime_r() instead of gmtime() and localtime(), for thread-safety. There are a few affected calls in libpq and ecpg's libpgtypes, which are probably effectively bugs, because those libraries already claim to be thread-safe. There is one affected call in the backend. Most of the backend otherwise uses the custom functions pg_gmtime() and pg_localtime(), which are implemented differently. While we're here, change the call in the backend to gmtime() instead of localtime(), since for that use time zone behavior is irrelevant, and this side-steps any questions about when time zones are initialized by localtime_r() vs localtime(). Portability: gmtime_r() and localtime_r() are in POSIX but are not available on Windows. Windows has functions gmtime_s() and localtime_s() that can fulfill the same purpose, so we add some small wrappers around them. (Note that these _s() functions are also different from the _s() functions in the bounds-checking extension of C11. We are not using those here.) On MinGW, you can get the POSIX-style *_r() functions by defining _POSIX_C_SOURCE appropriately before including <time.h>. This leads to a conflict at least in plpython because apparently _POSIX_C_SOURCE gets defined in some header there, and then our replacement definitions conflict with the system definitions. To avoid that sort of thing, we now always define _POSIX_C_SOURCE on MinGW and use the POSIX-style functions here. Reviewed-by: Stepan Neretin <sncfmgg@gmail.com> Reviewed-by: Heikki Linnakangas <hlinnaka@iki.fi> Reviewed-by: Thomas Munro <thomas.munro@gmail.com> Discussion: https://www.postgresql.org/message-id/flat/eba1dc75-298e-4c46-8869-48ba8aad7d70@eisentraut.org	2024-08-23 07:43:04 +02:00
Alexander Korotkov	04158e7fa3	Avoid repeated table name lookups in createPartitionTable() Currently, createPartitionTable() opens newly created table using its name. This approach is prone to privilege escalation attack, because we might end up opening another table than we just created. This commit address the issue above by opening newly created table by its OID. It appears to be tricky to get a relation OID out of ProcessUtility(). We have to extend TableLikeClause with new newRelationOid field, which is filled within ProcessUtility() to be further accessed by caller. Security: CVE-2014-0062 Reported-by: Noah Misch Discussion: https://postgr.es/m/20240808171351.a9.nmisch%40google.com Reviewed-by: Pavel Borisov, Dmitry Koval	2024-08-22 09:50:48 +03:00
Michael Paquier	490f869d92	Create syscache entries for pg_extension Two syscache identifiers are added for extension names and OIDs. Shared libraries of extensions might want to invalidate or update their own caches whenever a CREATE, ALTER or DROP EXTENSION command is run for their extension (in any backend). Right now this is non-trivial to do correctly and efficiently, but, if an extension catalog is part of a syscache, this could simply be done by registering an callback using CacheRegisterSyscacheCallback for the relevant syscache. Another case where this is useful is a loaded library where some of its code paths rely on some objects of the extension to exist; it can be simpler and more efficient to do an existence check directly on the extension through the syscache. Author: Jelte Fennema-Nio Reviewed-by: Alexander Korotkov, Pavel Stehule Discussion: https://postgr.es/m/CAGECzQTWm9sex719Hptbq4j56hBGUti7J9OWjeMobQ1ccRok9w@mail.gmail.com	2024-08-22 10:48:25 +09:00
Robert Haas	c01743aa48	Show number of disabled nodes in EXPLAIN ANALYZE output. Now that disable_cost is not included in the cost estimate, there's no visible sign in EXPLAIN output of which plan nodes are disabled. Fix that by propagating the number of disabled nodes from Path to Plan, and then showing it in the EXPLAIN output. There is some question about whether this is a desirable change. While I personally believe that it is, it seems best to make it a separate commit, in case we decide to back out just this part, or rework it. Reviewed by Andres Freund, Heikki Linnakangas, and David Rowley. Discussion: http://postgr.es/m/CA+TgmoZ_+MS+o6NeGK2xyBv-xM+w1AfFVuHE4f_aq6ekHv7YSQ@mail.gmail.com	2024-08-21 10:14:35 -04:00
Robert Haas	e222534679	Treat number of disabled nodes in a path as a separate cost metric. Previously, when a path type was disabled by e.g. enable_seqscan=false, we either avoided generating that path type in the first place, or more commonly, we added a large constant, called disable_cost, to the estimated startup cost of that path. This latter approach can distort planning. For instance, an extremely expensive non-disabled path could seem to be worse than a disabled path, especially if the full cost of that path node need not be paid (e.g. due to a Limit). Or, as in the regression test whose expected output changes with this commit, the addition of disable_cost can make two paths that would normally be distinguishible in cost seem to have fuzzily the same cost. To fix that, we now count the number of disabled path nodes and consider that a high-order component of both the startup cost and the total cost. Hence, the path list is now sorted by disabled_nodes and then by total_cost, instead of just by the latter, and likewise for the partial path list. It is important that this number is a count and not simply a Boolean; else, as soon as we're unable to respect disabled path types in all portions of the path, we stop trying to avoid them where we can. Because the path list is now sorted by the number of disabled nodes, the join prechecks must compute the count of disabled nodes during the initial cost phase instead of postponing it to final cost time. Counts of disabled nodes do not cross subquery levels; at present, there is no reason for them to do so, since the we do not postpone path selection across subquery boundaries (see make_subplan). Reviewed by Andres Freund, Heikki Linnakangas, and David Rowley. Discussion: http://postgr.es/m/CA+TgmoZ_+MS+o6NeGK2xyBv-xM+w1AfFVuHE4f_aq6ekHv7YSQ@mail.gmail.com	2024-08-21 10:12:30 -04:00
Amit Kapila	3f28b2fcac	Don't advance origin during apply failure. We advance origin progress during abort on successful streaming and application of ROLLBACK in parallel streaming mode. But the origin shouldn't be advanced during an error or unsuccessful apply due to shutdown. Otherwise, it will result in a transaction loss as such a transaction won't be sent again by the server. Reported-by: Hou Zhijie Author: Hayato Kuroda and Shveta Malik Reviewed-by: Amit Kapila Backpatch-through: 16 Discussion: https://postgr.es/m/TYAPR01MB5692FAC23BE40C69DA8ED4AFF5B92@TYAPR01MB5692.jpnprd01.prod.outlook.com	2024-08-21 09:22:32 +05:30
Michael Paquier	15c1abd977	Remove _PG_fini() ab02d702ef08 has removed from the backend the code able to support the unloading of modules, because this has never worked. This removes the last references to _PG_fini(), that could be used as a callback for modules to manipulate the stack when unloading a library. The test module ldap_password_func had the idea to declare it, doing nothing. The function declaration in fmgr.h is gone. It was left around in 2022 to avoid breaking extension code, but at this stage there are also benefits in letting extension developers know that keeping the unloading code is pointless and this move leads to less maintenance. Reviewed-by: Tom Lane, Heikki Linnakangas Discussion: https://postgr.es/m/ZsQfi0AUJoMF6NSd@paquier.xyz	2024-08-21 07:24:03 +09:00
Amit Kapila	9758174e2e	Log the conflicts while applying changes in logical replication. This patch provides the additional logging information in the following conflict scenarios while applying changes: insert_exists: Inserting a row that violates a NOT DEFERRABLE unique constraint. update_differ: Updating a row that was previously modified by another origin. update_exists: The updated row value violates a NOT DEFERRABLE unique constraint. update_missing: The tuple to be updated is missing. delete_differ: Deleting a row that was previously modified by another origin. delete_missing: The tuple to be deleted is missing. For insert_exists and update_exists conflicts, the log can include the origin and commit timestamp details of the conflicting key with track_commit_timestamp enabled. update_differ and delete_differ conflicts can only be detected when track_commit_timestamp is enabled on the subscriber. We do not offer additional logging for exclusion constraint violations because these constraints can specify rules that are more complex than simple equality checks. Resolving such conflicts won't be straightforward. This area can be further enhanced if required. Author: Hou Zhijie Reviewed-by: Shveta Malik, Amit Kapila, Nisha Moond, Hayato Kuroda, Dilip Kumar Discussion: https://postgr.es/m/OS0PR01MB5716352552DFADB8E9AD1D8994C92@OS0PR01MB5716.jpnprd01.prod.outlook.com	2024-08-20 08:35:11 +05:30
David Rowley	adf97c1562	Speed up Hash Join by making ExprStates support hashing Here we add ExprState support for obtaining a 32-bit hash value from a list of expressions. This allows both faster hashing and also JIT compilation of these expressions. This is especially useful when hash joins have multiple join keys as the previous code called ExecEvalExpr on each hash join key individually and that was inefficient as tuple deformation would have only taken into account one key at a time, which could lead to walking the tuple once for each join key. With the new code, we'll determine the maximum attribute required and deform the tuple to that point only once. Some performance tests done with this change have shown up to a 20% performance increase of a query containing a Hash Join without JIT compilation and up to a 26% performance increase when JIT is enabled and optimization and inlining were performed by the JIT compiler. The performance increase with 1 join column was less with a 14% increase with and without JIT. This test was done using a fairly small hash table and a large number of hash probes. The increase will likely be less with large tables, especially ones larger than L3 cache as memory pressure is more likely to be the limiting factor there. This commit only addresses Hash Joins, but lays expression evaluation and JIT compilation infrastructure for other hashing needs such as Hash Aggregate. Author: David Rowley Reviewed-by: Alexey Dvoichenkov <alexey@hyperplane.net> Reviewed-by: Tels <nospam-pg-abuse@bloodgate.com> Discussion: https://postgr.es/m/CAApHDvoexAxgQFNQD_GRkr2O_eJUD1-wUGm%3Dm0L%2BGc%3DT%3DkEa4g%40mail.gmail.com	2024-08-20 13:38:22 +12:00
Nathan Bossart	9e9a2b7031	Remove dependence on -fwrapv semantics in a few places. This commit attempts to update a few places, such as the money, numeric, and timestamp types, to no longer rely on signed integer wrapping for correctness. This is intended to move us closer towards removing -fwrapv, which may enable some compiler optimizations. However, there is presently no plan to actually remove that compiler option in the near future. Besides using some of the existing overflow-aware routines in int.h, this commit introduces and makes use of some new ones. Specifically, it adds functions that accept a signed integer and return its absolute value as an unsigned integer with the same width (e.g., pg_abs_s64()). It also adds functions that accept an unsigned integer, store the result of negating that integer in a signed integer with the same width, and return whether the negation overflowed (e.g., pg_neg_u64_overflow()). Finally, this commit adds a couple of tests for timestamps near POSTGRES_EPOCH_JDATE. Author: Joseph Koshakow Reviewed-by: Tom Lane, Heikki Linnakangas, Jian He Discussion: https://postgr.es/m/CAAvxfHdBPOyEGS7s%2Bxf4iaW0-cgiq25jpYdWBqQqvLtLe_t6tw%40mail.gmail.com	2024-08-15 15:47:31 -05:00

1 2 3 4 5 ...

11705 Commits