Commit Graph

452 Commits

Author SHA1 Message Date
4e7f62bc38 Add support for Unicode case folding.
Expand case mapping tables to include entries for case folding, which
are parsed from CaseFolding.txt.

Discussion: https://postgr.es/m/a1886ddfcd8f60cb3e905c93009b646b4cfb74c5.camel%40j-davis.com
2025-01-23 09:06:50 -08:00
ea5ff5833c Be clearer about when jsonapi's need_escapes is needed
Most operations beyond pure json parsing need to set need_escapes to
true to get access to field names and string scalars. Document this
fact more explicitly.

Slightly tweaked patch from:

Author: Corey Huinker <corey.huinker@gmail.com>

Discussion: https://postgr.es/m/CADkLM=c49Vkfg2+A8ubSuEtaGEjuaKZXCA6SrXA8kdwHjx3uxQ@mail.gmail.com
2025-01-19 09:09:58 -05:00
286a365b9c Support Unicode full case mapping and conversion.
Generate tables from Unicode SpecialCasing.txt to support more
sophisticated case mapping behavior:

 * support case mappings to multiple codepoints, such as "ß"
   uppercasing to "SS"
 * support conditional case mappings, such as the "final sigma"
 * support titlecase variants, such as "dž" uppercasing to "DŽ" but
   titlecasing to "Dž"

Discussion: https://postgr.es/m/ddfd67928818f138f51635712529bc5e1d25e4e7.camel@j-davis.com
Discussion: https://postgr.es/m/27bb0e52-801d-4f73-a0a4-02cfdd4a9ada@eisentraut.org
Reviewed-by: Peter Eisentraut, Daniel Verite
2025-01-17 15:56:20 -08:00
72ceb21b02 Fix a compiler warning in initStringInfo().
Fix a compiler warning found by Cfbot. This was caused by commit
bb86e85e442.
2025-01-11 15:52:37 +09:00
a9dcbb4d5c Add new StringInfo APIs to allow callers to specify the buffer size.
Previously StringInfo APIs allocated buffers with fixed initial
allocation size of 1024 bytes. This may be too large and inappropriate
for some callers that can do with smaller memory buffers. To fix this,
introduce new APIs that allow callers to specify initial buffer size.

extern StringInfo makeStringInfoExt(int initsize);
extern void initStringInfoExt(StringInfo str, int initsize);

Existing APIs (makeStringInfo() and initStringInfo()) are changed to
call makeStringInfoExt and initStringInfoExt respectively (via inline
helper functions makeStringInfoInternal and initStringInfoInternal),
with the default buffer size of 1024.

Reviewed-by: Nathan Bossart, David Rowley, Michael Paquier, Gurjeet Singh
Discussion: https://postgr.es/m/20241225.123704.1194662271286702010.ishii%40postgresql.org
2025-01-11 08:23:46 +09:00
50e6eb731d Update copyright for 2025
Backpatch-through: 13
2025-01-01 11:21:55 -05:00
2571c1d5cc meson: Export all libcommon functions in Windows builds
This fixes "unresolved external symbol" errors with extensions that
use functions from libcommon. This was reported with pgvector.

Reported-by: Andrew Kane
Author: Vladlen Popolitov
Backpatch-through: 16, where Meson was introduced
Discussion: https://www.postgresql.org/message-id/CAOdR5yF0krWrxycA04rgUKCgKugRvGWzzGLAhDZ9bzNv8g0Lag@mail.gmail.com
2024-12-25 18:14:18 +02:00
7b2690a571 Fix outdated comment of scram_build_secret()
This routine documented that "iterations" would use a default value if
set to 0 by the caller.  However, the iteration should always be set by
the caller to a value strictly more than 0, as documented by an
assertion.

Oversight in b577743000cd, that has made the iteration count of SCRAM
configurable.

Author: Matheus Alcantara
Discussion: https://postgr.es/m/ac858943-4743-44cd-b4ad-08a0c10cbbc8@gmail.com
Backpatch-through: 16
2024-12-10 12:54:09 +09:00
5c32c21afe jsonapi: add lexer option to keep token ownership
Commit 0785d1b8b adds support for libpq as a JSON client, but
allocations for string tokens can still be leaked during parsing
failures. This is tricky to fix for the object_field semantic callbacks:
the field name must remain valid until the end of the object, but if a
parsing error is encountered partway through, object_field_end() won't
be invoked and the client won't get a chance to free the field name.

This patch adds a flag to switch the ownership of parsed tokens to the
lexer. When this is enabled, the client must make a copy of any tokens
it wants to persist past the callback lifetime, but the lexer will
handle necessary cleanup on failure.

Backend uses of the JSON parser don't need to use this flag, since the
parser's allocations will occur in a short lived memory context.

A -o option has been added to test_json_parser_incremental to exercise
the new setJsonLexContextOwnsTokens() API, and the test_json_parser TAP
tests make use of it. (The test program now cleans up allocated memory,
so that tests can be usefully run under leak sanitizers.)

Author: Jacob Champion

Discussion: https://postgr.es/m/CAOYmi+kb38EciwyBQOf9peApKGwraHqA7pgzBkvoUnw5BRfS1g@mail.gmail.com
2024-11-27 12:07:14 -05:00
e6c32d9fad Clean up newlines following left parentheses
Most came in during the 17 cycle, so backpatch there.  Some
(particularly reorderbuffer.h) are very old, but backpatching doesn't
seem useful.

Like commits c9d297751959, c4f113e8fef9.
2024-11-26 17:10:07 +01:00
ecb5af7798 Remove unused #include's from bin .c files
as determined by IWYU

Similar to commit dbbca2cf299, but for bin and some related files.

Discussion: https://www.postgresql.org/message-id/flat/0df1d5b1-8ca8-4f84-93be-121081bde049%40eisentraut.org
2024-11-06 11:11:52 +01:00
849110dd3e Optimize sifting down in binaryheap.
Presently, each iteration of the loop in sift_down() will perform
3 comparisons if both children are larger than the parent node (2
for comparing each child to the parent node, and a third to compare
the children to each other).  By first comparing the children to
each other and then comparing the larger child to the parent node,
we can accomplish the same thing with just 2 comparisons (while
also not affecting the number of comparisons in any other case).

Author: ChangAo Chen
Reviewed-by: Robert Haas
Discussion: https://postgr.es/m/tencent_0142D8DA90940B9930BCC08348BBD6D0BB07%40qq.com
2024-10-30 11:28:34 -05:00
2845cd1ca0 meson: Add missing dependency to unicode test programs
The test programs in src/common/unicode/ (case_test, category_test,
norm_test), don't build with meson if the nls option is enabled,
because a libintl dependency is missing.  Fix that.  (The makefiles
are ok.)

Reviewed-by: Aleksander Alekseev <aleksander@timescale.com>
Discussion: https://www.postgresql.org/message-id/flat/52db1d2b-4b96-473e-b323-a4b16a950fba%40eisentraut.org
2024-10-30 08:30:00 +01:00
11b7de4a78 Unify src/common/'s definitions of MaxAllocSize.
As threatened in the previous patch, define MaxAllocSize in
src/include/common/fe_memutils.h rather than having several
copies of it in different src/common/*.c files.  This also
provides an opportunity to document it better.

While this would probably be safe to back-patch, I'll refrain
(for now anyway).
2024-10-28 14:39:01 -04:00
bd28431672 Guard against enormously long input in pg_saslprep().
Coverity complained that pg_saslprep() could suffer integer overflow,
leading to under-allocation of the output buffer, if the input string
exceeds SIZE_MAX/4.  This hazard seems largely hypothetical, but it's
easy enough to defend against, so let's do so.

This patch creates a third place in src/common/ where we are locally
defining MaxAllocSize so that we can test against that in the same way
in backend and frontend compiles.  That seems like about two places
too many, so the next patch will move that into common/fe_memutils.h.
I'm hesitant to do that in back branches however.

Back-patch to v14.  The code looks similar in older branches, but
before commit 67a472d71 there was a separate test on the input string
length that prevented this hazard.

Per Coverity report.
2024-10-28 14:33:55 -04:00
4b652692e9 Fix memory leaks from incorrect strsep() uses
Commit 5d2e1cc117b introduced some strsep() uses, but it did the
memory management wrong in some cases.  We need to keep a separate
pointer to the allocate memory so that we can free it later, because
strsep() advances the pointer we pass to it, and it at the end it
will be NULL, so any free() calls won't do anything.

(This fixes two of the four places changed in commit 5d2e1cc117b.  The
other two don't have this problem.)

Reported-by: Alexander Lakhin <exclusion@gmail.com>
Discussion: https://www.postgresql.org/message-id/flat/79692bf9-17d3-41e6-b9c9-fc8c3944222a@eisentraut.org
2024-10-18 11:29:20 +02:00
41b023946d jsonapi: fully initialize dummy lexer
Valgrind reports that checks on lex->inc_state are undefined for the
"dummy lexer" used for incremental parsing, since it's only partially
initialized on the stack. This was introduced in 0785d1b8b2.
Zero-initialize the whole struct.

Author: Jacob Champion <jacob.champion@enterprisedb.com>
Reported-by: Alexander Lakhin <exclusion@gmail.com>
Discussion: https://www.postgresql.org/message-id/CAOYmi+n9QWr4gsAADZc6qFQjFViXQYVk=gBy_EvxuqsgPJcb_g@mail.gmail.com
2024-10-17 08:23:46 +02:00
a7f2f6adc2 Whitespace fixup from generated unicode tables.
When running the 'update-unicode' build target, generate files that
conform to pgindent whitespace rules.
2024-10-16 12:21:13 -07:00
d94cf5ca7f File size in a backup manifest should use uint64, not size_t.
size_t is the size of an object in memory, not the size of a file on disk.

Thanks to Tom Lane for noting the error.

Discussion: http://postgr.es/m/1865585.1727803933@sss.pgh.pa.us
2024-10-02 09:59:04 -04:00
75240f65e7 jsonapi: fix memory leakage during OOM error recovery.
Coverity pointed out that inc_lex_level() would leak memory
(not to mention corrupt the pstack data structure) if some
but not all of its three REALLOC's failed.  To fix, store
successfully-updated pointers back into the pstack struct
immediately.

Oversight in 0785d1b8b, so no need for back-patch.
2024-09-23 12:30:51 -04:00
0785d1b8b2 common/jsonapi: support libpq as a client
Based on a patch by Michael Paquier.

For libpq, use PQExpBuffer instead of StringInfo. This requires us to
track allocation failures so that we can return JSON_OUT_OF_MEMORY as
needed rather than exit()ing.

Author: Jacob Champion <jacob.champion@enterprisedb.com>
Co-authored-by: Michael Paquier <michael@paquier.xyz>
Co-authored-by: Daniel Gustafsson <daniel@yesql.se>
Reviewed-by: Peter Eisentraut <peter@eisentraut.org>
Discussion: https://www.postgresql.org/message-id/flat/d1b467a78e0e36ed85a09adf979d04cf124a9d4b.camel@vmware.com
2024-09-11 09:01:07 +02:00
390b3cbbb2 Protect against small overread in SASLprep validation
In case of torn UTF8 in the input data we might end up going
past the end of the string since we don't account for length.
While validation won't be performed on a sequence with a NULL
byte it's better to avoid going past the end to beging with.
Fix by taking the length into consideration.

Author: Jacob Champion <jacob.champion@enterprisedb.com>
Reviewed-by: Daniel Gustafsson <daniel@yesql.se>
Discussion: https://postgr.es/m/CAOYmi+mTnmM172g=_+Yvc47hzzeAsYPy2C4UBY3HK9p-AXNV0g@mail.gmail.com
2024-09-10 11:02:28 +02:00
c7cd2d6ed0 Define PG_TBLSPC_DIR for path pg_tblspc/ in data folder
Similarly to 2065ddf5e34c, this introduces a define for "pg_tblspc".
This makes the style more consistent with the existing PG_STAT_TMP_DIR,
for example.

There is a difference with the other cases with the introduction of
PG_TBLSPC_DIR_SLASH, required in two places for recovery and backups.

Author: Bertrand Drouvot
Reviewed-by: Ashutosh Bapat, Álvaro Herrera, Yugo Nagata, Michael
Paquier
Discussion: https://postgr.es/m/ZryVvjqS9SnV1GPP@ip-10-97-1-34.eu-west-3.compute.internal
2024-09-03 09:11:54 +09:00
a70e01d430 Remove support for OpenSSL older than 1.1.0
OpenSSL 1.0.2 has been EOL from the upstream OpenSSL project for
some time, and is no longer the default OpenSSL version with any
vendor which package PostgreSQL. By retiring support for OpenSSL
1.0.2 we can remove a lot of no longer required complexity for
managing state within libcrypto which is now handled by OpenSSL.

Reviewed-by: Jacob Champion <jacob.champion@enterprisedb.com>
Reviewed-by: Peter Eisentraut <peter@eisentraut.org>
Reviewed-by: Michael Paquier <michael@paquier.xyz>
Discussion: https://postgr.es/m/ZG3JNursG69dz1lr@paquier.xyz
Discussion: https://postgr.es/m/CA+hUKGKh7QrYzu=8yWEUJvXtMVm_CNWH1L_TLWCbZMwbi1XP2Q@mail.gmail.com
2024-09-02 13:51:48 +02:00
edee0c621d Message style improvements 2024-08-29 14:43:34 +02:00
f1976df5ea Remove fe_memutils from libpgcommon_shlib
libpq must not use palloc/pfree. It's not allowed to exit on allocation
failure, and mixing the frontend pfree with malloc is architecturally
unsound.

Remove fe_memutils from the shlib build entirely, to keep devs from
accidentally depending on it in the future.

Author: Jacob Champion <jacob.champion@enterprisedb.com>
Discussion: https://www.postgresql.org/message-id/CAOYmi+=pg=W5L1h=3MEP_EB24jaBu2FyATrLXqQHGe7cpuvwyg@mail.gmail.com
2024-08-12 08:30:39 +02:00
94980c4567 Remove support for old realpath() API
The now preferred way to call realpath() is by passing NULL as the
second argument and get a malloc'ed result.  We still supported the
old way of providing our own buffer as a second argument, for some
platforms that didn't support the new way yet.  Those were only
Solaris less than version 11 and some older AIX versions (7.1 and
newer appear to support the new variant).  We don't support those
platforms versions anymore, so we can remove this extra code.

Reviewed-by: Heikki Linnakangas <hlinnaka@iki.fi>
Discussion: https://www.postgresql.org/message-id/flat/9e638b49-5c3f-470f-a392-2cbedb2f7855%40eisentraut.org
2024-08-12 08:04:35 +02:00
2676040df0 Make fallback MD5 implementation thread-safe on big-endian systems
Replace a static scratch buffer with a local variable, because a
static buffer makes the function not thread-safe. This function is
used in client-code in libpq, so it needs to be thread-safe. It was
until commit b67b57a966, which replaced the implementation with the
one from pgcrypto.

Backpatch to v14, where we switched to the new implementation.

Reviewed-by: Robert Haas, Michael Paquier
Discussion: https://www.postgresql.org/message-id/dfa2015d-ad21-4802-a4cc-3850fc5fff3f@iki.fi
2024-08-07 10:43:52 +03:00
fe8dd65bf2 Mark misc static global variables as const
Reviewed-by: Andres Freund
Discussion: https://www.postgresql.org/message-id/54c29fb0-edf2-48ea-9814-44e918bbd6e8@iki.fi
2024-08-06 23:04:48 +03:00
85829c973c Make nullSemAction const, add 'const' decorators to related functions
To make it more clear that these should never be modified.

Reviewed-by: Andres Freund
Discussion: https://www.postgresql.org/message-id/54c29fb0-edf2-48ea-9814-44e918bbd6e8@iki.fi
2024-08-06 23:04:22 +03:00
c27090bd60 Convert some extern variables to static, Windows code
Similar to 720b0eaae9b, discovered by MinGW.
2024-08-01 13:28:32 +02:00
5d2e1cc117 Replace some strtok() with strsep()
strtok() considers adjacent delimiters to be one delimiter, which is
arguably the wrong behavior in some cases.  Replace with strsep(),
which has the right behavior: Adjacent delimiters create an empty
token.

Affected by this are parsing of:

- Stored SCRAM secrets
  ("SCRAM-SHA-256$<iterations>:<salt>$<storedkey>:<serverkey>")

- ICU collation attributes
  ("und@colStrength=primary;colCaseLevel=yes") for ICU older than
  version 54

- PG_COLORS environment variable
  ("error=01;31:warning=01;35:note=01;36:locus=01")

- pg_regress command-line options with comma-separated list arguments
  (--dbname, --create-role) (currently only used pg_regress_ecpg)

Reviewed-by: Kyotaro Horiguchi <horikyota.ntt@gmail.com>
Reviewed-by: David Steele <david@pgmasters.net>
Discussion: https://www.postgresql.org/message-id/flat/79692bf9-17d3-41e6-b9c9-fc8c3944222a@eisentraut.org
2024-07-22 15:45:46 +02:00
519d710720 Typo fix
Reported-by: Junwang Zhao <zhjwpku@gmail.com>
Discussion: https://postgr.es/m/CAEG8a3KPi=LayiTwJ11ikF7bcqnZUrcj8NgX0V8nO1mQKZ9GfQ@mail.gmail.com
Backpatch-through: 17
2024-07-08 22:12:55 +09:00
1029bdec2d Improve enlargeStringInfo's ERROR message
Until now, when an enlargeStringInfo() call would cause the StringInfo to
exceed its maximum size, we reported an "out of memory" error.  This is
misleading as it's no such thing.

Here we remove the "out of memory" text and replace it with something
more relevant to better indicate that it's a program limitation that's
been reached.

Reported-by: Michael Banck
Reviewed-by: Daniel Gustafsson, Tom Lane
Discussion: https://postgr.es/m/18484-3e357ade5fe50e61@postgresql.org
2024-07-01 12:11:10 +12:00
02bbc3c83a parse_manifest: Use const char *
This adapts the manifest parsing code to take advantage of the
const-ified jsonapi.

Reviewed-by: Andrew Dunstan <andrew@dunslane.net>
Discussion: https://www.postgresql.org/message-id/flat/f732b014-f614-4600-a437-dba5a2c3738b%40eisentraut.org
2024-06-21 07:53:30 +02:00
15cd9a3881 jsonapi: Use const char *
Apply const qualifiers to char * arguments and fields throughout the
jsonapi.  This allows the top-level APIs such as
pg_parse_json_incremental() to declare their input argument as const.
It also reduces the number of unconstify() calls.

Reviewed-by: Andrew Dunstan <andrew@dunslane.net>
Discussion: https://www.postgresql.org/message-id/flat/f732b014-f614-4600-a437-dba5a2c3738b%40eisentraut.org
2024-06-21 07:53:30 +02:00
0b06bf9fa9 jsonapi: Use size_t
Use size_t instead of int for object sizes in the jsonapi.  This makes
the API better self-documenting.

Reviewed-by: Andrew Dunstan <andrew@dunslane.net>
Discussion: https://www.postgresql.org/message-id/flat/f732b014-f614-4600-a437-dba5a2c3738b%40eisentraut.org
2024-06-21 07:53:30 +02:00
da32f5c4bc jsonapi: Some message style fixes
Add missing punctuation, and un-gettext-mark an internal error.
2024-05-23 09:22:28 +02:00
cc70e170c0 Make all Perl warnings fatal, catch-up
Apply c5385929593 to new Perl files that had missed the note.
2024-05-15 10:10:19 +02:00
8cb653b245 Fix incorrect year in some copyright notices added this year
A few patches that were written <= 2023 and committed in 2024 still
contained 2023 copyright year.  Fix that.

Discussion: https://postgr.es/m/CAApHDvr5egyW3FmHbAg-Uq2p_Aizwco1Zjs6Vbq18KqN64-hRA@mail.gmail.com
2024-05-15 15:01:21 +12:00
da256a4a7f Pre-beta mechanical code beautification.
Run pgindent, pgperltidy, and reformat-dat-files.

The pgindent part of this is pretty small, consisting mainly of
fixing up self-inflicted formatting damage from patches that
hadn't bothered to add their new typedefs to typedefs.list.
In order to keep it from making anything worse, I manually added
a dozen or so typedefs that appeared in the existing typedefs.list
but not in the buildfarm's list.  Perhaps we should formalize that,
or better find a way to get those typedefs into the automatic list.

pgperltidy is as opinionated as always, and reformat-dat-files too.
2024-05-14 16:34:50 -04:00
3ddbac368c Add missing gettext triggers
Commit d6607016c7 moved all the jsonapi.c error messages into
token_error().  This needs to be added to the various nls.mk files
that use this.  Since that makes token_error() effectively a globally
known symbol, the name seems a bit too general, so rename to
json_token_error() for more clarity.
2024-05-14 12:57:22 +02:00
855517307d Fix overread in JSON parsing errors for incomplete byte sequences
json_lex_string() relies on pg_encoding_mblen_bounded() to point to the
end of a JSON string when generating an error message, and the input it
uses is not guaranteed to be null-terminated.

It was possible to walk off the end of the input buffer by a few bytes
when the last bytes consist of an incomplete multi-byte sequence, as
token_terminator would point to a location defined by
pg_encoding_mblen_bounded() rather than the end of the input.  This
commit switches token_terminator so as the error uses data up to the
end of the JSON input.

More work should be done so as this code could rely on an equivalent of
report_invalid_encoding() so as incorrect byte sequences can show in
error messages in a readable form.  This requires work for at least two
cases in the JSON parsing API: an incomplete token and an invalid escape
sequence.  A more complete solution may be too invasive for a backpatch,
so this is left as a future improvement, taking care of the overread
first.

A test is added on HEAD as test_json_parser makes this issue
straight-forward to check.

Note that pg_encoding_mblen_bounded() no longer has any callers.  This
will be removed on HEAD with a separate commit, as this is proving to
encourage unsafe coding.

Author: Jacob Champion
Discussion: https://postgr.es/m/CAOYmi+ncM7pwLS3AnKCSmoqqtpjvA8wmCdoBtKA3ZrB2hZG6zA@mail.gmail.com
Backpatch-through: 13
2024-05-09 12:45:37 +09:00
e00b4f79e7 Remove redundant JSON parser typedefs
JsonNonTerminal and JsonParserSem were added in commit 3311ea86ed

These names of these two enums are not actually used, so there is no
need for typedefs. Instead use plain enums to declare the constants.

Noticed by Alvaro Herera.
2024-04-27 07:02:57 -04:00
7e44ac3741 Update src/common/unicode/.gitignore
for new downloaded files and new build results.
2024-04-22 09:16:33 +02:00
950d4a2cb1 Fix typos and duplicate words
This fixes various typos, duplicated words, and tiny bits of whitespace
mainly in code comments but also in docs.

Author: Daniel Gustafsson <daniel@yesql.se>
Author: Heikki Linnakangas <hlinnaka@iki.fi>
Author: Alexander Lakhin <exclusion@gmail.com>
Author: David Rowley <dgrowleyml@gmail.com>
Author: Nazir Bilal Yavuz <byavuz81@gmail.com>
Discussion: https://postgr.es/m/3F577953-A29E-4722-98AD-2DA9EFF2CBB8@yesql.se
2024-04-18 21:28:07 +02:00
42fa4b6601 Assorted minor cleanups in the test_json_parser module
Per gripes from Michael Paquier

Discussion: https://postgr.es/m/ZhTQ6_w1vwOhqTQI@paquier.xyz

Along the way, also clean up a handful of typos in 3311ea86ed and
ea7b4e9a2a, found by Alexander Lakhin, and a couple of stylistic
snafus noted by Daniel Westermann and Daniel Gustafsson.
2024-04-12 10:32:30 -04:00
661ab4e185 Fix some memory leaks associated with parsing json and manifests
Coverity complained about not freeing some memory associated with
incrementally parsing backup manifests. To fix that, provide and use a new
shutdown function for the JsonManifestParseIncrementalState object, in
line with a suggestion from Tom Lane.

While analysing the problem, I noticed a buglet in freeing memory for
incremental json lexers. To fix that remove a bogus condition on
freeing the memory allocated for them.
2024-04-12 10:32:30 -04:00
810f64a015 Revert indexed and enlargable binary heap implementation.
This reverts commit b840508644 and bcb14f4abc. These commits were made
for commit 5bec1d6bc5 (Improve eviction algorithm in ReorderBuffer
using max-heap for many subtransactions). However, per discussion,
commit efb8acc0d0 replaced binary heap + index with pairing heap, and
made these commits unnecessary.

Reported-by: Jeff Davis
Discussion: https://postgr.es/m/12747c15811d94efcc5cda72d6b35c80d7bf3443.camel%40j-davis.com
2024-04-11 17:18:05 +09:00
c3e60f3d7e Silence some compiler warnings in commit 3311ea86ed
Per report from Nathan Bossart
2024-04-05 16:08:40 -04:00