Commit Graph

58 Commits

Author SHA1 Message Date
4603903d29 Allow json{b}_strip_nulls to remove null array elements
An additional paramater ("strip_in_arrays") is added to these functions.
It defaults to false. If true, then null array elements are removed as
well as null valued object fields. JSON that just consists of a single
null is not affected.

Author: Florents Tselai <florents.tselai@gmail.com>

Discussion: https://postgr.es/m/4BCECCD5-4F40-4313-9E98-9E16BEB0B01D@gmail.com
2025-03-05 10:04:02 -05:00
842265631d Fix unique key checks in JSON object constructors
When building a JSON object, the code builds a hash table of keys, to
allow checking if the keys are unique. The uniqueness check and adding
the new key happens in json_unique_check_key(), but this assumes the
pointer to the key remains valid.

Unfortunately, two places passed pointers to keys in a buffer, while
also appending more data (additional key/value pairs) to the buffer.
With enough data the buffer is resized by enlargeStringInfo(), which
calls repalloc(), invalidating the earlier key pointers.

Due to this the uniqueness check may fail with both false negatives and
false positives, producing JSON objects with duplicate keys or failing
to produce a perfectly valid JSON object.

This affects multiple functions that enforce uniqueness of keys, all
introduced in PG16 with the new SQL/JSON:

- json_object_agg_unique / jsonb_object_agg_unique
- json_object / jsonb_objectagg

Existing regression tests did not detect the issue, simply because the
initial buffer size is 1024 and the objects were small enough not to
require the repalloc.

With a sufficiently large object, AddressSanitizer reported the access
to invalid memory immediately. So would valgrind, of course.

Fixed by copying the key into the hash table memory context, and adding
regression tests with enough data to repalloc the buffer. Backpatch to
16, where the functions were introduced.

Reported by Alexander Lakhin. Investigation and initial fix by Junwang
Zhao, with various improvements and tests by me.

Reported-by: Alexander Lakhin
Author: Junwang Zhao, Tomas Vondra
Backpatch-through: 16
Discussion: https://postgr.es/m/18598-3279ed972a2347c7@postgresql.org
Discussion: https://postgr.es/m/CAEG8a3JjH0ReJF2_O7-8LuEbO69BxPhYeXs95_x7+H9AMWF1gw@mail.gmail.com
2024-09-11 13:21:10 +02:00
ca6fde9225 Optimize JSON escaping using SIMD
Here we adjust escape_json_with_len() to make use of SIMD to allow
processing of up to 16-bytes at a time rather than processing a single
byte at a time.  This has been shown to speed up escaping of JSON
strings significantly.

Escaping is required for both JSON string properties and also the
property names themselves, so this should also help improve the speed of
the conversion from JSON into text for JSON objects that have property
names 16 or more bytes long.

Escaping JSON strings was often a significant bottleneck for longer
strings.  With these changes, some benchmarking has shown a query
performing nearly 4 times faster when escaping a JSON object with a 1MB
text property.  Tests with shorter text properties saw smaller but still
significant performance improvements.  For example, a test outputting 1024
JSON strings with a text property length ranging from 1 char to 1024 chars
became around 2 times faster.

Author: David Rowley
Reviewed-by: Melih Mutlu
Discussion: https://postgr.es/m/CAApHDvpLXwMZvbCKcdGfU9XQjGCDm7tFpRdTXuB9PVgpNUYfEQ@mail.gmail.com
2024-08-05 23:16:44 +12:00
b8da37b3ad Rework pg_input_error_message(), now renamed pg_input_error_info()
pg_input_error_info() is now a SQL function able to return a row with
more than just the error message generated for incorrect data type
inputs when these are able to handle soft failures, returning more
contents of ErrorData, as of:
- The error message (same as before).
- The error detail, if set.
- The error hint, if set.
- SQL error code.

All the regression tests that relied on pg_input_error_message() are
updated to reflect the effects of the rename.

Per discussion with Tom Lane and Andrew Dunstan.

Author: Nathan Bossart
Discussion: https://postgr.es/m/139a68e1-bd1f-a9a7-b5fe-0be9845c6311@dunslane.net
2023-02-28 08:04:13 +09:00
c60c9badba Convert json_in and jsonb_in to report errors softly.
This requires a bit of further infrastructure-extension to allow
trapping errors reported by numeric_in and pg_unicode_to_server,
but otherwise it's pretty straightforward.

In the case of jsonb_in, we are only capturing errors reported
during the initial "parse" phase.  The value-construction phase
(JsonbValueToJsonb) can also throw errors if assorted implementation
limits are exceeded.  We should improve that, but it seems like a
separable project.

Andrew Dunstan and Tom Lane

Discussion: https://postgr.es/m/3bac9841-fe07-713d-fa42-606c225567d6@dunslane.net
2022-12-11 11:28:15 -05:00
0a8de93a48 Speed up lexing of long JSON strings
Use optimized linear search when looking ahead for end quotes,
backslashes, and non-printable characters. This results in nearly 40%
faster JSON parsing on x86-64 when most values are long strings, and
all platforms should see some improvement.

Reviewed by Andres Freund and Nathan Bossart
Discussion: https://www.postgresql.org/message-id/CAFBsxsGhaR2KQ5eisaK%3D6Vm60t%3DaxhD8Ckj1qFoCH1pktZi%2B2w%40mail.gmail.com
Discussion: https://www.postgresql.org/message-id/CAFBsxsESLUyJ5spfOSyPrOvKUEYYNqsBosue9SV1j8ecgNXSKA%40mail.gmail.com
2022-09-02 09:36:22 +07:00
ffd3944ab9 Improve reporting for syntax errors in multi-line JSON data.
Point to the specific line where the error was detected; the
previous code tended to include several preceding lines as well.
Avoid re-scanning the entire input to recompute which line that
was.  Simplify the logic a bit.  Add test cases.

Simon Riggs and Hamid Akhtar, reviewed by Daniel Gustafsson and myself

Discussion: https://postgr.es/m/CANbhV-EPBnXm3MF_TTWBwwqgn1a1Ghmep9VHfqmNBQ8BT0f+_g@mail.gmail.com
2021-03-01 16:44:17 -05:00
bbd93667bd Remove unnecessary test dependency on the contents of pg_pltemplate.
Using pg_pltemplate as test data was probably not very forward-looking,
considering we've had many discussions around removing that catalog
altogether.  Use a nearby temp table instead, to make these two test
scripts more self-contained.  This is a better test case anyway, since
it exercises the scenario where the entries in the anyarray column
actually vary in type intra-query.
2019-08-21 10:43:23 -04:00
e136a0d8ca Restore json{b}_populate_record{set}'s ability to take type info from AS.
If the record argument is NULL and has no declared type more concrete
than RECORD, we can't extract useful information about the desired
rowtype from it.  In this case, see if we're in FROM with an AS clause,
and if so extract the needed rowtype info from AS.

It worked like this before v11, but commit 37a795a60 removed the
behavior, reasoning that it was undocumented, inefficient, and utterly
not self-consistent.  If you want to take type info from an AS clause,
you should be using the json_to_record() family of functions not the
json_populate_record() family.  Also, it was already the case that
the "populate" functions would fail for a null-valued RECORD input
(with an unfriendly "record type has not been registered" error)
when there wasn't an AS clause at hand, and it wasn't obvious that
that behavior wasn't OK when there was one.  However, it emerges
that some people were depending on this to work, and indeed the
rather off-point error message you got if you left off AS encouraged
slapping on AS without switching to the json_to_record() family.

Hence, put back the fallback behavior of looking for AS.  While at it,
improve the run-time error you get when there's no place to obtain type
info; we can do a lot better than "record type has not been registered".
(We can't, unfortunately, easily improve the parse-time error message
that leads people down this path in the first place.)

While at it, I refactored the code a bit to avoid duplicating the
same logic in several different places.

Per bug #15940 from Jaroslav Sivy.  Back-patch to v11 where the
current coding came in.  (The pre-v11 deficiencies in this area
aren't regressions, so we'll leave those branches alone.)

Patch by me, based on preliminary analysis by Dmitry Dolgov.

Discussion: https://postgr.es/m/15940-2ab76dc58ffb85b6@postgresql.org
2019-08-19 18:01:09 -04:00
6f34fcbbd5 Fix conversion of JSON strings to JSON output columns in json_to_record().
json_to_record(), when an output column is declared as type json or jsonb,
should emit the corresponding field of the input JSON object.  But it got
this slightly wrong when the field is just a string literal: it failed to
escape the contents of the string.  That typically resulted in syntax
errors if the string contained any double quotes or backslashes.

jsonb_to_record() handles such cases correctly, but I added corresponding
test cases for it too, to prevent future backsliding.

Improve the documentation, as it provided only a very hand-wavy
description of the conversion rules used by these functions.

Per bug report from Robert Vollmert.  Back-patch to v10 where the
error was introduced (by commit cf35346e8).

Note that PG 9.4 - 9.6 also get this case wrong, but differently so:
they feed the de-escaped contents of the string literal to json[b]_in.
That behavior is less obviously wrong, so possibly it's being depended on
in the field, so I won't risk trying to make the older branches behave
like the newer ones.

Discussion: https://postgr.es/m/D6921B37-BD8E-4664-8D5F-DB3525765DCD@vllmrt.net
2019-06-11 13:33:22 -04:00
4e197bf195 Fix typo related to to_tsvector() in tests of json and jsonb
Author: Sho Kato
Reviewed-by: Kyotaro Horiguchi
Discussion: https://postgr.es/m/25C1C6B2E7BE044889E4FE8643A58BA963E1D03D@G01JPEXMBKW03
2019-03-15 16:20:11 +09:00
eba2ce1712 Fix another crash in json{b}_populate_recordset and json{b}_to_recordset.
populate_recordset_worker() failed to consider the possibility that the
supplied JSON data contains no rows, so that update_cached_tupdesc never
got called.  This led to a null-pointer dereference since commit 9a5e8ed28;
before that it led to a bogus "set-valued function called in context that
cannot accept a set" error.  Fix by forcing the update to happen.

Per bug #15514.  Back-patch to v11 as 9a5e8ed28 was.  (If we were excited
about the bogus error, we could perhaps go back further, but it'd take more
work to figure out how to fix it in older branches.  Given the lack of
field complaints about that aspect, I'm not excited.)

Discussion: https://postgr.es/m/15514-59d5b4c4065b178b@postgresql.org
2018-11-22 15:14:01 -05:00
4984784f83 Fix crash in json{b}_populate_recordset() and json{b}_to_recordset().
As of commit 37a795a60, populate_recordset_worker() tried to pass back
(as rsi.setDesc) a tupdesc that it also had cached in its fn_extra.
But the core executor would free the passed-back tupdesc, risking a
crash if the function were called again in the same query.  The safest
and least invasive way to fix that is to make an extra tupdesc copy
to pass back.

While at it, I failed to resist the temptation to get rid of unnecessary
get_fn_expr_argtype() calls here and in populate_record_worker().

Per report from Dmitry Dolgov; thanks to Michael Paquier and
Andrew Gierth for investigation and discussion.

Discussion: https://postgr.es/m/CA+q6zcWzN9ztCfR47ZwgTr1KLnuO6BAY6FurxXhovP4hxr+yOQ@mail.gmail.com
2018-07-13 14:16:54 -04:00
01bb85169a Make test of json(b)_to_tsvector language-independ
Missed in 1c1791e00065f6986f9d44a78ce7c28b2d1322dd commit
2018-04-07 21:29:48 +03:00
1c1791e000 Add json(b)_to_tsvector function
Jsonb has a complex nature so there isn't best-for-everything way to convert it
to tsvector for full text search. Current to_tsvector(json(b)) suggests to
convert only string values, but it's possible to index keys, numerics and even
booleans value. To solve that json(b)_to_tsvector has a second required
argument contained a list of desired types of json fields. Second argument is
a jsonb scalar or array right now with possibility to add new options in a
future.

Bump catalog version

Author: Dmitry Dolgov with some editorization by me
Reviewed by: Teodor Sigaev
Discussion: https://www.postgresql.org/message-id/CA+q6zcXJQbS1b4kJ_HeAOoOc=unfnOrUEL=KGgE32QKDww7d8g@mail.gmail.com
2018-04-07 20:58:03 +03:00
b574228715 Add tests for json{b}_populate_recordset() crash case.
The problem reported as CVE-2017-15098 was already resolved in HEAD by
commit 37a795a60, but let's add the relevant test cases anyway.

Michael Paquier and Tom Lane, per a report from David Rowley.

Security: CVE-2017-15098
2017-11-06 10:29:37 -05:00
37a795a60b Support domains over composite types.
This is the last major omission in our domains feature: you can now
make a domain over anything that's not a pseudotype.

The major complication from an implementation standpoint is that places
that might be creating tuples of a domain type now need to be prepared
to apply domain_check().  It seems better that unprepared code fail
with an error like "<type> is not composite" than that it silently fail
to apply domain constraints.  Therefore, relevant infrastructure like
get_func_result_type() and lookup_rowtype_tupdesc() has been adjusted
to treat domain-over-composite as a distinct case that unprepared code
won't recognize, rather than just transparently treating it the same
as plain composite.  This isn't a 100% solution to the possibility of
overlooked domain checks, but it catches most places.

In passing, improve typcache.c's support for domains (it can now cache
the identity of a domain's base type), and rewrite the argument handling
logic in jsonfuncs.c's populate_record[set]_worker to reduce duplicative
per-call lookups.

I believe this is code-complete so far as the core and contrib code go.
The PLs need varying amounts of work, which will be tackled in followup
patches.

Discussion: https://postgr.es/m/4206.1499798337@sss.pgh.pa.us
2017-10-26 13:47:45 -04:00
18fc4ecf4a Process variadic arguments consistently in json functions
json_build_object and json_build_array and the jsonb equivalents did not
correctly process explicit VARIADIC arguments. They are modified to use
the new extract_variadic_args() utility function which abstracts away
the details of the call method.

Michael Paquier, reviewed by Tom Lane and Dmitry Dolgov.

Backpatch to 9.5 for the jsonb fixes and 9.4 for the json fixes, as
that's where they originated.
2017-10-25 07:34:00 -04:00
cf35346e81 Make json_populate_record and friends operate recursively
With this change array fields are populated from json(b) arrays, and
composite fields are populated from json(b) objects.

Along the way, some significant code refactoring is done to remove
redundancy in the way to populate_record[_set] and to_record[_set]
functions operate, and some significant efficiency gains are made by
caching tuple descriptors.

Nikita Glukhov, edited some by me.

Reviewed by Aleksander Alekseev and Tom Lane.
2017-04-06 22:22:13 -04:00
c281cd5fe1 Fix unstable regression test result.
Whoops, missed that same test was made for json as well as jsonb.
2017-03-31 20:29:30 -04:00
e306df7f9c Full Text Search support for json and jsonb
The new functions are ts_headline() and to_tsvector.

Dmitry Dolgov, edited and documented by me.
2017-03-31 14:26:03 -04:00
502a3832cc Correctly handle array pseudotypes in to_json and to_jsonb
Columns with array pseudotypes have not been identified as arrays, so
they have been rendered as strings in the json and jsonb conversion
routines. This change allows them to be rendered as json arrays, making
it possible to deal correctly with the anyarray columns in pg_stats.
2017-02-22 11:10:49 -05:00
a9d199f6d3 Fix json_to_record() bug with nested objects.
A thinko concerning nesting depth caused json_to_record() to produce bogus
output if a field of its input object contained a sub-object with a field
name matching one of the requested output column names.  Per bug #13996
from Johann Visagie.

I added a regression test case based on his example, plus parallel tests
for json_to_recordset, jsonb_to_record, jsonb_to_recordset.  The latter
three do not exhibit the same bug (which suggests that we may be missing
some opportunities to share code...) but testing seems like a good idea
in any case.

Back-patch to 9.4 where these functions were introduced.
2016-03-02 23:31:39 -05:00
94c745eb18 Fix two-argument jsonb_object when called with empty arrays
Some over-eager copy-and-pasting on my part resulted in a nonsense
result being returned in this case. I have adopted the same pattern for
handling this case as is used in the one argument form of the function,
i.e. we just skip over the code that adds values to the object.

Diagnosis and patch from Michael Paquier, although not quite his
solution.

Fixes bug #13936.

Backpatch to 9.5 where jsonb_object was introduced.
2016-02-21 10:30:49 -05:00
d435542583 Fix incorrect translation of minus-infinity datetimes for json/jsonb.
Commit bda76c1c8cfb1d11751ba6be88f0242850481733 caused both plus and
minus infinity to be rendered as "infinity", which is not only wrong
but inconsistent with the pre-9.4 behavior of to_json().  Fix that by
duplicating the coding in date_out/timestamp_out/timestamptz_out more
closely.  Per bug #13687 from Stepan Perlov.  Back-patch to 9.4, like
the previous commit.

In passing, also re-pgindent json.c, since it had gotten a bit messed up by
recent patches (and I was already annoyed by indentation-related problems
in back-patching this fix ...)
2015-10-20 11:07:04 -07:00
b6363772fd Factor out encoding specific tests for json
This lets us remove the large alternative results files for the main
json and jsonb tests, which makes modifying those tests simpler for
committers and patch submitters.

Backpatch to 9.4 for jsonb and 9.3 for json.
2015-10-07 22:18:27 -04:00
9e36c91b46 Fix insufficiently-portable regression test case.
Some of the buildfarm members are evidently miserly enough of stack space
to pass the originally-committed form of this test.  Increase the
requirement 10X to hopefully ensure that it fails as-expected everywhere.

Security: CVE-2015-5289
2015-10-05 12:19:14 -04:00
08fa47c485 Prevent stack overflow in json-related functions.
Sufficiently-deep recursion heretofore elicited a SIGSEGV.  If an
application constructs PostgreSQL json or jsonb values from arbitrary
user input, application users could have exploited this to terminate all
active database connections.  That applies to 9.3, where the json parser
adopted recursive descent, and later versions.  Only row_to_json() and
array_to_json() were at risk in 9.2, both in a non-security capacity.
Back-patch to 9.2, where the json type was introduced.

Oskari Saarenmaa, reviewed by Michael Paquier.

Security: CVE-2015-5289
2015-10-05 10:06:29 -04:00
d9a356ff2e Fix treatment of nulls in jsonb_agg and jsonb_object_agg
The wrong is_null flag was being passed to datum_to_json. Also, null
object key values are not permitted, and this was not being checked
for. Add regression tests covering these cases, and also add those tests
to the json set, even though it was doing the right thing.

Fixes bug #13514, initially diagnosed by Tom Lane.
2015-07-24 09:40:46 -04:00
e02d44b8a7 Support JSON negative array subscripts everywhere
Previously, there was an inconsistency across json/jsonb operators that
operate on datums containing JSON arrays -- only some operators
supported negative array count-from-the-end subscripting.  Specifically,
only a new-to-9.5 jsonb deletion operator had support (the new "jsonb -
integer" operator).  This inconsistency seemed likely to be
counter-intuitive to users.  To fix, allow all places where the user can
supply an integer subscript to accept a negative subscript value,
including path-orientated operators and functions, as well as other
extraction operators.  This will need to be called out as an
incompatibility in the 9.5 release notes, since it's possible that users
are relying on certain established extraction operators changed here
yielding NULL in the event of a negative subscript.

For the json type, this requires adding a way of cheaply getting the
total JSON array element count ahead of time when parsing arrays with a
negative subscript involved, necessitating an ad-hoc lex and parse.
This is followed by a "conversion" from a negative subscript to its
equivalent positive-wise value using the count.  From there on, it's as
if a positive-wise value was originally provided.

Note that there is still a minor inconsistency here across jsonb
deletion operators.  Unlike the aforementioned new "-" deletion operator
that accepts an integer on its right hand side, the new "#-" path
orientated deletion variant does not throw an error when it appears like
an array subscript (input that could be recognized by as an integer
literal) is being used on an object, which is wrong-headed.  The reason
for not being stricter is that it could be the case that an object pair
happens to have a key value that looks like an integer; in general,
these two possibilities are impossible to differentiate with rhs path
text[] argument elements.  However, we still don't allow the "#-"
path-orientated deletion operator to perform array-style subscripting.
Rather, we just return the original left operand value in the event of a
negative subscript (which seems analogous to how the established
"jsonb/json #> text[]" path-orientated operator may yield NULL in the
event of an invalid subscript).

In passing, make SetArrayPath() stricter about not accepting cases where
there is trailing non-numeric garbage bytes rather than a clean NUL
byte.  This means, for example, that strings like "10e10" are now not
accepted as an array subscript of 10 by some new-to-9.5 path-orientated
jsonb operators (e.g. the new #- operator).  Finally, remove dead code
for jsonb subscript deletion; arguably, this should have been done in
commit b81c7b409.

Peter Geoghegan and Andrew Dunstan
2015-07-17 21:13:47 -04:00
bda76c1c8c Render infinite date/timestamps as 'infinity' for json/jsonb
Commit ab14a73a6c raised an error in these cases and later the
behaviour was copied to jsonb. This is what the XML code, which we
then adopted, does, as the XSD types don't accept infinite values.
However, json dates and timestamps are just strings as far as json is
concerned, so there is no reason not to render these values as
'infinity'.

The json portion of this is backpatched to 9.4 where the behaviour was
introduced. The jsonb portion only affects the development branch.

Per gripe on pgsql-general.
2015-02-26 12:25:21 -05:00
451d280815 Fix jsonb Unicode escape processing, and in consequence disallow \u0000.
We've been trying to support \u0000 in JSON values since commit
78ed8e03c67d7333, and have introduced increasingly worse hacks to try to
make it work, such as commit 0ad1a816320a2b53.  However, it fundamentally
can't work in the way envisioned, because the stored representation looks
the same as for \\u0000 which is not the same thing at all.  It's also
entirely bogus to output \u0000 when de-escaped output is called for.

The right way to do this would be to store an actual 0x00 byte, and then
throw error only if asked to produce de-escaped textual output.  However,
getting to that point seems likely to take considerable work and may well
never be practical in the 9.4.x series.

To preserve our options for better behavior while getting rid of the nasty
side-effects of 0ad1a816320a2b53, revert that commit in toto and instead
throw error if \u0000 is used in a context where it needs to be de-escaped.
(These are the same contexts where non-ASCII Unicode escapes throw error
if the database encoding isn't UTF8, so this behavior is by no means
without precedent.)

In passing, make both the \u0000 case and the non-ASCII Unicode case report
ERRCODE_UNTRANSLATABLE_CHARACTER / "unsupported Unicode escape sequence"
rather than claiming there's something wrong with the input syntax.

Back-patch to 9.4, where we have to do something because 0ad1a816320a2b53
broke things for many cases having nothing to do with \u0000.  9.3 also has
bogus behavior, but only for that specific escape value, so given the lack
of field complaints it seems better to leave 9.3 alone.
2015-01-30 14:44:56 -05:00
237a882443 Add json_strip_nulls and jsonb_strip_nulls functions.
The functions remove object fields, including in nested objects, that
have null as a value. In certain cases this can lead to considerably
smaller datums, with no loss of semantic information.

Andrew Dunstan, reviewed by Pavel Stehule.
2014-12-12 09:00:43 -05:00
c8a026e4f1 Revert 95d737ff to add 'ignore_nulls'
Per discussion, revert the commit which added 'ignore_nulls' to
row_to_json.  This capability would be better added as an independent
function rather than being bolted on to row_to_json.  Additionally,
the implementation didn't address complex JSON objects, and so was
incomplete anyway.

Pointed out by Tom and discussed with Andrew and Robert.
2014-09-29 13:32:22 -04:00
95d737ff45 Add 'ignore_nulls' option to row_to_json
Provide an option to skip NULL values in a row when generating a JSON
object from that row with row_to_json.  This can reduce the size of the
JSON object in cases where columns are NULL without really reducing the
information in the JSON object.

This also makes row_to_json into a single function with default values,
rather than having multiple functions.  In passing, change array_to_json
to also be a single function with default values (we don't add an
'ignore_nulls' option yet- it's not clear that there is a sensible
use-case there, and it hasn't been asked for in any case).

Pavel Stehule
2014-09-11 21:23:51 -04:00
41dd50e84d Fix corner-case behaviors in JSON/JSONB field extraction operators.
Cause the path extraction operators to return their lefthand input,
not NULL, if the path array has no elements.  This seems more consistent
since the case ought to correspond to applying the simple extraction
operator (->) zero times.

Cause other corner cases in field/element/path extraction to return NULL
rather than failing.  This behavior is arguably more useful than throwing
an error, since it allows an expression index using these operators to be
built even when not all values in the column are suitable for the
extraction being indexed.  Moreover, we already had multiple
inconsistencies between the path extraction operators and the simple
extraction operators, as well as inconsistencies between the JSON and
JSONB code paths.  Adopt a uniform rule of returning NULL rather than
throwing an error when the JSON input does not have a structure that
permits the request to be satisfied.

Back-patch to 9.4.  Update the release notes to list this as a behavior
change since 9.3.
2014-08-22 13:17:58 -04:00
fa069822f5 More regression test cases for json/jsonb extraction operators.
Cover some cases I omitted before, such as null and empty-string
elements in the path array.  This exposes another inconsistency:
json_extract_path complains about empty path elements but
jsonb_extract_path does not.
2014-08-20 19:05:05 -04:00
9bac66020d Fix core dump in jsonb #> operator, and add regression test cases.
jsonb's #> operator segfaulted (dereferencing a null pointer) if the RHS
was a zero-length array, as reported in bug #11207 from Justin Van Winkle.
json's #> operator returns NULL in such cases, so for the moment let's
make jsonb act likewise.

Also add a bunch of regression test queries memorializing the -> and #>
operators' behavior for this and other corner cases.

There is a good argument for changing some of these behaviors, as they
are not very consistent with each other, and throwing an error isn't
necessarily a desirable behavior for operators that are likely to be
used in indexes.  However, everybody can agree that a core dump is the
Wrong Thing, and we need test cases even if we decide to change their
expected output later.
2014-08-20 16:48:53 -04:00
4ebe3519e1 Allow empty string object keys in json_object().
This makes the behaviour consistent with the json parser, other
json-generating functions, and the JSON standards.
2014-07-22 11:27:31 -04:00
a749a23d7a Remove use_json_as_text options from json_to_record/json_populate_record.
The "false" case was really quite useless since all it did was to throw
an error; a definition not helped in the least by making it the default.
Instead let's just have the "true" case, which emits nested objects and
arrays in JSON syntax.  We might later want to provide the ability to
emit sub-objects in Postgres record or array syntax, but we'd be best off
to drive that off a check of the target field datatype, not a separate
argument.

For the functions newly added in 9.4, we can just remove the flag arguments
outright.  We can't do that for json_populate_record[set], which already
existed in 9.3, but we can ignore the argument and always behave as if it
were "true".  It helps that the flag arguments were optional and not
documented in any useful fashion anyway.
2014-06-29 13:50:58 -04:00
57d8c1270e Fix handling of nested JSON objects in json_populate_recordset and friends.
populate_recordset_object_start() improperly created a new hash table
(overwriting the link to the existing one) if called at nest levels
greater than one.  This resulted in previous fields not appearing in
the final output, as reported by Matti Hameister in bug #10728.
In 9.4 the problem also affects json_to_recordset.

This perhaps missed detection earlier because the default behavior is to
throw an error for nested objects: you have to pass use_json_as_text = true
to see the problem.

In addition, fix query-lifespan leakage of the hashtable created by
json_populate_record().  This is pretty much the same problem recently
fixed in dblink: creating an intended-to-be-temporary context underneath
the executor's per-tuple context isn't enough to make it go away at the
end of the tuple cycle, because MemoryContextReset is not
MemoryContextResetAndDeleteChildren.

Michael Paquier and Tom Lane
2014-06-24 21:22:40 -07:00
0ad1a81632 Do not escape a unicode sequence when escaping JSON text.
Previously, any backslash in text being escaped for JSON was doubled so
that the result was still valid JSON. However, this led to some perverse
results in the case of Unicode sequences, These are now detected and the
initial backslash is no longer escaped. All other backslashes are
still escaped. No validity check is performed, all that is looked for is
\uXXXX where X is a hexidecimal digit.

This is a change from the 9.2 and 9.3 behaviour as noted in the Release
notes.

Per complaint from Teodor Sigaev.
2014-06-03 16:11:31 -04:00
f30015b6d7 Output timestamps in ISO 8601 format when rendering JSON.
Many JSON processors require timestamp strings in ISO 8601 format in
order to convert the strings. When converting a timestamp, with or
without timezone, to a JSON datum we therefore now use such a format
rather than the type's default text output, in functions such as
to_json().

This is a change in behaviour from 9.2 and 9.3, as noted in the release
notes.
2014-06-03 13:56:53 -04:00
d9134d0a35 Introduce jsonb, a structured format for storing json.
The new format accepts exactly the same data as the json type. However, it is
stored in a format that does not require reparsing the orgiginal text in order
to process it, making it much more suitable for indexing and other operations.
Insignificant whitespace is discarded, and the order of object keys is not
preserved. Neither are duplicate object keys kept - the later value for a given
key is the only one stored.

The new type has all the functions and operators that the json type has,
with the exception of the json generation functions (to_json, json_agg etc.)
and with identical semantics. In addition, there are operator classes for
hash and btree indexing, and two classes for GIN indexing, that have no
equivalent in the json type.

This feature grew out of previous work by Oleg Bartunov and Teodor Sigaev, which
was intended to provide similar facilities to a nested hstore type, but which
in the end proved to have some significant compatibility issues.

Authors: Oleg Bartunov,  Teodor Sigaev, Peter Geoghegan and Andrew Dunstan.
Review: Andres Freund
2014-03-23 16:40:19 -04:00
5264d91541 Add json_array_elements_text function.
This was a notable omission from the json functions added in 9.3 and
there have been numerous complaints about its absence.

Laurence Rowe.
2014-01-29 15:39:01 -05:00
105639900b New json functions.
json_build_array() and json_build_object allow for the construction of
arbitrarily complex json trees. json_object() turns a one or two
dimensional array, or two separate arrays, into a json_object of
name/value pairs, similarly to the hstore() function.
json_object_agg() aggregates its two arguments into a single json object
as name value pairs.

Catalog version bumped.

Andrew Dunstan, reviewed by Marko Tiikkaja.
2014-01-28 17:48:21 -05:00
001e114b8d Fix whitespace issues found by git diff --check, add gitattributes
Set per file type attributes in .gitattributes to fine-tune whitespace
checks.  With the associated cleanups, the tree is now clean for git
2013-11-10 14:48:29 -05:00
4d212bac17 json_typeof function.
Andrew Tipton.
2013-10-10 12:21:59 -04:00
78ed8e03c6 Fix unescaping of JSON Unicode escapes, especially for non-UTF8.
Per discussion  on -hackers. We treat Unicode escapes when unescaping
them similarly to the way we treat them in PostgreSQL string literals.
Escapes in the ASCII range are always accepted, no matter what the
database encoding. Escapes for higher code points are only processed in
UTF8 databases, and attempts to process them in other databases will
result in an error. \u0000 is never unescaped, since it would result in
an impermissible null byte.
2013-06-12 13:35:24 -04:00
94e3311b97 Handle Unicode surrogate pairs correctly when processing JSON.
In 9.2, Unicode escape sequences are not analysed at all other than
to make sure that they are in the form \uXXXX. But in 9.3 many of the
new operators and functions try to turn JSON text values into text in
the server encoding, and this includes de-escaping Unicode escape
sequences. This processing had not taken into account the possibility
that this might contain a surrogate pair to designate a character
outside the BMP. That is now handled correctly.

This also enforces correct use of surrogate pairs, something that is not
done by the type's input routines. This fact is noted in the docs.
2013-06-08 09:12:48 -04:00