Commit Graph

30 Commits

Author SHA1 Message Date
fcec6caafa Support XMLTABLE query expression
XMLTABLE is defined by the SQL/XML standard as a feature that allows
turning XML-formatted data into relational form, so that it can be used
as a <table primary> in the FROM clause of a query.

This new construct provides significant simplicity and performance
benefit for XML data processing; what in a client-side custom
implementation was reported to take 20 minutes can be executed in 400ms
using XMLTABLE.  (The same functionality was said to take 10 seconds
using nested PostgreSQL XPath function calls, and 5 seconds using
XMLReader under PL/Python).

The implemented syntax deviates slightly from what the standard
requires.  First, the standard indicates that the PASSING clause is
optional and that multiple XML input documents may be given to it; we
make it mandatory and accept a single document only.  Second, we don't
currently support a default namespace to be specified.

This implementation relies on a new executor node based on a hardcoded
method table.  (Because the grammar is fixed, there is no extensibility
in the current approach; further constructs can be implemented on top of
this such as JSON_TABLE, but they require changes to core code.)

Author: Pavel Stehule, Álvaro Herrera
Extensively reviewed by: Craig Ringer
Discussion: https://postgr.es/m/CAFj8pRAgfzMD-LoSmnMGybD0WsEznLHWap8DO79+-GTRAPR4qA@mail.gmail.com
2017-03-08 12:40:26 -03:00
e2f1765ce0 Remove xmlparse(document '') test
This one test was behaving differently between the ubuntu fix for
CVE-2015-7499 and the base "expected" file.  It's not worth having
yet another version of the expected file for this test, so drop it.
Perhaps at some point when all distros have settled down to the
same behavior on this test, it can be restored.

Problem found by me on libxml2 (2.9.1+dfsg1-3ubuntu4.6).
Solution suggested by Tom Lane.
Backpatch to 9.5, where the test was added.
2015-12-14 11:37:26 -06:00
79af9a1d26 Fix namespace handling in xpath function
Previously, the xml value resulting from an xpath query would not have
namespace declarations if the namespace declarations were attached to
an ancestor element in the input xml value.  That means the output value
was not correct XML.  Fix that by running the result value through
xmlCopyNode(), which produces the correct namespace declarations.

Author: Ali Akbar <the.apaan@gmail.com>
2015-01-06 23:06:13 -05:00
57b1085df5 Allow empty content in xml type
The xml type previously rejected "content" that is empty or consists
only of spaces.  But the SQL/XML standard allows that, so change that.
The accepted values for XML "documents" are not changed.

Reviewed-by: Ali Akbar <the.apaan@gmail.com>
2014-09-09 11:34:52 -04:00
17351fce4e Prevent access to external files/URLs via XML entity references.
xml_parse() would attempt to fetch external files or URLs as needed to
resolve DTD and entity references in an XML value, thus allowing
unprivileged database users to attempt to fetch data with the privileges
of the database server.  While the external data wouldn't get returned
directly to the user, portions of it could be exposed in error messages
if the data didn't parse as valid XML; and in any case the mere ability
to check existence of a file might be useful to an attacker.

The ideal solution to this would still allow fetching of references that
are listed in the host system's XML catalogs, so that documents can be
validated according to installed DTDs.  However, doing that with the
available libxml2 APIs appears complex and error-prone, so we're not going
to risk it in a security patch that necessarily hasn't gotten wide review.
So this patch merely shuts off all access, causing any external fetch to
silently expand to an empty string.  A future patch may improve this.

In HEAD and 9.2, also suppress warnings about undefined entities, which
would otherwise occur as a result of not loading referenced DTDs.  Previous
branches don't show such warnings anyway, due to different error handling
arrangements.

Credit to Noah Misch for first reporting the problem, and for much work
towards a solution, though this simplistic approach was not his preference.
Also thanks to Daniel Veillard for consultation.

Security: CVE-2012-3489
2012-08-14 18:31:16 -04:00
0ce7676aa0 Make xpath() do something useful with XPath expressions that return scalars.
Previously, xpath() simply returned an empty array if the expression did
not yield a node set.  This is useless for expressions that return scalars,
such as one with name() at the top level.  Arrange to return the scalar
value as a single-element xml array, instead.  (String values will be
suitably escaped.)

This change will also cause xpath_exists() to return true, not false,
for such expressions.

Florian Pflug, reviewed by Radoslaw Smogura
2011-07-21 11:32:46 -04:00
aaf15e5c1c Ensure that xpath() escapes special characters in string values.
Without this it's possible for the output to not be legal XML, as
illustrated by the added regression test cases.

NB: this change will need to be called out as an incompatibility in the
9.2 release notes, since it's possible somebody was relying on the old
behavior, even though it's clearly wrong.

Florian Pflug, reviewed by Radoslaw Smogura
2011-07-20 18:44:35 -04:00
cacd42d62c Rewrite libxml error handling to be more robust.
libxml reports some errors (like invalid xmlns attributes) via the error
handler hook, but still returns a success indicator to the library caller.
This causes us to miss some errors that are important to report.  Since the
"generic" error handler hook doesn't know whether the message it's getting
is for an error, warning, or notice, stop using that and instead start
using the "structured" error handler hook, which gets enough information
to be useful.

While at it, arrange to save and restore the error handler hook setting in
each libxml-using function, rather than assuming we can set and forget the
hook.  This should improve the odds of working nicely with third-party
libraries that also use libxml.

In passing, volatile-ize some local variables that get modified within
PG_TRY blocks.  I noticed this while testing with an older gcc version
than I'd previously tried to compile xml.c with.

Florian Pflug and Tom Lane, with extensive review/testing by Noah Misch
2011-07-20 13:03:49 -04:00
a0b7b717a4 Add xml_is_well_formed, xml_is_well_formed_document, xml_is_well_formed_content
functions to the core XML code.  Per discussion, the former depends on
XMLOPTION while the others do not.  These supersede a version previously
offered by contrib/xml2.

Mike Fowler, reviewed by Pavel Stehule
2010-08-13 18:36:26 +00:00
4dfc457854 Add an xpath_exists() function. This is equivalent to XMLEXISTS except that
it offers support for namespace mapping.

Mike Fowler, reviewed by David Fetter
2010-08-08 19:15:27 +00:00
641459f269 Add xmlexists function
by Mike Fowler, reviewed by Peter Eisentraut
2010-08-05 04:21:54 +00:00
9b7304bc25 Fix xmlattribute escaping XML special characters twice (bug #4822).
Author: Itagaki Takahiro <itagaki.takahiro@oss.ntt.co.jp>
2009-06-09 22:00:57 +00:00
77d67a4a3b XMLATTRIBUTES() should send the attribute values through
map_sql_value_to_xml_value() instead of directly through the data type output
function.  This is per SQL standard, and consistent with XMLELEMENT().
2009-04-08 21:51:38 +00:00
8aad333f8f Fix crash of xmlconcat(NULL)
also backpatched to 8.3
2008-11-15 20:52:35 +00:00
8db43db01e Allow XML processing instructions starting with "xml" while prohibiting
those being exactly "xml".  Bug #3735 from Ben Leslie
2007-11-09 15:52:51 +00:00
3963574d13 XPath fixes:
- Function renamed to "xpath".
 - Function is now strict, per discussion.
 - Return empty array in case when XPath expression detects nothing
   (previously, NULL was returned in such case), per discussion.
 - (bugfix) Work with fragments with prologue: select xpath('/a',
   '<?xml version="1.0"?><a /><b />'); // now XML datum is always wrapped
   with dummy <x>...</x>, XML prologue simply goes away (if any).
 - Some cleanup.

Nikolay Samokhvalov

Some code cleanup and documentation work by myself.
2007-05-21 17:10:29 +00:00
ea3b212fee Commit newest version of xmlpath().
Nikolay Samokhvalov
2007-03-22 20:26:30 +00:00
e651bcf3f6 Add xmlpath() to evaluate XPath expressions, with namespaces support.
Nikolay Samokhvalov
2007-03-22 20:14:58 +00:00
eecbb33267 Add ORDER BY to a query on information_schema.views, to avoid possible
platform-specific result ordering.  Per buildfarm results.
2007-02-15 05:05:03 +00:00
ec020e1ceb Implement XMLSERIALIZE for real. Analogously, make the xml to text cast
observe the xmloption.

Reorganize the representation of the XML option in the parse tree and the
API to make it easier to manage and understand.

Add regression tests for parsing back XML expressions.
2007-02-03 14:06:56 +00:00
22bd156ff0 Various fixes in the logic of XML functions:
- Add new SQL command SET XML OPTION (also available via regular GUC) to
  control the DOCUMENT vs. CONTENT option in implicit parsing and
  serialization operations.

- Subtle corrections in the handling of the standalone property in
  xmlroot().

- Allow xmlroot() to work on content fragments.

- Subtle corrections in the handling of the version property in
  xmlconcat().

- Code refactoring for producing XML declarations.
2007-01-25 11:53:52 +00:00
b4c8d49036 Fix xmlconcat by properly merging the XML declarations. Add aggregate
function xmlagg.
2007-01-20 09:27:20 +00:00
4b48ad4fb2 Add support for converting binary values (i.e. bytea) into xml values,
with new GUC parameter "xmlbinary" that controls the output encoding, as
per SQL/XML standard.
2007-01-19 16:58:46 +00:00
2f8f76bcd5 Add support for xmlval IS DOCUMENT expression. 2007-01-14 13:11:54 +00:00
fc568b9d8f Allow for arbitrary data types as content in XMLELEMENT. The original
coercion to type xml was a mistake.  Escape values so they are valid
XML character data.
2007-01-12 16:29:24 +00:00
3a32ba2f3f Prevent duplicate attribute names in XMLELEMENT. 2007-01-08 23:41:57 +00:00
d807c7ef3f Some fine-tuning of xmlpi in corner cases:
- correct error codes
- do syntax checks in correct order
- strip leading spaces of argument
2007-01-07 22:49:56 +00:00
19749fb0cf Replace xmlroot with a properly functioning version that parses the value,
sets the items, and serializes the value back (rather than adding an
arbitrary number of XML preambles as before).

The libxml memory management via palloc had to be disabled because it
crashes when libxml tries to access memory that was helpfully freed
earlier by PostgreSQL.  This needs further thought.
2007-01-06 19:18:36 +00:00
d9e1c97feb Handle content and document options in xmlparse() correctly. 2006-12-28 03:17:38 +00:00
8c1de5fb00 Initial SQL/XML support: xml data type and initial set of functions. 2006-12-21 16:05:16 +00:00