Update bundled PCRE2-library to version 10.23

Some manual changes done to the library were lost with this update. They will be added in the next commit.
2017-05-29 15:31:42 +03:00
parent 7231563937
commit 36af74cb25
218 changed files with 49218 additions and 26130 deletions
--- a/pcre2/doc/html/pcre2build.html
+++ b/pcre2/doc/html/pcre2build.html
@ -18,23 +18,26 @@ please consult the man page, in case the conversion went wrong.
 <li><a name="TOC3" href="#SEC3">BUILDING 8-BIT, 16-BIT AND 32-BIT LIBRARIES</a>
 <li><a name="TOC4" href="#SEC4">BUILDING SHARED AND STATIC LIBRARIES</a>
 <li><a name="TOC5" href="#SEC5">UNICODE AND UTF SUPPORT</a>
-<li><a name="TOC6" href="#SEC6">JUST-IN-TIME COMPILER SUPPORT</a>
-<li><a name="TOC7" href="#SEC7">NEWLINE RECOGNITION</a>
-<li><a name="TOC8" href="#SEC8">WHAT \R MATCHES</a>
-<li><a name="TOC9" href="#SEC9">HANDLING VERY LARGE PATTERNS</a>
-<li><a name="TOC10" href="#SEC10">AVOIDING EXCESSIVE STACK USAGE</a>
-<li><a name="TOC11" href="#SEC11">LIMITING PCRE2 RESOURCE USAGE</a>
-<li><a name="TOC12" href="#SEC12">CREATING CHARACTER TABLES AT BUILD TIME</a>
-<li><a name="TOC13" href="#SEC13">USING EBCDIC CODE</a>
-<li><a name="TOC14" href="#SEC14">PCRE2GREP OPTIONS FOR COMPRESSED FILE SUPPORT</a>
-<li><a name="TOC15" href="#SEC15">PCRE2GREP BUFFER SIZE</a>
-<li><a name="TOC16" href="#SEC16">PCRE2TEST OPTION FOR LIBREADLINE SUPPORT</a>
-<li><a name="TOC17" href="#SEC17">INCLUDING DEBUGGING CODE</a>
-<li><a name="TOC18" href="#SEC18">DEBUGGING WITH VALGRIND SUPPORT</a>
-<li><a name="TOC19" href="#SEC19">CODE COVERAGE REPORTING</a>
-<li><a name="TOC20" href="#SEC20">SEE ALSO</a>
-<li><a name="TOC21" href="#SEC21">AUTHOR</a>
-<li><a name="TOC22" href="#SEC22">REVISION</a>
+<li><a name="TOC6" href="#SEC6">DISABLING THE USE OF \C</a>
+<li><a name="TOC7" href="#SEC7">JUST-IN-TIME COMPILER SUPPORT</a>
+<li><a name="TOC8" href="#SEC8">NEWLINE RECOGNITION</a>
+<li><a name="TOC9" href="#SEC9">WHAT \R MATCHES</a>
+<li><a name="TOC10" href="#SEC10">HANDLING VERY LARGE PATTERNS</a>
+<li><a name="TOC11" href="#SEC11">AVOIDING EXCESSIVE STACK USAGE</a>
+<li><a name="TOC12" href="#SEC12">LIMITING PCRE2 RESOURCE USAGE</a>
+<li><a name="TOC13" href="#SEC13">CREATING CHARACTER TABLES AT BUILD TIME</a>
+<li><a name="TOC14" href="#SEC14">USING EBCDIC CODE</a>
+<li><a name="TOC15" href="#SEC15">PCRE2GREP SUPPORT FOR EXTERNAL SCRIPTS</a>
+<li><a name="TOC16" href="#SEC16">PCRE2GREP OPTIONS FOR COMPRESSED FILE SUPPORT</a>
+<li><a name="TOC17" href="#SEC17">PCRE2GREP BUFFER SIZE</a>
+<li><a name="TOC18" href="#SEC18">PCRE2TEST OPTION FOR LIBREADLINE SUPPORT</a>
+<li><a name="TOC19" href="#SEC19">INCLUDING DEBUGGING CODE</a>
+<li><a name="TOC20" href="#SEC20">DEBUGGING WITH VALGRIND SUPPORT</a>
+<li><a name="TOC21" href="#SEC21">CODE COVERAGE REPORTING</a>
+<li><a name="TOC22" href="#SEC22">SUPPORT FOR FUZZERS</a>
+<li><a name="TOC23" href="#SEC23">SEE ALSO</a>
+<li><a name="TOC24" href="#SEC24">AUTHOR</a>
+<li><a name="TOC25" href="#SEC25">REVISION</a>
 </ul>
 <br><a name="SEC1" href="#TOC1">BUILDING PCRE2</a><br>
 <P>
@ -148,13 +151,19 @@ properties. The application can request that they do by setting the PCRE2_UCP
 option. Unless the application has set PCRE2_NEVER_UCP, a pattern may also
 request this by starting with (*UCP).
 </P>
+<br><a name="SEC6" href="#TOC1">DISABLING THE USE OF \C</a><br>
 <P>
 The \C escape sequence, which matches a single code unit, even in a UTF mode,
 can cause unpredictable behaviour because it may leave the current matching
-point in the middle of a multi-code-unit character. It can be locked out by
-setting the PCRE2_NEVER_BACKSLASH_C option.
+point in the middle of a multi-code-unit character. The application can lock it
+out by setting the PCRE2_NEVER_BACKSLASH_C option when calling
+<b>pcre2_compile()</b>. There is also a build-time option
+<pre>
+  --enable-never-backslash-C
+</pre>
+(note the upper case C) which locks out the use of \C entirely.
 </P>
-<br><a name="SEC6" href="#TOC1">JUST-IN-TIME COMPILER SUPPORT</a><br>
+<br><a name="SEC7" href="#TOC1">JUST-IN-TIME COMPILER SUPPORT</a><br>
 <P>
 Just-in-time compiler support is included in the build by specifying
 <pre>
@ -171,7 +180,7 @@ pcre2grep automatically makes use of it, unless you add
 </pre>
 to the "configure" command.
 </P>
-<br><a name="SEC7" href="#TOC1">NEWLINE RECOGNITION</a><br>
+<br><a name="SEC8" href="#TOC1">NEWLINE RECOGNITION</a><br>
 <P>
 By default, PCRE2 interprets the linefeed (LF) character as indicating the end
 of a line. This is the normal newline character on Unix-like systems. You can
@ -208,7 +217,7 @@ Whatever default line ending convention is selected when PCRE2 is built can be
 overridden by applications that use the library. At build time it is
 conventional to use the standard for your operating system.
 </P>
-<br><a name="SEC8" href="#TOC1">WHAT \R MATCHES</a><br>
+<br><a name="SEC9" href="#TOC1">WHAT \R MATCHES</a><br>
 <P>
 By default, the sequence \R in a pattern matches any Unicode newline sequence,
 independently of what has been selected as the line ending sequence. If you
@ -220,7 +229,7 @@ the default is changed so that \R matches only CR, LF, or CRLF. Whatever is
 selected when PCRE2 is built can be overridden by applications that use the
 called.
 </P>
-<br><a name="SEC9" href="#TOC1">HANDLING VERY LARGE PATTERNS</a><br>
+<br><a name="SEC10" href="#TOC1">HANDLING VERY LARGE PATTERNS</a><br>
 <P>
 Within a compiled pattern, offset values are used to point from one part to
 another (for example, from an opening parenthesis to an alternation
@ -239,7 +248,7 @@ longer offsets slows down the operation of PCRE2 because it has to load
 additional data when handling them. For the 32-bit library the value is always
 4 and cannot be overridden; the value of --with-link-size is ignored.
 </P>
-<br><a name="SEC10" href="#TOC1">AVOIDING EXCESSIVE STACK USAGE</a><br>
+<br><a name="SEC11" href="#TOC1">AVOIDING EXCESSIVE STACK USAGE</a><br>
 <P>
 When matching with the <b>pcre2_match()</b> function, PCRE2 implements
 backtracking by making recursive calls to an internal function called
@ -261,7 +270,7 @@ custom memory management functions can be called instead. PCRE2 runs noticeably
 more slowly when built in this way. This option affects only the
 <b>pcre2_match()</b> function; it is not relevant for <b>pcre2_dfa_match()</b>.
 </P>
-<br><a name="SEC11" href="#TOC1">LIMITING PCRE2 RESOURCE USAGE</a><br>
+<br><a name="SEC12" href="#TOC1">LIMITING PCRE2 RESOURCE USAGE</a><br>
 <P>
 Internally, PCRE2 has a function called <b>match()</b>, which it calls
 repeatedly (sometimes recursively) when matching a pattern with the
@ -290,7 +299,7 @@ constraints. However, you can set a lower limit by adding, for example,
 </pre>
 to the <b>configure</b> command. This value can also be overridden at run time.
 </P>
-<br><a name="SEC12" href="#TOC1">CREATING CHARACTER TABLES AT BUILD TIME</a><br>
+<br><a name="SEC13" href="#TOC1">CREATING CHARACTER TABLES AT BUILD TIME</a><br>
 <P>
 PCRE2 uses fixed tables for processing characters whose code points are less
 than 256. By default, PCRE2 is built with a set of tables that are distributed
@ -307,7 +316,7 @@ compiling, because <b>dftables</b> is run on the local host. If you need to
 create alternative tables when cross compiling, you will have to do so "by
 hand".)
 </P>
-<br><a name="SEC13" href="#TOC1">USING EBCDIC CODE</a><br>
+<br><a name="SEC14" href="#TOC1">USING EBCDIC CODE</a><br>
 <P>
 PCRE2 assumes by default that it will run in an environment where the character
 code is ASCII or Unicode, which is a superset of ASCII. This is the case for
@ -342,7 +351,16 @@ The options that select newline behaviour, such as --enable-newline-is-cr,
 and equivalent run-time options, refer to these character values in an EBCDIC
 environment.
 </P>
-<br><a name="SEC14" href="#TOC1">PCRE2GREP OPTIONS FOR COMPRESSED FILE SUPPORT</a><br>
+<br><a name="SEC15" href="#TOC1">PCRE2GREP SUPPORT FOR EXTERNAL SCRIPTS</a><br>
+<P>
+By default, on non-Windows systems, <b>pcre2grep</b> supports the use of
+callouts with string arguments within the patterns it is matching, in order to
+run external scripts. For details, see the
+<a href="pcre2grep.html"><b>pcre2grep</b></a>
+documentation. This support can be disabled by adding
+--disable-pcre2grep-callout to the <b>configure</b> command.
+</P>
+<br><a name="SEC16" href="#TOC1">PCRE2GREP OPTIONS FOR COMPRESSED FILE SUPPORT</a><br>
 <P>
 By default, <b>pcre2grep</b> reads all files as plain text. You can build it so
 that it recognizes files whose names end in <b>.gz</b> or <b>.bz2</b>, and reads
@ -355,22 +373,25 @@ to the <b>configure</b> command. These options naturally require that the
 relevant libraries are installed on your system. Configuration will fail if
 they are not.
 </P>
-<br><a name="SEC15" href="#TOC1">PCRE2GREP BUFFER SIZE</a><br>
+<br><a name="SEC17" href="#TOC1">PCRE2GREP BUFFER SIZE</a><br>
 <P>
 <b>pcre2grep</b> uses an internal buffer to hold a "window" on the file it is
 scanning, in order to be able to output "before" and "after" lines when it
-finds a match. The size of the buffer is controlled by a parameter whose
-default value is 20K. The buffer itself is three times this size, but because
-of the way it is used for holding "before" lines, the longest line that is
-guaranteed to be processable is the parameter size. You can change the default
-parameter value by adding, for example,
+finds a match. The starting size of the buffer is controlled by a parameter
+whose default value is 20K. The buffer itself is three times this size, but
+because of the way it is used for holding "before" lines, the longest line that
+is guaranteed to be processable is the parameter size. If a longer line is
+encountered, <b>pcre2grep</b> automatically expands the buffer, up to a
+specified maximum size, whose default is 1M or the starting size, whichever is
+the larger. You can change the default parameter values by adding, for example,
 <pre>
-  --with-pcre2grep-bufsize=50K
+  --with-pcre2grep-bufsize=51200
+  --with-pcre2grep-max-bufsize=2097152
 </pre>
-to the <b>configure</b> command. The caller of \fPpcre2grep\fP can override this
-value by using --buffer-size on the command line..
+to the <b>configure</b> command. The caller of \fPpcre2grep\fP can override
+these values by using --buffer-size and --max-buffer-size on the command line.
 </P>
-<br><a name="SEC16" href="#TOC1">PCRE2TEST OPTION FOR LIBREADLINE SUPPORT</a><br>
+<br><a name="SEC18" href="#TOC1">PCRE2TEST OPTION FOR LIBREADLINE SUPPORT</a><br>
 <P>
 If you add one of
 <pre>
@ -404,7 +425,7 @@ automatically included, you may need to add something like
 </pre>
 immediately before the <b>configure</b> command.
 </P>
-<br><a name="SEC17" href="#TOC1">INCLUDING DEBUGGING CODE</a><br>
+<br><a name="SEC19" href="#TOC1">INCLUDING DEBUGGING CODE</a><br>
 <P>
 If you add
 <pre>
@ -413,7 +434,7 @@ If you add
 to the <b>configure</b> command, additional debugging code is included in the
 build. This feature is intended for use by the PCRE2 maintainers.
 </P>
-<br><a name="SEC18" href="#TOC1">DEBUGGING WITH VALGRIND SUPPORT</a><br>
+<br><a name="SEC20" href="#TOC1">DEBUGGING WITH VALGRIND SUPPORT</a><br>
 <P>
 If you add
 <pre>
@ -423,7 +444,7 @@ to the <b>configure</b> command, PCRE2 will use valgrind annotations to mark
 certain memory regions as unaddressable. This allows it to detect invalid
 memory accesses, and is mostly useful for debugging PCRE2 itself.
 </P>
-<br><a name="SEC19" href="#TOC1">CODE COVERAGE REPORTING</a><br>
+<br><a name="SEC21" href="#TOC1">CODE COVERAGE REPORTING</a><br>
 <P>
 If your C compiler is gcc, you can build a version of PCRE2 that can generate a
 code coverage report for its test suite. To enable this, you must install
@ -480,11 +501,32 @@ This cleans all coverage data including the generated coverage report. For more
 information about code coverage, see the <b>gcov</b> and <b>lcov</b>
 documentation.
 </P>
-<br><a name="SEC20" href="#TOC1">SEE ALSO</a><br>
+<br><a name="SEC22" href="#TOC1">SUPPORT FOR FUZZERS</a><br>
+<P>
+There is a special option for use by people who want to run fuzzing tests on
+PCRE2:
+<pre>
+  --enable-fuzz-support
+</pre>
+At present this applies only to the 8-bit library. If set, it causes an extra
+library called libpcre2-fuzzsupport.a to be built, but not installed. This
+contains a single function called LLVMFuzzerTestOneInput() whose arguments are
+a pointer to a string and the length of the string. When called, this function
+tries to compile the string as a pattern, and if that succeeds, to match it.
+This is done both with no options and with some random options bits that are
+generated from the string. Setting --enable-fuzz-support also causes a binary
+called <b>pcre2fuzzcheck</b> to be created. This is normally run under valgrind
+or used when PCRE2 is compiled with address sanitizing enabled. It calls the
+fuzzing function and outputs information about it is doing. The input strings
+are specified by arguments: if an argument starts with "=" the rest of it is a
+literal input string. Otherwise, it is assumed to be a file name, and the
+contents of the file are the test string.
+</P>
+<br><a name="SEC23" href="#TOC1">SEE ALSO</a><br>
 <P>
 <b>pcre2api</b>(3), <b>pcre2-config</b>(3).
 </P>
-<br><a name="SEC21" href="#TOC1">AUTHOR</a><br>
+<br><a name="SEC24" href="#TOC1">AUTHOR</a><br>
 <P>
 Philip Hazel
 <br>
@ -493,11 +535,11 @@ University Computing Service
 Cambridge, England.
 <br>
 </P>
-<br><a name="SEC22" href="#TOC1">REVISION</a><br>
+<br><a name="SEC25" href="#TOC1">REVISION</a><br>
 <P>
-Last updated: 24 April 2015
+Last updated: 01 November 2016
 <br>
-Copyright &copy; 1997-2015 University of Cambridge.
+Copyright &copy; 1997-2016 University of Cambridge.
 <br>
 <p>
 Return to the <a href="index.html">PCRE2 index page</a>.