Update bundled PCRE2-library to version 10.23

Some manual changes done to the library were lost with this update. They will be added in the next commit.
2017-05-29 15:31:42 +03:00
parent 7231563937
commit 36af74cb25
218 changed files with 49218 additions and 26130 deletions
--- a/pcre2/doc/html/NON-AUTOTOOLS-BUILD.txt
+++ b/pcre2/doc/html/NON-AUTOTOOLS-BUILD.txt
@ -97,6 +97,7 @@ can skip ahead to the CMake section.
       pcre2_context.c
       pcre2_dfa_match.c
       pcre2_error.c
+       pcre2_find_bracket.c
       pcre2_jit_compile.c
       pcre2_maketables.c
       pcre2_match.c
@ -173,7 +174,11 @@ can skip ahead to the CMake section.

 (11) If you want to use the pcre2grep command, compile and link
     src/pcre2grep.c; it uses only the basic 8-bit PCRE2 library (it does not
-     need the pcre2posix library).
+     need the pcre2posix library). If you have built the PCRE2 library with JIT
+     support by defining SUPPORT_JIT in src/config.h, you can also define
+     SUPPORT_PCRE2GREP_JIT, which causes pcre2grep to make use of JIT (unless
+     it is run with --no-jit). If you define SUPPORT_PCRE2GREP_JIT without
+     defining SUPPORT_JIT, pcre2grep does not try to make use of JIT.


 STACK SIZE IN WINDOWS ENVIRONMENTS
@ -388,4 +393,4 @@ and executable, is in EBCDIC and native z/OS file formats and this is the
 recommended download site.

 =============================
-Last Updated: 15 June 2015
+Last Updated: 13 October 2016
--- a/pcre2/doc/html/README.txt
+++ b/pcre2/doc/html/README.txt
@ -44,7 +44,7 @@ wrappers.

 The distribution does contain a set of C wrapper functions for the 8-bit
 library that are based on the POSIX regular expression API (see the pcre2posix
-man page). These can be found in a library called libpcre2posix. Note that this
+man page). These can be found in a library called libpcre2-posix. Note that this
 just provides a POSIX calling interface to PCRE2; the regular expressions
 themselves still follow Perl syntax and semantics. The POSIX API is restricted,
 and does not give full access to all of PCRE2's facilities.
@ -58,8 +58,8 @@ renamed or pointed at by a link.
 If you are using the POSIX interface to PCRE2 and there is already a POSIX
 regex library installed on your system, as well as worrying about the regex.h
 header file (as mentioned above), you must also take care when linking programs
-to ensure that they link with PCRE2's libpcre2posix library. Otherwise they may
-pick up the POSIX functions of the same name from the other library.
+to ensure that they link with PCRE2's libpcre2-posix library. Otherwise they
+may pick up the POSIX functions of the same name from the other library.

 One way of avoiding this confusion is to compile PCRE2 with the addition of
 -Dregcomp=PCRE2regcomp (and similarly for the other POSIX functions) to the
@ -168,15 +168,12 @@ library. They are also documented in the pcre2build man page.
  built. If you want only the 16-bit or 32-bit library, use --disable-pcre2-8
  to disable building the 8-bit library.

-. If you want to include support for just-in-time compiling, which can give
-  large performance improvements on certain platforms, add --enable-jit to the
-  "configure" command. This support is available only for certain hardware
+. If you want to include support for just-in-time (JIT) compiling, which can
+  give large performance improvements on certain platforms, add --enable-jit to
+  the "configure" command. This support is available only for certain hardware
  architectures. If you try to enable it on an unsupported architecture, there
  will be a compile time error.

-. When JIT support is enabled, pcre2grep automatically makes use of it, unless
-  you add --disable-pcre2grep-jit to the "configure" command.
-
 . If you do not want to make use of the support for UTF-8 Unicode character
  strings in the 8-bit library, UTF-16 Unicode character strings in the 16-bit
  library, or UTF-32 Unicode character strings in the 32-bit library, you can
@ -207,19 +204,19 @@ library. They are also documented in the pcre2build man page.
  --enable-newline-is-crlf, --enable-newline-is-anycrlf, or
  --enable-newline-is-any to the "configure" command, respectively.

-  If you specify --enable-newline-is-cr or --enable-newline-is-crlf, some of
-  the standard tests will fail, because the lines in the test files end with
-  LF. Even if the files are edited to change the line endings, there are likely
-  to be some failures. With --enable-newline-is-anycrlf or
-  --enable-newline-is-any, many tests should succeed, but there may be some
-  failures.
-
 . By default, the sequence \R in a pattern matches any Unicode line ending
  sequence. This is independent of the option specifying what PCRE2 considers
  to be the end of a line (see above). However, the caller of PCRE2 can
  restrict \R to match only CR, LF, or CRLF. You can make this the default by
  adding --enable-bsr-anycrlf to the "configure" command (bsr = "backslash R").

+. In a pattern, the escape sequence \C matches a single code unit, even in a
+  UTF mode. This can be dangerous because it breaks up multi-code-unit
+  characters. You can build PCRE2 with the use of \C permanently locked out by
+  adding --enable-never-backslash-C (note the upper case C) to the "configure"
+  command. When \C is allowed by the library, individual applications can lock
+  it out by calling pcre2_compile() with the PCRE2_NEVER_BACKSLASH_C option.
+
 . PCRE2 has a counter that limits the depth of nesting of parentheses in a
  pattern. This limits the amount of system stack that a pattern uses when it
  is compiled. The default is 250, but you can change it by setting, for
@ -249,13 +246,13 @@ library. They are also documented in the pcre2build man page.
  sizes in the pcre2stack man page.

 . In the 8-bit library, the default maximum compiled pattern size is around
-  64K. You can increase this by adding --with-link-size=3 to the "configure"
-  command. PCRE2 then uses three bytes instead of two for offsets to different
-  parts of the compiled pattern. In the 16-bit library, --with-link-size=3 is
-  the same as --with-link-size=4, which (in both libraries) uses four-byte
-  offsets. Increasing the internal link size reduces performance in the 8-bit
-  and 16-bit libraries. In the 32-bit library, the link size setting is
-  ignored, as 4-byte offsets are always used.
+  64K bytes. You can increase this by adding --with-link-size=3 to the
+  "configure" command. PCRE2 then uses three bytes instead of two for offsets
+  to different parts of the compiled pattern. In the 16-bit library,
+  --with-link-size=3 is the same as --with-link-size=4, which (in both
+  libraries) uses four-byte offsets. Increasing the internal link size reduces
+  performance in the 8-bit and 16-bit libraries. In the 32-bit library, the
+  link size setting is ignored, as 4-byte offsets are always used.

 . You can build PCRE2 so that its internal match() function that is called from
  pcre2_match() does not call itself recursively. Instead, it uses memory
@ -317,6 +314,14 @@ library. They are also documented in the pcre2build man page.
  running "make" to build PCRE2. There is more information about coverage
  reporting in the "pcre2build" documentation.

+. When JIT support is enabled, pcre2grep automatically makes use of it, unless
+  you add --disable-pcre2grep-jit to the "configure" command.
+
+. On non-Windows sytems there is support for calling external scripts during
+  matching in the pcre2grep command via PCRE2's callout facility with string
+  arguments. This support can be disabled by adding --disable-pcre2grep-callout
+  to the "configure" command.
+
 . The pcre2grep program currently supports only 8-bit data files, and so
  requires the 8-bit PCRE2 library. It is possible to compile pcre2grep to use
  libz and/or libbz2, in order to read .gz and .bz2 files (respectively), by
@ -327,12 +332,23 @@ library. They are also documented in the pcre2build man page.

  Of course, the relevant libraries must be installed on your system.

-. The default size (in bytes) of the internal buffer used by pcre2grep can be
-  set by, for example:
+. The default starting size (in bytes) of the internal buffer used by pcre2grep
+  can be set by, for example:

  --with-pcre2grep-bufsize=51200

-  The value must be a plain integer. The default is 20480.
+  The value must be a plain integer. The default is 20480. The amount of memory
+  used by pcre2grep is actually three times this number, to allow for "before"
+  and "after" lines. If very long lines are encountered, the buffer is
+  automatically enlarged, up to a fixed maximum size.
+
+. The default maximum size of pcre2grep's internal buffer can be set by, for
+  example:
+
+  --with-pcre2grep-max-bufsize=2097152
+
+  The default is either 1048576 or the value of --with-pcre2grep-bufsize,
+  whichever is the larger.

 . It is possible to compile pcre2test so that it links with the libreadline
  or libedit libraries, by specifying, respectively,
@ -357,6 +373,22 @@ library. They are also documented in the pcre2build man page.
  tgetflag, or tgoto, this is the problem, and linking with the ncurses library
  should fix it.

+. There is a special option called --enable-fuzz-support for use by people who
+  want to run fuzzing tests on PCRE2. At present this applies only to the 8-bit
+  library. If set, it causes an extra library called libpcre2-fuzzsupport.a to
+  be built, but not installed. This contains a single function called
+  LLVMFuzzerTestOneInput() whose arguments are a pointer to a string and the
+  length of the string. When called, this function tries to compile the string
+  as a pattern, and if that succeeds, to match it. This is done both with no
+  options and with some random options bits that are generated from the string.
+  Setting --enable-fuzz-support also causes a binary called pcre2fuzzcheck to
+  be created. This is normally run under valgrind or used when PCRE2 is
+  compiled with address sanitizing enabled. It calls the fuzzing function and
+  outputs information about it is doing. The input strings are specified by
+  arguments: if an argument starts with "=" the rest of it is a literal input
+  string. Otherwise, it is assumed to be a file name, and the contents of the
+  file are the test string.
+
 The "configure" script builds the following files for the basic C library:

 . Makefile             the makefile that builds the library
@ -531,7 +563,7 @@ script creates the .txt and HTML forms of the documentation from the man pages.


 Testing PCRE2
------------
+-------------

 To test the basic PCRE2 library on a Unix-like system, run the RunTest script.
 There is another script called RunGrepTest that tests the pcre2grep command.
@ -724,6 +756,7 @@ The distribution should contain the files listed below.
  src/pcre2_context.c      )
  src/pcre2_dfa_match.c    )
  src/pcre2_error.c        )
+  src/pcre2_find_bracket.c )
  src/pcre2_jit_compile.c  )
  src/pcre2_jit_match.c    ) sources for the functions in the library,
  src/pcre2_jit_misc.c     )   and some internal functions that they use
@ -744,6 +777,7 @@ The distribution should contain the files listed below.
  src/pcre2_xclass.c       )

  src/pcre2_printint.c     debugging function that is used by pcre2test,
+  src/pcre2_fuzzsupport.c  function for (optional) fuzzing support

  src/config.h.in          template for config.h, when built by "configure"
  src/pcre2.h.in           template for pcre2.h when built by "configure"
@ -801,7 +835,7 @@ The distribution should contain the files listed below.
  libpcre2-8.pc.in         template for libpcre2-8.pc for pkg-config
  libpcre2-16.pc.in        template for libpcre2-16.pc for pkg-config
  libpcre2-32.pc.in        template for libpcre2-32.pc for pkg-config
-  libpcre2posix.pc.in      template for libpcre2posix.pc for pkg-config
+  libpcre2-posix.pc.in     template for libpcre2-posix.pc for pkg-config
  ltmain.sh                file used to build a libtool script
  missing                  ) common stub for a few missing GNU programs while
                           )   installing, generated by automake
@ -832,4 +866,4 @@ The distribution should contain the files listed below.
 Philip Hazel
 Email local part: ph10
 Email domain: cam.ac.uk
-Last updated: 24 April 2015
+Last updated: 01 November 2016
--- a/pcre2/doc/html/index.html
+++ b/pcre2/doc/html/index.html
@ -91,6 +91,12 @@ in the library.
 <tr><td><a href="pcre2_callout_enumerate.html">pcre2_callout_enumerate</a></td>
    <td>&nbsp;&nbsp;Enumerate callouts in a compiled pattern</td></tr>

+<tr><td><a href="pcre2_code_copy.html">pcre2_code_copy</a></td>
+    <td>&nbsp;&nbsp;Copy a compiled pattern</td></tr>
+
+<tr><td><a href="pcre2_code_copy_with_tables.html">pcre2_code_copy_with_tables</a></td>
+    <td>&nbsp;&nbsp;Copy a compiled pattern and its character tables</td></tr>
+
 <tr><td><a href="pcre2_code_free.html">pcre2_code_free</a></td>
    <td>&nbsp;&nbsp;Free a compiled pattern</td></tr>

@ -210,9 +216,15 @@ in the library.
 <tr><td><a href="pcre2_set_match_limit.html">pcre2_set_match_limit</a></td>
    <td>&nbsp;&nbsp;Set the match limit</td></tr>

+<tr><td><a href="pcre2_set_max_pattern_length.html">pcre2_set_max_pattern_length</a></td>
+    <td>&nbsp;&nbsp;Set the maximum length of pattern</td></tr>
+
 <tr><td><a href="pcre2_set_newline.html">pcre2_set_newline</a></td>
    <td>&nbsp;&nbsp;Set the newline convention</td></tr>

+<tr><td><a href="pcre2_set_offset_limit.html">pcre2_set_offset_limit</a></td>
+    <td>&nbsp;&nbsp;Set the offset limit</td></tr>
+
 <tr><td><a href="pcre2_set_parens_nest_limit.html">pcre2_set_parens_nest_limit</a></td>
    <td>&nbsp;&nbsp;Set the parentheses nesting limit</td></tr>

--- a/pcre2/doc/html/pcre2.html
+++ b/pcre2/doc/html/pcre2.html
@ -126,8 +126,10 @@ running redundant checks.
 <P>
 The use of the \C escape sequence in a UTF-8 or UTF-16 pattern can lead to
 problems, because it may leave the current matching point in the middle of a
-multi-code-unit character. The PCRE2_NEVER_BACKSLASH_C option can be used to
-lock out the use of \C, causing a compile-time error if it is encountered.
+multi-code-unit character. The PCRE2_NEVER_BACKSLASH_C option can be used by an
+application to lock out the use of \C, causing a compile-time error if it is
+encountered. It is also possible to build PCRE2 with the use of \C permanently
+disabled.
 </P>
 <P>
 Another way that performance can be hit is by running a pattern that has a very
@ -187,7 +189,7 @@ use my two initials, followed by the two digits 10, at the domain cam.ac.uk.
 </P>
 <br><a name="SEC5" href="#TOC1">REVISION</a><br>
 <P>
-Last updated: 13 April 2015
+Last updated: 16 October 2015
 <br>
 Copyright &copy; 1997-2015 University of Cambridge.
 <br>
--- a/pcre2/doc/html/pcre2_code_copy.html
+++ b/pcre2/doc/html/pcre2_code_copy.html
@ -0,0 +1,43 @@
+<html>
+<head>
+<title>pcre2_code_copy specification</title>
+</head>
+<body bgcolor="#FFFFFF" text="#00005A" link="#0066FF" alink="#3399FF" vlink="#2222BB">
+<h1>pcre2_code_copy man page</h1>
+<p>
+Return to the <a href="index.html">PCRE2 index page</a>.
+</p>
+<p>
+This page is part of the PCRE2 HTML documentation. It was generated
+automatically from the original man page. If there is any nonsense in it,
+please consult the man page, in case the conversion went wrong.
+<br>
+<br><b>
+SYNOPSIS
+</b><br>
+<P>
+<b>#include &#60;pcre2.h&#62;</b>
+</P>
+<P>
+<b>pcre2_code *pcre2_code_copy(const pcre2_code *<i>code</i>);</b>
+</P>
+<br><b>
+DESCRIPTION
+</b><br>
+<P>
+This function makes a copy of the memory used for a compiled pattern, excluding
+any memory used by the JIT compiler. Without a subsequent call to
+<b>pcre2_jit_compile()</b>, the copy can be used only for non-JIT matching. The
+pointer to the character tables is copied, not the tables themselves (see
+<b>pcre2_code_copy_with_tables()</b>). The yield of the function is NULL if
+<i>code</i> is NULL or if sufficient memory cannot be obtained.
+</P>
+<P>
+There is a complete description of the PCRE2 native API in the
+<a href="pcre2api.html"><b>pcre2api</b></a>
+page and a description of the POSIX API in the
+<a href="pcre2posix.html"><b>pcre2posix</b></a>
+page.
+<p>
+Return to the <a href="index.html">PCRE2 index page</a>.
+</p>
--- a/pcre2/doc/html/pcre2_code_copy_with_tables.html
+++ b/pcre2/doc/html/pcre2_code_copy_with_tables.html
@ -0,0 +1,44 @@
+<html>
+<head>
+<title>pcre2_code_copy_with_tables specification</title>
+</head>
+<body bgcolor="#FFFFFF" text="#00005A" link="#0066FF" alink="#3399FF" vlink="#2222BB">
+<h1>pcre2_code_copy_with_tables man page</h1>
+<p>
+Return to the <a href="index.html">PCRE2 index page</a>.
+</p>
+<p>
+This page is part of the PCRE2 HTML documentation. It was generated
+automatically from the original man page. If there is any nonsense in it,
+please consult the man page, in case the conversion went wrong.
+<br>
+<br><b>
+SYNOPSIS
+</b><br>
+<P>
+<b>#include &#60;pcre2.h&#62;</b>
+</P>
+<P>
+<b>pcre2_code *pcre2_code_copy_with_tables(const pcre2_code *<i>code</i>);</b>
+</P>
+<br><b>
+DESCRIPTION
+</b><br>
+<P>
+This function makes a copy of the memory used for a compiled pattern, excluding
+any memory used by the JIT compiler. Without a subsequent call to
+<b>pcre2_jit_compile()</b>, the copy can be used only for non-JIT matching.
+Unlike <b>pcre2_code_copy()</b>, a separate copy of the character tables is also
+made, with the new code pointing to it. This memory will be automatically freed
+when <b>pcre2_code_free()</b> is called. The yield of the function is NULL if
+<i>code</i> is NULL or if sufficient memory cannot be obtained.
+</P>
+<P>
+There is a complete description of the PCRE2 native API in the
+<a href="pcre2api.html"><b>pcre2api</b></a>
+page and a description of the POSIX API in the
+<a href="pcre2posix.html"><b>pcre2posix</b></a>
+page.
+<p>
+Return to the <a href="index.html">PCRE2 index page</a>.
+</p>
--- a/pcre2/doc/html/pcre2_code_free.html
+++ b/pcre2/doc/html/pcre2_code_free.html
@ -19,7 +19,7 @@ SYNOPSIS
 <b>#include &#60;pcre2.h&#62;</b>
 </P>
 <P>
-<b>pcre2_code_free(pcre2_code *<i>code</i>);</b>
+<b>void pcre2_code_free(pcre2_code *<i>code</i>);</b>
 </P>
 <br><b>
 DESCRIPTION
--- a/pcre2/doc/html/pcre2_dfa_match.html
+++ b/pcre2/doc/html/pcre2_dfa_match.html
@ -45,8 +45,8 @@ is <b>pcre2_match()</b>.) The arguments for this function are:
  <i>wscount</i>      Number of elements in the vector
 </pre>
 For <b>pcre2_dfa_match()</b>, a match context is needed only if you want to set
-up a callout function. The <i>length</i> and <i>startoffset</i> values are code
-units, not characters. The options are:
+up a callout function or specify the recursion limit. The <i>length</i> and
+<i>startoffset</i> values are code units, not characters. The options are:
 <pre>
  PCRE2_ANCHORED          Match only at the first position
  PCRE2_NOTBOL            Subject is not the beginning of a line
--- a/pcre2/doc/html/pcre2_get_error_message.html
+++ b/pcre2/doc/html/pcre2_get_error_message.html
@ -35,7 +35,10 @@ errors are negative numbers. The arguments are:
  <i>bufflen</i>     the length of the buffer (code units)
 </pre>
 The function returns the length of the message, excluding the trailing zero, or
-a negative error code if the buffer is too small.
+the negative error code PCRE2_ERROR_NOMEMORY if the buffer is too small. In
+this case, the returned message is truncated (but still with a trailing zero).
+If <i>errorcode</i> does not contain a recognized error code number, the
+negative value PCRE2_ERROR_BADDATA is returned.
 </P>
 <P>
 There is a complete description of the PCRE2 native API in the
--- a/pcre2/doc/html/pcre2_match_data_create.html
+++ b/pcre2/doc/html/pcre2_match_data_create.html
@ -19,7 +19,7 @@ SYNOPSIS
 <b>#include &#60;pcre2.h&#62;</b>
 </P>
 <P>
-<b>pcre2_match_data_create(uint32_t <i>ovecsize</i>,</b>
+<b>pcre2_match_data *pcre2_match_data_create(uint32_t <i>ovecsize</i>,</b>
 <b>  pcre2_general_context *<i>gcontext</i>);</b>
 </P>
 <br><b>
--- a/pcre2/doc/html/pcre2_match_data_create_from_pattern.html
+++ b/pcre2/doc/html/pcre2_match_data_create_from_pattern.html
@ -19,8 +19,8 @@ SYNOPSIS
 <b>#include &#60;pcre2.h&#62;</b>
 </P>
 <P>
-<b>pcre2_match_data_create_from_pattern(const pcre2_code *<i>code</i>,</b>
-<b>  pcre2_general_context *<i>gcontext</i>);</b>
+<b>pcre2_match_data *pcre2_match_data_create_from_pattern(</b>
+<b>  const pcre2_code *<i>code</i>, pcre2_general_context *<i>gcontext</i>);</b>
 </P>
 <br><b>
 DESCRIPTION
--- a/pcre2/doc/html/pcre2_pattern_info.html
+++ b/pcre2/doc/html/pcre2_pattern_info.html
@ -42,19 +42,20 @@ request are as follows:
                               PCRE2_BSR_ANYCRLF: CR, LF, or CRLF only
  PCRE2_INFO_CAPTURECOUNT    Number of capturing subpatterns
  PCRE2_INFO_FIRSTBITMAP     Bitmap of first code units, or NULL
-  PCRE2_INFO_FIRSTCODEUNIT   First code unit when type is 1
  PCRE2_INFO_FIRSTCODETYPE   Type of start-of-match information
                               0 nothing set
                               1 first code unit is set
                               2 start of string or after newline
+  PCRE2_INFO_FIRSTCODEUNIT   First code unit when type is 1
+  PCRE2_INFO_HASBACKSLASHC   Return 1 if pattern contains \C
  PCRE2_INFO_HASCRORLF       Return 1 if explicit CR or LF matches
                               exist in the pattern
  PCRE2_INFO_JCHANGED        Return 1 if (?J) or (?-J) was used
  PCRE2_INFO_JITSIZE         Size of JIT compiled code, or 0
-  PCRE2_INFO_LASTCODEUNIT    Last code unit when type is 1
  PCRE2_INFO_LASTCODETYPE    Type of must-be-present information
                               0 nothing set
                               1 code unit is set
+  PCRE2_INFO_LASTCODEUNIT    Last code unit when type is 1
  PCRE2_INFO_MATCHEMPTY      1 if the pattern can match an
                               empty string, 0 otherwise
  PCRE2_INFO_MATCHLIMIT      Match limit if set,
@ -62,8 +63,8 @@ request are as follows:
  PCRE2_INFO_MAXLOOKBEHIND   Length (in characters) of the longest
                               lookbehind assertion
  PCRE2_INFO_MINLENGTH       Lower bound length of matching strings
-  PCRE2_INFO_NAMEENTRYSIZE   Size of name table entries
  PCRE2_INFO_NAMECOUNT       Number of named subpatterns
+  PCRE2_INFO_NAMEENTRYSIZE   Size of name table entries
  PCRE2_INFO_NAMETABLE       Pointer to name table
  PCRE2_CONFIG_NEWLINE       Code for the newline sequence:
                               PCRE2_NEWLINE_CR
--- a/pcre2/doc/html/pcre2_serialize_decode.html
+++ b/pcre2/doc/html/pcre2_serialize_decode.html
@ -20,7 +20,7 @@ SYNOPSIS
 </P>
 <P>
 <b>int32_t pcre2_serialize_decode(pcre2_code **<i>codes</i>,</b>
-<b>  int32_t <i>number_of_codes</i>, const uint32_t *<i>bytes</i>,</b>
+<b>  int32_t <i>number_of_codes</i>, const uint8_t *<i>bytes</i>,</b>
 <b>  pcre2_general_context *<i>gcontext</i>);</b>
 </P>
 <br><b>
--- a/pcre2/doc/html/pcre2_serialize_encode.html
+++ b/pcre2/doc/html/pcre2_serialize_encode.html
@ -19,8 +19,8 @@ SYNOPSIS
 <b>#include &#60;pcre2.h&#62;</b>
 </P>
 <P>
-<b>int32_t pcre2_serialize_encode(pcre2_code **<i>codes</i>,</b>
-<b>  int32_t <i>number_of_codes</i>, uint32_t **<i>serialized_bytes</i>,</b>
+<b>int32_t pcre2_serialize_encode(const pcre2_code **<i>codes</i>,</b>
+<b>  int32_t <i>number_of_codes</i>, uint8_t **<i>serialized_bytes</i>,</b>
 <b>  PCRE2_SIZE *<i>serialized_size</i>, pcre2_general_context *<i>gcontext</i>);</b>
 </P>
 <br><b>
--- a/pcre2/doc/html/pcre2_set_max_pattern_length.html
+++ b/pcre2/doc/html/pcre2_set_max_pattern_length.html
@ -0,0 +1,43 @@
+<html>
+<head>
+<title>pcre2_set_max_pattern_length specification</title>
+</head>
+<body bgcolor="#FFFFFF" text="#00005A" link="#0066FF" alink="#3399FF" vlink="#2222BB">
+<h1>pcre2_set_max_pattern_length man page</h1>
+<p>
+Return to the <a href="index.html">PCRE2 index page</a>.
+</p>
+<p>
+This page is part of the PCRE2 HTML documentation. It was generated
+automatically from the original man page. If there is any nonsense in it,
+please consult the man page, in case the conversion went wrong.
+<br>
+<br><b>
+SYNOPSIS
+</b><br>
+<P>
+<b>#include &#60;pcre2.h&#62;</b>
+</P>
+<P>
+<b>int pcre2_set_max_pattern_length(pcre2_compile_context *<i>ccontext</i>,</b>
+<b>  PCRE2_SIZE <i>value</i>);</b>
+</P>
+<br><b>
+DESCRIPTION
+</b><br>
+<P>
+This function sets, in a compile context, the maximum text length (in code
+units) of the pattern that can be compiled. The result is always zero. If a
+longer pattern is passed to <b>pcre2_compile()</b> there is an immediate error
+return. The default is effectively unlimited, being the largest value a
+PCRE2_SIZE variable can hold.
+</P>
+<P>
+There is a complete description of the PCRE2 native API in the
+<a href="pcre2api.html"><b>pcre2api</b></a>
+page and a description of the POSIX API in the
+<a href="pcre2posix.html"><b>pcre2posix</b></a>
+page.
+<p>
+Return to the <a href="index.html">PCRE2 index page</a>.
+</p>
--- a/pcre2/doc/html/pcre2_set_offset_limit.html
+++ b/pcre2/doc/html/pcre2_set_offset_limit.html
@ -0,0 +1,40 @@
+<html>
+<head>
+<title>pcre2_set_offset_limit specification</title>
+</head>
+<body bgcolor="#FFFFFF" text="#00005A" link="#0066FF" alink="#3399FF" vlink="#2222BB">
+<h1>pcre2_set_offset_limit man page</h1>
+<p>
+Return to the <a href="index.html">PCRE2 index page</a>.
+</p>
+<p>
+This page is part of the PCRE2 HTML documentation. It was generated
+automatically from the original man page. If there is any nonsense in it,
+please consult the man page, in case the conversion went wrong.
+<br>
+<br><b>
+SYNOPSIS
+</b><br>
+<P>
+<b>#include &#60;pcre2.h&#62;</b>
+</P>
+<P>
+<b>int pcre2_set_offset_limit(pcre2_match_context *<i>mcontext</i>,</b>
+<b>  PCRE2_SIZE <i>value</i>);</b>
+</P>
+<br><b>
+DESCRIPTION
+</b><br>
+<P>
+This function sets the offset limit field in a match context. The result is
+always zero.
+</P>
+<P>
+There is a complete description of the PCRE2 native API in the
+<a href="pcre2api.html"><b>pcre2api</b></a>
+page and a description of the POSIX API in the
+<a href="pcre2posix.html"><b>pcre2posix</b></a>
+page.
+<p>
+Return to the <a href="index.html">PCRE2 index page</a>.
+</p>
--- a/pcre2/doc/html/pcre2_substitute.html
+++ b/pcre2/doc/html/pcre2_substitute.html
@ -59,20 +59,25 @@ units, not characters, as is the contents of the variable pointed at by
 <i>outlengthptr</i>, which is updated to the actual length of the new string.
 The options are:
 <pre>
-  PCRE2_ANCHORED          Match only at the first position
-  PCRE2_NOTBOL            Subject string is not the beginning of a line
-  PCRE2_NOTEOL            Subject string is not the end of a line
-  PCRE2_NOTEMPTY          An empty string is not a valid match
-  PCRE2_NOTEMPTY_ATSTART  An empty string at the start of the subject
-                           is not a valid match
-  PCRE2_NO_UTF_CHECK      Do not check the subject or replacement for
-                           UTF validity (only relevant if PCRE2_UTF
-                           was set at compile time)
-  PCRE2_SUBSTITUTE_GLOBAL Replace all occurrences in the subject
+  PCRE2_ANCHORED             Match only at the first position
+  PCRE2_NOTBOL               Subject is not the beginning of a line
+  PCRE2_NOTEOL               Subject is not the end of a line
+  PCRE2_NOTEMPTY             An empty string is not a valid match
+  PCRE2_NOTEMPTY_ATSTART     An empty string at the start of the
+                              subject is not a valid match
+  PCRE2_NO_UTF_CHECK         Do not check the subject or replacement
+                              for UTF validity (only relevant if
+                              PCRE2_UTF was set at compile time)
+  PCRE2_SUBSTITUTE_EXTENDED  Do extended replacement processing
+  PCRE2_SUBSTITUTE_GLOBAL    Replace all occurrences in the subject
+  PCRE2_SUBSTITUTE_OVERFLOW_LENGTH  If overflow, compute needed length
+  PCRE2_SUBSTITUTE_UNKNOWN_UNSET  Treat unknown group as unset
+  PCRE2_SUBSTITUTE_UNSET_EMPTY  Simple unset insert = empty string
 </pre>
 The function returns the number of substitutions, which may be zero if there
 were no matches. The result can be greater than one only when
-PCRE2_SUBSTITUTE_GLOBAL is set.
+PCRE2_SUBSTITUTE_GLOBAL is set. In the event of an error, a negative error code
+is returned.
 </P>
 <P>
 There is a complete description of the PCRE2 native API in the
--- a/pcre2/doc/html/pcre2api.html
+++ b/pcre2/doc/html/pcre2api.html
--- a/pcre2/doc/html/pcre2build.html
+++ b/pcre2/doc/html/pcre2build.html
@ -18,23 +18,26 @@ please consult the man page, in case the conversion went wrong.
 <li><a name="TOC3" href="#SEC3">BUILDING 8-BIT, 16-BIT AND 32-BIT LIBRARIES</a>
 <li><a name="TOC4" href="#SEC4">BUILDING SHARED AND STATIC LIBRARIES</a>
 <li><a name="TOC5" href="#SEC5">UNICODE AND UTF SUPPORT</a>
-<li><a name="TOC6" href="#SEC6">JUST-IN-TIME COMPILER SUPPORT</a>
-<li><a name="TOC7" href="#SEC7">NEWLINE RECOGNITION</a>
-<li><a name="TOC8" href="#SEC8">WHAT \R MATCHES</a>
-<li><a name="TOC9" href="#SEC9">HANDLING VERY LARGE PATTERNS</a>
-<li><a name="TOC10" href="#SEC10">AVOIDING EXCESSIVE STACK USAGE</a>
-<li><a name="TOC11" href="#SEC11">LIMITING PCRE2 RESOURCE USAGE</a>
-<li><a name="TOC12" href="#SEC12">CREATING CHARACTER TABLES AT BUILD TIME</a>
-<li><a name="TOC13" href="#SEC13">USING EBCDIC CODE</a>
-<li><a name="TOC14" href="#SEC14">PCRE2GREP OPTIONS FOR COMPRESSED FILE SUPPORT</a>
-<li><a name="TOC15" href="#SEC15">PCRE2GREP BUFFER SIZE</a>
-<li><a name="TOC16" href="#SEC16">PCRE2TEST OPTION FOR LIBREADLINE SUPPORT</a>
-<li><a name="TOC17" href="#SEC17">INCLUDING DEBUGGING CODE</a>
-<li><a name="TOC18" href="#SEC18">DEBUGGING WITH VALGRIND SUPPORT</a>
-<li><a name="TOC19" href="#SEC19">CODE COVERAGE REPORTING</a>
-<li><a name="TOC20" href="#SEC20">SEE ALSO</a>
-<li><a name="TOC21" href="#SEC21">AUTHOR</a>
-<li><a name="TOC22" href="#SEC22">REVISION</a>
+<li><a name="TOC6" href="#SEC6">DISABLING THE USE OF \C</a>
+<li><a name="TOC7" href="#SEC7">JUST-IN-TIME COMPILER SUPPORT</a>
+<li><a name="TOC8" href="#SEC8">NEWLINE RECOGNITION</a>
+<li><a name="TOC9" href="#SEC9">WHAT \R MATCHES</a>
+<li><a name="TOC10" href="#SEC10">HANDLING VERY LARGE PATTERNS</a>
+<li><a name="TOC11" href="#SEC11">AVOIDING EXCESSIVE STACK USAGE</a>
+<li><a name="TOC12" href="#SEC12">LIMITING PCRE2 RESOURCE USAGE</a>
+<li><a name="TOC13" href="#SEC13">CREATING CHARACTER TABLES AT BUILD TIME</a>
+<li><a name="TOC14" href="#SEC14">USING EBCDIC CODE</a>
+<li><a name="TOC15" href="#SEC15">PCRE2GREP SUPPORT FOR EXTERNAL SCRIPTS</a>
+<li><a name="TOC16" href="#SEC16">PCRE2GREP OPTIONS FOR COMPRESSED FILE SUPPORT</a>
+<li><a name="TOC17" href="#SEC17">PCRE2GREP BUFFER SIZE</a>
+<li><a name="TOC18" href="#SEC18">PCRE2TEST OPTION FOR LIBREADLINE SUPPORT</a>
+<li><a name="TOC19" href="#SEC19">INCLUDING DEBUGGING CODE</a>
+<li><a name="TOC20" href="#SEC20">DEBUGGING WITH VALGRIND SUPPORT</a>
+<li><a name="TOC21" href="#SEC21">CODE COVERAGE REPORTING</a>
+<li><a name="TOC22" href="#SEC22">SUPPORT FOR FUZZERS</a>
+<li><a name="TOC23" href="#SEC23">SEE ALSO</a>
+<li><a name="TOC24" href="#SEC24">AUTHOR</a>
+<li><a name="TOC25" href="#SEC25">REVISION</a>
 </ul>
 <br><a name="SEC1" href="#TOC1">BUILDING PCRE2</a><br>
 <P>
@ -148,13 +151,19 @@ properties. The application can request that they do by setting the PCRE2_UCP
 option. Unless the application has set PCRE2_NEVER_UCP, a pattern may also
 request this by starting with (*UCP).
 </P>
+<br><a name="SEC6" href="#TOC1">DISABLING THE USE OF \C</a><br>
 <P>
 The \C escape sequence, which matches a single code unit, even in a UTF mode,
 can cause unpredictable behaviour because it may leave the current matching
-point in the middle of a multi-code-unit character. It can be locked out by
-setting the PCRE2_NEVER_BACKSLASH_C option.
+point in the middle of a multi-code-unit character. The application can lock it
+out by setting the PCRE2_NEVER_BACKSLASH_C option when calling
+<b>pcre2_compile()</b>. There is also a build-time option
+<pre>
+  --enable-never-backslash-C
+</pre>
+(note the upper case C) which locks out the use of \C entirely.
 </P>
-<br><a name="SEC6" href="#TOC1">JUST-IN-TIME COMPILER SUPPORT</a><br>
+<br><a name="SEC7" href="#TOC1">JUST-IN-TIME COMPILER SUPPORT</a><br>
 <P>
 Just-in-time compiler support is included in the build by specifying
 <pre>
@ -171,7 +180,7 @@ pcre2grep automatically makes use of it, unless you add
 </pre>
 to the "configure" command.
 </P>
-<br><a name="SEC7" href="#TOC1">NEWLINE RECOGNITION</a><br>
+<br><a name="SEC8" href="#TOC1">NEWLINE RECOGNITION</a><br>
 <P>
 By default, PCRE2 interprets the linefeed (LF) character as indicating the end
 of a line. This is the normal newline character on Unix-like systems. You can
@ -208,7 +217,7 @@ Whatever default line ending convention is selected when PCRE2 is built can be
 overridden by applications that use the library. At build time it is
 conventional to use the standard for your operating system.
 </P>
-<br><a name="SEC8" href="#TOC1">WHAT \R MATCHES</a><br>
+<br><a name="SEC9" href="#TOC1">WHAT \R MATCHES</a><br>
 <P>
 By default, the sequence \R in a pattern matches any Unicode newline sequence,
 independently of what has been selected as the line ending sequence. If you
@ -220,7 +229,7 @@ the default is changed so that \R matches only CR, LF, or CRLF. Whatever is
 selected when PCRE2 is built can be overridden by applications that use the
 called.
 </P>
-<br><a name="SEC9" href="#TOC1">HANDLING VERY LARGE PATTERNS</a><br>
+<br><a name="SEC10" href="#TOC1">HANDLING VERY LARGE PATTERNS</a><br>
 <P>
 Within a compiled pattern, offset values are used to point from one part to
 another (for example, from an opening parenthesis to an alternation
@ -239,7 +248,7 @@ longer offsets slows down the operation of PCRE2 because it has to load
 additional data when handling them. For the 32-bit library the value is always
 4 and cannot be overridden; the value of --with-link-size is ignored.
 </P>
-<br><a name="SEC10" href="#TOC1">AVOIDING EXCESSIVE STACK USAGE</a><br>
+<br><a name="SEC11" href="#TOC1">AVOIDING EXCESSIVE STACK USAGE</a><br>
 <P>
 When matching with the <b>pcre2_match()</b> function, PCRE2 implements
 backtracking by making recursive calls to an internal function called
@ -261,7 +270,7 @@ custom memory management functions can be called instead. PCRE2 runs noticeably
 more slowly when built in this way. This option affects only the
 <b>pcre2_match()</b> function; it is not relevant for <b>pcre2_dfa_match()</b>.
 </P>
-<br><a name="SEC11" href="#TOC1">LIMITING PCRE2 RESOURCE USAGE</a><br>
+<br><a name="SEC12" href="#TOC1">LIMITING PCRE2 RESOURCE USAGE</a><br>
 <P>
 Internally, PCRE2 has a function called <b>match()</b>, which it calls
 repeatedly (sometimes recursively) when matching a pattern with the
@ -290,7 +299,7 @@ constraints. However, you can set a lower limit by adding, for example,
 </pre>
 to the <b>configure</b> command. This value can also be overridden at run time.
 </P>
-<br><a name="SEC12" href="#TOC1">CREATING CHARACTER TABLES AT BUILD TIME</a><br>
+<br><a name="SEC13" href="#TOC1">CREATING CHARACTER TABLES AT BUILD TIME</a><br>
 <P>
 PCRE2 uses fixed tables for processing characters whose code points are less
 than 256. By default, PCRE2 is built with a set of tables that are distributed
@ -307,7 +316,7 @@ compiling, because <b>dftables</b> is run on the local host. If you need to
 create alternative tables when cross compiling, you will have to do so "by
 hand".)
 </P>
-<br><a name="SEC13" href="#TOC1">USING EBCDIC CODE</a><br>
+<br><a name="SEC14" href="#TOC1">USING EBCDIC CODE</a><br>
 <P>
 PCRE2 assumes by default that it will run in an environment where the character
 code is ASCII or Unicode, which is a superset of ASCII. This is the case for
@ -342,7 +351,16 @@ The options that select newline behaviour, such as --enable-newline-is-cr,
 and equivalent run-time options, refer to these character values in an EBCDIC
 environment.
 </P>
-<br><a name="SEC14" href="#TOC1">PCRE2GREP OPTIONS FOR COMPRESSED FILE SUPPORT</a><br>
+<br><a name="SEC15" href="#TOC1">PCRE2GREP SUPPORT FOR EXTERNAL SCRIPTS</a><br>
+<P>
+By default, on non-Windows systems, <b>pcre2grep</b> supports the use of
+callouts with string arguments within the patterns it is matching, in order to
+run external scripts. For details, see the
+<a href="pcre2grep.html"><b>pcre2grep</b></a>
+documentation. This support can be disabled by adding
+--disable-pcre2grep-callout to the <b>configure</b> command.
+</P>
+<br><a name="SEC16" href="#TOC1">PCRE2GREP OPTIONS FOR COMPRESSED FILE SUPPORT</a><br>
 <P>
 By default, <b>pcre2grep</b> reads all files as plain text. You can build it so
 that it recognizes files whose names end in <b>.gz</b> or <b>.bz2</b>, and reads
@ -355,22 +373,25 @@ to the <b>configure</b> command. These options naturally require that the
 relevant libraries are installed on your system. Configuration will fail if
 they are not.
 </P>
-<br><a name="SEC15" href="#TOC1">PCRE2GREP BUFFER SIZE</a><br>
+<br><a name="SEC17" href="#TOC1">PCRE2GREP BUFFER SIZE</a><br>
 <P>
 <b>pcre2grep</b> uses an internal buffer to hold a "window" on the file it is
 scanning, in order to be able to output "before" and "after" lines when it
-finds a match. The size of the buffer is controlled by a parameter whose
-default value is 20K. The buffer itself is three times this size, but because
-of the way it is used for holding "before" lines, the longest line that is
-guaranteed to be processable is the parameter size. You can change the default
-parameter value by adding, for example,
+finds a match. The starting size of the buffer is controlled by a parameter
+whose default value is 20K. The buffer itself is three times this size, but
+because of the way it is used for holding "before" lines, the longest line that
+is guaranteed to be processable is the parameter size. If a longer line is
+encountered, <b>pcre2grep</b> automatically expands the buffer, up to a
+specified maximum size, whose default is 1M or the starting size, whichever is
+the larger. You can change the default parameter values by adding, for example,
 <pre>
-  --with-pcre2grep-bufsize=50K
+  --with-pcre2grep-bufsize=51200
+  --with-pcre2grep-max-bufsize=2097152
 </pre>
-to the <b>configure</b> command. The caller of \fPpcre2grep\fP can override this
-value by using --buffer-size on the command line..
+to the <b>configure</b> command. The caller of \fPpcre2grep\fP can override
+these values by using --buffer-size and --max-buffer-size on the command line.
 </P>
-<br><a name="SEC16" href="#TOC1">PCRE2TEST OPTION FOR LIBREADLINE SUPPORT</a><br>
+<br><a name="SEC18" href="#TOC1">PCRE2TEST OPTION FOR LIBREADLINE SUPPORT</a><br>
 <P>
 If you add one of
 <pre>
@ -404,7 +425,7 @@ automatically included, you may need to add something like
 </pre>
 immediately before the <b>configure</b> command.
 </P>
-<br><a name="SEC17" href="#TOC1">INCLUDING DEBUGGING CODE</a><br>
+<br><a name="SEC19" href="#TOC1">INCLUDING DEBUGGING CODE</a><br>
 <P>
 If you add
 <pre>
@ -413,7 +434,7 @@ If you add
 to the <b>configure</b> command, additional debugging code is included in the
 build. This feature is intended for use by the PCRE2 maintainers.
 </P>
-<br><a name="SEC18" href="#TOC1">DEBUGGING WITH VALGRIND SUPPORT</a><br>
+<br><a name="SEC20" href="#TOC1">DEBUGGING WITH VALGRIND SUPPORT</a><br>
 <P>
 If you add
 <pre>
@ -423,7 +444,7 @@ to the <b>configure</b> command, PCRE2 will use valgrind annotations to mark
 certain memory regions as unaddressable. This allows it to detect invalid
 memory accesses, and is mostly useful for debugging PCRE2 itself.
 </P>
-<br><a name="SEC19" href="#TOC1">CODE COVERAGE REPORTING</a><br>
+<br><a name="SEC21" href="#TOC1">CODE COVERAGE REPORTING</a><br>
 <P>
 If your C compiler is gcc, you can build a version of PCRE2 that can generate a
 code coverage report for its test suite. To enable this, you must install
@ -480,11 +501,32 @@ This cleans all coverage data including the generated coverage report. For more
 information about code coverage, see the <b>gcov</b> and <b>lcov</b>
 documentation.
 </P>
-<br><a name="SEC20" href="#TOC1">SEE ALSO</a><br>
+<br><a name="SEC22" href="#TOC1">SUPPORT FOR FUZZERS</a><br>
+<P>
+There is a special option for use by people who want to run fuzzing tests on
+PCRE2:
+<pre>
+  --enable-fuzz-support
+</pre>
+At present this applies only to the 8-bit library. If set, it causes an extra
+library called libpcre2-fuzzsupport.a to be built, but not installed. This
+contains a single function called LLVMFuzzerTestOneInput() whose arguments are
+a pointer to a string and the length of the string. When called, this function
+tries to compile the string as a pattern, and if that succeeds, to match it.
+This is done both with no options and with some random options bits that are
+generated from the string. Setting --enable-fuzz-support also causes a binary
+called <b>pcre2fuzzcheck</b> to be created. This is normally run under valgrind
+or used when PCRE2 is compiled with address sanitizing enabled. It calls the
+fuzzing function and outputs information about it is doing. The input strings
+are specified by arguments: if an argument starts with "=" the rest of it is a
+literal input string. Otherwise, it is assumed to be a file name, and the
+contents of the file are the test string.
+</P>
+<br><a name="SEC23" href="#TOC1">SEE ALSO</a><br>
 <P>
 <b>pcre2api</b>(3), <b>pcre2-config</b>(3).
 </P>
-<br><a name="SEC21" href="#TOC1">AUTHOR</a><br>
+<br><a name="SEC24" href="#TOC1">AUTHOR</a><br>
 <P>
 Philip Hazel
 <br>
@ -493,11 +535,11 @@ University Computing Service
 Cambridge, England.
 <br>
 </P>
-<br><a name="SEC22" href="#TOC1">REVISION</a><br>
+<br><a name="SEC25" href="#TOC1">REVISION</a><br>
 <P>
-Last updated: 24 April 2015
+Last updated: 01 November 2016
 <br>
-Copyright &copy; 1997-2015 University of Cambridge.
+Copyright &copy; 1997-2016 University of Cambridge.
 <br>
 <p>
 Return to the <a href="index.html">PCRE2 index page</a>.
--- a/pcre2/doc/html/pcre2callout.html
+++ b/pcre2/doc/html/pcre2callout.html
@ -57,11 +57,20 @@ two callout points:
 </pre>
 If the PCRE2_AUTO_CALLOUT option bit is set when a pattern is compiled, PCRE2
 automatically inserts callouts, all with number 255, before each item in the
-pattern. For example, if PCRE2_AUTO_CALLOUT is used with the pattern
+pattern except for immediately before or after a callout item in the pattern.
+For example, if PCRE2_AUTO_CALLOUT is used with the pattern
+<pre>
+  A(?C3)B
+</pre>
+it is processed as if it were
+<pre>
+  (?C255)A(?C3)B(?C255)
+</pre>
+Here is a more complicated example:
 <pre>
  A(\d{2}|--)
 </pre>
-it is processed as if it were
+With PCRE2_AUTO_CALLOUT, this pattern is processed as if it were
 <br>
 <br>
 (?C255)A(?C255)((?C255)\d{2}(?C255)|(?C255)-(?C255)-(?C255))(?C255)
@ -107,10 +116,10 @@ with PCRE2_ANCHORED and PCRE2_AUTO_CALLOUT and then applied to the string
  No match
 </pre>
 This indicates that when matching [bc] fails, there is no backtracking into a+
-and therefore the callouts that would be taken for the backtracks do not occur.
-You can disable the auto-possessify feature by passing PCRE2_NO_AUTO_POSSESS to
-<b>pcre2_compile()</b>, or starting the pattern with (*NO_AUTO_POSSESS). In this
-case, the output changes to this:
+(because it is being treated as a++) and therefore the callouts that would be
+taken for the backtracks do not occur. You can disable the auto-possessify
+feature by passing PCRE2_NO_AUTO_POSSESS to <b>pcre2_compile()</b>, or starting
+the pattern with (*NO_AUTO_POSSESS). In this case, the output changes to this:
 <pre>
  ---&#62;aaaa
   +0 ^        a+
@ -235,8 +244,8 @@ Fields for numerical callouts
 <P>
 For a numerical callout, <i>callout_string</i> is NULL, and <i>callout_number</i>
 contains the number of the callout, in the range 0-255. This is the number
-that follows (?C for manual callouts; it is 255 for automatically generated
-callouts.
+that follows (?C for callouts that part of the pattern; it is 255 for
+automatically generated callouts.
 </P>
 <br><b>
 Fields for string callouts
@ -310,10 +319,15 @@ the next item to be matched.
 </P>
 <P>
 The <i>next_item_length</i> field contains the length of the next item to be
-matched in the pattern string. When the callout immediately precedes an
-alternation bar, a closing parenthesis, or the end of the pattern, the length
-is zero. When the callout precedes an opening parenthesis, the length is that
-of the entire subpattern.
+processed in the pattern string. When the callout is at the end of the pattern,
+the length is zero. When the callout precedes an opening parenthesis, the
+length includes meta characters that follow the parenthesis. For example, in a
+callout before an assertion such as (?=ab) the length is 3. For an an
+alternation bar or a closing parenthesis, the length is one, unless a closing
+parenthesis is followed by a quantifier, in which case its length is included.
+(This changed in release 10.23. In earlier releases, before an opening
+parenthesis the length was that of the entire subpattern, and before an
+alternation bar or a closing parenthesis the length was zero.)
 </P>
 <P>
 The <i>pattern_position</i> and <i>next_item_length</i> fields are intended to
@ -399,9 +413,9 @@ Cambridge, England.
 </P>
 <br><a name="SEC8" href="#TOC1">REVISION</a><br>
 <P>
-Last updated: 23 March 2015
+Last updated: 29 September 2016
 <br>
-Copyright &copy; 1997-2015 University of Cambridge.
+Copyright &copy; 1997-2016 University of Cambridge.
 <br>
 <p>
 Return to the <a href="index.html">PCRE2 index page</a>.
--- a/pcre2/doc/html/pcre2compat.html
+++ b/pcre2/doc/html/pcre2compat.html
@ -107,7 +107,7 @@ processed as anchored at the point where they are tested.
 one that is backtracked onto acts. For example, in the pattern
 A(*COMMIT)B(*PRUNE)C a failure in B triggers (*COMMIT), but a failure in C
 triggers (*PRUNE). Perl's behaviour is more complex; in many cases it is the
-same as PCRE2, but there are examples where it differs.
+same as PCRE2, but there are cases where it differs.
 </P>
 <P>
 11. Most backtracking verbs in assertions have their normal actions. They are
@ -123,7 +123,7 @@ the pattern /^(a(b)?)+$/ in Perl leaves $2 unset, but in PCRE2 it is set to
 13. PCRE2's handling of duplicate subpattern numbers and duplicate subpattern
 names is not as general as Perl's. This is a consequence of the fact the PCRE2
 works internally just with numbers, using an external table to translate
-between numbers and names. In particular, a pattern such as (?|(?&#60;a&#62;A)|(?&#60;b)B),
+between numbers and names. In particular, a pattern such as (?|(?&#60;a&#62;A)|(?&#60;b&#62;B),
 where the two capturing parentheses have the same number but different names,
 is not supported, and causes an error at compile time. If it were allowed, it
 would not be possible to distinguish which parentheses matched, because both
@ -131,10 +131,11 @@ names map to capturing subpattern number 1. To avoid this confusing situation,
 an error is given at compile time.
 </P>
 <P>
-14. Perl recognizes comments in some places that PCRE2 does not, for example,
-between the ( and ? at the start of a subpattern. If the /x modifier is set,
-Perl allows white space between ( and ? (though current Perls warn that this is
-deprecated) but PCRE2 never does, even if the PCRE2_EXTENDED option is set.
+14. Perl used to recognize comments in some places that PCRE2 does not, for
+example, between the ( and ? at the start of a subpattern. If the /x modifier
+is set, Perl allowed white space between ( and ? though the latest Perls give
+an error (for a while it was just deprecated). There may still be some cases
+where Perl behaves differently.
 </P>
 <P>
 15. Perl, when in warning mode, gives warnings for character classes such as
@ -161,42 +162,47 @@ each alternative branch of a lookbehind assertion can match a different length
 of string. Perl requires them all to have the same length.
 <br>
 <br>
-(b) If PCRE2_DOLLAR_ENDONLY is set and PCRE2_MULTILINE is not set, the $
+(b) From PCRE2 10.23, back references to groups of fixed length are supported
+in lookbehinds, provided that there is no possibility of referencing a
+non-unique number or name. Perl does not support backreferences in lookbehinds.
+<br>
+<br>
+(c) If PCRE2_DOLLAR_ENDONLY is set and PCRE2_MULTILINE is not set, the $
 meta-character matches only at the very end of the string.
 <br>
 <br>
-(c) A backslash followed by a letter with no special meaning is faulted. (Perl
+(d) A backslash followed by a letter with no special meaning is faulted. (Perl
 can be made to issue a warning.)
 <br>
 <br>
-(d) If PCRE2_UNGREEDY is set, the greediness of the repetition quantifiers is
+(e) If PCRE2_UNGREEDY is set, the greediness of the repetition quantifiers is
 inverted, that is, by default they are not greedy, but if followed by a
 question mark they are.
 <br>
 <br>
-(e) PCRE2_ANCHORED can be used at matching time to force a pattern to be tried
+(f) PCRE2_ANCHORED can be used at matching time to force a pattern to be tried
 only at the first matching position in the subject string.
 <br>
 <br>
-(f) The PCRE2_NOTBOL, PCRE2_NOTEOL, PCRE2_NOTEMPTY, PCRE2_NOTEMPTY_ATSTART, and
+(g) The PCRE2_NOTBOL, PCRE2_NOTEOL, PCRE2_NOTEMPTY, PCRE2_NOTEMPTY_ATSTART, and
 PCRE2_NO_AUTO_CAPTURE options have no Perl equivalents.
 <br>
 <br>
-(g) The \R escape sequence can be restricted to match only CR, LF, or CRLF
+(h) The \R escape sequence can be restricted to match only CR, LF, or CRLF
 by the PCRE2_BSR_ANYCRLF option.
 <br>
 <br>
-(h) The callout facility is PCRE2-specific.
+(i) The callout facility is PCRE2-specific.
 <br>
 <br>
-(i) The partial matching facility is PCRE2-specific.
+(j) The partial matching facility is PCRE2-specific.
 <br>
 <br>
-(j) The alternative matching function (<b>pcre2_dfa_match()</b> matches in a
+(k) The alternative matching function (<b>pcre2_dfa_match()</b> matches in a
 different way and is not Perl-compatible.
 <br>
 <br>
-(k) PCRE2 recognizes some special sequences such as (*CR) at the start of
+(l) PCRE2 recognizes some special sequences such as (*CR) at the start of
 a pattern that set overall options that cannot be changed within the pattern.
 </P>
 <br><b>
@ -214,9 +220,9 @@ Cambridge, England.
 REVISION
 </b><br>
 <P>
-Last updated: 15 March 2015
+Last updated: 18 October 2016
 <br>
-Copyright &copy; 1997-2015 University of Cambridge.
+Copyright &copy; 1997-2016 University of Cambridge.
 <br>
 <p>
 Return to the <a href="index.html">PCRE2 index page</a>.
--- a/pcre2/doc/html/pcre2demo.html
+++ b/pcre2/doc/html/pcre2demo.html
@ -20,28 +20,31 @@ please consult the man page, in case the conversion went wrong.
 *************************************************/

 /* This is a demonstration program to illustrate a straightforward way of
-calling the PCRE2 regular expression library from a C program. See the
+using the PCRE2 regular expression library from a C program. See the
 pcre2sample documentation for a short discussion ("man pcre2sample" if you have
 the PCRE2 man pages installed). PCRE2 is a revised API for the library, and is
 incompatible with the original PCRE API.

 There are actually three libraries, each supporting a different code unit
-width. This demonstration program uses the 8-bit library.
+width. This demonstration program uses the 8-bit library. The default is to
+process each code unit as a separate character, but if the pattern begins with
+"(*UTF)", both it and the subject are treated as UTF-8 strings, where
+characters may occupy multiple code units.

 In Unix-like environments, if PCRE2 is installed in your standard system
 libraries, you should be able to compile this program using this command:

-gcc -Wall pcre2demo.c -lpcre2-8 -o pcre2demo
+cc -Wall pcre2demo.c -lpcre2-8 -o pcre2demo

 If PCRE2 is not installed in a standard place, it is likely to be installed
 with support for the pkg-config mechanism. If you have pkg-config, you can
 compile this program using this command:

-gcc -Wall pcre2demo.c `pkg-config --cflags --libs libpcre2-8` -o pcre2demo
+cc -Wall pcre2demo.c `pkg-config --cflags --libs libpcre2-8` -o pcre2demo

-If you do not have pkg-config, you may have to use this:
+If you do not have pkg-config, you may have to use something like this:

-gcc -Wall pcre2demo.c -I/usr/local/include -L/usr/local/lib \
+cc -Wall pcre2demo.c -I/usr/local/include -L/usr/local/lib \
  -R/usr/local/lib -lpcre2-8 -o pcre2demo

 Replace "/usr/local/include" and "/usr/local/lib" with wherever the include and
@ -56,9 +59,14 @@ the following line. */

 /* #define PCRE2_STATIC */

-/* This macro must be defined before including pcre2.h. For a program that uses
-only one code unit width, it makes it possible to use generic function names
-such as pcre2_compile(). */
+/* The PCRE2_CODE_UNIT_WIDTH macro must be defined before including pcre2.h.
+For a program that uses only one code unit width, setting it to 8, 16, or 32
+makes it possible to use generic function names such as pcre2_compile(). Note
+that just changing 8 to 16 (for example) is not sufficient to convert this
+program to process 16-bit characters. Even in a fully 16-bit environment, where
+string-handling functions such as strcmp() and printf() work with 16-bit
+characters, the code for handling the table of named substrings will still need
+to be modified. */

 #define PCRE2_CODE_UNIT_WIDTH 8

@ -79,19 +87,19 @@ int main(int argc, char **argv)
 {
 pcre2_code *re;
 PCRE2_SPTR pattern;     /* PCRE2_SPTR is a pointer to unsigned code units of */
-PCRE2_SPTR subject;     /* the appropriate width (8, 16, or 32 bits). */
+PCRE2_SPTR subject;     /* the appropriate width (in this case, 8 bits). */
 PCRE2_SPTR name_table;

 int crlf_is_newline;
 int errornumber;
 int find_all;
 int i;
-int namecount;
-int name_entry_size;
 int rc;
 int utf8;

 uint32_t option_bits;
+uint32_t namecount;
+uint32_t name_entry_size;
 uint32_t newline;

 PCRE2_SIZE erroroffset;
@ -106,15 +114,19 @@ pcre2_match_data *match_data;
 * First, sort out the command line. There is only one possible option at  *
 * the moment, "-g" to request repeated matching to find all occurrences,  *
 * like Perl's /g option. We set the variable find_all to a non-zero value *
-* if the -g option is present. Apart from that, there must be exactly two *
-* arguments.                                                              *
+* if the -g option is present.                                            *
 **************************************************************************/

 find_all = 0;
 for (i = 1; i &lt; argc; i++)
  {
  if (strcmp(argv[i], "-g") == 0) find_all = 1;
-    else break;
+  else if (argv[i][0] == '-')
+    {
+    printf("Unrecognised option %s\n", argv[i]);
+    return 1;
+    }
+  else break;
  }

 /* After the options, we require exactly two arguments, which are the pattern,
@ -122,7 +134,7 @@ and the subject string. */

 if (argc - i != 2)
  {
-  printf("Two arguments required: a regex and a subject string\n");
+  printf("Exactly two arguments required: a regex and a subject string\n");
  return 1;
  }

@ -201,7 +213,7 @@ if (rc &lt; 0)
 stored. */

 ovector = pcre2_get_ovector_pointer(match_data);
-printf("\nMatch succeeded at offset %d\n", (int)ovector[0]);
+printf("Match succeeded at offset %d\n", (int)ovector[0]);


 /*************************************************************************
@ -242,7 +254,7 @@ we have to extract the count of named parentheses from the pattern. */
  PCRE2_INFO_NAMECOUNT, /* get the number of named substrings */
  &amp;namecount);          /* where to put the answer */

-if (namecount &lt;= 0) printf("No named substrings\n"); else
+if (namecount == 0) printf("No named substrings\n"); else
  {
  PCRE2_SPTR tabptr;
  printf("Named substrings\n");
@ -330,8 +342,8 @@ crlf_is_newline = newline == PCRE2_NEWLINE_ANY ||

 for (;;)
  {
-  uint32_t options = 0;                    /* Normally no options */
-  PCRE2_SIZE start_offset = ovector[1];  /* Start at end of previous match */
+  uint32_t options = 0;                   /* Normally no options */
+  PCRE2_SIZE start_offset = ovector[1];   /* Start at end of previous match */

  /* If the previous match was for an empty string, we are finished if we are
  at the end of the subject. Otherwise, arrange to run another match at the
@ -371,7 +383,7 @@ for (;;)
    {
    if (options == 0) break;                    /* All matches found */
    ovector[1] = start_offset + 1;              /* Advance one code unit */
-    if (crlf_is_newline &amp;&amp;                      /* If CRLF is newline &amp; */
+    if (crlf_is_newline &amp;&amp;                      /* If CRLF is a newline &amp; */
        start_offset &lt; subject_length - 1 &amp;&amp;    /* we are at CRLF, */
        subject[start_offset] == '\r' &amp;&amp;
        subject[start_offset + 1] == '\n')
@ -417,7 +429,7 @@ for (;;)
    printf("%2d: %.*s\n", i, (int)substring_length, (char *)substring_start);
    }

-  if (namecount &lt;= 0) printf("No named substrings\n"); else
+  if (namecount == 0) printf("No named substrings\n"); else
    {
    PCRE2_SPTR tabptr = name_table;
    printf("Named substrings\n");
--- a/pcre2/doc/html/pcre2grep.html
+++ b/pcre2/doc/html/pcre2grep.html
@ -22,11 +22,12 @@ please consult the man page, in case the conversion went wrong.
 <li><a name="TOC7" href="#SEC7">NEWLINES</a>
 <li><a name="TOC8" href="#SEC8">OPTIONS COMPATIBILITY</a>
 <li><a name="TOC9" href="#SEC9">OPTIONS WITH DATA</a>
-<li><a name="TOC10" href="#SEC10">MATCHING ERRORS</a>
-<li><a name="TOC11" href="#SEC11">DIAGNOSTICS</a>
-<li><a name="TOC12" href="#SEC12">SEE ALSO</a>
-<li><a name="TOC13" href="#SEC13">AUTHOR</a>
-<li><a name="TOC14" href="#SEC14">REVISION</a>
+<li><a name="TOC10" href="#SEC10">CALLING EXTERNAL SCRIPTS</a>
+<li><a name="TOC11" href="#SEC11">MATCHING ERRORS</a>
+<li><a name="TOC12" href="#SEC12">DIAGNOSTICS</a>
+<li><a name="TOC13" href="#SEC13">SEE ALSO</a>
+<li><a name="TOC14" href="#SEC14">AUTHOR</a>
+<li><a name="TOC15" href="#SEC15">REVISION</a>
 </ul>
 <br><a name="SEC1" href="#TOC1">SYNOPSIS</a><br>
 <P>
@ -79,11 +80,19 @@ span line boundaries. What defines a line boundary is controlled by the
 </P>
 <P>
 The amount of memory used for buffering files that are being scanned is
-controlled by a parameter that can be set by the <b>--buffer-size</b> option.
-The default value for this parameter is specified when <b>pcre2grep</b> is
-built, with the default default being 20K. A block of memory three times this
-size is used (to allow for buffering "before" and "after" lines). An error
-occurs if a line overflows the buffer.
+controlled by parameters that can be set by the <b>--buffer-size</b> and
+<b>--max-buffer-size</b> options. The first of these sets the size of buffer
+that is obtained at the start of processing. If an input file contains very
+long lines, a larger buffer may be needed; this is handled by automatically
+extending the buffer, up to the limit specified by <b>--max-buffer-size</b>. The
+default values for these parameters are specified when <b>pcre2grep</b> is
+built, with the default defaults being 20K and 1M respectively. An error occurs
+if a line is too long and the buffer can no longer be expanded.
+</P>
+<P>
+The block of memory that is actually used is three times the "buffer size", to
+allow for buffering "before" and "after" lines. If the buffer size is too
+small, fewer than requested "before" and "after" lines may be output.
 </P>
 <P>
 Patterns can be no longer than 8K or BUFSIZ bytes, whichever is the greater.
@ -154,12 +163,13 @@ processing of patterns and file names that start with hyphens.
 </P>
 <P>
 <b>-A</b> <i>number</i>, <b>--after-context=</b><i>number</i>
-Output <i>number</i> lines of context after each matching line. If file names
-and/or line numbers are being output, a hyphen separator is used instead of a
-colon for the context lines. A line containing "--" is output between each
-group of lines, unless they are in fact contiguous in the input file. The value
-of <i>number</i> is expected to be relatively small. However, <b>pcre2grep</b>
-guarantees to have up to 8K of following text available for context output.
+Output up to <i>number</i> lines of context after each matching line. Fewer
+lines are output if the next match or the end of the file is reached, or if the
+processing buffer size has been set too small. If file names and/or line
+numbers are being output, a hyphen separator is used instead of a colon for the
+context lines. A line containing "--" is output between each group of lines,
+unless they are in fact contiguous in the input file. The value of <i>number</i>
+is expected to be relatively small. When <b>-c</b> is used, <b>-A</b> is ignored.
 </P>
 <P>
 <b>-a</b>, <b>--text</b>
@ -168,12 +178,14 @@ Treat binary files as text. This is equivalent to
 </P>
 <P>
 <b>-B</b> <i>number</i>, <b>--before-context=</b><i>number</i>
-Output <i>number</i> lines of context before each matching line. If file names
-and/or line numbers are being output, a hyphen separator is used instead of a
-colon for the context lines. A line containing "--" is output between each
-group of lines, unless they are in fact contiguous in the input file. The value
-of <i>number</i> is expected to be relatively small. However, <b>pcre2grep</b>
-guarantees to have up to 8K of preceding text available for context output.
+Output up to <i>number</i> lines of context before each matching line. Fewer
+lines are output if the previous match or the start of the file is within
+<i>number</i> lines, or if the processing buffer size has been set too small. If
+file names and/or line numbers are being output, a hyphen separator is used
+instead of a colon for the context lines. A line containing "--" is output
+between each group of lines, unless they are in fact contiguous in the input
+file. The value of <i>number</i> is expected to be relatively small. When
+<b>-c</b> is used, <b>-B</b> is ignored.
 </P>
 <P>
 <b>--binary-files=</b><i>word</i>
@ -190,8 +202,9 @@ return code.
 </P>
 <P>
 <b>--buffer-size=</b><i>number</i>
-Set the parameter that controls how much memory is used for buffering files
-that are being scanned.
+Set the parameter that controls how much memory is obtained at the start of
+processing for buffering files that are being scanned. See also
+<b>--max-buffer-size</b> below.
 </P>
 <P>
 <b>-C</b> <i>number</i>, <b>--context=</b><i>number</i>
@ -201,14 +214,16 @@ This is equivalent to setting both <b>-A</b> and <b>-B</b> to the same value.
 <P>
 <b>-c</b>, <b>--count</b>
 Do not output lines from the files that are being scanned; instead output the
-number of matches (or non-matches if <b>-v</b> is used) that would otherwise
-have caused lines to be shown. By default, this count is the same as the number
-of suppressed lines, but if the <b>-M</b> (multiline) option is used (without
-<b>-v</b>), there may be more suppressed lines than the number of matches.
+number of lines that would have been shown, either because they matched, or, if
+<b>-v</b> is set, because they failed to match. By default, this count is
+exactly the same as the number of lines that would have been output, but if the
+<b>-M</b> (multiline) option is used (without <b>-v</b>), there may be more
+suppressed lines than the count (that is, the number of matches).
 <br>
 <br>
 If no lines are selected, the number zero is output. If several files are are
-being scanned, a count is output for each of them. However, if the
+being scanned, a count is output for each of them and the <b>-t</b> option can
+be used to cause a total to be output at the end. However, if the
 <b>--files-with-matches</b> option is also used, only those files whose counts
 are greater than zero are listed. When <b>-c</b> is used, the <b>-A</b>,
 <b>-B</b>, and <b>-C</b> options are ignored.
@ -230,12 +245,23 @@ because <b>pcre2grep</b> has to search for all possible matches in a line, not
 just one, in order to colour them all.
 <br>
 <br>
-The colour that is used can be specified by setting the environment variable
-PCRE2GREP_COLOUR or PCRE2GREP_COLOR. The value of this variable should be a
-string of two numbers, separated by a semicolon. They are copied directly into
-the control string for setting colour on a terminal, so it is your
-responsibility to ensure that they make sense. If neither of the environment
-variables is set, the default is "1;31", which gives red.
+The colour that is used can be specified by setting one of the environment
+variables PCRE2GREP_COLOUR, PCRE2GREP_COLOR, PCREGREP_COLOUR, or
+PCREGREP_COLOR, which are checked in that order. If none of these are set,
+<b>pcre2grep</b> looks for GREP_COLORS or GREP_COLOR (in that order). The value
+of the variable should be a string of two numbers, separated by a semicolon,
+except in the case of GREP_COLORS, which must start with "ms=" or "mt="
+followed by two semicolon-separated colours, terminated by the end of the
+string or by a colon. If GREP_COLORS does not start with "ms=" or "mt=" it is
+ignored, and GREP_COLOR is checked.
+<br>
+<br>
+If the string obtained from one of the above variables contains any characters
+other than semicolon or digits, the setting is ignored and the default colour
+is used. The string is copied directly into the control string for setting
+colour on a terminal, so it is your responsibility to ensure that the values
+make sense. If no relevant environment variable is set, the default is "1;31",
+which gives red.
 </P>
 <P>
 <b>-D</b> <i>action</i>, <b>--devices=</b><i>action</i>
@ -320,18 +346,18 @@ files; it does not apply to patterns specified by any of the <b>--include</b> or
 </P>
 <P>
 <b>-f</b> <i>filename</i>, <b>--file=</b><i>filename</i>
-Read patterns from the file, one per line, and match them against
-each line of input. What constitutes a newline when reading the file is the
-operating system's default. The <b>--newline</b> option has no effect on this
-option. Trailing white space is removed from each line, and blank lines are
-ignored. An empty file contains no patterns and therefore matches nothing. See
-also the comments about multiple patterns versus a single pattern with
-alternatives in the description of <b>-e</b> above.
+Read patterns from the file, one per line, and match them against each line of
+input. What constitutes a newline when reading the file is the operating
+system's default. The <b>--newline</b> option has no effect on this option.
+Trailing white space is removed from each line, and blank lines are ignored. An
+empty file contains no patterns and therefore matches nothing. See also the
+comments about multiple patterns versus a single pattern with alternatives in
+the description of <b>-e</b> above.
 <br>
 <br>
-If this option is given more than once, all the specified files are
-read. A data line is output if any of the patterns match it. A file name can
-be given as "-" to refer to the standard input. When <b>-f</b> is used, patterns
+If this option is given more than once, all the specified files are read. A
+data line is output if any of the patterns match it. A file name can be given
+as "-" to refer to the standard input. When <b>-f</b> is used, patterns
 specified on the command line using <b>-e</b> may also be present; they are
 tested before the file's patterns. However, no other pattern is taken from the
 command line; all arguments are treated as the names of paths to be searched.
@ -501,19 +527,27 @@ There are no short forms for these options. The default settings are specified
 when the PCRE2 library is compiled, with the default default being 10 million.
 </P>
 <P>
+\fB--max-buffer-size=<i>number</i>
+This limits the expansion of the processing buffer, whose initial size can be
+set by <b>--buffer-size</b>. The maximum buffer size is silently forced to be no
+smaller than the starting buffer size.
+</P>
+<P>
 <b>-M</b>, <b>--multiline</b>
-Allow patterns to match more than one line. When this option is given, patterns
-may usefully contain literal newline characters and internal occurrences of ^
-and $ characters. The output for a successful match may consist of more than
-one line. The first is the line in which the match started, and the last is the
-line in which the match ended. If the matched string ends with a newline
-sequence the output ends at the end of that line.
+Allow patterns to match more than one line. When this option is set, the PCRE2
+library is called in "multiline" mode. This allows a matched string to extend
+past the end of a line and continue on one or more subsequent lines. Patterns
+used with <b>-M</b> may usefully contain literal newline characters and internal
+occurrences of ^ and $ characters. The output for a successful match may
+consist of more than one line. The first line is the line in which the match
+started, and the last line is the line in which the match ended. If the matched
+string ends with a newline sequence, the output ends at the end of that line.
+If <b>-v</b> is set, none of the lines in a multi-line match are output. Once a
+match has been handled, scanning restarts at the beginning of the line after
+the one in which the match ended.
 <br>
 <br>
-When this option is set, the PCRE2 library is called in "multiline" mode.
-However, <b>pcre2grep</b> still processes the input line by line. The difference
-is that a matched string may extend past the end of a line and continue on
-one or more subsequent lines. The newline sequence must be matched as part of
+The newline sequence that separates multiple lines must be matched as part of
 the pattern. For example, to find the phrase "regular expression" in a file
 where "regular" might be at the end of a line and "expression" at the start of
 the next line, you could use this command:
@ -526,11 +560,8 @@ well as possibly handling a two-character newline sequence.
 <br>
 <br>
 There is a limit to the number of lines that can be matched, imposed by the way
-that <b>pcre2grep</b> buffers the input file as it scans it. However,
-<b>pcre2grep</b> ensures that at least 8K characters or the rest of the file
-(whichever is the shorter) are available for forward matching, and similarly
-the previous 8K characters (or all the previous characters, if fewer than 8K)
-are guaranteed to be available for lookbehind assertions. The <b>-M</b> option
+that <b>pcre2grep</b> buffers the input file as it scans it. With a sufficiently
+large processing buffer, this should not be a problem, but the <b>-M</b> option
 does not work when input is read line by line (see \fP--line-buffered\fP.)
 </P>
 <P>
@ -578,12 +609,13 @@ It should never be needed in normal use.
 Show only the part of the line that matched a pattern instead of the whole
 line. In this mode, no context is shown. That is, the <b>-A</b>, <b>-B</b>, and
 <b>-C</b> options are ignored. If there is more than one match in a line, each
-of them is shown separately. If <b>-o</b> is combined with <b>-v</b> (invert the
-sense of the match to find non-matching lines), no output is generated, but the
-return code is set appropriately. If the matched portion of the line is empty,
-nothing is output unless the file name or line number are being printed, in
-which case they are shown on an otherwise empty line. This option is mutually
-exclusive with <b>--file-offsets</b> and <b>--line-offsets</b>.
+of them is shown separately, on a separate line of output. If <b>-o</b> is
+combined with <b>-v</b> (invert the sense of the match to find non-matching
+lines), no output is generated, but the return code is set appropriately. If
+the matched portion of the line is empty, nothing is output unless the file
+name or line number are being printed, in which case they are shown on an
+otherwise empty line. This option is mutually exclusive with
+<b>--file-offsets</b> and <b>--line-offsets</b>.
 </P>
 <P>
 <b>-o</b><i>number</i>, <b>--only-matching</b>=<i>number</i>
@ -597,10 +629,11 @@ capturing parentheses do not exist in the pattern, or were not set in the
 match, nothing is output unless the file name or line number are being output.
 <br>
 <br>
-If this option is given multiple times, multiple substrings are output, in the
-order the options are given. For example, -o3 -o1 -o3 causes the substrings
-matched by capturing parentheses 3 and 1 and then 3 again to be output. By
-default, there is no separator (but see the next option).
+If this option is given multiple times, multiple substrings are output for each
+match, in the order the options are given, and all on one line. For example,
+-o3 -o1 -o3 causes the substrings matched by capturing parentheses 3 and 1 and
+then 3 again to be output. By default, there is no separator (but see the next
+option).
 </P>
 <P>
 <b>--om-separator</b>=<i>text</i>
@ -631,6 +664,18 @@ quietly skipped. However, the return code is still 2, even if matches were
 found in other files.
 </P>
 <P>
+<b>-t</b>, <b>--total-count</b>
+This option is useful when scanning more than one file. If used on its own,
+<b>-t</b> suppresses all output except for a grand total number of matching
+lines (or non-matching lines if <b>-v</b> is used) in all the files. If <b>-t</b>
+is used with <b>-c</b>, a grand total is output except when the previous output
+is just one line. In other words, it is not output when just one file's count
+is listed. If file names are being output, the grand total is preceded by
+"TOTAL:". Otherwise, it appears as just another number. The <b>-t</b> option is
+ignored when used with <b>-L</b> (list files without matches), because the grand
+total would always be zero.
+</P>
+<P>
 <b>-u</b>, <b>--utf-8</b>
 Operate in UTF-8 mode. This option is available only if PCRE2 has been compiled
 with UTF-8 support. All patterns (including those for any <b>--exclude</b> and
@ -658,11 +703,12 @@ specified by any of the <b>--include</b> or <b>--exclude</b> options.
 <P>
 <b>-x</b>, <b>--line-regex</b>, <b>--line-regexp</b>
 Force the patterns to be anchored (each must start matching at the beginning of
-a line) and in addition, require them to match entire lines. This is equivalent
-to having ^ and $ characters at the start and end of each alternative top-level
-branch in every pattern. This option applies only to the patterns that are
-matched against the contents of files; it does not apply to patterns specified
-by any of the <b>--include</b> or <b>--exclude</b> options.
+a line) and in addition, require them to match entire lines. In multiline mode
+the match may be more than one line. This is equivalent to having \A and \Z
+characters at the start and end of each alternative top-level branch in every
+pattern. This option applies only to the patterns that are matched against the
+contents of files; it does not apply to patterns specified by any of the
+<b>--include</b> or <b>--exclude</b> options.
 </P>
 <br><a name="SEC6" href="#TOC1">ENVIRONMENT VARIABLES</a><br>
 <P>
@ -735,7 +781,57 @@ The exceptions to the above are the <b>--colour</b> (or <b>--color</b>) and
 options does have data, it must be given in the first form, using an equals
 character. Otherwise <b>pcre2grep</b> will assume that it has no data.
 </P>
-<br><a name="SEC10" href="#TOC1">MATCHING ERRORS</a><br>
+<br><a name="SEC10" href="#TOC1">CALLING EXTERNAL SCRIPTS</a><br>
+<P>
+<b>pcre2grep</b> has, by default, support for calling external programs or
+scripts during matching by making use of PCRE2's callout facility. However,
+this support can be disabled when <b>pcre2grep</b> is built. You can find out
+whether your binary has support for callouts by running it with the <b>--help</b>
+option. If the support is not enabled, all callouts in patterns are ignored by
+<b>pcre2grep</b>.
+</P>
+<P>
+A callout in a PCRE2 pattern is of the form (?C&#60;arg&#62;) where the argument is
+either a number or a quoted string (see the
+<a href="pcre2callout.html"><b>pcre2callout</b></a>
+documentation for details). Numbered callouts are ignored by <b>pcre2grep</b>.
+String arguments are parsed as a list of substrings separated by pipe (vertical
+bar) characters. The first substring must be an executable name, with the
+following substrings specifying arguments:
+<pre>
+  executable_name|arg1|arg2|...
+</pre>
+Any substring (including the executable name) may contain escape sequences
+started by a dollar character: $&#60;digits&#62; or ${&#60;digits&#62;} is replaced by the
+captured substring of the given decimal number, which must be greater than
+zero. If the number is greater than the number of capturing substrings, or if
+the capture is unset, the replacement is empty.
+</P>
+<P>
+Any other character is substituted by itself. In particular, $$ is replaced by
+a single dollar and $| is replaced by a pipe character. Here is an example:
+<pre>
+  echo -e "abcde\n12345" | pcre2grep \
+    '(?x)(.)(..(.))
+    (?C"/bin/echo|Arg1: [$1] [$2] [$3]|Arg2: $|${1}$| ($4)")()' -
+
+  Output:
+
+    Arg1: [a] [bcd] [d] Arg2: |a| ()
+    abcde
+    Arg1: [1] [234] [4] Arg2: |1| ()
+    12345
+</pre>
+The parameters for the <b>execv()</b> system call that is used to run the
+program or script are zero-terminated strings. This means that binary zero
+characters in the callout argument will cause premature termination of their
+substrings, and therefore should not be present. Any syntax errors in the
+string (for example, a dollar not followed by another character) cause the
+callout to be ignored. If running the program fails for any reason (including
+the non-existence of the executable), a local matching failure occurs and the
+matcher backtracks in the normal way.
+</P>
+<br><a name="SEC11" href="#TOC1">MATCHING ERRORS</a><br>
 <P>
 It is possible to supply a regular expression that takes a very long time to
 fail to match certain lines. Such patterns normally involve nested indefinite
@ -751,7 +847,7 @@ overall resource limit; there is a second option called <b>--recursion-limit</b>
 that sets a limit on the amount of memory (usually stack) that is used (see the
 discussion of these options above).
 </P>
-<br><a name="SEC11" href="#TOC1">DIAGNOSTICS</a><br>
+<br><a name="SEC12" href="#TOC1">DIAGNOSTICS</a><br>
 <P>
 Exit status is 0 if any matches were found, 1 if no matches were found, and 2
 for syntax errors, overlong lines, non-existent or inaccessible files (even if
@ -759,11 +855,11 @@ matches were found in other files) or too many matching errors. Using the
 <b>-s</b> option to suppress error messages about inaccessible files does not
 affect the return code.
 </P>
-<br><a name="SEC12" href="#TOC1">SEE ALSO</a><br>
+<br><a name="SEC13" href="#TOC1">SEE ALSO</a><br>
 <P>
-<b>pcre2pattern</b>(3), <b>pcre2syntax</b>(3).
+<b>pcre2pattern</b>(3), <b>pcre2syntax</b>(3), <b>pcre2callout</b>(3).
 </P>
-<br><a name="SEC13" href="#TOC1">AUTHOR</a><br>
+<br><a name="SEC14" href="#TOC1">AUTHOR</a><br>
 <P>
 Philip Hazel
 <br>
@ -772,11 +868,11 @@ University Computing Service
 Cambridge, England.
 <br>
 </P>
-<br><a name="SEC14" href="#TOC1">REVISION</a><br>
+<br><a name="SEC15" href="#TOC1">REVISION</a><br>
 <P>
-Last updated: 03 January 2015
+Last updated: 31 December 2016
 <br>
-Copyright &copy; 1997-2015 University of Cambridge.
+Copyright &copy; 1997-2016 University of Cambridge.
 <br>
 <p>
 Return to the <a href="index.html">PCRE2 index page</a>.
--- a/pcre2/doc/html/pcre2jit.html
+++ b/pcre2/doc/html/pcre2jit.html
@ -86,6 +86,13 @@ results. The returned value from <b>pcre2_jit_compile()</b> is zero on success,
 or a negative error code.
 </P>
 <P>
+There is a limit to the size of pattern that JIT supports, imposed by the size
+of machine stack that it uses. The exact rules are not documented because they
+may change at any time, in particular, when new optimizations are introduced.
+If a pattern is too big, a call to \fBpcre2_jit_compile()\fB returns
+PCRE2_ERROR_NOMEMORY.
+</P>
+<P>
 PCRE2_JIT_COMPLETE requests the JIT compiler to generate code for complete
 matches. If you want to run partial matches using the PCRE2_PARTIAL_HARD or
 PCRE2_PARTIAL_SOFT options of <b>pcre2_match()</b>, you should set one or both
@ -145,6 +152,10 @@ PCRE2_NO_UTF_CHECK, PCRE2_PARTIAL_HARD, and PCRE2_PARTIAL_SOFT. The
 PCRE2_ANCHORED option is not supported at match time.
 </P>
 <P>
+If the PCRE2_NO_JIT option is passed to <b>pcre2_match()</b> it disables the
+use of JIT, forcing matching by the interpreter code.
+</P>
+<P>
 The only unsupported pattern items are \C (match a single data unit) when
 running in a UTF mode, and a callout immediately before an assertion condition
 in a conditional group.
@ -224,8 +235,14 @@ whether a match operation was executed by JIT or by the interpreter.
 </P>
 <P>
 You may safely use the same JIT stack for more than one pattern (either by
-assigning directly or by callback), as long as the patterns are all matched
-sequentially in the same thread. In a multithread application, if you do not
+assigning directly or by callback), as long as the patterns are matched
+sequentially in the same thread. Currently, the only way to set up
+non-sequential matches in one thread is to use callouts: if a callout function
+starts another match, that match must use a different JIT stack to the one used
+for currently suspended match(es).
+</P>
+<P>
+In a multithread application, if you do not
 specify a JIT stack, or if you assign or pass back NULL from a callback, that
 is thread-safe, because each thread has its own machine stack. However, if you
 assign or pass back a non-NULL JIT stack, this must be a different stack for
@ -390,7 +407,7 @@ The fast path function is called <b>pcre2_jit_match()</b>, and it takes exactly
 the same arguments as <b>pcre2_match()</b>. The return values are also the same,
 plus PCRE2_ERROR_JIT_BADOPTION if a matching mode (partial or complete) is
 requested that was not compiled. Unsupported option bits (for example,
-PCRE2_ANCHORED) are ignored.
+PCRE2_ANCHORED) are ignored, as is the PCRE2_NO_JIT option.
 </P>
 <P>
 When you call <b>pcre2_match()</b>, as well as testing for invalid options, a
@ -419,9 +436,9 @@ Cambridge, England.
 </P>
 <br><a name="SEC13" href="#TOC1">REVISION</a><br>
 <P>
-Last updated: 27 November 2014
+Last updated: 05 June 2016
 <br>
-Copyright &copy; 1997-2014 University of Cambridge.
+Copyright &copy; 1997-2016 University of Cambridge.
 <br>
 <p>
 Return to the <a href="index.html">PCRE2 index page</a>.
--- a/pcre2/doc/html/pcre2limits.html
+++ b/pcre2/doc/html/pcre2limits.html
@ -32,6 +32,11 @@ However, the speed of execution is slower. In the 32-bit library, the internal
 linkage size is always 4.
 </P>
 <P>
+The maximum length of a source pattern string is essentially unlimited; it is
+the largest number a PCRE2_SIZE variable can hold. However, the program that
+calls <b>pcre2_compile()</b> can specify a smaller limit.
+</P>
+<P>
 The maximum length (in code units) of a subject string is one less than the
 largest number a PCRE2_SIZE variable can hold. PCRE2_SIZE is an unsigned
 integer type, usually defined as size_t. Its maximum value (that is
@ -50,17 +55,16 @@ documentation.
 All values in repeating quantifiers must be less than 65536.
 </P>
 <P>
+The maximum length of a lookbehind assertion is 65535 characters.
+</P>
+<P>
 There is no limit to the number of parenthesized subpatterns, but there can be
 no more than 65535 capturing subpatterns. There is, however, a limit to the
 depth of nesting of parenthesized subpatterns of all kinds. This is imposed in
-order to limit the amount of system stack used at compile time. The limit can
-be specified when PCRE2 is built; the default is 250.
-</P>
-<P>
-There is a limit to the number of forward references to subsequent subpatterns
-of around 200,000. Repeated forward references with fixed upper limits, for
-example, (?2){0,100} when subpattern number 2 is to the right, are included in
-the count. There is no limit to the number of backward references.
+order to limit the amount of system stack used at compile time. The default
+limit can be specified when PCRE2 is built; the default default is 250. An
+application can change this limit by calling pcre2_set_parens_nest_limit() to
+set the limit in a compile context.
 </P>
 <P>
 The maximum length of name for a named subpattern is 32 code units, and the
@ -68,7 +72,12 @@ maximum number of named subpatterns is 10000.
 </P>
 <P>
 The maximum length of a name in a (*MARK), (*PRUNE), (*SKIP), or (*THEN) verb
-is 255 for the 8-bit library and 65535 for the 16-bit and 32-bit libraries.
+is 255 code units for the 8-bit library and 65535 code units for the 16-bit and
+32-bit libraries.
+</P>
+<P>
+The maximum length of a string argument to a callout is the largest number a
+32-bit unsigned integer can hold.
 </P>
 <br><b>
 AUTHOR
@ -85,9 +94,9 @@ Cambridge, England.
 REVISION
 </b><br>
 <P>
-Last updated: 25 November 2014
+Last updated: 26 October 2016
 <br>
-Copyright &copy; 1997-2014 University of Cambridge.
+Copyright &copy; 1997-2016 University of Cambridge.
 <br>
 <p>
 Return to the <a href="index.html">PCRE2 index page</a>.
--- a/pcre2/doc/html/pcre2pattern.html
+++ b/pcre2/doc/html/pcre2pattern.html
@ -190,6 +190,12 @@ be less than the value set (or defaulted) by the caller of <b>pcre2_match()</b>
 for it to have any effect. In other words, the pattern writer can lower the
 limits set by the programmer, but not raise them. If there is more than one
 setting of one of these limits, the lower value is used.
+</P>
+<P>
+The match limit is used (but in a different way) when JIT is being used, but it
+is not relevant, and is ignored, when matching with <b>pcre2_dfa_match()</b>.
+However, the recursion limit is relevant for DFA matching, which does use some
+function recursion, in particular, for recursions within the pattern.
 <a name="newlines"></a></P>
 <br><b>
 Newline conventions
@ -379,32 +385,31 @@ case letter, it is converted to upper case. Then bit 6 of the character (hex
 40) is inverted. Thus \cA to \cZ become hex 01 to hex 1A (A is 41, Z is 5A),
 but \c{ becomes hex 3B ({ is 7B), and \c; becomes hex 7B (; is 3B). If the
 code unit following \c has a value less than 32 or greater than 126, a
-compile-time error occurs. This locks out non-printable ASCII characters in all
-modes.
+compile-time error occurs.
 </P>
 <P>
 When PCRE2 is compiled in EBCDIC mode, \a, \e, \f, \n, \r, and \t
 generate the appropriate EBCDIC code values. The \c escape is processed
 as specified for Perl in the <b>perlebcdic</b> document. The only characters
 that are allowed after \c are A-Z, a-z, or one of @, [, \, ], ^, _, or ?. Any
-other character provokes a compile-time error. The sequence \@ encodes
-character code 0; the letters (in either case) encode characters 1-26 (hex 01
-to hex 1A); [, \, ], ^, and _ encode characters 27-31 (hex 1B to hex 1F), and
-\? becomes either 255 (hex FF) or 95 (hex 5F).
+other character provokes a compile-time error. The sequence \c@ encodes
+character code 0; after \c the letters (in either case) encode characters 1-26
+(hex 01 to hex 1A); [, \, ], ^, and _ encode characters 27-31 (hex 1B to hex
+1F), and \c? becomes either 255 (hex FF) or 95 (hex 5F).
 </P>
 <P>
-Thus, apart from \?, these escapes generate the same character code values as
+Thus, apart from \c?, these escapes generate the same character code values as
 they do in an ASCII environment, though the meanings of the values mostly
-differ. For example, \G always generates code value 7, which is BEL in ASCII
+differ. For example, \cG always generates code value 7, which is BEL in ASCII
 but DEL in EBCDIC.
 </P>
 <P>
-The sequence \? generates DEL (127, hex 7F) in an ASCII environment, but
+The sequence \c? generates DEL (127, hex 7F) in an ASCII environment, but
 because 127 is not a control character in EBCDIC, Perl makes it generate the
 APC character. Unfortunately, there are several variants of EBCDIC. In most of
 them the APC character has the value 255 (hex FF), but in the one Perl calls
 POSIX-BC its value is 95 (hex 5F). If certain other characters have POSIX-BC
-values, PCRE2 makes \? generate 95; otherwise it generates 255.
+values, PCRE2 makes \c? generate 95; otherwise it generates 255.
 </P>
 <P>
 After \0 up to two further octal digits are read. If there are fewer than two
@ -526,9 +531,9 @@ by code point, as described in the previous section.
 Absolute and relative back references
 </b><br>
 <P>
-The sequence \g followed by an unsigned or a negative number, optionally
-enclosed in braces, is an absolute or relative back reference. A named back
-reference can be coded as \g{name}. Back references are discussed
+The sequence \g followed by a signed or unsigned number, optionally enclosed
+in braces, is an absolute or relative back reference. A named back reference
+can be coded as \g{name}. Back references are discussed
 <a href="#backreferences">later,</a>
 following the discussion of
 <a href="#subpattern">parenthesized subpatterns.</a>
@ -669,8 +674,8 @@ This is an example of an "atomic group", details of which are given
 This particular group matches either the two-character sequence CR followed by
 LF, or one of the single characters LF (linefeed, U+000A), VT (vertical tab,
 U+000B), FF (form feed, U+000C), CR (carriage return, U+000D), or NEL (next
-line, U+0085). The two-character sequence is treated as a single unit that
-cannot be split.
+line, U+0085). Because this is an atomic group, the two-character sequence is
+treated as a single unit that cannot be split.
 </P>
 <P>
 In other modes, two additional characters whose codepoints are greater than 255
@ -736,6 +741,8 @@ Those that are not part of an identified script are lumped together as
 "Common". The current list of scripts is:
 </P>
 <P>
+Ahom,
+Anatolian_Hieroglyphs,
 Arabic,
 Armenian,
 Avestan,
@ -776,6 +783,7 @@ Gurmukhi,
 Han,
 Hangul,
 Hanunoo,
+Hatran,
 Hebrew,
 Hiragana,
 Imperial_Aramaic,
@ -812,12 +820,14 @@ Miao,
 Modi,
 Mongolian,
 Mro,
+Multani,
 Myanmar,
 Nabataean,
 New_Tai_Lue,
 Nko,
 Ogham,
 Ol_Chiki,
+Old_Hungarian,
 Old_Italic,
 Old_North_Arabian,
 Old_Permic,
@ -839,6 +849,7 @@ Saurashtra,
 Sharada,
 Shavian,
 Siddham,
+SignWriting,
 Sinhala,
 Sora_Sompeng,
 Sundanese,
@ -1180,6 +1191,16 @@ when the <i>startoffset</i> argument of <b>pcre2_match()</b> is non-zero. The
 PCRE2_DOLLAR_ENDONLY option is ignored if PCRE2_MULTILINE is set.
 </P>
 <P>
+When the newline convention (see
+<a href="#newlines">"Newline conventions"</a>
+below) recognizes the two-character sequence CRLF as a newline, this is
+preferred, even if the single characters CR and LF are also recognized as
+newlines. For example, if the newline convention is "any", a multiline mode
+circumflex matches before "xyz" in the string "abc\r\nxyz" rather than after
+CR, even though CR on its own is a valid newline. (It also matches at the very
+start of the string, of course.)
+</P>
+<P>
 Note that the sequences \A, \Z, and \z can be used to match the start and
 end of the subject in both modes, and if all branches of a pattern start with
 \A it is always anchored, whether or not PCRE2_MULTILINE is set.
@ -1230,20 +1251,32 @@ with \C in UTF-8 or UTF-16 mode means that the rest of the string may start
 with a malformed UTF character. This has undefined results, because PCRE2
 assumes that it is matching character by character in a valid UTF string (by
 default it checks the subject string's validity at the start of processing
-unless the PCRE2_NO_UTF_CHECK option is used). An application can lock out the
-use of \C by setting the PCRE2_NEVER_BACKSLASH_C option.
+unless the PCRE2_NO_UTF_CHECK option is used).
+</P>
+<P>
+An application can lock out the use of \C by setting the
+PCRE2_NEVER_BACKSLASH_C option when compiling a pattern. It is also possible to
+build PCRE2 with the use of \C permanently disabled.
 </P>
 <P>
 PCRE2 does not allow \C to appear in lookbehind assertions
 <a href="#lookbehind">(described below)</a>
-in a UTF mode, because this would make it impossible to calculate the length of
-the lookbehind.
+in UTF-8 or UTF-16 modes, because this would make it impossible to calculate
+the length of the lookbehind. Neither the alternative matching function
+<b>pcre2_dfa_match()</b> nor the JIT optimizer support \C in these UTF modes.
+The former gives a match-time error; the latter fails to optimize and so the
+match is always run using the interpreter.
+</P>
+<P>
+In the 32-bit library, however, \C is always supported (when not explicitly
+locked out) because it always matches a single code unit, whether or not UTF-32
+is specified.
 </P>
 <P>
 In general, the \C escape sequence is best avoided. However, one way of using
-it that avoids the problem of malformed UTF characters is to use a lookahead to
-check the length of the next character, as in this pattern, which could be used
-with a UTF-8 string (ignore white space and line breaks):
+it that avoids the problem of malformed UTF-8 or UTF-16 characters is to use a
+lookahead to check the length of the next character, as in this pattern, which
+could be used with a UTF-8 string (ignore white space and line breaks):
 <pre>
  (?| (?=[\x00-\x7f])(\C) |
      (?=[\x80-\x{7ff}])(\C)(\C) |
@ -1298,42 +1331,6 @@ whatever setting of the PCRE2_DOTALL and PCRE2_MULTILINE options is used. A
 class such as [^a] always matches one of these characters.
 </P>
 <P>
-The minus (hyphen) character can be used to specify a range of characters in a
-character class. For example, [d-m] matches any letter between d and m,
-inclusive. If a minus character is required in a class, it must be escaped with
-a backslash or appear in a position where it cannot be interpreted as
-indicating a range, typically as the first or last character in the class, or
-immediately after a range. For example, [b-d-z] matches letters in the range b
-to d, a hyphen character, or z.
-</P>
-<P>
-It is not possible to have the literal character "]" as the end character of a
-range. A pattern such as [W-]46] is interpreted as a class of two characters
-("W" and "-") followed by a literal string "46]", so it would match "W46]" or
-"-46]". However, if the "]" is escaped with a backslash it is interpreted as
-the end of range, so [W-\]46] is interpreted as a class containing a range
-followed by two other characters. The octal or hexadecimal representation of
-"]" can also be used to end a range.
-</P>
-<P>
-An error is generated if a POSIX character class (see below) or an escape
-sequence other than one that defines a single character appears at a point
-where a range ending character is expected. For example, [z-\xff] is valid,
-but [A-\d] and [A-[:digit:]] are not.
-</P>
-<P>
-Ranges operate in the collating sequence of character values. They can also be
-used for characters specified numerically, for example [\000-\037]. Ranges
-can include any characters that are valid for the current mode.
-</P>
-<P>
-If a range that includes letters is used when caseless matching is set, it
-matches the letters in either case. For example, [W-c] is equivalent to
-[][\\^_`wxyzabc], matched caselessly, and in a non-UTF mode, if character
-tables for a French locale are in use, [\xc8-\xcb] matches accented E
-characters in both cases.
-</P>
-<P>
 The character escape sequences \d, \D, \h, \H, \p, \P, \s, \S, \v,
 \V, \w, and \W may appear in a character class, and add the characters that
 they match to the class. For example, [\dABCDEF] matches any hexadecimal
@ -1347,6 +1344,52 @@ are not special inside a character class. Like any other unrecognized escape
 sequences, they cause an error.
 </P>
 <P>
+The minus (hyphen) character can be used to specify a range of characters in a
+character class. For example, [d-m] matches any letter between d and m,
+inclusive. If a minus character is required in a class, it must be escaped with
+a backslash or appear in a position where it cannot be interpreted as
+indicating a range, typically as the first or last character in the class,
+or immediately after a range. For example, [b-d-z] matches letters in the range
+b to d, a hyphen character, or z.
+</P>
+<P>
+Perl treats a hyphen as a literal if it appears before or after a POSIX class
+(see below) or a character type escape such as as \d, but gives a warning in
+its warning mode, as this is most likely a user error. As PCRE2 has no facility
+for warning, an error is given in these cases.
+</P>
+<P>
+It is not possible to have the literal character "]" as the end character of a
+range. A pattern such as [W-]46] is interpreted as a class of two characters
+("W" and "-") followed by a literal string "46]", so it would match "W46]" or
+"-46]". However, if the "]" is escaped with a backslash it is interpreted as
+the end of range, so [W-\]46] is interpreted as a class containing a range
+followed by two other characters. The octal or hexadecimal representation of
+"]" can also be used to end a range.
+</P>
+<P>
+Ranges normally include all code points between the start and end characters,
+inclusive. They can also be used for code points specified numerically, for
+example [\000-\037]. Ranges can include any characters that are valid for the
+current mode.
+</P>
+<P>
+There is a special case in EBCDIC environments for ranges whose end points are
+both specified as literal letters in the same case. For compatibility with
+Perl, EBCDIC code points within the range that are not letters are omitted. For
+example, [h-k] matches only four characters, even though the codes for h and k
+are 0x88 and 0x92, a range of 11 code points. However, if the range is
+specified numerically, for example, [\x88-\x92] or [h-\x92], all code points
+are included.
+</P>
+<P>
+If a range that includes letters is used when caseless matching is set, it
+matches the letters in either case. For example, [W-c] is equivalent to
+[][\\^_`wxyzabc], matched caselessly, and in a non-UTF mode, if character
+tables for a French locale are in use, [\xc8-\xcb] matches accented E
+characters in both cases.
+</P>
+<P>
 A circumflex can conveniently be used with the upper case character types to
 specify a more restricted set of characters than the matching lower case type.
 For example, the class [^\W_] matches any letter or digit, but not underscore,
@ -1514,13 +1557,8 @@ respectively.
 <P>
 When one of these option changes occurs at top level (that is, not inside
 subpattern parentheses), the change applies to the remainder of the pattern
-that follows. If the change is placed right at the start of a pattern, PCRE2
-extracts it into the global options (and it will therefore show up in data
-extracted by the <b>pcre2_pattern_info()</b> function).
-</P>
-<P>
-An option change within a subpattern (see below for a description of
-subpatterns) affects only that part of the subpattern that follows it, so
+that follows. An option change within a subpattern (see below for a description
+of subpatterns) affects only that part of the subpattern that follows it, so
 <pre>
  (a(?i)b)c
 </pre>
@ -1649,6 +1687,10 @@ first one in the pattern with the given number. The following pattern matches
 <pre>
  /(?|(abc)|(def))(?1)/
 </pre>
+A relative reference such as (?-1) is no different: it is just a convenient way
+of computing an absolute group number.
+</P>
+<P>
 If a
 <a href="#conditions">condition test</a>
 for a subpattern's having matched refers to a non-unique number, the test is
@ -2051,9 +2093,9 @@ subpattern is possible using named parentheses (see below).
 </P>
 <P>
 Another way of avoiding the ambiguity inherent in the use of digits following a
-backslash is to use the \g escape sequence. This escape must be followed by an
-unsigned number or a negative number, optionally enclosed in braces. These
-examples are all identical:
+backslash is to use the \g escape sequence. This escape must be followed by a
+signed or unsigned number, optionally enclosed in braces. These examples are
+all identical:
 <pre>
  (ring), \1
  (ring), \g1
@ -2061,8 +2103,7 @@ examples are all identical:
 </pre>
 An unsigned number specifies an absolute reference without the ambiguity that
 is present in the older syntax. It is also useful when literal digits follow
-the reference. A negative number is a relative reference. Consider this
-example:
+the reference. A signed number is a relative reference. Consider this example:
 <pre>
  (abc(def)ghi)\g{-1}
 </pre>
@ -2073,6 +2114,11 @@ can be helpful in long patterns, and also in patterns that are created by
 joining together fragments that contain references within themselves.
 </P>
 <P>
+The sequence \g{+1} is a reference to the next capturing subpattern. This kind
+of forward reference can be useful it patterns that repeat. Perl does not
+support the use of + in this way.
+</P>
+<P>
 A back reference matches whatever actually matched the capturing subpattern in
 the current subject string, rather than anything matching the subpattern
 itself (see
@ -2172,6 +2218,14 @@ capturing is carried out only for positive assertions. (Perl sometimes, but not
 always, does do capturing in negative assertions.)
 </P>
 <P>
+WARNING: If a positive assertion containing one or more capturing subpatterns
+succeeds, but failure to match later in the pattern causes backtracking over
+this assertion, the captures within the assertion are reset only if no higher
+numbered captures are already set. This is, unfortunately, a fundamental
+limitation of the current implementation; it may get removed in a future
+reworking.
+</P>
+<P>
 For compatibility with Perl, most assertion subpatterns may be repeated; though
 it makes no sense to assert the same thing several times, the side effect of
 capturing parentheses may occasionally be useful. However, an assertion that
@ -2268,18 +2322,31 @@ match. If there are insufficient characters before the current position, the
 assertion fails.
 </P>
 <P>
-In a UTF mode, PCRE2 does not allow the \C escape (which matches a single code
-unit even in a UTF mode) to appear in lookbehind assertions, because it makes
-it impossible to calculate the length of the lookbehind. The \X and \R
-escapes, which can match different numbers of code units, are also not
-permitted.
+In UTF-8 and UTF-16 modes, PCRE2 does not allow the \C escape (which matches a
+single code unit even in a UTF mode) to appear in lookbehind assertions,
+because it makes it impossible to calculate the length of the lookbehind. The
+\X and \R escapes, which can match different numbers of code units, are never
+permitted in lookbehinds.
 </P>
 <P>
 <a href="#subpatternsassubroutines">"Subroutine"</a>
 calls (see below) such as (?2) or (?&X) are permitted in lookbehinds, as long
-as the subpattern matches a fixed-length string.
-<a href="#recursion">Recursion,</a>
-however, is not supported.
+as the subpattern matches a fixed-length string. However,
+<a href="#recursion">recursion,</a>
+that is, a "subroutine" call into a group that is already active,
+is not supported.
+</P>
+<P>
+Perl does not support back references in lookbehinds. PCRE2 does support them,
+but only if certain conditions are met. The PCRE2_MATCH_UNSET_BACKREF option
+must not be set, there must be no use of (?| in the pattern (it creates
+duplicate subpattern numbers), and if the back reference is by name, the name
+must be unique. Of course, the referenced subpattern must itself be of fixed
+length. The following pattern matches words containing at least two characters
+that begin and end with the same character:
+<pre>
+   \b(\w)\w++(?&#60;=\1)
+</PRE>
 </P>
 <P>
 Possessive quantifiers can be used in conjunction with lookbehind assertions to
@ -2417,7 +2484,9 @@ Checking for a used subpattern by name
 <P>
 Perl uses the syntax (?(&#60;name&#62;)...) or (?('name')...) to test for a used
 subpattern by name. For compatibility with earlier versions of PCRE1, which had
-this facility before Perl, the syntax (?(name)...) is also recognized.
+this facility before Perl, the syntax (?(name)...) is also recognized. Note,
+however, that undelimited names consisting of the letter R followed by digits
+are ambiguous (see the following section).
 </P>
 <P>
 Rewriting the above example to use a named subpattern gives this:
@ -2432,30 +2501,52 @@ matched.
 Checking for pattern recursion
 </b><br>
 <P>
-If the condition is the string (R), and there is no subpattern with the name R,
-the condition is true if a recursive call to the whole pattern or any
-subpattern has been made. If digits or a name preceded by ampersand follow the
-letter R, for example:
+"Recursion" in this sense refers to any subroutine-like call from one part of
+the pattern to another, whether or not it is actually recursive. See the
+sections entitled
+<a href="#recursion">"Recursive patterns"</a>
+and
+<a href="#subpatternsassubroutines">"Subpatterns as subroutines"</a>
+below for details of recursion and subpattern calls.
+</P>
+<P>
+If a condition is the string (R), and there is no subpattern with the name R,
+the condition is true if matching is currently in a recursion or subroutine
+call to the whole pattern or any subpattern. If digits follow the letter R, and
+there is no subpattern with that name, the condition is true if the most recent
+call is into a subpattern with the given number, which must exist somewhere in
+the overall pattern. This is a contrived example that is equivalent to a+b:
 <pre>
-  (?(R3)...) or (?(R&name)...)
+  ((?(R1)a+|(?1)b))
 </pre>
-the condition is true if the most recent recursion is into a subpattern whose
-number or name is given. This condition does not check the entire recursion
-stack. If the name used in a condition of this kind is a duplicate, the test is
-applied to all subpatterns of the same name, and is true if any one of them is
-the most recent recursion.
+However, in both cases, if there is a subpattern with a matching name, the
+condition tests for its being set, as described in the section above, instead
+of testing for recursion. For example, creating a group with the name R1 by
+adding (?&#60;R1&#62;) to the above pattern completely changes its meaning.
+</P>
+<P>
+If a name preceded by ampersand follows the letter R, for example:
+<pre>
+  (?(R&name)...)
+</pre>
+the condition is true if the most recent recursion is into a subpattern of that
+name (which must exist within the pattern).
+</P>
+<P>
+This condition does not check the entire recursion stack. It tests only the
+current level. If the name used in a condition of this kind is a duplicate, the
+test is applied to all subpatterns of the same name, and is true if any one of
+them is the most recent recursion.
 </P>
 <P>
 At "top level", all these recursion test conditions are false.
-<a href="#recursion">The syntax for recursive patterns</a>
-is described below.
 <a name="subdefine"></a></P>
 <br><b>
 Defining subpatterns for use by reference only
 </b><br>
 <P>
-If the condition is the string (DEFINE), and there is no subpattern with the
-name DEFINE, the condition is always false. In this case, there may be only one
+If the condition is the string (DEFINE), the condition is always false, even if
+there is a group with the name DEFINE. In this case, there may be only one
 alternative in the subpattern. It is always skipped if control reaches this
 point in the pattern; the idea of DEFINE is that it can be used to define
 subroutines that can be referenced from elsewhere. (The use of
@ -2489,7 +2580,8 @@ For example:
  (?(VERSION&#62;=10.4)yes|no)
 </pre>
 This pattern matches "yes" if the PCRE2 version is greater or equal to 10.4, or
-"no" otherwise.
+"no" otherwise. The fractional part of the version number may not contain more
+than two digits.
 </P>
 <br><b>
 Assertion conditions
@ -2602,6 +2694,21 @@ parentheses preceding the recursion. In other words, a negative number counts
 capturing parentheses leftwards from the point at which it is encountered.
 </P>
 <P>
+Be aware however, that if
+<a href="#dupsubpatternnumber">duplicate subpattern numbers</a>
+are in use, relative references refer to the earliest subpattern with the
+appropriate number. Consider, for example:
+<pre>
+  (?|(a)|(b)) (c) (?-2)
+</pre>
+The first two capturing groups (a) and (b) are both numbered 1, and group (c)
+is number 2. When the reference (?-2) is encountered, the second most recently
+opened parentheses has the number 1, but it is the first such group (the (a)
+group) to which the recursion refers. This would be the same if an absolute
+reference (?1) was used. In other words, relative references are just a
+shorthand for computing a group number.
+</P>
+<P>
 It is also possible to refer to subsequently opened parentheses, by writing
 references such as (?+2). However, these cannot be recursive because the
 reference is not inside the parentheses that are referenced. They are always
@ -2899,14 +3006,36 @@ remarks apply to the PCRE2 features described in this section.
 </P>
 <P>
 The new verbs make use of what was previously invalid syntax: an opening
-parenthesis followed by an asterisk. They are generally of the form
-(*VERB) or (*VERB:NAME). Some may take either form, possibly behaving
-differently depending on whether or not a name is present. A name is any
-sequence of characters that does not include a closing parenthesis. The maximum
-length of name is 255 in the 8-bit library and 65535 in the 16-bit and 32-bit
-libraries. If the name is empty, that is, if the closing parenthesis
-immediately follows the colon, the effect is as if the colon were not there.
-Any number of these verbs may occur in a pattern.
+parenthesis followed by an asterisk. They are generally of the form (*VERB) or
+(*VERB:NAME). Some verbs take either form, possibly behaving differently
+depending on whether or not a name is present.
+</P>
+<P>
+By default, for compatibility with Perl, a name is any sequence of characters
+that does not include a closing parenthesis. The name is not processed in
+any way, and it is not possible to include a closing parenthesis in the name.
+This can be changed by setting the PCRE2_ALT_VERBNAMES option, but the result
+is no longer Perl-compatible.
+</P>
+<P>
+When PCRE2_ALT_VERBNAMES is set, backslash processing is applied to verb names
+and only an unescaped closing parenthesis terminates the name. However, the
+only backslash items that are permitted are \Q, \E, and sequences such as
+\x{100} that define character code points. Character type escapes such as \d
+are faulted.
+</P>
+<P>
+A closing parenthesis can be included in a name either as \) or between \Q
+and \E. In addition to backslash processing, if the PCRE2_EXTENDED option is
+also set, unescaped whitespace in verb names is skipped, and #-comments are
+recognized, exactly as in the rest of the pattern. PCRE2_EXTENDED does not
+affect verb names unless PCRE2_ALT_VERBNAMES is also set.
+</P>
+<P>
+The maximum length of a name is 255 in the 8-bit library and 65535 in the
+16-bit and 32-bit libraries. If the name is empty, that is, if the closing
+parenthesis immediately follows the colon, the effect is as if the colon were
+not there. Any number of these verbs may occur in a pattern.
 </P>
 <P>
 Since these verbs are specifically related to backtracking, most of them can be
@ -3323,9 +3452,9 @@ Cambridge, England.
 </P>
 <br><a name="SEC30" href="#TOC1">REVISION</a><br>
 <P>
-Last updated: 13 June 2015
+Last updated: 27 December 2016
 <br>
-Copyright &copy; 1997-2015 University of Cambridge.
+Copyright &copy; 1997-2016 University of Cambridge.
 <br>
 <p>
 Return to the <a href="index.html">PCRE2 index page</a>.
--- a/pcre2/doc/html/pcre2perform.html
+++ b/pcre2/doc/html/pcre2perform.html
@ -12,17 +12,21 @@ This page is part of the PCRE2 HTML documentation. It was generated
 automatically from the original man page. If there is any nonsense in it,
 please consult the man page, in case the conversion went wrong.
 <br>
-<br><b>
-PCRE2 PERFORMANCE
-</b><br>
+<ul>
+<li><a name="TOC1" href="#SEC1">PCRE2 PERFORMANCE</a>
+<li><a name="TOC2" href="#SEC2">COMPILED PATTERN MEMORY USAGE</a>
+<li><a name="TOC3" href="#SEC3">STACK USAGE AT RUN TIME</a>
+<li><a name="TOC4" href="#SEC4">PROCESSING TIME</a>
+<li><a name="TOC5" href="#SEC5">AUTHOR</a>
+<li><a name="TOC6" href="#SEC6">REVISION</a>
+</ul>
+<br><a name="SEC1" href="#TOC1">PCRE2 PERFORMANCE</a><br>
 <P>
 Two aspects of performance are discussed below: memory usage and processing
 time. The way you express your pattern as a regular expression can affect both
 of them.
 </P>
-<br><b>
-COMPILED PATTERN MEMORY USAGE
-</b><br>
+<br><a name="SEC2" href="#TOC1">COMPILED PATTERN MEMORY USAGE</a><br>
 <P>
 Patterns are compiled by PCRE2 into a reasonably efficient interpretive code,
 so that most simple patterns do not use much memory. However, there is one case
@ -75,9 +79,7 @@ pattern. Nevertheless, if the atomic grouping is not a problem and the loss of
 speed is acceptable, this kind of rewriting will allow you to process patterns
 that PCRE2 cannot otherwise handle.
 </P>
-<br><b>
-STACK USAGE AT RUN TIME
-</b><br>
+<br><a name="SEC3" href="#TOC1">STACK USAGE AT RUN TIME</a><br>
 <P>
 When <b>pcre2_match()</b> is used for matching, certain kinds of pattern can
 cause it to use large amounts of the process stack. In some environments the
@ -86,9 +88,7 @@ SIGSEGV. Rewriting your pattern can often help. The
 <a href="pcre2stack.html"><b>pcre2stack</b></a>
 documentation discusses this issue in detail.
 </P>
-<br><b>
-PROCESSING TIME
-</b><br>
+<br><a name="SEC4" href="#TOC1">PROCESSING TIME</a><br>
 <P>
 Certain items in regular expression patterns are processed more efficiently
 than others. It is more efficient to use a character class like [aeiou] than a
@ -177,9 +177,7 @@ appreciable time with strings longer than about 20 characters.
 In many cases, the solution to this kind of performance issue is to use an
 atomic group or a possessive quantifier.
 </P>
-<br><b>
-AUTHOR
-</b><br>
+<br><a name="SEC5" href="#TOC1">AUTHOR</a><br>
 <P>
 Philip Hazel
 <br>
@ -188,9 +186,7 @@ University Computing Service
 Cambridge, England.
 <br>
 </P>
-<br><b>
-REVISION
-</b><br>
+<br><a name="SEC6" href="#TOC1">REVISION</a><br>
 <P>
 Last updated: 02 January 2015
 <br>
--- a/pcre2/doc/html/pcre2posix.html
+++ b/pcre2/doc/html/pcre2posix.html
@ -48,7 +48,7 @@ This set of functions provides a POSIX-style API for the PCRE2 regular
 expression 8-bit library. See the
 <a href="pcre2api.html"><b>pcre2api</b></a>
 documentation for a description of PCRE2's native API, which contains much
-additional functionality. There is no POSIX-style wrapper for PCRE2's 16-bit
+additional functionality. There are no POSIX-style wrappers for PCRE2's 16-bit
 and 32-bit libraries.
 </P>
 <P>
@ -67,9 +67,9 @@ POSIX interface often use it, this makes it easier to slot in PCRE2 as a
 replacement library. Other POSIX options are not even defined.
 </P>
 <P>
-There are also some other options that are not defined by POSIX. These have
-been added at the request of users who want to make use of certain
-PCRE2-specific features via the POSIX calling interface.
+There are also some options that are not defined by POSIX. These have been
+added at the request of users who want to make use of certain PCRE2-specific
+features via the POSIX calling interface.
 </P>
 <P>
 When PCRE2 is called via these functions, it is only the API that is POSIX-like
@ -119,11 +119,11 @@ defined POSIX behaviour for REG_NEWLINE (see the following section).
 <pre>
  REG_NOSUB
 </pre>
-The PCRE2_NO_AUTO_CAPTURE option is set when the regular expression is passed
-for compilation to the native function. In addition, when a pattern that is
-compiled with this flag is passed to <b>regexec()</b> for matching, the
-<i>nmatch</i> and <i>pmatch</i> arguments are ignored, and no captured strings
-are returned.
+When a pattern that is compiled with this flag is passed to <b>regexec()</b> for
+matching, the <i>nmatch</i> and <i>pmatch</i> arguments are ignored, and no
+captured strings are returned. Versions of the PCRE library prior to 10.22 used
+to set the PCRE2_NO_AUTO_CAPTURE compile option, but this no longer happens
+because it disables the use of back references.
 <pre>
  REG_UCP
 </pre>
@ -170,7 +170,7 @@ use the contents of the <i>preg</i> structure. If, for example, you pass it to
 This area is not simple, because POSIX and Perl take different views of things.
 It is not possible to get PCRE2 to obey POSIX semantics, but then PCRE2 was
 never intended to be a POSIX engine. The following table lists the different
-possibilities for matching newline characters in PCRE2:
+possibilities for matching newline characters in Perl and PCRE2:
 <pre>
                          Default   Change with

@ -180,7 +180,7 @@ possibilities for matching newline characters in PCRE2:
  $ matches \n in middle     no     PCRE2_MULTILINE
  ^ matches \n in middle     no     PCRE2_MULTILINE
 </pre>
-This is the equivalent table for POSIX:
+This is the equivalent table for a POSIX-compatible pattern matcher:
 <pre>
                          Default   Change with

@ -190,14 +190,18 @@ This is the equivalent table for POSIX:
  $ matches \n in middle     no     REG_NEWLINE
  ^ matches \n in middle     no     REG_NEWLINE
 </pre>
-PCRE2's behaviour is the same as Perl's, except that there is no equivalent for
-PCRE2_DOLLAR_ENDONLY in Perl. In both PCRE2 and Perl, there is no way to stop
-newline from matching [^a].
+This behaviour is not what happens when PCRE2 is called via its POSIX
+API. By default, PCRE2's behaviour is the same as Perl's, except that there is
+no equivalent for PCRE2_DOLLAR_ENDONLY in Perl. In both PCRE2 and Perl, there
+is no way to stop newline from matching [^a].
 </P>
 <P>
-The default POSIX newline handling can be obtained by setting PCRE2_DOTALL and
-PCRE2_DOLLAR_ENDONLY, but there is no way to make PCRE2 behave exactly as for
-the REG_NEWLINE action.
+Default POSIX newline handling can be obtained by setting PCRE2_DOTALL and
+PCRE2_DOLLAR_ENDONLY when calling <b>pcre2_compile()</b> directly, but there is
+no way to make PCRE2 behave exactly as for the REG_NEWLINE action. When using
+the POSIX API, passing REG_NEWLINE to PCRE2's <b>regcomp()</b> function
+causes PCRE2_MULTILINE to be passed to <b>pcre2_compile()</b>, and REG_DOTALL
+passes PCRE2_DOTALL. There is no way to pass PCRE2_DOLLAR_ENDONLY.
 </P>
 <br><a name="SEC5" href="#TOC1">MATCHING A PATTERN</a><br>
 <P>
@ -231,19 +235,21 @@ to have a terminating NUL located at <i>string</i> + <i>pmatch[0].rm_eo</i>
 IEEE Standard 1003.2 (POSIX.2), and should be used with caution in software
 intended to be portable to other systems. Note that a non-zero <i>rm_so</i> does
 not imply REG_NOTBOL; REG_STARTEND affects only the location of the string, not
-how it is matched.
+how it is matched. Setting REG_STARTEND and passing <i>pmatch</i> as NULL are
+mutually exclusive; the error REG_INVARG is returned.
 </P>
 <P>
 If the pattern was compiled with the REG_NOSUB flag, no data about any matched
 strings is returned. The <i>nmatch</i> and <i>pmatch</i> arguments of
-<b>regexec()</b> are ignored.
+<b>regexec()</b> are ignored (except possibly as input for REG_STARTEND).
 </P>
 <P>
-If the value of <i>nmatch</i> is zero, or if the value <i>pmatch</i> is NULL,
-no data about any matched strings is returned.
+The value of <i>nmatch</i> may be zero, and the value <i>pmatch</i> may be NULL
+(unless REG_STARTEND is set); in both these cases no data about any matched
+strings is returned.
 </P>
 <P>
-Otherwise,the portion of the string that was matched, and also any captured
+Otherwise, the portion of the string that was matched, and also any captured
 substrings, are returned via the <i>pmatch</i> argument, which points to an
 array of <i>nmatch</i> structures of type <i>regmatch_t</i>, containing the
 members <i>rm_so</i> and <i>rm_eo</i>. These contain the byte offset to the first
@ -262,9 +268,11 @@ header file, of which REG_NOMATCH is the "expected" failure code.
 The <b>regerror()</b> function maps a non-zero errorcode from either
 <b>regcomp()</b> or <b>regexec()</b> to a printable message. If <i>preg</i> is not
 NULL, the error should have arisen from the use of that structure. A message
-terminated by a binary zero is placed in <i>errbuf</i>. The length of the
-message, including the zero, is limited to <i>errbuf_size</i>. The yield of the
-function is the size of buffer needed to hold the whole message.
+terminated by a binary zero is placed in <i>errbuf</i>. If the buffer is too
+short, only the first <i>errbuf_size</i> - 1 characters of the error message are
+used. The yield of the function is the size of buffer needed to hold the whole
+message, including the terminating zero. This value is greater than
+<i>errbuf_size</i> if the message was truncated.
 </P>
 <br><a name="SEC7" href="#TOC1">MEMORY USAGE</a><br>
 <P>
@ -283,9 +291,9 @@ Cambridge, England.
 </P>
 <br><a name="SEC9" href="#TOC1">REVISION</a><br>
 <P>
-Last updated: 20 October 2014
+Last updated: 31 January 2016
 <br>
-Copyright &copy; 1997-2014 University of Cambridge.
+Copyright &copy; 1997-2016 University of Cambridge.
 <br>
 <p>
 Return to the <a href="index.html">PCRE2 index page</a>.
--- a/pcre2/doc/html/pcre2sample.html
+++ b/pcre2/doc/html/pcre2sample.html
@ -24,12 +24,11 @@ documentation. If you do not have a copy of the PCRE2 distribution, you can
 save this listing to re-create the contents of <i>pcre2demo.c</i>.
 </P>
 <P>
-The demonstration program, which uses the PCRE2 8-bit library, compiles the
-regular expression that is its first argument, and matches it against the
-subject string in its second argument. No PCRE2 options are set, and default
-character tables are used. If matching succeeds, the program outputs the
-portion of the subject that matched, together with the contents of any captured
-substrings.
+The demonstration program compiles the regular expression that is its
+first argument, and matches it against the subject string in its second
+argument. No PCRE2 options are set, and default character tables are used. If
+matching succeeds, the program outputs the portion of the subject that matched,
+together with the contents of any captured substrings.
 </P>
 <P>
 If the -g option is given on the command line, the program then goes on to
@ -38,34 +37,39 @@ string. The logic is a little bit tricky because of the possibility of matching
 an empty string. Comments in the code explain what is going on.
 </P>
 <P>
+The code in <b>pcre2demo.c</b> is an 8-bit program that uses the PCRE2 8-bit
+library. It handles strings and characters that are stored in 8-bit code units.
+By default, one character corresponds to one code unit, but if the pattern
+starts with "(*UTF)", both it and the subject are treated as UTF-8 strings,
+where characters may occupy multiple code units.
+</P>
+<P>
 If PCRE2 is installed in the standard include and library directories for your
 operating system, you should be able to compile the demonstration program using
-this command:
+a command like this:
 <pre>
-  gcc -o pcre2demo pcre2demo.c -lpcre2-8
+  cc -o pcre2demo pcre2demo.c -lpcre2-8
 </pre>
 If PCRE2 is installed elsewhere, you may need to add additional options to the
 command line. For example, on a Unix-like system that has PCRE2 installed in
 <i>/usr/local</i>, you can compile the demonstration program using a command
 like this:
 <pre>
-  gcc -o pcre2demo -I/usr/local/include pcre2demo.c -L/usr/local/lib -lpcre2-8
-
-</PRE>
-</P>
-<P>
-Once you have compiled and linked the demonstration program, you can run simple
-tests like this:
+  cc -o pcre2demo -I/usr/local/include pcre2demo.c -L/usr/local/lib -lpcre2-8
+</pre>
+Once you have built the demonstration program, you can run simple tests like
+this:
 <pre>
  ./pcre2demo 'cat|dog' 'the cat sat on the mat'
  ./pcre2demo -g 'cat|dog' 'the dog sat on the cat'
 </pre>
 Note that there is a much more comprehensive test program, called
 <a href="pcre2test.html"><b>pcre2test</b>,</a>
-which supports many more facilities for testing regular expressions using the
-PCRE2 libraries. The
+which supports many more facilities for testing regular expressions using all
+three PCRE2 libraries (8-bit, 16-bit, and 32-bit, though not all three need be
+installed). The
 <a href="pcre2demo.html"><b>pcre2demo</b></a>
-program is provided as a simple coding example.
+program is provided as a relatively simple coding example.
 </P>
 <P>
 If you try to run
@ -73,7 +77,7 @@ If you try to run
 when PCRE2 is not installed in the standard library directory, you may get an
 error like this on some operating systems (e.g. Solaris):
 <pre>
-  ld.so.1: a.out: fatal: libpcre2.so.0: open failed: No such file or directory
+  ld.so.1: pcre2demo: fatal: libpcre2-8.so.0: open failed: No such file or directory
 </pre>
 This is caused by the way shared library support works on those systems. You
 need to add
@ -97,9 +101,9 @@ Cambridge, England.
 REVISION
 </b><br>
 <P>
-Last updated: 20 October 2014
+Last updated: 02 February 2016
 <br>
-Copyright &copy; 1997-2014 University of Cambridge.
+Copyright &copy; 1997-2016 University of Cambridge.
 <br>
 <p>
 Return to the <a href="index.html">PCRE2 index page</a>.
--- a/pcre2/doc/html/pcre2serialize.html
+++ b/pcre2/doc/html/pcre2serialize.html
@ -14,10 +14,11 @@ please consult the man page, in case the conversion went wrong.
 <br>
 <ul>
 <li><a name="TOC1" href="#SEC1">SAVING AND RE-USING PRECOMPILED PCRE2 PATTERNS</a>
-<li><a name="TOC2" href="#SEC2">SAVING COMPILED PATTERNS</a>
-<li><a name="TOC3" href="#SEC3">RE-USING PRECOMPILED PATTERNS</a>
-<li><a name="TOC4" href="#SEC4">AUTHOR</a>
-<li><a name="TOC5" href="#SEC5">REVISION</a>
+<li><a name="TOC2" href="#SEC2">SECURITY CONCERNS</a>
+<li><a name="TOC3" href="#SEC3">SAVING COMPILED PATTERNS</a>
+<li><a name="TOC4" href="#SEC4">RE-USING PRECOMPILED PATTERNS</a>
+<li><a name="TOC5" href="#SEC5">AUTHOR</a>
+<li><a name="TOC6" href="#SEC6">REVISION</a>
 </ul>
 <br><a name="SEC1" href="#TOC1">SAVING AND RE-USING PRECOMPILED PCRE2 PATTERNS</a><br>
 <P>
@ -41,14 +42,22 @@ If you are running an application that uses a large number of regular
 expression patterns, it may be useful to store them in a precompiled form
 instead of having to compile them every time the application is run. However,
 if you are using the just-in-time optimization feature, it is not possible to
-save and reload the JIT data, because it is position-dependent. In addition,
-the host on which the patterns are reloaded must be running the same version of
-PCRE2, with the same code unit width, and must also have the same endianness,
-pointer width and PCRE2_SIZE type. For example, patterns compiled on a 32-bit
-system using PCRE2's 16-bit library cannot be reloaded on a 64-bit system, nor
-can they be reloaded using the 8-bit library.
+save and reload the JIT data, because it is position-dependent. The host on
+which the patterns are reloaded must be running the same version of PCRE2, with
+the same code unit width, and must also have the same endianness, pointer width
+and PCRE2_SIZE type. For example, patterns compiled on a 32-bit system using
+PCRE2's 16-bit library cannot be reloaded on a 64-bit system, nor can they be
+reloaded using the 8-bit library.
 </P>
-<br><a name="SEC2" href="#TOC1">SAVING COMPILED PATTERNS</a><br>
+<br><a name="SEC2" href="#TOC1">SECURITY CONCERNS</a><br>
+<P>
+The facility for saving and restoring compiled patterns is intended for use
+within individual applications. As such, the data supplied to
+<b>pcre2_serialize_decode()</b> is expected to be trusted data, not data from
+arbitrary external sources. There is only some simple consistency checking, not
+complete validation of what is being re-loaded.
+</P>
+<br><a name="SEC3" href="#TOC1">SAVING COMPILED PATTERNS</a><br>
 <P>
 Before compiled patterns can be saved they must be serialized, that is,
 converted to a stream of bytes. A single byte stream may contain any number of
@ -110,7 +119,7 @@ still be used for matching. Their memory must eventually be freed in the usual
 way by calling <b>pcre2_code_free()</b>. When you have finished with the byte
 stream, it too must be freed by calling <b>pcre2_serialize_free()</b>.
 </P>
-<br><a name="SEC3" href="#TOC1">RE-USING PRECOMPILED PATTERNS</a><br>
+<br><a name="SEC4" href="#TOC1">RE-USING PRECOMPILED PATTERNS</a><br>
 <P>
 In order to re-use a set of saved patterns you must first make the serialized
 byte stream available in main memory (for example, by reading from a file). The
@ -142,21 +151,27 @@ is filled with those that fit, and the remainder are ignored. The yield of the
 function is the number of decoded patterns, or one of the following negative
 error codes:
 <pre>
-  PCRE2_ERROR_BADDATA   second argument is zero or less
-  PCRE2_ERROR_BADMAGIC  mismatch of id bytes in the data
-  PCRE2_ERROR_BADMODE   mismatch of variable unit size or PCRE2 version
-  PCRE2_ERROR_MEMORY    memory allocation failed
-  PCRE2_ERROR_NULL      first or third argument is NULL
+  PCRE2_ERROR_BADDATA    second argument is zero or less
+  PCRE2_ERROR_BADMAGIC   mismatch of id bytes in the data
+  PCRE2_ERROR_BADMODE    mismatch of code unit size or PCRE2 version
+  PCRE2_ERROR_BADSERIALIZEDDATA  other sanity check failure
+  PCRE2_ERROR_MEMORY     memory allocation failed
+  PCRE2_ERROR_NULL       first or third argument is NULL
 </pre>
 PCRE2_ERROR_BADMAGIC may mean that the data is corrupt, or that it was compiled
 on a system with different endianness.
 </P>
 <P>
 Decoded patterns can be used for matching in the usual way, and must be freed
-by calling <b>pcre2_code_free()</b> as normal. A single copy of the character
-tables is used by all the decoded patterns. A reference count is used to
+by calling <b>pcre2_code_free()</b>. However, be aware that there is a potential
+race issue if you are using multiple patterns that were decoded from a single
+byte stream in a multithreaded application. A single copy of the character
+tables is used by all the decoded patterns and a reference count is used to
 arrange for its memory to be automatically freed when the last pattern is
-freed.
+freed, but there is no locking on this reference count. Therefore, if you want
+to call <b>pcre2_code_free()</b> for these patterns in different threads, you
+must arrange your own locking, and ensure that <b>pcre2_code_free()</b> cannot
+be called by two threads at the same time.
 </P>
 <P>
 If a pattern was processed by <b>pcre2_jit_compile()</b> before being
@ -164,7 +179,7 @@ serialized, the JIT data is discarded and so is no longer available after a
 save/restore cycle. You can, however, process a restored pattern with
 <b>pcre2_jit_compile()</b> if you wish.
 </P>
-<br><a name="SEC4" href="#TOC1">AUTHOR</a><br>
+<br><a name="SEC5" href="#TOC1">AUTHOR</a><br>
 <P>
 Philip Hazel
 <br>
@ -173,11 +188,11 @@ University Computing Service
 Cambridge, England.
 <br>
 </P>
-<br><a name="SEC5" href="#TOC1">REVISION</a><br>
+<br><a name="SEC6" href="#TOC1">REVISION</a><br>
 <P>
-Last updated: 20 January 2015
+Last updated: 24 May 2016
 <br>
-Copyright &copy; 1997-2015 University of Cambridge.
+Copyright &copy; 1997-2016 University of Cambridge.
 <br>
 <p>
 Return to the <a href="index.html">PCRE2 index page</a>.
--- a/pcre2/doc/html/pcre2stack.html
+++ b/pcre2/doc/html/pcre2stack.html
@ -57,12 +57,13 @@ assertion and "once-only" subpatterns, which are handled like subroutine calls.
 Normally, these are never very deep, and the limit on the complexity of
 <b>pcre2_dfa_match()</b> is controlled by the amount of workspace it is given.
 However, it is possible to write patterns with runaway infinite recursions;
-such patterns will cause <b>pcre2_dfa_match()</b> to run out of stack. At
-present, there is no protection against this.
+such patterns will cause <b>pcre2_dfa_match()</b> to run out of stack unless a
+limit is applied (see below).
 </P>
 <P>
-The comments that follow do NOT apply to <b>pcre2_dfa_match()</b>; they are
-relevant only for <b>pcre2_match()</b> without the JIT optimization.
+The comments in the next three sections do not apply to
+<b>pcre2_dfa_match()</b>; they are relevant only for <b>pcre2_match()</b> without
+the JIT optimization.
 </P>
 <br><b>
 Reducing <b>pcre2_match()</b>'s stack usage
@ -115,7 +116,7 @@ entitled
 in the
 <a href="pcre2api.html"><b>pcre2api</b></a>
 documentation. Since the block sizes are always the same, it may be possible to
-implement customized a memory handler that is more efficient than the standard
+implement a customized memory handler that is more efficient than the standard
 function. The memory blocks obtained for this purpose are retained and re-used
 if possible while <b>pcre2_match()</b> is running. They are all freed just
 before it exits.
@ -151,6 +152,15 @@ pattern to match. This is done by calling <b>pcre2_match()</b> repeatedly with
 different limits.
 </P>
 <br><b>
+Limiting <b>pcre2_dfa_match()</b>'s stack usage
+</b><br>
+<P>
+The recursion limit, as described above for <b>pcre2_match()</b>, also applies
+to <b>pcre2_dfa_match()</b>, whose use of recursive function calls for
+recursions in the pattern can lead to runaway stack usage. The non-recursive
+match limit is not relevant for DFA matching, and is ignored.
+</P>
+<br><b>
 Changing stack size in Unix-like systems
 </b><br>
 <P>
@ -198,9 +208,9 @@ Cambridge, England.
 REVISION
 </b><br>
 <P>
-Last updated: 21 November 2014
+Last updated: 23 December 2016
 <br>
-Copyright &copy; 1997-2014 University of Cambridge.
+Copyright &copy; 1997-2016 University of Cambridge.
 <br>
 <p>
 Return to the <a href="index.html">PCRE2 index page</a>.
--- a/pcre2/doc/html/pcre2syntax.html
+++ b/pcre2/doc/html/pcre2syntax.html
@ -111,9 +111,10 @@ it matches a literal "u".
  \W         a "non-word" character
  \X         a Unicode extended grapheme cluster
 </pre>
-The application can lock out the use of \C by setting the
-PCRE2_NEVER_BACKSLASH_C option. It is dangerous because it may leave the
-current matching point in the middle of a UTF-8 or UTF-16 character.
+\C is dangerous because it may leave the current matching point in the middle
+of a UTF-8 or UTF-16 character. The application can lock out the use of \C by
+setting the PCRE2_NEVER_BACKSLASH_C option. It is also possible to build PCRE2
+with the use of \C permanently disabled.
 </P>
 <P>
 By default, \d, \s, and \w match only ASCII characters, even in UTF-8 mode
@ -187,6 +188,8 @@ at release 5.18.
 </P>
 <br><a name="SEC7" href="#TOC1">SCRIPT NAMES FOR \p AND \P</a><br>
 <P>
+Ahom,
+Anatolian_Hieroglyphs,
 Arabic,
 Armenian,
 Avestan,
@ -227,6 +230,7 @@ Gurmukhi,
 Han,
 Hangul,
 Hanunoo,
+Hatran,
 Hebrew,
 Hiragana,
 Imperial_Aramaic,
@ -263,12 +267,14 @@ Miao,
 Modi,
 Mongolian,
 Mro,
+Multani,
 Myanmar,
 Nabataean,
 New_Tai_Lue,
 Nko,
 Ogham,
 Ol_Chiki,
+Old_Hungarian,
 Old_Italic,
 Old_North_Arabian,
 Old_Permic,
@ -290,6 +296,7 @@ Saurashtra,
 Sharada,
 Shavian,
 Siddham,
+SignWriting,
 Sinhala,
 Sora_Sompeng,
 Sundanese,
@ -444,9 +451,10 @@ appear.
  (*UCP)          set PCRE2_UCP (use Unicode properties for \d etc)
 </pre>
 Note that LIMIT_MATCH and LIMIT_RECURSION can only reduce the value of the
-limits set by the caller of pcre2_match(), not increase them. The application
-can lock out the use of (*UTF) and (*UCP) by setting the PCRE2_NEVER_UTF or
-PCRE2_NEVER_UCP options, respectively, at compile time.
+limits set by the caller of <b>pcre2_match()</b> or <b>pcre2_dfa_match()</b>, not
+increase them. The application can lock out the use of (*UTF) and (*UCP) by
+setting the PCRE2_NEVER_UTF or PCRE2_NEVER_UCP options, respectively, at
+compile time.
 </P>
 <br><a name="SEC17" href="#TOC1">NEWLINE CONVENTION</a><br>
 <P>
@ -485,6 +493,9 @@ Each top-level branch of a look behind must be of a fixed length.
  \n              reference by number (can be ambiguous)
  \gn             reference by number
  \g{n}           reference by number
+  \g+n            relative reference by number (PCRE2 extension)
+  \g-n            relative reference by number
+  \g{+n}          relative reference by number (PCRE2 extension)
  \g{-n}          relative reference by number
  \k&#60;name&#62;        reference by name (Perl)
  \k'name'        reference by name (Perl)
@ -523,14 +534,17 @@ Each top-level branch of a look behind must be of a fixed length.
  (?(-n)              relative reference condition
  (?(&#60;name&#62;)          named reference condition (Perl)
  (?('name')          named reference condition (Perl)
-  (?(name)            named reference condition (PCRE2)
+  (?(name)            named reference condition (PCRE2, deprecated)
  (?(R)               overall recursion condition
-  (?(Rn)              specific group recursion condition
-  (?(R&name)          specific recursion condition
+  (?(Rn)              specific numbered group recursion condition
+  (?(R&name)          specific named group recursion condition
  (?(DEFINE)          define subpattern for reference
  (?(VERSION[&#62;]=n.m)  test PCRE2 version
  (?(assert)          assertion condition
-</PRE>
+</pre>
+Note the ambiguity of (?(R) and (?(Rn) which might be named reference
+conditions or recursion tests. Such a condition is interpreted as a reference
+condition if the relevant named group exists.
 </P>
 <br><a name="SEC23" href="#TOC1">BACKTRACKING CONTROL</a><br>
 <P>
@ -582,9 +596,9 @@ Cambridge, England.
 </P>
 <br><a name="SEC27" href="#TOC1">REVISION</a><br>
 <P>
-Last updated: 13 June 2015
+Last updated: 23 December 2016
 <br>
-Copyright &copy; 1997-2015 University of Cambridge.
+Copyright &copy; 1997-2016 University of Cambridge.
 <br>
 <p>
 Return to the <a href="index.html">PCRE2 index page</a>.
--- a/pcre2/doc/html/pcre2test.html
+++ b/pcre2/doc/html/pcre2test.html
@ -61,7 +61,7 @@ subject is processed, and what output is produced.
 <P>
 As the original fairly simple PCRE library evolved, it acquired many different
 features, and as a result, the original <b>pcretest</b> program ended up with a
-lot of options in a messy, arcane syntax, for testing all the features. The
+lot of options in a messy, arcane syntax for testing all the features. The
 move to the new PCRE2 API provided an opportunity to re-implement the test
 program as <b>pcre2test</b>, with a cleaner modifier syntax. Nevertheless, there
 are still many obscure modifiers, some of which are specifically designed for
@ -77,31 +77,61 @@ strings that are encoded in 8-bit, 16-bit, or 32-bit code units. One, two, or
 all three of these libraries may be simultaneously installed. The
 <b>pcre2test</b> program can be used to test all the libraries. However, its own
 input and output are always in 8-bit format. When testing the 16-bit or 32-bit
-libraries, patterns and subject strings are converted to 16- or 32-bit format
-before being passed to the library functions. Results are converted back to
-8-bit code units for output.
+libraries, patterns and subject strings are converted to 16-bit or 32-bit
+format before being passed to the library functions. Results are converted back
+to 8-bit code units for output.
 </P>
 <P>
 In the rest of this document, the names of library functions and structures
 are given in generic form, for example, <b>pcre_compile()</b>. The actual
 names used in the libraries have a suffix _8, _16, or _32, as appropriate.
-</P>
+<a name="inputencoding"></a></P>
 <br><a name="SEC3" href="#TOC1">INPUT ENCODING</a><br>
 <P>
 Input to <b>pcre2test</b> is processed line by line, either by calling the C
-library's <b>fgets()</b> function, or via the <b>libreadline</b> library (see
-below). The input is processed using using C's string functions, so must not
-contain binary zeroes, even though in Unix-like environments, <b>fgets()</b>
-treats any bytes other than newline as data characters. In some Windows
-environments character 26 (hex 1A) causes an immediate end of file, and no
-further data is read.
+library's <b>fgets()</b> function, or via the <b>libreadline</b> library. In some
+Windows environments character 26 (hex 1A) causes an immediate end of file, and
+no further data is read, so this character should be avoided unless you really
+want that action.
 </P>
 <P>
-For maximum portability, therefore, it is safest to avoid non-printing
-characters in <b>pcre2test</b> input files. There is a facility for specifying a
-pattern's characters as hexadecimal pairs, thus making it possible to include
-binary zeroes in a pattern for testing purposes. Subject lines are processed
-for backslash escapes, which makes it possible to include any data value.
+The input is processed using using C's string functions, so must not
+contain binary zeroes, even though in Unix-like environments, <b>fgets()</b>
+treats any bytes other than newline as data characters. An error is generated
+if a binary zero is encountered. Subject lines are processed for backslash
+escapes, which makes it possible to include any data value in strings that are
+passed to the library for matching. For patterns, there is a facility for
+specifying some or all of the 8-bit input characters as hexadecimal pairs,
+which makes it possible to include binary zeros.
+</P>
+<br><b>
+Input for the 16-bit and 32-bit libraries
+</b><br>
+<P>
+When testing the 16-bit or 32-bit libraries, there is a need to be able to
+generate character code points greater than 255 in the strings that are passed
+to the library. For subject lines, backslash escapes can be used. In addition,
+when the <b>utf</b> modifier (see
+<a href="#optionmodifiers">"Setting compilation options"</a>
+below) is set, the pattern and any following subject lines are interpreted as
+UTF-8 strings and translated to UTF-16 or UTF-32 as appropriate.
+</P>
+<P>
+For non-UTF testing of wide characters, the <b>utf8_input</b> modifier can be
+used. This is mutually exclusive with <b>utf</b>, and is allowed only in 16-bit
+or 32-bit mode. It causes the pattern and following subject lines to be treated
+as UTF-8 according to the original definition (RFC 2279), which allows for
+character values up to 0x7fffffff. Each character is placed in one 16-bit or
+32-bit code unit (in the 16-bit case, values greater than 0xffff cause an error
+to occur).
+</P>
+<P>
+UTF-8 is not capable of encoding values greater than 0x7fffffff, but such
+values can be handled by the 32-bit library. When testing this library in
+non-UTF mode with <b>utf8_input</b> set, if any character is preceded by the
+byte 0xff (which is an illegal byte in UTF-8) 0x80000000 is added to the
+character's value. This is the only way of passing such code points in a
+pattern string. For subject strings, using an escape sequence is preferable.
 </P>
 <br><a name="SEC4" href="#TOC1">COMMAND LINE OPTIONS</a><br>
 <P>
@ -123,8 +153,13 @@ the 32-bit library has been built, this is the default. If the 32-bit library
 has not been built, this option causes an error.
 </P>
 <P>
+<b>-ac</b>
+Behave as if each pattern has the <b>auto_callout</b> modifier, that is, insert
+automatic callouts into every pattern that is compiled.
+</P>
+<P>
 <b>-b</b>
-Behave as if each pattern has the <b>/fullbincode</b> modifier; the full
+Behave as if each pattern has the <b>fullbincode</b> modifier; the full
 internal binary form of the pattern is output after compilation.
 </P>
 <P>
@ -155,12 +190,13 @@ following options output the value and set the exit code as indicated:
 The following options output 1 for true or 0 for false, and set the exit code
 to the same value:
 <pre>
-  ebcdic     compiled for an EBCDIC environment
-  jit        just-in-time support is available
-  pcre2-16   the 16-bit library was built
-  pcre2-32   the 32-bit library was built
-  pcre2-8    the 8-bit library was built
-  unicode    Unicode support is available
+  backslash-C  \C is supported (not locked out)
+  ebcdic       compiled for an EBCDIC environment
+  jit          just-in-time support is available
+  pcre2-16     the 16-bit library was built
+  pcre2-32     the 32-bit library was built
+  pcre2-8      the 8-bit library was built
+  unicode      Unicode support is available
 </pre>
 If an unknown option is given, an error message is output; the exit code is 0.
 </P>
@ -177,12 +213,19 @@ using the <b>pcre2_dfa_match()</b> function instead of the default
 <b>pcre2_match()</b>.
 </P>
 <P>
+<b>-error</b> <i>number[,number,...]</i>
+Call <b>pcre2_get_error_message()</b> for each of the error numbers in the
+comma-separated list, display the resulting messages on the standard output,
+then exit with zero exit code. The numbers may be positive or negative. This is
+a convenience facility for PCRE2 maintainers.
+</P>
+<P>
 <b>-help</b>
 Output a brief summary these options and then exit.
 </P>
 <P>
 <b>-i</b>
-Behave as if each pattern has the <b>/info</b> modifier; information about the
+Behave as if each pattern has the <b>info</b> modifier; information about the
 compiled pattern is given after compilation.
 </P>
 <P>
@ -265,9 +308,9 @@ Each subject line is matched separately and independently. If you want to do
 multi-line matches, you have to use the \n escape sequence (or \r or \r\n,
 etc., depending on the newline setting) in a single line of input to encode the
 newline sequences. There is no limit on the length of subject lines; the input
-buffer is automatically extended if it is too small. There is a replication
-feature that makes it possible to generate long subject lines without having to
-supply them explicitly.
+buffer is automatically extended if it is too small. There are replication
+features that makes it possible to generate long repetitive pattern or subject
+lines without having to supply them explicitly.
 </P>
 <P>
 An empty line or the end of the file signals the end of the subject lines for a
@ -304,6 +347,36 @@ output.
 This command is used to load a set of precompiled patterns from a file, as
 described in the section entitled "Saving and restoring compiled patterns"
 <a href="#saverestore">below.</a>
+<pre>
+  #newline_default [&#60;newline-list&#62;]
+</pre>
+When PCRE2 is built, a default newline convention can be specified. This
+determines which characters and/or character pairs are recognized as indicating
+a newline in a pattern or subject string. The default can be overridden when a
+pattern is compiled. The standard test files contain tests of various newline
+conventions, but the majority of the tests expect a single linefeed to be
+recognized as a newline by default. Without special action the tests would fail
+when PCRE2 is compiled with either CR or CRLF as the default newline.
+</P>
+<P>
+The #newline_default command specifies a list of newline types that are
+acceptable as the default. The types must be one of CR, LF, CRLF, ANYCRLF, or
+ANY (in upper or lower case), for example:
+<pre>
+  #newline_default LF Any anyCRLF
+</pre>
+If the default newline is in the list, this command has no effect. Otherwise,
+except when testing the POSIX API, a <b>newline</b> modifier that specifies the
+first newline convention in the list (LF in the above example) is added to any
+pattern that does not already have a <b>newline</b> modifier. If the newline
+list is empty, the feature is turned off. This command is present in a number
+of the standard test input files.
+</P>
+<P>
+When the POSIX API is being tested there is no way to override the default
+newline convention, though it is possible to set the newline convention from
+within the pattern. A warning is given if the <b>posix</b> modifier is used when
+<b>#newline_default</b> would set a default for the non-POSIX API.
 <pre>
  #pattern &#60;modifier-list&#62;
 </pre>
@ -321,9 +394,10 @@ test files that are also processed by <b>perltest.sh</b>. The <b>#perltest</b>
 command helps detect tests that are accidentally put in the wrong file.
 <pre>
  #pop [&#60;modifiers&#62;]
+  #popcopy [&#60;modifiers&#62;]
 </pre>
-This command is used to manipulate the stack of compiled patterns, as described
-in the section entitled "Saving and restoring compiled patterns"
+These commands are used to manipulate the stack of compiled patterns, as
+described in the section entitled "Saving and restoring compiled patterns"
 <a href="#saverestore">below.</a>
 <pre>
  #save &#60;filename&#62;
@ -340,12 +414,13 @@ subject lines. Modifiers on a subject line can change these settings.
 <br><a name="SEC7" href="#TOC1">MODIFIER SYNTAX</a><br>
 <P>
 Modifier lists are used with both pattern and subject lines. Items in a list
-are separated by commas and optional white space. Some modifiers may be given
-for both patterns and subject lines, whereas others are valid for one or the
-other only. Each modifier has a long name, for example "anchored", and some of
-them must be followed by an equals sign and a value, for example, "offset=12".
-Modifiers that do not take values may be preceded by a minus sign to turn off a
-previous setting.
+are separated by commas followed by optional white space. Trailing whitespace
+in a modifier list is ignored. Some modifiers may be given for both patterns
+and subject lines, whereas others are valid only for one or the other. Each
+modifier has a long name, for example "anchored", and some of them must be
+followed by an equals sign and a value, for example, "offset=12". Values cannot
+contain comma characters, but may contain spaces. Modifiers that do not take
+values may be preceded by a minus sign to turn off a previous setting.
 </P>
 <P>
 A few of the more common modifiers can also be specified as single letters, for
@ -454,6 +529,12 @@ the start of a modifier list. For example:
 <pre>
  abc\=notbol,notempty
 </pre>
+If the subject string is empty and \= is followed by whitespace, the line is
+treated as a comment line, and is not used for matching. For example:
+<pre>
+  \= This is a comment.
+  abc\= This is an invalid modifier list.
+</pre>
 A backslash followed by any other non-alphanumeric character just escapes that
 character. A backslash followed by anything else causes an error. However, if
 the very last character in the line is a backslash (and there is no modifier
@ -462,10 +543,10 @@ a real empty line terminates the data input.
 </P>
 <br><a name="SEC10" href="#TOC1">PATTERN MODIFIERS</a><br>
 <P>
-There are three types of modifier that can appear in pattern lines, two of
-which may also be used in a <b>#pattern</b> command. A pattern's modifier list
-can add to or override default modifiers that were set by a previous
-<b>#pattern</b> command.
+There are several types of modifier that can appear in pattern lines. Except
+where noted below, they may also be used in <b>#pattern</b> commands. A
+pattern's modifier list can add to or override default modifiers that were set
+by a previous <b>#pattern</b> command.
 <a name="optionmodifiers"></a></P>
 <br><b>
 Setting compilation options
@ -473,12 +554,13 @@ Setting compilation options
 <P>
 The following modifiers set options for <b>pcre2_compile()</b>. The most common
 ones have single-letter abbreviations. See
-<a href="pcreapi.html"><b>pcreapi</b></a>
+<a href="pcre2api.html"><b>pcre2api</b></a>
 for a description of their effects.
 <pre>
      allow_empty_class         set PCRE2_ALLOW_EMPTY_CLASS
      alt_bsux                  set PCRE2_ALT_BSUX
      alt_circumflex            set PCRE2_ALT_CIRCUMFLEX
+      alt_verbnames             set PCRE2_ALT_VERBNAMES
      anchored                  set PCRE2_ANCHORED
      auto_callout              set PCRE2_AUTO_CALLOUT
  /i  caseless                  set PCRE2_CASELESS
@ -499,12 +581,15 @@ for a description of their effects.
      no_utf_check              set PCRE2_NO_UTF_CHECK
      ucp                       set PCRE2_UCP
      ungreedy                  set PCRE2_UNGREEDY
+      use_offset_limit          set PCRE2_USE_OFFSET_LIMIT
      utf                       set PCRE2_UTF
 </pre>
 As well as turning on the PCRE2_UTF option, the <b>utf</b> modifier causes all
 non-printing characters in output strings to be printed using the \x{hh...}
 notation. Otherwise, those less than 0x100 are output in hex without the curly
-brackets.
+brackets. Setting <b>utf</b> in 16-bit or 32-bit mode also causes pattern and
+subject strings to be translated to UTF-16 or UTF-32, respectively, before
+being passed to library functions.
 <a name="controlmodifiers"></a></P>
 <br><b>
 Setting compilation controls
@ -519,18 +604,24 @@ about the pattern:
      debug                     same as info,fullbincode
      fullbincode               show binary code with lengths
  /I  info                      show info about compiled pattern
-      hex                       pattern is coded in hexadecimal
+      hex                       unquoted characters are hexadecimal
      jit[=&#60;number&#62;]            use JIT
      jitfast                   use JIT fast path
      jitverify                 verify JIT use
      locale=&#60;name&#62;             use this locale
+      max_pattern_length=&#60;n&#62;    set the maximum pattern length
      memory                    show memory used
      newline=&#60;type&#62;            set newline type
+      null_context              compile with a NULL context
      parens_nest_limit=&#60;n&#62;     set maximum parentheses depth
      posix                     use the POSIX API
+      posix_nosub               use the POSIX API with REG_NOSUB
      push                      push compiled pattern onto the stack
+      pushcopy                  push a copy onto the stack
      stackguard=&#60;number&#62;       test the stackguard feature
      tables=[0|1|2]            select internal tables
+      use_length                do not zero-terminate the pattern
+      utf8_input                treat input as UTF-8
 </pre>
 The effects of these modifiers are described in the following sections.
 </P>
@ -604,40 +695,145 @@ is requested. For each callout, either its number or string is given, followed
 by the item that follows it in the pattern.
 </P>
 <br><b>
-Specifying a pattern in hex
+Passing a NULL context
 </b><br>
 <P>
-The <b>hex</b> modifier specifies that the characters of the pattern are to be
-interpreted as pairs of hexadecimal digits. White space is permitted between
-pairs. For example:
+Normally, <b>pcre2test</b> passes a context block to <b>pcre2_compile()</b>. If
+the <b>null_context</b> modifier is set, however, NULL is passed. This is for
+testing that <b>pcre2_compile()</b> behaves correctly in this case (it uses
+default values).
+</P>
+<br><b>
+Specifying the pattern's length
+</b><br>
+<P>
+By default, patterns are passed to the compiling functions as zero-terminated
+strings. When using the POSIX wrapper API, there is no other option. However,
+when using PCRE2's native API, patterns can be passed by length instead of
+being zero-terminated. The <b>use_length</b> modifier causes this to happen.
+Using a length happens automatically (whether or not <b>use_length</b> is set)
+when <b>hex</b> is set, because patterns specified in hexadecimal may contain
+binary zeros.
+</P>
+<br><b>
+Specifying pattern characters in hexadecimal
+</b><br>
+<P>
+The <b>hex</b> modifier specifies that the characters of the pattern, except for
+substrings enclosed in single or double quotes, are to be interpreted as pairs
+of hexadecimal digits. This feature is provided as a way of creating patterns
+that contain binary zeros and other non-printing characters. White space is
+permitted between pairs of digits. For example, this pattern contains three
+characters:
 <pre>
  /ab 32 59/hex
 </pre>
-This feature is provided as a way of creating patterns that contain binary zero
-and other non-printing characters. By default, <b>pcre2test</b> passes patterns
-as zero-terminated strings to <b>pcre2_compile()</b>, giving the length as
-PCRE2_ZERO_TERMINATED. However, for patterns specified in hexadecimal, the
-actual length of the pattern is passed.
+Parts of such a pattern are taken literally if quoted. This pattern contains
+nine characters, only two of which are specified in hexadecimal:
+<pre>
+  /ab "literal" 32/hex
+</pre>
+Either single or double quotes may be used. There is no way of including
+the delimiter within a substring. The <b>hex</b> and <b>expand</b> modifiers are
+mutually exclusive.
+</P>
+<P>
+The POSIX API cannot be used with patterns specified in hexadecimal because
+they may contain binary zeros, which conflicts with <b>regcomp()</b>'s
+requirement for a zero-terminated string. Such patterns are always passed to
+<b>pcre2_compile()</b> as a string with a length, not as zero-terminated.
+</P>
+<br><b>
+Specifying wide characters in 16-bit and 32-bit modes
+</b><br>
+<P>
+In 16-bit and 32-bit modes, all input is automatically treated as UTF-8 and
+translated to UTF-16 or UTF-32 when the <b>utf</b> modifier is set. For testing
+the 16-bit and 32-bit libraries in non-UTF mode, the <b>utf8_input</b> modifier
+can be used. It is mutually exclusive with <b>utf</b>. Input lines are
+interpreted as UTF-8 as a means of specifying wide characters. More details are
+given in
+<a href="#inputencoding">"Input encoding"</a>
+above.
+</P>
+<br><b>
+Generating long repetitive patterns
+</b><br>
+<P>
+Some tests use long patterns that are very repetitive. Instead of creating a
+very long input line for such a pattern, you can use a special repetition
+feature, similar to the one described for subject lines above. If the
+<b>expand</b> modifier is present on a pattern, parts of the pattern that have
+the form
+<pre>
+  \[&#60;characters&#62;]{&#60;count&#62;}
+</pre>
+are expanded before the pattern is passed to <b>pcre2_compile()</b>. For
+example, \[AB]{6000} is expanded to "ABAB..." 6000 times. This construction
+cannot be nested. An initial "\[" sequence is recognized only if "]{" followed
+by decimal digits and "}" is found later in the pattern. If not, the characters
+remain in the pattern unaltered. The <b>expand</b> and <b>hex</b> modifiers are
+mutually exclusive.
+</P>
+<P>
+If part of an expanded pattern looks like an expansion, but is really part of
+the actual pattern, unwanted expansion can be avoided by giving two values in
+the quantifier. For example, \[AB]{6000,6000} is not recognized as an
+expansion item.
+</P>
+<P>
+If the <b>info</b> modifier is set on an expanded pattern, the result of the
+expansion is included in the information that is output.
 </P>
 <br><b>
 JIT compilation
 </b><br>
 <P>
-The <b>/jit</b> modifier may optionally be followed by an equals sign and a
-number in the range 0 to 7:
+Just-in-time (JIT) compiling is a heavyweight optimization that can greatly
+speed up pattern matching. See the
+<a href="pcre2jit.html"><b>pcre2jit</b></a>
+documentation for details. JIT compiling happens, optionally, after a pattern
+has been successfully compiled into an internal form. The JIT compiler converts
+this to optimized machine code. It needs to know whether the match-time options
+PCRE2_PARTIAL_HARD and PCRE2_PARTIAL_SOFT are going to be used, because
+different code is generated for the different cases. See the <b>partial</b>
+modifier in "Subject Modifiers"
+<a href="#subjectmodifiers">below</a>
+for details of how these options are specified for each match attempt.
+</P>
+<P>
+JIT compilation is requested by the <b>/jit</b> pattern modifier, which may
+optionally be followed by an equals sign and a number in the range 0 to 7.
+The three bits that make up the number specify which of the three JIT operating
+modes are to be compiled:
+<pre>
+  1  compile JIT code for non-partial matching
+  2  compile JIT code for soft partial matching
+  4  compile JIT code for hard partial matching
+</pre>
+The possible values for the <b>jit</b> modifier are therefore:
 <pre>
  0  disable JIT
-  1  use JIT for normal match only
-  2  use JIT for soft partial match only
-  3  use JIT for normal match and soft partial match
-  4  use JIT for hard partial match only
-  6  use JIT for soft and hard partial match
+  1  normal matching only
+  2  soft partial matching only
+  3  normal and soft partial matching
+  4  hard partial matching only
+  6  soft and hard partial matching only
  7  all three modes
 </pre>
-If no number is given, 7 is assumed. If JIT compilation is successful, the
-compiled JIT code will automatically be used when <b>pcre2_match()</b> is run
-for the appropriate type of match, except when incompatible run-time options
-are specified. For more details, see the
+If no number is given, 7 is assumed. The phrase "partial matching" means a call
+to <b>pcre2_match()</b> with either the PCRE2_PARTIAL_SOFT or the
+PCRE2_PARTIAL_HARD option set. Note that such a call may return a complete
+match; the options enable the possibility of a partial match, but do not
+require it. Note also that if you request JIT compilation only for partial
+matching (for example, /jit=2) but do not set the <b>partial</b> modifier on a
+subject line, that match will not use JIT code because none was compiled for
+non-partial matching.
+</P>
+<P>
+If JIT compilation is successful, the compiled JIT code will automatically be
+used when an appropriate type of match is run, except when incompatible
+run-time options are specified. For more details, see the
 <a href="pcre2jit.html"><b>pcre2jit</b></a>
 documentation. See also the <b>jitstack</b> modifier below for a way of
 setting the size of the JIT stack.
@ -661,14 +857,14 @@ code was actually used in the match.
 Setting a locale
 </b><br>
 <P>
-The <b>/locale</b> modifier must specify the name of a locale, for example:
+The <b>locale</b> modifier must specify the name of a locale, for example:
 <pre>
  /pattern/locale=fr_FR
 </pre>
 The given locale is set, <b>pcre2_maketables()</b> is called to build a set of
 character tables for the locale, and this is then passed to
 <b>pcre2_compile()</b> when compiling the regular expression. The same tables
-are used when matching the following subject lines. The <b>/locale</b> modifier
+are used when matching the following subject lines. The <b>locale</b> modifier
 applies only to the pattern on which it appears, but can be given in a
 <b>#pattern</b> command if a default is needed. Setting a locale and alternate
 character tables are mutually exclusive.
@ -677,7 +873,7 @@ character tables are mutually exclusive.
 Showing pattern memory
 </b><br>
 <P>
-The <b>/memory</b> modifier causes the size in bytes of the memory used to hold
+The <b>memory</b> modifier causes the size in bytes of the memory used to hold
 the compiled pattern to be output. This does not include the size of the
 <b>pcre2_code</b> block; it is just the actual compiled data. If the pattern is
 subsequently passed to the JIT compiler, the size of the JIT compiled code is
@ -700,30 +896,53 @@ sets its own default of 220, which is required for running the standard test
 suite.
 </P>
 <br><b>
+Limiting the pattern length
+</b><br>
+<P>
+The <b>max_pattern_length</b> modifier sets a limit, in code units, to the
+length of pattern that <b>pcre2_compile()</b> will accept. Breaching the limit
+causes a compilation error. The default is the largest number a PCRE2_SIZE
+variable can hold (essentially unlimited).
+</P>
+<br><b>
 Using the POSIX wrapper API
 </b><br>
 <P>
-The <b>/posix</b> modifier causes <b>pcre2test</b> to call PCRE2 via the POSIX
-wrapper API rather than its native API. This supports only the 8-bit library.
-When the POSIX API is being used, the following pattern modifiers set options
-for the <b>regcomp()</b> function:
+The <b>/posix</b> and <b>posix_nosub</b> modifiers cause <b>pcre2test</b> to call
+PCRE2 via the POSIX wrapper API rather than its native API. When
+<b>posix_nosub</b> is used, the POSIX option REG_NOSUB is passed to
+<b>regcomp()</b>. The POSIX wrapper supports only the 8-bit library. Note that
+it does not imply POSIX matching semantics; for more detail see the
+<a href="pcre2posix.html"><b>pcre2posix</b></a>
+documentation. The following pattern modifiers set options for the
+<b>regcomp()</b> function:
 <pre>
  caseless           REG_ICASE
  multiline          REG_NEWLINE
-  no_auto_capture    REG_NOSUB
  dotall             REG_DOTALL     )
  ungreedy           REG_UNGREEDY   ) These options are not part of
  ucp                REG_UCP        )   the POSIX standard
  utf                REG_UTF8       )
 </pre>
+The <b>regerror_buffsize</b> modifier specifies a size for the error buffer that
+is passed to <b>regerror()</b> in the event of a compilation error. For example:
+<pre>
+  /abc/posix,regerror_buffsize=20
+</pre>
+This provides a means of testing the behaviour of <b>regerror()</b> when the
+buffer is too small for the error message. If this modifier has not been set, a
+large buffer is used.
+</P>
+<P>
 The <b>aftertext</b> and <b>allaftertext</b> subject modifiers work as described
-below. All other modifiers cause an error.
+below. All other modifiers are either ignored, with a warning message, or cause
+an error.
 </P>
 <br><b>
 Testing the stack guard feature
 </b><br>
 <P>
-The <b>/stackguard</b> modifier is used to test the use of
+The <b>stackguard</b> modifier is used to test the use of
 <b>pcre2_set_compile_recursion_guard()</b>, a function that is provided to
 enable stack availability to be checked during compilation (see the
 <a href="pcre2api.html"><b>pcre2api</b></a>
@ -738,7 +957,7 @@ be aborted.
 Using alternative character tables
 </b><br>
 <P>
-The value specified for the <b>/tables</b> modifier must be one of the digits 0,
+The value specified for the <b>tables</b> modifier must be one of the digits 0,
 1, or 2. It causes a specific set of built-in character tables to be passed to
 <b>pcre2_compile()</b>. This is used in the PCRE2 tests to check behaviour with
 different character tables. The digit specifies the tables as follows:
@ -758,17 +977,22 @@ Setting certain match controls
 <P>
 The following modifiers are really subject modifiers, and are described below.
 However, they may be included in a pattern's modifier list, in which case they
-are applied to every subject line that is processed with that pattern. They do
-not affect the compilation process.
+are applied to every subject line that is processed with that pattern. They may
+not appear in <b>#pattern</b> commands. These modifiers do not affect the
+compilation process.
 <pre>
-      aftertext           show text after match
-      allaftertext        show text after captures
-      allcaptures         show all captures
-      allusedtext         show all consulted text
-  /g  global              global matching
-      mark                show mark values
-      replace=&#60;string&#62;    specify a replacement string
-      startchar           show starting character when relevant
+      aftertext                  show text after match
+      allaftertext               show text after captures
+      allcaptures                show all captures
+      allusedtext                show all consulted text
+  /g  global                     global matching
+      mark                       show mark values
+      replace=&#60;string&#62;           specify a replacement string
+      startchar                  show starting character when relevant
+      substitute_extended        use PCRE2_SUBSTITUTE_EXTENDED
+      substitute_overflow_length use PCRE2_SUBSTITUTE_OVERFLOW_LENGTH
+      substitute_unknown_unset   use PCRE2_SUBSTITUTE_UNKNOWN_UNSET
+      substitute_unset_empty     use PCRE2_SUBSTITUTE_UNSET_EMPTY
 </pre>
 These modifiers may not appear in a <b>#pattern</b> command. If you want them as
 defaults, set them in a <b>#subject</b> command.
@ -782,13 +1006,17 @@ pushed onto a stack of compiled patterns, and <b>pcre2test</b> expects the next
 line to contain a new pattern (or a command) instead of a subject line. This
 facility is used when saving compiled patterns to a file, as described in the
 section entitled "Saving and restoring compiled patterns"
-<a href="#saverestore">below.</a>
-The <b>push</b> modifier is incompatible with compilation modifiers such as
-<b>global</b> that act at match time. Any that are specified are ignored, with a
-warning message, except for <b>replace</b>, which causes an error. Note that,
-<b>jitverify</b>, which is allowed, does not carry through to any subsequent
-matching that uses this pattern.
-</P>
+<a href="#saverestore">below. If <b>pushcopy</b> is used instead of <b>push</b>, a copy of the compiled</a>
+pattern is stacked, leaving the original as current, ready to match the
+following input lines. This provides a way of testing the
+<b>pcre2_code_copy()</b> function.
+The <b>push</b> and <b>pushcopy </b> modifiers are incompatible with compilation
+modifiers such as <b>global</b> that act at match time. Any that are specified
+are ignored (for the stacked copy), with a warning message, except for
+<b>replace</b>, which causes an error. Note that <b>jitverify</b>, which is
+allowed, does not carry through to any subsequent matching that uses a stacked
+pattern.
+<a name="subjectmodifiers"></a></P>
 <br><a name="SEC11" href="#TOC1">SUBJECT MODIFIERS</a><br>
 <P>
 The modifiers that can appear in subject lines and the <b>#subject</b>
@ -806,6 +1034,7 @@ for a description of their effects.
      anchored                  set PCRE2_ANCHORED
      dfa_restart               set PCRE2_DFA_RESTART
      dfa_shortest              set PCRE2_DFA_SHORTEST
+      no_jit                    set PCRE2_NO_JIT
      no_utf_check              set PCRE2_NO_UTF_CHECK
      notbol                    set PCRE2_NOTBOL
      notempty                  set PCRE2_NOTEMPTY
@ -818,11 +1047,11 @@ The partial matching modifiers are provided with abbreviations because they
 appear frequently in tests.
 </P>
 <P>
-If the <b>/posix</b> modifier was present on the pattern, causing the POSIX
+If the <b>posix</b> modifier was present on the pattern, causing the POSIX
 wrapper API to be used, the only option-setting modifiers that have any effect
 are <b>notbol</b>, <b>notempty</b>, and <b>noteol</b>, causing REG_NOTBOL,
 REG_NOTEMPTY, and REG_NOTEOL, respectively, to be passed to <b>regexec()</b>.
-Any other modifiers cause an error.
+The other modifiers are ignored, with a warning message.
 </P>
 <br><b>
 Setting match controls
@ -833,33 +1062,44 @@ information. Some of them may also be specified on a pattern line (see above),
 in which case they apply to every subject line that is matched against that
 pattern.
 <pre>
-      aftertext                 show text after match
-      allaftertext              show text after captures
-      allcaptures               show all captures
-      allusedtext               show all consulted text (non-JIT only)
-      altglobal                 alternative global matching
-      callout_capture           show captures at callout time
-      callout_data=&#60;n&#62;          set a value to pass via callouts
-      callout_fail=&#60;n&#62;[:&#60;m&#62;]    control callout failure
-      callout_none              do not supply a callout function
-      copy=&#60;number or name&#62;     copy captured substring
-      dfa                       use <b>pcre2_dfa_match()</b>
-      find_limits               find match and recursion limits
-      get=&#60;number or name&#62;      extract captured substring
-      getall                    extract all captured substrings
-  /g  global                    global matching
-      jitstack=&#60;n&#62;              set size of JIT stack
-      mark                      show mark values
-      match_limit=&#62;n&#62;           set a match limit
-      memory                    show memory usage
-      offset=&#60;n&#62;                set starting offset
-      ovector=&#60;n&#62;               set size of output vector
-      recursion_limit=&#60;n&#62;       set a recursion limit
-      replace=&#60;string&#62;          specify a replacement string
-      startchar                 show startchar when relevant
-      zero_terminate            pass the subject as zero-terminated
+      aftertext                  show text after match
+      allaftertext               show text after captures
+      allcaptures                show all captures
+      allusedtext                show all consulted text (non-JIT only)
+      altglobal                  alternative global matching
+      callout_capture            show captures at callout time
+      callout_data=&#60;n&#62;           set a value to pass via callouts
+      callout_error=&#60;n&#62;[:&#60;m&#62;]    control callout error
+      callout_fail=&#60;n&#62;[:&#60;m&#62;]     control callout failure
+      callout_none               do not supply a callout function
+      copy=&#60;number or name&#62;      copy captured substring
+      dfa                        use <b>pcre2_dfa_match()</b>
+      find_limits                find match and recursion limits
+      get=&#60;number or name&#62;       extract captured substring
+      getall                     extract all captured substrings
+  /g  global                     global matching
+      jitstack=&#60;n&#62;               set size of JIT stack
+      mark                       show mark values
+      match_limit=&#60;n&#62;            set a match limit
+      memory                     show memory usage
+      null_context               match with a NULL context
+      offset=&#60;n&#62;                 set starting offset
+      offset_limit=&#60;n&#62;           set offset limit
+      ovector=&#60;n&#62;                set size of output vector
+      recursion_limit=&#60;n&#62;        set a recursion limit
+      replace=&#60;string&#62;           specify a replacement string
+      startchar                  show startchar when relevant
+      startoffset=&#60;n&#62;            same as offset=&#60;n&#62;
+      substitute_extedded        use PCRE2_SUBSTITUTE_EXTENDED
+      substitute_overflow_length use PCRE2_SUBSTITUTE_OVERFLOW_LENGTH
+      substitute_unknown_unset   use PCRE2_SUBSTITUTE_UNKNOWN_UNSET
+      substitute_unset_empty     use PCRE2_SUBSTITUTE_UNSET_EMPTY
+      zero_terminate             pass the subject as zero-terminated
 </pre>
-The effects of these modifiers are described in the following sections.
+The effects of these modifiers are described in the following sections. When
+matching via the POSIX wrapper API, the <b>aftertext</b>, <b>allaftertext</b>,
+and <b>ovector</b> subject modifiers work as described below. All other
+modifiers are either ignored, with a warning message, or cause an error.
 </P>
 <br><b>
 Showing more text
@ -916,7 +1156,8 @@ The <b>allcaptures</b> modifier requests that the values of all potential
 captured parentheses be output after a match. By default, only those up to the
 highest one actually used in the match are output (corresponding to the return
 code from <b>pcre2_match()</b>). Groups that did not take part in the match
-are output as "&#60;unset&#62;".
+are output as "&#60;unset&#62;". This modifier is not relevant for DFA matching (which
+does no capturing); it is ignored, with a warning message, if present.
 </P>
 <br><b>
 Testing callouts
@ -924,15 +1165,22 @@ Testing callouts
 <P>
 A callout function is supplied when <b>pcre2test</b> calls the library matching
 functions, unless <b>callout_none</b> is specified. If <b>callout_capture</b> is
-set, the current captured groups are output when a callout occurs.
+set, the current captured groups are output when a callout occurs. The default
+return from the callout function is zero, which allows matching to continue.
 </P>
 <P>
 The <b>callout_fail</b> modifier can be given one or two numbers. If there is
-only one number, 1 is returned instead of 0 when a callout of that number is
-reached. If two numbers are given, 1 is returned when callout &#60;n&#62; is reached
-for the &#60;m&#62;th time. Note that callouts with string arguments are always given
-the number zero. See "Callouts" below for a description of the output when a
-callout it taken.
+only one number, 1 is returned instead of 0 (causing matching to backtrack)
+when a callout of that number is reached. If two numbers (&#60;n&#62;:&#60;m&#62;) are given, 1
+is returned when callout &#60;n&#62; is reached and there have been at least &#60;m&#62;
+callouts. The <b>callout_error</b> modifier is similar, except that
+PCRE2_ERROR_CALLOUT is returned, causing the entire matching process to be
+aborted. If both these modifiers are set for the same callout number,
+<b>callout_error</b> takes precedence.
+</P>
+<P>
+Note that callouts with string arguments are always given the number zero. See
+"Callouts" below for a description of the output when a callout it taken.
 </P>
 <P>
 The <b>callout_data</b> modifier can be given an unsigned or a negative number.
@ -945,7 +1193,7 @@ Finding all matches in a string
 </b><br>
 <P>
 Searching for all possible matches within a subject can be requested by the
-<b>global</b> or <b>/altglobal</b> modifier. After finding a match, the matching
+<b>global</b> or <b>altglobal</b> modifier. After finding a match, the matching
 function is called again to search the remainder of the subject. The difference
 between <b>global</b> and <b>altglobal</b> is that the former uses the
 <i>start_offset</i> argument to <b>pcre2_match()</b> or <b>pcre2_dfa_match()</b>
@ -996,19 +1244,34 @@ Testing the substitution function
 </b><br>
 <P>
 If the <b>replace</b> modifier is set, the <b>pcre2_substitute()</b> function is
-called instead of one of the matching functions. Unlike subject strings,
-<b>pcre2test</b> does not process replacement strings for escape sequences. In
-UTF mode, a replacement string is checked to see if it is a valid UTF-8 string.
-If so, it is correctly converted to a UTF string of the appropriate code unit
-width. If it is not a valid UTF-8 string, the individual code units are copied
-directly. This provides a means of passing an invalid UTF-8 string for testing
-purposes.
+called instead of one of the matching functions. Note that replacement strings
+cannot contain commas, because a comma signifies the end of a modifier. This is
+not thought to be an issue in a test program.
 </P>
 <P>
-If the <b>global</b> modifier is set, PCRE2_SUBSTITUTE_GLOBAL is passed to
-<b>pcre2_substitute()</b>. After a successful substitution, the modified string
-is output, preceded by the number of replacements. This may be zero if there
-were no matches. Here is a simple example of a substitution test:
+Unlike subject strings, <b>pcre2test</b> does not process replacement strings
+for escape sequences. In UTF mode, a replacement string is checked to see if it
+is a valid UTF-8 string. If so, it is correctly converted to a UTF string of
+the appropriate code unit width. If it is not a valid UTF-8 string, the
+individual code units are copied directly. This provides a means of passing an
+invalid UTF-8 string for testing purposes.
+</P>
+<P>
+The following modifiers set options (in additional to the normal match options)
+for <b>pcre2_substitute()</b>:
+<pre>
+  global                      PCRE2_SUBSTITUTE_GLOBAL
+  substitute_extended         PCRE2_SUBSTITUTE_EXTENDED
+  substitute_overflow_length  PCRE2_SUBSTITUTE_OVERFLOW_LENGTH
+  substitute_unknown_unset    PCRE2_SUBSTITUTE_UNKNOWN_UNSET
+  substitute_unset_empty      PCRE2_SUBSTITUTE_UNSET_EMPTY
+
+</PRE>
+</P>
+<P>
+After a successful substitution, the modified string is output, preceded by the
+number of replacements. This may be zero if there were no matches. Here is a
+simple example of a substitution test:
 <pre>
  /abc/replace=xxx
      =abc=abc=
@ -1016,12 +1279,12 @@ were no matches. Here is a simple example of a substitution test:
      =abc=abc=\=global
   2: =xxx=xxx=
 </pre>
-Subject and replacement strings should be kept relatively short for
-substitution tests, as fixed-size buffers are used. To make it easy to test for
-buffer overflow, if the replacement string starts with a number in square
-brackets, that number is passed to <b>pcre2_substitute()</b> as the size of the
-output buffer, with the replacement string starting at the next character. Here
-is an example that tests the edge case:
+Subject and replacement strings should be kept relatively short (fewer than 256
+characters) for substitution tests, as fixed-size buffers are used. To make it
+easy to test for buffer overflow, if the replacement string starts with a
+number in square brackets, that number is passed to <b>pcre2_substitute()</b> as
+the size of the output buffer, with the replacement string starting at the next
+character. Here is an example that tests the edge case:
 <pre>
  /abc/
      123abc123\=replace=[10]XYZ
@ -1029,6 +1292,19 @@ is an example that tests the edge case:
      123abc123\=replace=[9]XYZ
  Failed: error -47: no more memory
 </pre>
+The default action of <b>pcre2_substitute()</b> is to return
+PCRE2_ERROR_NOMEMORY when the output buffer is too small. However, if the
+PCRE2_SUBSTITUTE_OVERFLOW_LENGTH option is set (by using the
+<b>substitute_overflow_length</b> modifier), <b>pcre2_substitute()</b> continues
+to go through the motions of matching and substituting, in order to compute the
+size of buffer that is required. When this happens, <b>pcre2test</b> shows the
+required buffer length (which includes space for the trailing zero) as part of
+the error message. For example:
+<pre>
+  /abc/substitute_overflow_length
+      123abc123\=replace=[9]XYZ
+  Failed: error -47: no more memory: 10 code units are needed
+</pre>
 A replacement string is ignored with POSIX and DFA matching. Specifying partial
 matching provokes an error return ("bad option value") from
 <b>pcre2_substitute()</b>.
@ -1100,6 +1376,16 @@ The <b>offset</b> modifier sets an offset in the subject string at which
 matching starts. Its value is a number of code units, not characters.
 </P>
 <br><b>
+Setting an offset limit
+</b><br>
+<P>
+The <b>offset_limit</b> modifier sets a limit for unanchored matches. If a match
+cannot be found starting at or before this offset in the subject, a "no match"
+return is given. The data value is a number of code units, not characters. When
+this modifier is used, the <b>use_offset_limit</b> modifier must have been set
+for the pattern; if not, an error is generated.
+</P>
+<br><b>
 Setting the size of the output vector
 </b><br>
 <P>
@ -1131,6 +1417,17 @@ this modifier has no effect, as there is no facility for passing a length.)
 When testing <b>pcre2_substitute()</b>, this modifier also has the effect of
 passing the replacement string as zero-terminated.
 </P>
+<br><b>
+Passing a NULL context
+</b><br>
+<P>
+Normally, <b>pcre2test</b> passes a context block to <b>pcre2_match()</b>,
+<b>pcre2_dfa_match()</b> or <b>pcre2_jit_match()</b>. If the <b>null_context</b>
+modifier is set, however, NULL is passed. This is for testing that the matching
+functions behave correctly in this case (they use default values). This
+modifier cannot be used with the <b>find_limits</b> modifier or when testing the
+substitution function.
+</P>
 <br><a name="SEC12" href="#TOC1">THE ALTERNATIVE MATCHING FUNCTION</a><br>
 <P>
 By default, <b>pcre2test</b> uses the standard PCRE2 matching function,
@ -1196,7 +1493,7 @@ unset substring is shown as "&#60;unset&#62;", as for the second data line.
 If the strings contain any non-printing characters, they are output as \xhh
 escapes if the value is less than 256 and UTF mode is not set. Otherwise they
 are output as \x{hh...} escapes. See below for the definition of non-printing
-characters. If the <b>/aftertext</b> modifier is set, the output for substring
+characters. If the <b>aftertext</b> modifier is set, the output for substring
 0 is followed by the the rest of the subject string, identified by "0+" like
 this:
 <pre>
@ -1321,7 +1618,9 @@ item to be tested. For example:
 This output indicates that callout number 0 occurred for a match attempt
 starting at the fourth character of the subject string, when the pointer was at
 the seventh character, and when the next pattern item was \d. Just
-one circumflex is output if the start and current positions are the same.
+one circumflex is output if the start and current positions are the same, or if
+the current position precedes the start position, which can happen if the
+callout is in a lookbehind assertion.
 </P>
 <P>
 Callouts numbered 255 are assumed to be automatic callouts, inserted as a
@ -1387,7 +1686,7 @@ therefore shown as hex escapes.
 <P>
 When <b>pcre2test</b> is outputting text that is a matched part of a subject
 string, it behaves in the same way, unless a different locale has been set for
-the pattern (using the <b>/locale</b> modifier). In this case, the
+the pattern (using the <b>locale</b> modifier). In this case, the
 <b>isprint()</b> function is used to distinguish printing and non-printing
 characters.
 <a name="saverestore"></a></P>
@ -1413,11 +1712,16 @@ can be used to test these functions.
 <P>
 When a pattern with <b>push</b> modifier is successfully compiled, it is pushed
 onto a stack of compiled patterns, and <b>pcre2test</b> expects the next line to
-contain a new pattern (or command) instead of a subject line. By this means, a
-number of patterns can be compiled and retained. The <b>push</b> modifier is
-incompatible with <b>posix</b>, and control modifiers that act at match time are
-ignored (with a message). The <b>jitverify</b> modifier applies only at compile
-time. The command
+contain a new pattern (or command) instead of a subject line. By contrast,
+the <b>pushcopy</b> modifier causes a copy of the compiled pattern to be
+stacked, leaving the original available for immediate matching. By using
+<b>push</b> and/or <b>pushcopy</b>, a number of patterns can be compiled and
+retained. These modifiers are incompatible with <b>posix</b>, and control
+modifiers that act at match time are ignored (with a message) for the stacked
+patterns. The <b>jitverify</b> modifier applies only at compile time.
+</P>
+<P>
+The command
 <pre>
  #save &#60;filename&#62;
 </pre>
@ -1434,7 +1738,8 @@ usual by an empty line or end of file. This command may be followed by a
 modifier list containing only
 <a href="#controlmodifiers">control modifiers</a>
 that act after a pattern has been compiled. In particular, <b>hex</b>,
-<b>posix</b>, and <b>push</b> are not allowed, nor are any
+<b>posix</b>, <b>posix_nosub</b>, <b>push</b>, and <b>pushcopy</b> are not allowed,
+nor are any
 <a href="#optionmodifiers">option-setting modifiers.</a>
 The JIT modifiers are, however permitted. Here is an example that saves and
 reloads two patterns.
@ -1452,6 +1757,11 @@ reloads two patterns.
 If <b>jitverify</b> is used with #pop, it does not automatically imply
 <b>jit</b>, which is different behaviour from when it is used on a pattern.
 </P>
+<P>
+The #popcopy command is analagous to the <b>pushcopy</b> modifier in that it
+makes current a copy of the topmost stack pattern, leaving the original still
+on the stack.
+</P>
 <br><a name="SEC19" href="#TOC1">SEE ALSO</a><br>
 <P>
 <b>pcre2</b>(3), <b>pcre2api</b>(3), <b>pcre2callout</b>(3),
@ -1469,9 +1779,9 @@ Cambridge, England.
 </P>
 <br><a name="SEC21" href="#TOC1">REVISION</a><br>
 <P>
-Last updated: 20 May 2015
+Last updated: 28 December 2016
 <br>
-Copyright &copy; 1997-2015 University of Cambridge.
+Copyright &copy; 1997-2016 University of Cambridge.
 <br>
 <p>
 Return to the <a href="index.html">PCRE2 index page</a>.
--- a/pcre2/doc/html/pcre2unicode.html
+++ b/pcre2/doc/html/pcre2unicode.html
@ -67,15 +67,20 @@ In UTF modes, the dot metacharacter matches one UTF character instead of a
 single code unit.
 </P>
 <P>
-The escape sequence \C can be used to match a single code unit, in a UTF mode,
+The escape sequence \C can be used to match a single code unit in a UTF mode,
 but its use can lead to some strange effects because it breaks up multi-unit
 characters (see the description of \C in the
 <a href="pcre2pattern.html"><b>pcre2pattern</b></a>
-documentation). The use of \C is not supported in the alternative matching
-function <b>pcre2_dfa_match()</b>, nor is it supported in UTF mode by the JIT
-optimization. If JIT optimization is requested for a UTF pattern that contains
-\C, it will not succeed, and so the matching will be carried out by the normal
-interpretive function.
+documentation).
+</P>
+<P>
+The use of \C is not supported by the alternative matching function
+<b>pcre2_dfa_match()</b> when in UTF-8 or UTF-16 mode, that is, when a character
+may consist of more than one code unit. The use of \C in these modes provokes
+a match-time error. Also, the JIT optimization does not support \C in these
+modes. If JIT optimization is requested for a UTF-8 or UTF-16 pattern that
+contains \C, it will not succeed, and so when <b>pcre2_match()</b> is called,
+the matching will be carried out by the normal interpretive function.
 </P>
 <P>
 The character escapes \b, \B, \d, \D, \s, \S, \w, and \W correctly test
@ -126,11 +131,22 @@ as a byte-order mark (BOM). The PCRE2 functions do not handle this, expecting
 strings to be in host byte order.
 </P>
 <P>
-The entire string is checked before any other processing takes place. In
-addition to checking the format of the string, there is a check to ensure that
-all code points lie in the range U+0 to U+10FFFF, excluding the surrogate area.
-The so-called "non-character" code points are not excluded because Unicode
-corrigendum #9 makes it clear that they should not be.
+A UTF string is checked before any other processing takes place. In the case of
+<b>pcre2_match()</b> and <b>pcre2_dfa_match()</b> calls with a non-zero starting
+offset, the check is applied only to that part of the subject that could be
+inspected during matching, and there is a check that the starting offset points
+to the first code unit of a character or to the end of the subject. If there
+are no lookbehind assertions in the pattern, the check starts at the starting
+offset. Otherwise, it starts at the length of the longest lookbehind before the
+starting offset, or at the start of the subject if there are not that many
+characters before the starting offset. Note that the sequences \b and \B are
+one-character lookbehinds.
+</P>
+<P>
+In addition to checking the format of the string, there is a check to ensure
+that all code points lie in the range U+0 to U+10FFFF, excluding the surrogate
+area. The so-called "non-character" code points are not excluded because
+Unicode corrigendum #9 makes it clear that they should not be.
 </P>
 <P>
 Characters in the "Surrogate Area" of Unicode are reserved for use by UTF-16,
@ -232,9 +248,9 @@ Errors in UTF-16 strings
 <P>
 The following negative error codes are given for invalid UTF-16 strings:
 <pre>
-  PCRE_UTF16_ERR1  Missing low surrogate at end of string
-  PCRE_UTF16_ERR2  Invalid low surrogate follows high surrogate
-  PCRE_UTF16_ERR3  Isolated low surrogate
+  PCRE2_ERROR_UTF16_ERR1  Missing low surrogate at end of string
+  PCRE2_ERROR_UTF16_ERR2  Invalid low surrogate follows high surrogate
+  PCRE2_ERROR_UTF16_ERR3  Isolated low surrogate

 <a name="utf32strings"></a></PRE>
 </P>
@ -244,8 +260,8 @@ Errors in UTF-32 strings
 <P>
 The following negative error codes are given for invalid UTF-32 strings:
 <pre>
-  PCRE_UTF32_ERR1  Surrogate character (range from 0xd800 to 0xdfff)
-  PCRE_UTF32_ERR2  Code point is greater than 0x10ffff
+  PCRE2_ERROR_UTF32_ERR1  Surrogate character (0xd800 to 0xdfff)
+  PCRE2_ERROR_UTF32_ERR2  Code point is greater than 0x10ffff

 </PRE>
 </P>
@ -264,9 +280,9 @@ Cambridge, England.
 REVISION
 </b><br>
 <P>
-Last updated: 23 November 2014
+Last updated: 03 July 2016
 <br>
-Copyright &copy; 1997-2014 University of Cambridge.
+Copyright &copy; 1997-2016 University of Cambridge.
 <br>
 <p>
 Return to the <a href="index.html">PCRE2 index page</a>.
--- a/pcre2/doc/index.html.src
+++ b/pcre2/doc/index.html.src
@ -91,6 +91,12 @@ in the library.
 <tr><td><a href="pcre2_callout_enumerate.html">pcre2_callout_enumerate</a></td>
    <td>&nbsp;&nbsp;Enumerate callouts in a compiled pattern</td></tr>

+<tr><td><a href="pcre2_code_copy.html">pcre2_code_copy</a></td>
+    <td>&nbsp;&nbsp;Copy a compiled pattern</td></tr>
+
+<tr><td><a href="pcre2_code_copy_with_tables.html">pcre2_code_copy_with_tables</a></td>
+    <td>&nbsp;&nbsp;Copy a compiled pattern and its character tables</td></tr>
+
 <tr><td><a href="pcre2_code_free.html">pcre2_code_free</a></td>
    <td>&nbsp;&nbsp;Free a compiled pattern</td></tr>

@ -210,9 +216,15 @@ in the library.
 <tr><td><a href="pcre2_set_match_limit.html">pcre2_set_match_limit</a></td>
    <td>&nbsp;&nbsp;Set the match limit</td></tr>

+<tr><td><a href="pcre2_set_max_pattern_length.html">pcre2_set_max_pattern_length</a></td>
+    <td>&nbsp;&nbsp;Set the maximum length of pattern</td></tr>
+
 <tr><td><a href="pcre2_set_newline.html">pcre2_set_newline</a></td>
    <td>&nbsp;&nbsp;Set the newline convention</td></tr>

+<tr><td><a href="pcre2_set_offset_limit.html">pcre2_set_offset_limit</a></td>
+    <td>&nbsp;&nbsp;Set the offset limit</td></tr>
+
 <tr><td><a href="pcre2_set_parens_nest_limit.html">pcre2_set_parens_nest_limit</a></td>
    <td>&nbsp;&nbsp;Set the parentheses nesting limit</td></tr>

--- a/pcre2/doc/pcre2.3
+++ b/pcre2/doc/pcre2.3
@ -1,4 +1,4 @@
-.TH PCRE2 3 "13 April 2015" "PCRE2 10.20"
+.TH PCRE2 3 "16 October 2015" "PCRE2 10.21"
 .SH NAME
 PCRE2 - Perl-compatible regular expressions (revised API)
 .SH INTRODUCTION
@ -118,8 +118,10 @@ running redundant checks.
 .P
 The use of the \eC escape sequence in a UTF-8 or UTF-16 pattern can lead to
 problems, because it may leave the current matching point in the middle of a
-multi-code-unit character. The PCRE2_NEVER_BACKSLASH_C option can be used to
-lock out the use of \eC, causing a compile-time error if it is encountered.
+multi-code-unit character. The PCRE2_NEVER_BACKSLASH_C option can be used by an
+application to lock out the use of \eC, causing a compile-time error if it is
+encountered. It is also possible to build PCRE2 with the use of \eC permanently
+disabled.
 .P
 Another way that performance can be hit is by running a pattern that has a very
 large search tree against a string that will never match. Nested unlimited
@ -187,6 +189,6 @@ use my two initials, followed by the two digits 10, at the domain cam.ac.uk.
 .rs
 .sp
 .nf
-Last updated: 13 April 2015
+Last updated: 16 October 2015
 Copyright (c) 1997-2015 University of Cambridge.
 .fi
--- a/pcre2/doc/pcre2.txt
+++ b/pcre2/doc/pcre2.txt
--- a/pcre2/doc/pcre2_code_copy.3
+++ b/pcre2/doc/pcre2_code_copy.3
@ -0,0 +1,31 @@
+.TH PCRE2_CODE_COPY 3 "22 November 2016" "PCRE2 10.23"
+.SH NAME
+PCRE2 - Perl-compatible regular expressions (revised API)
+.SH SYNOPSIS
+.rs
+.sp
+.B #include <pcre2.h>
+.PP
+.nf
+.B pcre2_code *pcre2_code_copy(const pcre2_code *\fIcode\fP);
+.fi
+.
+.SH DESCRIPTION
+.rs
+.sp
+This function makes a copy of the memory used for a compiled pattern, excluding
+any memory used by the JIT compiler. Without a subsequent call to
+\fBpcre2_jit_compile()\fP, the copy can be used only for non-JIT matching. The
+pointer to the character tables is copied, not the tables themselves (see
+\fBpcre2_code_copy_with_tables()\fP). The yield of the function is NULL if
+\fIcode\fP is NULL or if sufficient memory cannot be obtained.
+.P
+There is a complete description of the PCRE2 native API in the
+.\" HREF
+\fBpcre2api\fP
+.\"
+page and a description of the POSIX API in the
+.\" HREF
+\fBpcre2posix\fP
+.\"
+page.
--- a/pcre2/doc/pcre2_code_copy_with_tables.3
+++ b/pcre2/doc/pcre2_code_copy_with_tables.3
@ -0,0 +1,32 @@
+.TH PCRE2_CODE_COPY 3 "22 November 2016" "PCRE2 10.23"
+.SH NAME
+PCRE2 - Perl-compatible regular expressions (revised API)
+.SH SYNOPSIS
+.rs
+.sp
+.B #include <pcre2.h>
+.PP
+.nf
+.B pcre2_code *pcre2_code_copy_with_tables(const pcre2_code *\fIcode\fP);
+.fi
+.
+.SH DESCRIPTION
+.rs
+.sp
+This function makes a copy of the memory used for a compiled pattern, excluding
+any memory used by the JIT compiler. Without a subsequent call to
+\fBpcre2_jit_compile()\fP, the copy can be used only for non-JIT matching.
+Unlike \fBpcre2_code_copy()\fP, a separate copy of the character tables is also
+made, with the new code pointing to it. This memory will be automatically freed
+when \fBpcre2_code_free()\fP is called. The yield of the function is NULL if
+\fIcode\fP is NULL or if sufficient memory cannot be obtained.
+.P
+There is a complete description of the PCRE2 native API in the
+.\" HREF
+\fBpcre2api\fP
+.\"
+page and a description of the POSIX API in the
+.\" HREF
+\fBpcre2posix\fP
+.\"
+page.
--- a/pcre2/doc/pcre2_code_free.3
+++ b/pcre2/doc/pcre2_code_free.3
@ -1,4 +1,4 @@
-.TH PCRE2_CODE_FREE 3 "21 October 2014" "PCRE2 10.00"
+.TH PCRE2_CODE_FREE 3 "29 July 2015" "PCRE2 10.21"
 .SH NAME
 PCRE2 - Perl-compatible regular expressions (revised API)
 .SH SYNOPSIS
@ -7,7 +7,7 @@ PCRE2 - Perl-compatible regular expressions (revised API)
 .B #include <pcre2.h>
 .PP
 .nf
-.B pcre2_code_free(pcre2_code *\fIcode\fP);
+.B void pcre2_code_free(pcre2_code *\fIcode\fP);
 .fi
 .
 .SH DESCRIPTION
--- a/pcre2/doc/pcre2_dfa_match.3
+++ b/pcre2/doc/pcre2_dfa_match.3
@ -1,4 +1,4 @@
-.TH PCRE2_DFA_MATCH 3 "12 May 2013" "PCRE2 10.00"
+.TH PCRE2_DFA_MATCH 3 "23 December 2016" "PCRE2 10.23"
 .SH NAME
 PCRE2 - Perl-compatible regular expressions (revised API)
 .SH SYNOPSIS
@ -33,8 +33,8 @@ is \fBpcre2_match()\fP.) The arguments for this function are:
  \fIwscount\fP      Number of elements in the vector
 .sp
 For \fBpcre2_dfa_match()\fP, a match context is needed only if you want to set
-up a callout function. The \fIlength\fP and \fIstartoffset\fP values are code
-units, not characters. The options are:
+up a callout function or specify the recursion limit. The \fIlength\fP and
+\fIstartoffset\fP values are code units, not characters. The options are:
 .sp
  PCRE2_ANCHORED          Match only at the first position
  PCRE2_NOTBOL            Subject is not the beginning of a line
--- a/pcre2/doc/pcre2_get_error_message.3
+++ b/pcre2/doc/pcre2_get_error_message.3
@ -1,4 +1,4 @@
-.TH PCRE2_GET_ERROR_MESSAGE 3 "21 October 2014" "PCRE2 10.00"
+.TH PCRE2_GET_ERROR_MESSAGE 3 "17 June 2016" "PCRE2 10.22"
 .SH NAME
 PCRE2 - Perl-compatible regular expressions (revised API)
 .SH SYNOPSIS
@ -23,7 +23,10 @@ errors are negative numbers. The arguments are:
  \fIbufflen\fP     the length of the buffer (code units)
 .sp
 The function returns the length of the message, excluding the trailing zero, or
-a negative error code if the buffer is too small.
+the negative error code PCRE2_ERROR_NOMEMORY if the buffer is too small. In
+this case, the returned message is truncated (but still with a trailing zero).
+If \fIerrorcode\fP does not contain a recognized error code number, the
+negative value PCRE2_ERROR_BADDATA is returned.
 .P
 There is a complete description of the PCRE2 native API in the
 .\" HREF
--- a/pcre2/doc/pcre2_match_data_create.3
+++ b/pcre2/doc/pcre2_match_data_create.3
@ -1,4 +1,4 @@
-.TH PCRE2_MATCH_DATA_CREATE 3 "22 October 2014" "PCRE2 10.00"
+.TH PCRE2_MATCH_DATA_CREATE 3 "29 July 2015" "PCRE2 10.21"
 .SH NAME
 PCRE2 - Perl-compatible regular expressions (revised API)
 .SH SYNOPSIS
@ -7,7 +7,7 @@ PCRE2 - Perl-compatible regular expressions (revised API)
 .B #include <pcre2.h>
 .PP
 .nf
-.B pcre2_match_data_create(uint32_t \fIovecsize\fP,
+.B pcre2_match_data *pcre2_match_data_create(uint32_t \fIovecsize\fP,
 .B "  pcre2_general_context *\fIgcontext\fP);"
 .fi
 .
--- a/pcre2/doc/pcre2_match_data_create_from_pattern.3
+++ b/pcre2/doc/pcre2_match_data_create_from_pattern.3
@ -1,4 +1,4 @@
-.TH PCRE2_MATCH_DATA_CREATE_FROM_PATTERN 3 "24 October 2014" "PCRE2 10.00"
+.TH PCRE2_MATCH_DATA_CREATE_FROM_PATTERN 3 "29 July 2015" "PCRE2 10.21"
 .SH NAME
 PCRE2 - Perl-compatible regular expressions (revised API)
 .SH SYNOPSIS
@ -7,8 +7,8 @@ PCRE2 - Perl-compatible regular expressions (revised API)
 .B #include <pcre2.h>
 .PP
 .nf
-.B pcre2_match_data_create_from_pattern(const pcre2_code *\fIcode\fP,
-.B "  pcre2_general_context *\fIgcontext\fP);"
+.B pcre2_match_data *pcre2_match_data_create_from_pattern(
+.B "  const pcre2_code *\fIcode\fP, pcre2_general_context *\fIgcontext\fP);"
 .fi
 .
 .SH DESCRIPTION
--- a/pcre2/doc/pcre2_pattern_info.3
+++ b/pcre2/doc/pcre2_pattern_info.3
@ -1,4 +1,4 @@
-.TH PCRE2_PATTERN_INFO 3 "01 December 2014" "PCRE2 10.00"
+.TH PCRE2_PATTERN_INFO 3 "21 November 2015" "PCRE2 10.21"
 .SH NAME
 PCRE2 - Perl-compatible regular expressions (revised API)
 .SH SYNOPSIS
@ -30,19 +30,20 @@ request are as follows:
                               PCRE2_BSR_ANYCRLF: CR, LF, or CRLF only
  PCRE2_INFO_CAPTURECOUNT    Number of capturing subpatterns
  PCRE2_INFO_FIRSTBITMAP     Bitmap of first code units, or NULL
-  PCRE2_INFO_FIRSTCODEUNIT   First code unit when type is 1
  PCRE2_INFO_FIRSTCODETYPE   Type of start-of-match information
                               0 nothing set
                               1 first code unit is set
                               2 start of string or after newline
+  PCRE2_INFO_FIRSTCODEUNIT   First code unit when type is 1
+  PCRE2_INFO_HASBACKSLASHC   Return 1 if pattern contains \eC
  PCRE2_INFO_HASCRORLF       Return 1 if explicit CR or LF matches
                               exist in the pattern
  PCRE2_INFO_JCHANGED        Return 1 if (?J) or (?-J) was used
  PCRE2_INFO_JITSIZE         Size of JIT compiled code, or 0
-  PCRE2_INFO_LASTCODEUNIT    Last code unit when type is 1
  PCRE2_INFO_LASTCODETYPE    Type of must-be-present information
                               0 nothing set
                               1 code unit is set
+  PCRE2_INFO_LASTCODEUNIT    Last code unit when type is 1
  PCRE2_INFO_MATCHEMPTY      1 if the pattern can match an
                               empty string, 0 otherwise
  PCRE2_INFO_MATCHLIMIT      Match limit if set,
@ -50,8 +51,8 @@ request are as follows:
  PCRE2_INFO_MAXLOOKBEHIND   Length (in characters) of the longest
                               lookbehind assertion
  PCRE2_INFO_MINLENGTH       Lower bound length of matching strings
-  PCRE2_INFO_NAMEENTRYSIZE   Size of name table entries
  PCRE2_INFO_NAMECOUNT       Number of named subpatterns
+  PCRE2_INFO_NAMEENTRYSIZE   Size of name table entries
  PCRE2_INFO_NAMETABLE       Pointer to name table
  PCRE2_CONFIG_NEWLINE       Code for the newline sequence:
                               PCRE2_NEWLINE_CR
--- a/pcre2/doc/pcre2_serialize_decode.3
+++ b/pcre2/doc/pcre2_serialize_decode.3
@ -1,4 +1,4 @@
-.TH PCRE2_SERIALIZE_DECODE 3 "19 January 2015" "PCRE2 10.10"
+.TH PCRE2_SERIALIZE_DECODE 3 "02 September 2015" "PCRE2 10.21"
 .SH NAME
 PCRE2 - Perl-compatible regular expressions (revised API)
 .SH SYNOPSIS
@ -8,7 +8,7 @@ PCRE2 - Perl-compatible regular expressions (revised API)
 .PP
 .nf
 .B int32_t pcre2_serialize_decode(pcre2_code **\fIcodes\fP,
-.B "  int32_t \fInumber_of_codes\fP, const uint32_t *\fIbytes\fP,"
+.B "  int32_t \fInumber_of_codes\fP, const uint8_t *\fIbytes\fP,"
 .B "  pcre2_general_context *\fIgcontext\fP);"
 .fi
 .
--- a/pcre2/doc/pcre2_serialize_encode.3
+++ b/pcre2/doc/pcre2_serialize_encode.3
@ -1,4 +1,4 @@
-.TH PCRE2_SERIALIZE_ENCODE 3 "19 January 2015" "PCRE2 10.10"
+.TH PCRE2_SERIALIZE_ENCODE 3 "02 September 2015" "PCRE2 10.21"
 .SH NAME
 PCRE2 - Perl-compatible regular expressions (revised API)
 .SH SYNOPSIS
@ -7,8 +7,8 @@ PCRE2 - Perl-compatible regular expressions (revised API)
 .B #include <pcre2.h>
 .PP
 .nf
-.B int32_t pcre2_serialize_encode(pcre2_code **\fIcodes\fP,
-.B "  int32_t \fInumber_of_codes\fP, uint32_t **\fIserialized_bytes\fP,"
+.B int32_t pcre2_serialize_encode(const pcre2_code **\fIcodes\fP,
+.B "  int32_t \fInumber_of_codes\fP, uint8_t **\fIserialized_bytes\fP,"
 .B "  PCRE2_SIZE *\fIserialized_size\fP, pcre2_general_context *\fIgcontext\fP);"
 .fi
 .
--- a/pcre2/doc/pcre2_set_max_pattern_length.3
+++ b/pcre2/doc/pcre2_set_max_pattern_length.3
@ -0,0 +1,31 @@
+.TH PCRE2_SET_MAX_PATTERN_LENGTH 3 "05 October 2016" "PCRE2 10.23"
+.SH NAME
+PCRE2 - Perl-compatible regular expressions (revised API)
+.SH SYNOPSIS
+.rs
+.sp
+.B #include <pcre2.h>
+.PP
+.nf
+.B int pcre2_set_max_pattern_length(pcre2_compile_context *\fIccontext\fP,
+.B "  PCRE2_SIZE \fIvalue\fP);"
+.fi
+.
+.SH DESCRIPTION
+.rs
+.sp
+This function sets, in a compile context, the maximum text length (in code
+units) of the pattern that can be compiled. The result is always zero. If a
+longer pattern is passed to \fBpcre2_compile()\fP there is an immediate error
+return. The default is effectively unlimited, being the largest value a
+PCRE2_SIZE variable can hold.
+.P
+There is a complete description of the PCRE2 native API in the
+.\" HREF
+\fBpcre2api\fP
+.\"
+page and a description of the POSIX API in the
+.\" HREF
+\fBpcre2posix\fP
+.\"
+page.
--- a/pcre2/doc/pcre2_set_offset_limit.3
+++ b/pcre2/doc/pcre2_set_offset_limit.3
@ -0,0 +1,28 @@
+.TH PCRE2_SET_OFFSET_LIMIT 3 "22 September 2015" "PCRE2 10.21"
+.SH NAME
+PCRE2 - Perl-compatible regular expressions (revised API)
+.SH SYNOPSIS
+.rs
+.sp
+.B #include <pcre2.h>
+.PP
+.nf
+.B int pcre2_set_offset_limit(pcre2_match_context *\fImcontext\fP,
+.B "  PCRE2_SIZE \fIvalue\fP);"
+.fi
+.
+.SH DESCRIPTION
+.rs
+.sp
+This function sets the offset limit field in a match context. The result is
+always zero.
+.P
+There is a complete description of the PCRE2 native API in the
+.\" HREF
+\fBpcre2api\fP
+.\"
+page and a description of the POSIX API in the
+.\" HREF
+\fBpcre2posix\fP
+.\"
+page.
--- a/pcre2/doc/pcre2_substitute.3
+++ b/pcre2/doc/pcre2_substitute.3
@ -1,4 +1,4 @@
-.TH PCRE2_SUBSTITUTE 3 "11 November 2014" "PCRE2 10.00"
+.TH PCRE2_SUBSTITUTE 3 "12 December 2015" "PCRE2 10.21"
 .SH NAME
 PCRE2 - Perl-compatible regular expressions (revised API)
 .SH SYNOPSIS
@ -47,20 +47,25 @@ units, not characters, as is the contents of the variable pointed at by
 \fIoutlengthptr\fP, which is updated to the actual length of the new string.
 The options are:
 .sp
-  PCRE2_ANCHORED          Match only at the first position
-  PCRE2_NOTBOL            Subject string is not the beginning of a line
-  PCRE2_NOTEOL            Subject string is not the end of a line
-  PCRE2_NOTEMPTY          An empty string is not a valid match
-  PCRE2_NOTEMPTY_ATSTART  An empty string at the start of the subject
-                           is not a valid match
-  PCRE2_NO_UTF_CHECK      Do not check the subject or replacement for
-                           UTF validity (only relevant if PCRE2_UTF
-                           was set at compile time)
-  PCRE2_SUBSTITUTE_GLOBAL Replace all occurrences in the subject
+  PCRE2_ANCHORED             Match only at the first position
+  PCRE2_NOTBOL               Subject is not the beginning of a line
+  PCRE2_NOTEOL               Subject is not the end of a line
+  PCRE2_NOTEMPTY             An empty string is not a valid match
+  PCRE2_NOTEMPTY_ATSTART     An empty string at the start of the
+                              subject is not a valid match
+  PCRE2_NO_UTF_CHECK         Do not check the subject or replacement
+                              for UTF validity (only relevant if
+                              PCRE2_UTF was set at compile time)
+  PCRE2_SUBSTITUTE_EXTENDED  Do extended replacement processing
+  PCRE2_SUBSTITUTE_GLOBAL    Replace all occurrences in the subject
+  PCRE2_SUBSTITUTE_OVERFLOW_LENGTH  If overflow, compute needed length
+  PCRE2_SUBSTITUTE_UNKNOWN_UNSET  Treat unknown group as unset
+  PCRE2_SUBSTITUTE_UNSET_EMPTY  Simple unset insert = empty string
 .sp
 The function returns the number of substitutions, which may be zero if there
 were no matches. The result can be greater than one only when
-PCRE2_SUBSTITUTE_GLOBAL is set.
+PCRE2_SUBSTITUTE_GLOBAL is set. In the event of an error, a negative error code
+is returned.
 .P
 There is a complete description of the PCRE2 native API in the
 .\" HREF
--- a/pcre2/doc/pcre2api.3
+++ b/pcre2/doc/pcre2api.3
--- a/pcre2/doc/pcre2build.3
+++ b/pcre2/doc/pcre2build.3
@ -1,4 +1,4 @@
-.TH PCRE2BUILD 3 "23 April 2015" "PCRE2 10.20"
+.TH PCRE2BUILD 3 "01 November 2016" "PCRE2 10.23"
 .SH NAME
 PCRE2 - Perl-compatible regular expressions (revised API)
 .
@ -132,11 +132,20 @@ Pattern escapes such as \ed and \ew do not by default make use of Unicode
 properties. The application can request that they do by setting the PCRE2_UCP
 option. Unless the application has set PCRE2_NEVER_UCP, a pattern may also
 request this by starting with (*UCP).
-.P
+.
+.
+.SH "DISABLING THE USE OF \eC"
+.rs
+.sp
 The \eC escape sequence, which matches a single code unit, even in a UTF mode,
 can cause unpredictable behaviour because it may leave the current matching
-point in the middle of a multi-code-unit character. It can be locked out by
-setting the PCRE2_NEVER_BACKSLASH_C option.
+point in the middle of a multi-code-unit character. The application can lock it
+out by setting the PCRE2_NEVER_BACKSLASH_C option when calling
+\fBpcre2_compile()\fP. There is also a build-time option
+.sp
+  --enable-never-backslash-C
+.sp
+(note the upper case C) which locks out the use of \eC entirely.
 .
 .
 .SH "JUST-IN-TIME COMPILER SUPPORT"
@ -343,6 +352,19 @@ and equivalent run-time options, refer to these character values in an EBCDIC
 environment.
 .
 .
+.SH "PCRE2GREP SUPPORT FOR EXTERNAL SCRIPTS"
+.rs
+.sp
+By default, on non-Windows systems, \fBpcre2grep\fP supports the use of
+callouts with string arguments within the patterns it is matching, in order to
+run external scripts. For details, see the
+.\" HREF
+\fBpcre2grep\fP
+.\"
+documentation. This support can be disabled by adding
+--disable-pcre2grep-callout to the \fBconfigure\fP command.
+.
+.
 .SH "PCRE2GREP OPTIONS FOR COMPRESSED FILE SUPPORT"
 .rs
 .sp
@ -363,16 +385,19 @@ they are not.
 .sp
 \fBpcre2grep\fP uses an internal buffer to hold a "window" on the file it is
 scanning, in order to be able to output "before" and "after" lines when it
-finds a match. The size of the buffer is controlled by a parameter whose
-default value is 20K. The buffer itself is three times this size, but because
-of the way it is used for holding "before" lines, the longest line that is
-guaranteed to be processable is the parameter size. You can change the default
-parameter value by adding, for example,
+finds a match. The starting size of the buffer is controlled by a parameter
+whose default value is 20K. The buffer itself is three times this size, but
+because of the way it is used for holding "before" lines, the longest line that
+is guaranteed to be processable is the parameter size. If a longer line is
+encountered, \fBpcre2grep\fP automatically expands the buffer, up to a
+specified maximum size, whose default is 1M or the starting size, whichever is
+the larger. You can change the default parameter values by adding, for example,
 .sp
-  --with-pcre2grep-bufsize=50K
+  --with-pcre2grep-bufsize=51200
+  --with-pcre2grep-max-bufsize=2097152
 .sp
-to the \fBconfigure\fP command. The caller of \fPpcre2grep\fP can override this
-value by using --buffer-size on the command line..
+to the \fBconfigure\fP command. The caller of \fPpcre2grep\fP can override
+these values by using --buffer-size and --max-buffer-size on the command line.
 .
 .
 .SH "PCRE2TEST OPTION FOR LIBREADLINE SUPPORT"
@ -490,6 +515,28 @@ information about code coverage, see the \fBgcov\fP and \fBlcov\fP
 documentation.
 .
 .
+.SH "SUPPORT FOR FUZZERS"
+.rs
+.sp
+There is a special option for use by people who want to run fuzzing tests on
+PCRE2:
+.sp
+  --enable-fuzz-support
+.sp
+At present this applies only to the 8-bit library. If set, it causes an extra
+library called libpcre2-fuzzsupport.a to be built, but not installed. This
+contains a single function called LLVMFuzzerTestOneInput() whose arguments are
+a pointer to a string and the length of the string. When called, this function
+tries to compile the string as a pattern, and if that succeeds, to match it.
+This is done both with no options and with some random options bits that are
+generated from the string. Setting --enable-fuzz-support also causes a binary
+called \fBpcre2fuzzcheck\fP to be created. This is normally run under valgrind
+or used when PCRE2 is compiled with address sanitizing enabled. It calls the
+fuzzing function and outputs information about it is doing. The input strings
+are specified by arguments: if an argument starts with "=" the rest of it is a
+literal input string. Otherwise, it is assumed to be a file name, and the
+contents of the file are the test string.
+.
 .SH "SEE ALSO"
 .rs
 .sp
@ -510,6 +557,6 @@ Cambridge, England.
 .rs
 .sp
 .nf
-Last updated: 24 April 2015
-Copyright (c) 1997-2015 University of Cambridge.
+Last updated: 01 November 2016
+Copyright (c) 1997-2016 University of Cambridge.
 .fi
--- a/pcre2/doc/pcre2callout.3
+++ b/pcre2/doc/pcre2callout.3
@ -1,4 +1,4 @@
-.TH PCRE2CALLOUT 3 "23 March 2015" "PCRE2 10.20"
+.TH PCRE2CALLOUT 3 "29 September 2016" "PCRE2 10.23"
 .SH NAME
 PCRE2 - Perl-compatible regular expressions (revised API)
 .SH SYNOPSIS
@ -40,11 +40,20 @@ two callout points:
 .sp
 If the PCRE2_AUTO_CALLOUT option bit is set when a pattern is compiled, PCRE2
 automatically inserts callouts, all with number 255, before each item in the
-pattern. For example, if PCRE2_AUTO_CALLOUT is used with the pattern
+pattern except for immediately before or after a callout item in the pattern.
+For example, if PCRE2_AUTO_CALLOUT is used with the pattern
+.sp
+  A(?C3)B
+.sp
+it is processed as if it were
+.sp
+  (?C255)A(?C3)B(?C255)
+.sp
+Here is a more complicated example:
 .sp
  A(\ed{2}|--)
 .sp
-it is processed as if it were
+With PCRE2_AUTO_CALLOUT, this pattern is processed as if it were
 .sp
 (?C255)A(?C255)((?C255)\ed{2}(?C255)|(?C255)-(?C255)-(?C255))(?C255)
 .sp
@ -91,10 +100,10 @@ with PCRE2_ANCHORED and PCRE2_AUTO_CALLOUT and then applied to the string
  No match
 .sp
 This indicates that when matching [bc] fails, there is no backtracking into a+
-and therefore the callouts that would be taken for the backtracks do not occur.
-You can disable the auto-possessify feature by passing PCRE2_NO_AUTO_POSSESS to
-\fBpcre2_compile()\fP, or starting the pattern with (*NO_AUTO_POSSESS). In this
-case, the output changes to this:
+(because it is being treated as a++) and therefore the callouts that would be
+taken for the backtracks do not occur. You can disable the auto-possessify
+feature by passing PCRE2_NO_AUTO_POSSESS to \fBpcre2_compile()\fP, or starting
+the pattern with (*NO_AUTO_POSSESS). In this case, the output changes to this:
 .sp
  --->aaaa
   +0 ^        a+
@ -220,8 +229,8 @@ but the intention is never to remove any of the existing fields.
 .sp
 For a numerical callout, \fIcallout_string\fP is NULL, and \fIcallout_number\fP
 contains the number of the callout, in the range 0-255. This is the number
-that follows (?C for manual callouts; it is 255 for automatically generated
-callouts.
+that follows (?C for callouts that part of the pattern; it is 255 for
+automatically generated callouts.
 .
 .
 .SS "Fields for string callouts"
@ -286,10 +295,15 @@ The \fIpattern_position\fP field contains the offset in the pattern string to
 the next item to be matched.
 .P
 The \fInext_item_length\fP field contains the length of the next item to be
-matched in the pattern string. When the callout immediately precedes an
-alternation bar, a closing parenthesis, or the end of the pattern, the length
-is zero. When the callout precedes an opening parenthesis, the length is that
-of the entire subpattern.
+processed in the pattern string. When the callout is at the end of the pattern,
+the length is zero. When the callout precedes an opening parenthesis, the
+length includes meta characters that follow the parenthesis. For example, in a
+callout before an assertion such as (?=ab) the length is 3. For an an
+alternation bar or a closing parenthesis, the length is one, unless a closing
+parenthesis is followed by a quantifier, in which case its length is included.
+(This changed in release 10.23. In earlier releases, before an opening
+parenthesis the length was that of the entire subpattern, and before an
+alternation bar or a closing parenthesis the length was zero.)
 .P
 The \fIpattern_position\fP and \fInext_item_length\fP fields are intended to
 help in distinguishing between different automatic callouts, which all have the
@ -382,6 +396,6 @@ Cambridge, England.
 .rs
 .sp
 .nf
-Last updated: 23 March 2015
-Copyright (c) 1997-2015 University of Cambridge.
+Last updated: 29 September 2016
+Copyright (c) 1997-2016 University of Cambridge.
 .fi
--- a/pcre2/doc/pcre2compat.3
+++ b/pcre2/doc/pcre2compat.3
@ -1,4 +1,4 @@
-.TH PCRE2COMPAT 3 "15 March 2015" "PCRE2 10.20"
+.TH PCRE2COMPAT 3 "18 October 2016" "PCRE2 10.23"
 .SH NAME
 PCRE2 - Perl-compatible regular expressions (revised API)
 .SH "DIFFERENCES BETWEEN PCRE2 AND PERL"
@ -96,7 +96,7 @@ processed as anchored at the point where they are tested.
 one that is backtracked onto acts. For example, in the pattern
 A(*COMMIT)B(*PRUNE)C a failure in B triggers (*COMMIT), but a failure in C
 triggers (*PRUNE). Perl's behaviour is more complex; in many cases it is the
-same as PCRE2, but there are examples where it differs.
+same as PCRE2, but there are cases where it differs.
 .P
 11. Most backtracking verbs in assertions have their normal actions. They are
 not confined to the assertion.
@ -109,17 +109,18 @@ the pattern /^(a(b)?)+$/ in Perl leaves $2 unset, but in PCRE2 it is set to
 13. PCRE2's handling of duplicate subpattern numbers and duplicate subpattern
 names is not as general as Perl's. This is a consequence of the fact the PCRE2
 works internally just with numbers, using an external table to translate
-between numbers and names. In particular, a pattern such as (?|(?<a>A)|(?<b)B),
+between numbers and names. In particular, a pattern such as (?|(?<a>A)|(?<b>B),
 where the two capturing parentheses have the same number but different names,
 is not supported, and causes an error at compile time. If it were allowed, it
 would not be possible to distinguish which parentheses matched, because both
 names map to capturing subpattern number 1. To avoid this confusing situation,
 an error is given at compile time.
 .P
-14. Perl recognizes comments in some places that PCRE2 does not, for example,
-between the ( and ? at the start of a subpattern. If the /x modifier is set,
-Perl allows white space between ( and ? (though current Perls warn that this is
-deprecated) but PCRE2 never does, even if the PCRE2_EXTENDED option is set.
+14. Perl used to recognize comments in some places that PCRE2 does not, for
+example, between the ( and ? at the start of a subpattern. If the /x modifier
+is set, Perl allowed white space between ( and ? though the latest Perls give
+an error (for a while it was just deprecated). There may still be some cases
+where Perl behaves differently.
 .P
 15. Perl, when in warning mode, gives warnings for character classes such as
 [A-\ed] or [a-[:digit:]]. It then treats the hyphens as literals. PCRE2 has no
@ -141,33 +142,37 @@ list is with respect to Perl 5.10:
 each alternative branch of a lookbehind assertion can match a different length
 of string. Perl requires them all to have the same length.
 .sp
-(b) If PCRE2_DOLLAR_ENDONLY is set and PCRE2_MULTILINE is not set, the $
+(b) From PCRE2 10.23, back references to groups of fixed length are supported
+in lookbehinds, provided that there is no possibility of referencing a
+non-unique number or name. Perl does not support backreferences in lookbehinds.
+.sp
+(c) If PCRE2_DOLLAR_ENDONLY is set and PCRE2_MULTILINE is not set, the $
 meta-character matches only at the very end of the string.
 .sp
-(c) A backslash followed by a letter with no special meaning is faulted. (Perl
+(d) A backslash followed by a letter with no special meaning is faulted. (Perl
 can be made to issue a warning.)
 .sp
-(d) If PCRE2_UNGREEDY is set, the greediness of the repetition quantifiers is
+(e) If PCRE2_UNGREEDY is set, the greediness of the repetition quantifiers is
 inverted, that is, by default they are not greedy, but if followed by a
 question mark they are.
 .sp
-(e) PCRE2_ANCHORED can be used at matching time to force a pattern to be tried
+(f) PCRE2_ANCHORED can be used at matching time to force a pattern to be tried
 only at the first matching position in the subject string.
 .sp
-(f) The PCRE2_NOTBOL, PCRE2_NOTEOL, PCRE2_NOTEMPTY, PCRE2_NOTEMPTY_ATSTART, and
+(g) The PCRE2_NOTBOL, PCRE2_NOTEOL, PCRE2_NOTEMPTY, PCRE2_NOTEMPTY_ATSTART, and
 PCRE2_NO_AUTO_CAPTURE options have no Perl equivalents.
 .sp
-(g) The \eR escape sequence can be restricted to match only CR, LF, or CRLF
+(h) The \eR escape sequence can be restricted to match only CR, LF, or CRLF
 by the PCRE2_BSR_ANYCRLF option.
 .sp
-(h) The callout facility is PCRE2-specific.
+(i) The callout facility is PCRE2-specific.
 .sp
-(i) The partial matching facility is PCRE2-specific.
+(j) The partial matching facility is PCRE2-specific.
 .sp
-(j) The alternative matching function (\fBpcre2_dfa_match()\fP matches in a
+(k) The alternative matching function (\fBpcre2_dfa_match()\fP matches in a
 different way and is not Perl-compatible.
 .sp
-(k) PCRE2 recognizes some special sequences such as (*CR) at the start of
+(l) PCRE2 recognizes some special sequences such as (*CR) at the start of
 a pattern that set overall options that cannot be changed within the pattern.
 .
 .
@ -185,6 +190,6 @@ Cambridge, England.
 .rs
 .sp
 .nf
-Last updated: 15 March 2015
-Copyright (c) 1997-2015 University of Cambridge.
+Last updated: 18 October 2016
+Copyright (c) 1997-2016 University of Cambridge.
 .fi
--- a/pcre2/doc/pcre2demo.3
+++ b/pcre2/doc/pcre2demo.3
@ -20,28 +20,31 @@
 *************************************************/

 /* This is a demonstration program to illustrate a straightforward way of
-calling the PCRE2 regular expression library from a C program. See the
+using the PCRE2 regular expression library from a C program. See the
 pcre2sample documentation for a short discussion ("man pcre2sample" if you have
 the PCRE2 man pages installed). PCRE2 is a revised API for the library, and is
 incompatible with the original PCRE API.

 There are actually three libraries, each supporting a different code unit
-width. This demonstration program uses the 8-bit library.
+width. This demonstration program uses the 8-bit library. The default is to
+process each code unit as a separate character, but if the pattern begins with
+"(*UTF)", both it and the subject are treated as UTF-8 strings, where
+characters may occupy multiple code units.

 In Unix-like environments, if PCRE2 is installed in your standard system
 libraries, you should be able to compile this program using this command:

-gcc -Wall pcre2demo.c -lpcre2-8 -o pcre2demo
+cc -Wall pcre2demo.c -lpcre2-8 -o pcre2demo

 If PCRE2 is not installed in a standard place, it is likely to be installed
 with support for the pkg-config mechanism. If you have pkg-config, you can
 compile this program using this command:

-gcc -Wall pcre2demo.c `pkg-config --cflags --libs libpcre2-8` -o pcre2demo
+cc -Wall pcre2demo.c `pkg-config --cflags --libs libpcre2-8` -o pcre2demo

-If you do not have pkg-config, you may have to use this:
+If you do not have pkg-config, you may have to use something like this:

-gcc -Wall pcre2demo.c -I/usr/local/include -L/usr/local/lib \e
+cc -Wall pcre2demo.c -I/usr/local/include -L/usr/local/lib \e
  -R/usr/local/lib -lpcre2-8 -o pcre2demo

 Replace "/usr/local/include" and "/usr/local/lib" with wherever the include and
@ -56,9 +59,14 @@ the following line. */

 /* #define PCRE2_STATIC */

-/* This macro must be defined before including pcre2.h. For a program that uses
-only one code unit width, it makes it possible to use generic function names
-such as pcre2_compile(). */
+/* The PCRE2_CODE_UNIT_WIDTH macro must be defined before including pcre2.h.
+For a program that uses only one code unit width, setting it to 8, 16, or 32
+makes it possible to use generic function names such as pcre2_compile(). Note
+that just changing 8 to 16 (for example) is not sufficient to convert this
+program to process 16-bit characters. Even in a fully 16-bit environment, where
+string-handling functions such as strcmp() and printf() work with 16-bit
+characters, the code for handling the table of named substrings will still need
+to be modified. */

 #define PCRE2_CODE_UNIT_WIDTH 8

@ -79,19 +87,19 @@ int main(int argc, char **argv)
 {
 pcre2_code *re;
 PCRE2_SPTR pattern;     /* PCRE2_SPTR is a pointer to unsigned code units of */
-PCRE2_SPTR subject;     /* the appropriate width (8, 16, or 32 bits). */
+PCRE2_SPTR subject;     /* the appropriate width (in this case, 8 bits). */
 PCRE2_SPTR name_table;

 int crlf_is_newline;
 int errornumber;
 int find_all;
 int i;
-int namecount;
-int name_entry_size;
 int rc;
 int utf8;

 uint32_t option_bits;
+uint32_t namecount;
+uint32_t name_entry_size;
 uint32_t newline;

 PCRE2_SIZE erroroffset;
@ -106,15 +114,19 @@ pcre2_match_data *match_data;
 * First, sort out the command line. There is only one possible option at  *
 * the moment, "-g" to request repeated matching to find all occurrences,  *
 * like Perl's /g option. We set the variable find_all to a non-zero value *
-* if the -g option is present. Apart from that, there must be exactly two *
-* arguments.                                                              *
+* if the -g option is present.                                            *
 **************************************************************************/

 find_all = 0;
 for (i = 1; i < argc; i++)
  {
  if (strcmp(argv[i], "-g") == 0) find_all = 1;
-    else break;
+  else if (argv[i][0] == '-')
+    {
+    printf("Unrecognised option %s\en", argv[i]);
+    return 1;
+    }
+  else break;
  }

 /* After the options, we require exactly two arguments, which are the pattern,
@ -122,7 +134,7 @@ and the subject string. */

 if (argc - i != 2)
  {
-  printf("Two arguments required: a regex and a subject string\en");
+  printf("Exactly two arguments required: a regex and a subject string\en");
  return 1;
  }

@ -201,7 +213,7 @@ if (rc < 0)
 stored. */

 ovector = pcre2_get_ovector_pointer(match_data);
-printf("\enMatch succeeded at offset %d\en", (int)ovector[0]);
+printf("Match succeeded at offset %d\en", (int)ovector[0]);


 /*************************************************************************
@ -242,7 +254,7 @@ we have to extract the count of named parentheses from the pattern. */
  PCRE2_INFO_NAMECOUNT, /* get the number of named substrings */
  &namecount);          /* where to put the answer */

-if (namecount <= 0) printf("No named substrings\en"); else
+if (namecount == 0) printf("No named substrings\en"); else
  {
  PCRE2_SPTR tabptr;
  printf("Named substrings\en");
@ -330,8 +342,8 @@ crlf_is_newline = newline == PCRE2_NEWLINE_ANY ||

 for (;;)
  {
-  uint32_t options = 0;                    /* Normally no options */
-  PCRE2_SIZE start_offset = ovector[1];  /* Start at end of previous match */
+  uint32_t options = 0;                   /* Normally no options */
+  PCRE2_SIZE start_offset = ovector[1];   /* Start at end of previous match */

  /* If the previous match was for an empty string, we are finished if we are
  at the end of the subject. Otherwise, arrange to run another match at the
@ -371,7 +383,7 @@ for (;;)
    {
    if (options == 0) break;                    /* All matches found */
    ovector[1] = start_offset + 1;              /* Advance one code unit */
-    if (crlf_is_newline &&                      /* If CRLF is newline & */
+    if (crlf_is_newline &&                      /* If CRLF is a newline & */
        start_offset < subject_length - 1 &&    /* we are at CRLF, */
        subject[start_offset] == '\er' &&
        subject[start_offset + 1] == '\en')
@ -417,7 +429,7 @@ for (;;)
    printf("%2d: %.*s\en", i, (int)substring_length, (char *)substring_start);
    }

-  if (namecount <= 0) printf("No named substrings\en"); else
+  if (namecount == 0) printf("No named substrings\en"); else
    {
    PCRE2_SPTR tabptr = name_table;
    printf("Named substrings\en");
--- a/pcre2/doc/pcre2grep.1
+++ b/pcre2/doc/pcre2grep.1
@ -1,4 +1,4 @@
-.TH PCRE2GREP 1 "03 January 2015" "PCRE2 10.00"
+.TH PCRE2GREP 1 "31 December 2016" "PCRE2 10.23"
 .SH NAME
 pcre2grep - a grep with Perl-compatible regular expressions.
 .SH SYNOPSIS
@ -52,11 +52,18 @@ span line boundaries. What defines a line boundary is controlled by the
 \fB-N\fP (\fB--newline\fP) option.
 .P
 The amount of memory used for buffering files that are being scanned is
-controlled by a parameter that can be set by the \fB--buffer-size\fP option.
-The default value for this parameter is specified when \fBpcre2grep\fP is
-built, with the default default being 20K. A block of memory three times this
-size is used (to allow for buffering "before" and "after" lines). An error
-occurs if a line overflows the buffer.
+controlled by parameters that can be set by the \fB--buffer-size\fP and
+\fB--max-buffer-size\fP options. The first of these sets the size of buffer
+that is obtained at the start of processing. If an input file contains very
+long lines, a larger buffer may be needed; this is handled by automatically
+extending the buffer, up to the limit specified by \fB--max-buffer-size\fP. The
+default values for these parameters are specified when \fBpcre2grep\fP is
+built, with the default defaults being 20K and 1M respectively. An error occurs
+if a line is too long and the buffer can no longer be expanded.
+.P
+The block of memory that is actually used is three times the "buffer size", to
+allow for buffering "before" and "after" lines. If the buffer size is too
+small, fewer than requested "before" and "after" lines may be output.
 .P
 Patterns can be no longer than 8K or BUFSIZ bytes, whichever is the greater.
 BUFSIZ is defined in \fB<stdio.h>\fP. When there is more than one pattern
@ -126,24 +133,27 @@ command line starts with a hyphen but is not an option. This allows for the
 processing of patterns and file names that start with hyphens.
 .TP
 \fB-A\fP \fInumber\fP, \fB--after-context=\fP\fInumber\fP
-Output \fInumber\fP lines of context after each matching line. If file names
-and/or line numbers are being output, a hyphen separator is used instead of a
-colon for the context lines. A line containing "--" is output between each
-group of lines, unless they are in fact contiguous in the input file. The value
-of \fInumber\fP is expected to be relatively small. However, \fBpcre2grep\fP
-guarantees to have up to 8K of following text available for context output.
+Output up to \fInumber\fP lines of context after each matching line. Fewer
+lines are output if the next match or the end of the file is reached, or if the
+processing buffer size has been set too small. If file names and/or line
+numbers are being output, a hyphen separator is used instead of a colon for the
+context lines. A line containing "--" is output between each group of lines,
+unless they are in fact contiguous in the input file. The value of \fInumber\fP
+is expected to be relatively small. When \fB-c\fP is used, \fB-A\fP is ignored.
 .TP
 \fB-a\fP, \fB--text\fP
 Treat binary files as text. This is equivalent to
 \fB--binary-files\fP=\fItext\fP.
 .TP
 \fB-B\fP \fInumber\fP, \fB--before-context=\fP\fInumber\fP
-Output \fInumber\fP lines of context before each matching line. If file names
-and/or line numbers are being output, a hyphen separator is used instead of a
-colon for the context lines. A line containing "--" is output between each
-group of lines, unless they are in fact contiguous in the input file. The value
-of \fInumber\fP is expected to be relatively small. However, \fBpcre2grep\fP
-guarantees to have up to 8K of preceding text available for context output.
+Output up to \fInumber\fP lines of context before each matching line. Fewer
+lines are output if the previous match or the start of the file is within
+\fInumber\fP lines, or if the processing buffer size has been set too small. If
+file names and/or line numbers are being output, a hyphen separator is used
+instead of a colon for the context lines. A line containing "--" is output
+between each group of lines, unless they are in fact contiguous in the input
+file. The value of \fInumber\fP is expected to be relatively small. When
+\fB-c\fP is used, \fB-B\fP is ignored.
 .TP
 \fB--binary-files=\fP\fIword\fP
 Specify how binary files are to be processed. If the word is "binary" (the
@ -158,8 +168,9 @@ be of interest and are skipped without causing any output or affecting the
 return code.
 .TP
 \fB--buffer-size=\fP\fInumber\fP
-Set the parameter that controls how much memory is used for buffering files
-that are being scanned.
+Set the parameter that controls how much memory is obtained at the start of
+processing for buffering files that are being scanned. See also
+\fB--max-buffer-size\fP below.
 .TP
 \fB-C\fP \fInumber\fP, \fB--context=\fP\fInumber\fP
 Output \fInumber\fP lines of context both before and after each matching line.
@ -167,13 +178,15 @@ This is equivalent to setting both \fB-A\fP and \fB-B\fP to the same value.
 .TP
 \fB-c\fP, \fB--count\fP
 Do not output lines from the files that are being scanned; instead output the
-number of matches (or non-matches if \fB-v\fP is used) that would otherwise
-have caused lines to be shown. By default, this count is the same as the number
-of suppressed lines, but if the \fB-M\fP (multiline) option is used (without
-\fB-v\fP), there may be more suppressed lines than the number of matches.
+number of lines that would have been shown, either because they matched, or, if
+\fB-v\fP is set, because they failed to match. By default, this count is
+exactly the same as the number of lines that would have been output, but if the
+\fB-M\fP (multiline) option is used (without \fB-v\fP), there may be more
+suppressed lines than the count (that is, the number of matches).
 .sp
 If no lines are selected, the number zero is output. If several files are are
-being scanned, a count is output for each of them. However, if the
+being scanned, a count is output for each of them and the \fB-t\fP option can
+be used to cause a total to be output at the end. However, if the
 \fB--files-with-matches\fP option is also used, only those files whose counts
 are greater than zero are listed. When \fB-c\fP is used, the \fB-A\fP,
 \fB-B\fP, and \fB-C\fP options are ignored.
@ -192,12 +205,22 @@ connected to a terminal. More resources are used when colouring is enabled,
 because \fBpcre2grep\fP has to search for all possible matches in a line, not
 just one, in order to colour them all.
 .sp
-The colour that is used can be specified by setting the environment variable
-PCRE2GREP_COLOUR or PCRE2GREP_COLOR. The value of this variable should be a
-string of two numbers, separated by a semicolon. They are copied directly into
-the control string for setting colour on a terminal, so it is your
-responsibility to ensure that they make sense. If neither of the environment
-variables is set, the default is "1;31", which gives red.
+The colour that is used can be specified by setting one of the environment
+variables PCRE2GREP_COLOUR, PCRE2GREP_COLOR, PCREGREP_COLOUR, or
+PCREGREP_COLOR, which are checked in that order. If none of these are set,
+\fBpcre2grep\fP looks for GREP_COLORS or GREP_COLOR (in that order). The value
+of the variable should be a string of two numbers, separated by a semicolon,
+except in the case of GREP_COLORS, which must start with "ms=" or "mt="
+followed by two semicolon-separated colours, terminated by the end of the
+string or by a colon. If GREP_COLORS does not start with "ms=" or "mt=" it is
+ignored, and GREP_COLOR is checked.
+.sp
+If the string obtained from one of the above variables contains any characters
+other than semicolon or digits, the setting is ignored and the default colour
+is used. The string is copied directly into the control string for setting
+colour on a terminal, so it is your responsibility to ensure that the values
+make sense. If no relevant environment variable is set, the default is "1;31",
+which gives red.
 .TP
 \fB-D\fP \fIaction\fP, \fB--devices=\fP\fIaction\fP
 If an input path is not a regular file or a directory, "action" specifies how
@ -273,17 +296,17 @@ files; it does not apply to patterns specified by any of the \fB--include\fP or
 \fB--exclude\fP options.
 .TP
 \fB-f\fP \fIfilename\fP, \fB--file=\fP\fIfilename\fP
-Read patterns from the file, one per line, and match them against
-each line of input. What constitutes a newline when reading the file is the
-operating system's default. The \fB--newline\fP option has no effect on this
-option. Trailing white space is removed from each line, and blank lines are
-ignored. An empty file contains no patterns and therefore matches nothing. See
-also the comments about multiple patterns versus a single pattern with
-alternatives in the description of \fB-e\fP above.
+Read patterns from the file, one per line, and match them against each line of
+input. What constitutes a newline when reading the file is the operating
+system's default. The \fB--newline\fP option has no effect on this option.
+Trailing white space is removed from each line, and blank lines are ignored. An
+empty file contains no patterns and therefore matches nothing. See also the
+comments about multiple patterns versus a single pattern with alternatives in
+the description of \fB-e\fP above.
 .sp
-If this option is given more than once, all the specified files are
-read. A data line is output if any of the patterns match it. A file name can
-be given as "-" to refer to the standard input. When \fB-f\fP is used, patterns
+If this option is given more than once, all the specified files are read. A
+data line is output if any of the patterns match it. A file name can be given
+as "-" to refer to the standard input. When \fB-f\fP is used, patterns
 specified on the command line using \fB-e\fP may also be present; they are
 tested before the file's patterns. However, no other pattern is taken from the
 command line; all arguments are treated as the names of paths to be searched.
@ -432,18 +455,25 @@ of use only if it is set smaller than \fB--match-limit\fP.
 There are no short forms for these options. The default settings are specified
 when the PCRE2 library is compiled, with the default default being 10 million.
 .TP
+\fB--max-buffer-size=\fInumber\fP
+This limits the expansion of the processing buffer, whose initial size can be
+set by \fB--buffer-size\fP. The maximum buffer size is silently forced to be no
+smaller than the starting buffer size.
+.TP
 \fB-M\fP, \fB--multiline\fP
-Allow patterns to match more than one line. When this option is given, patterns
-may usefully contain literal newline characters and internal occurrences of ^
-and $ characters. The output for a successful match may consist of more than
-one line. The first is the line in which the match started, and the last is the
-line in which the match ended. If the matched string ends with a newline
-sequence the output ends at the end of that line.
+Allow patterns to match more than one line. When this option is set, the PCRE2
+library is called in "multiline" mode. This allows a matched string to extend
+past the end of a line and continue on one or more subsequent lines. Patterns
+used with \fB-M\fP may usefully contain literal newline characters and internal
+occurrences of ^ and $ characters. The output for a successful match may
+consist of more than one line. The first line is the line in which the match
+started, and the last line is the line in which the match ended. If the matched
+string ends with a newline sequence, the output ends at the end of that line.
+If \fB-v\fP is set, none of the lines in a multi-line match are output. Once a
+match has been handled, scanning restarts at the beginning of the line after
+the one in which the match ended.
 .sp
-When this option is set, the PCRE2 library is called in "multiline" mode.
-However, \fBpcre2grep\fP still processes the input line by line. The difference
-is that a matched string may extend past the end of a line and continue on
-one or more subsequent lines. The newline sequence must be matched as part of
+The newline sequence that separates multiple lines must be matched as part of
 the pattern. For example, to find the phrase "regular expression" in a file
 where "regular" might be at the end of a line and "expression" at the start of
 the next line, you could use this command:
@ -455,11 +485,8 @@ and is followed by + so as to match trailing white space on the first line as
 well as possibly handling a two-character newline sequence.
 .sp
 There is a limit to the number of lines that can be matched, imposed by the way
-that \fBpcre2grep\fP buffers the input file as it scans it. However,
-\fBpcre2grep\fP ensures that at least 8K characters or the rest of the file
-(whichever is the shorter) are available for forward matching, and similarly
-the previous 8K characters (or all the previous characters, if fewer than 8K)
-are guaranteed to be available for lookbehind assertions. The \fB-M\fP option
+that \fBpcre2grep\fP buffers the input file as it scans it. With a sufficiently
+large processing buffer, this should not be a problem, but the \fB-M\fP option
 does not work when input is read line by line (see \fP--line-buffered\fP.)
 .TP
 \fB-N\fP \fInewline-type\fP, \fB--newline\fP=\fInewline-type\fP
@ -502,12 +529,13 @@ It should never be needed in normal use.
 Show only the part of the line that matched a pattern instead of the whole
 line. In this mode, no context is shown. That is, the \fB-A\fP, \fB-B\fP, and
 \fB-C\fP options are ignored. If there is more than one match in a line, each
-of them is shown separately. If \fB-o\fP is combined with \fB-v\fP (invert the
-sense of the match to find non-matching lines), no output is generated, but the
-return code is set appropriately. If the matched portion of the line is empty,
-nothing is output unless the file name or line number are being printed, in
-which case they are shown on an otherwise empty line. This option is mutually
-exclusive with \fB--file-offsets\fP and \fB--line-offsets\fP.
+of them is shown separately, on a separate line of output. If \fB-o\fP is
+combined with \fB-v\fP (invert the sense of the match to find non-matching
+lines), no output is generated, but the return code is set appropriately. If
+the matched portion of the line is empty, nothing is output unless the file
+name or line number are being printed, in which case they are shown on an
+otherwise empty line. This option is mutually exclusive with
+\fB--file-offsets\fP and \fB--line-offsets\fP.
 .TP
 \fB-o\fP\fInumber\fP, \fB--only-matching\fP=\fInumber\fP
 Show only the part of the line that matched the capturing parentheses of the
@ -519,10 +547,11 @@ for the non-argument case above also apply to this case. If the specified
 capturing parentheses do not exist in the pattern, or were not set in the
 match, nothing is output unless the file name or line number are being output.
 .sp
-If this option is given multiple times, multiple substrings are output, in the
-order the options are given. For example, -o3 -o1 -o3 causes the substrings
-matched by capturing parentheses 3 and 1 and then 3 again to be output. By
-default, there is no separator (but see the next option).
+If this option is given multiple times, multiple substrings are output for each
+match, in the order the options are given, and all on one line. For example,
+-o3 -o1 -o3 causes the substrings matched by capturing parentheses 3 and 1 and
+then 3 again to be output. By default, there is no separator (but see the next
+option).
 .TP
 \fB--om-separator\fP=\fItext\fP
 Specify a separating string for multiple occurrences of \fB-o\fP. The default
@ -547,6 +576,17 @@ Suppress error messages about non-existent or unreadable files. Such files are
 quietly skipped. However, the return code is still 2, even if matches were
 found in other files.
 .TP
+\fB-t\fP, \fB--total-count\fP
+This option is useful when scanning more than one file. If used on its own,
+\fB-t\fP suppresses all output except for a grand total number of matching
+lines (or non-matching lines if \fB-v\fP is used) in all the files. If \fB-t\fP
+is used with \fB-c\fP, a grand total is output except when the previous output
+is just one line. In other words, it is not output when just one file's count
+is listed. If file names are being output, the grand total is preceded by
+"TOTAL:". Otherwise, it appears as just another number. The \fB-t\fP option is
+ignored when used with \fB-L\fP (list files without matches), because the grand
+total would always be zero.
+.TP
 \fB-u\fP, \fB--utf-8\fP
 Operate in UTF-8 mode. This option is available only if PCRE2 has been compiled
 with UTF-8 support. All patterns (including those for any \fB--exclude\fP and
@ -570,11 +610,12 @@ specified by any of the \fB--include\fP or \fB--exclude\fP options.
 .TP
 \fB-x\fP, \fB--line-regex\fP, \fB--line-regexp\fP
 Force the patterns to be anchored (each must start matching at the beginning of
-a line) and in addition, require them to match entire lines. This is equivalent
-to having ^ and $ characters at the start and end of each alternative top-level
-branch in every pattern. This option applies only to the patterns that are
-matched against the contents of files; it does not apply to patterns specified
-by any of the \fB--include\fP or \fB--exclude\fP options.
+a line) and in addition, require them to match entire lines. In multiline mode
+the match may be more than one line. This is equivalent to having \eA and \eZ
+characters at the start and end of each alternative top-level branch in every
+pattern. This option applies only to the patterns that are matched against the
+contents of files; it does not apply to patterns specified by any of the
+\fB--include\fP or \fB--exclude\fP options.
 .
 .
 .SH "ENVIRONMENT VARIABLES"
@ -653,6 +694,58 @@ options does have data, it must be given in the first form, using an equals
 character. Otherwise \fBpcre2grep\fP will assume that it has no data.
 .
 .
+.SH "CALLING EXTERNAL SCRIPTS"
+.rs
+.sp
+\fBpcre2grep\fP has, by default, support for calling external programs or
+scripts during matching by making use of PCRE2's callout facility. However,
+this support can be disabled when \fBpcre2grep\fP is built. You can find out
+whether your binary has support for callouts by running it with the \fB--help\fP
+option. If the support is not enabled, all callouts in patterns are ignored by
+\fBpcre2grep\fP.
+.P
+A callout in a PCRE2 pattern is of the form (?C<arg>) where the argument is
+either a number or a quoted string (see the
+.\" HREF
+\fBpcre2callout\fP
+.\"
+documentation for details). Numbered callouts are ignored by \fBpcre2grep\fP.
+String arguments are parsed as a list of substrings separated by pipe (vertical
+bar) characters. The first substring must be an executable name, with the
+following substrings specifying arguments:
+.sp
+  executable_name|arg1|arg2|...
+.sp
+Any substring (including the executable name) may contain escape sequences
+started by a dollar character: $<digits> or ${<digits>} is replaced by the
+captured substring of the given decimal number, which must be greater than
+zero. If the number is greater than the number of capturing substrings, or if
+the capture is unset, the replacement is empty.
+.P
+Any other character is substituted by itself. In particular, $$ is replaced by
+a single dollar and $| is replaced by a pipe character. Here is an example:
+.sp
+  echo -e "abcde\en12345" | pcre2grep \e
+    '(?x)(.)(..(.))
+    (?C"/bin/echo|Arg1: [$1] [$2] [$3]|Arg2: $|${1}$| ($4)")()' -
+.sp
+  Output:
+.sp
+    Arg1: [a] [bcd] [d] Arg2: |a| ()
+    abcde
+    Arg1: [1] [234] [4] Arg2: |1| ()
+    12345
+.sp
+The parameters for the \fBexecv()\fP system call that is used to run the
+program or script are zero-terminated strings. This means that binary zero
+characters in the callout argument will cause premature termination of their
+substrings, and therefore should not be present. Any syntax errors in the
+string (for example, a dollar not followed by another character) cause the
+callout to be ignored. If running the program fails for any reason (including
+the non-existence of the executable), a local matching failure occurs and the
+matcher backtracks in the normal way.
+.
+.
 .SH "MATCHING ERRORS"
 .rs
 .sp
@ -683,7 +776,7 @@ affect the return code.
 .SH "SEE ALSO"
 .rs
 .sp
-\fBpcre2pattern\fP(3), \fBpcre2syntax\fP(3).
+\fBpcre2pattern\fP(3), \fBpcre2syntax\fP(3), \fBpcre2callout\fP(3).
 .
 .
 .SH AUTHOR
@ -700,6 +793,6 @@ Cambridge, England.
 .rs
 .sp
 .nf
-Last updated: 03 January 2015
-Copyright (c) 1997-2015 University of Cambridge.
+Last updated: 31 December 2016
+Copyright (c) 1997-2016 University of Cambridge.
 .fi
--- a/pcre2/doc/pcre2grep.txt
+++ b/pcre2/doc/pcre2grep.txt
@ -51,103 +51,115 @@ DESCRIPTION
       boundary is controlled by the -N (--newline) option.

       The amount of memory used for buffering files that are being scanned is
-       controlled  by a parameter that can be set by the --buffer-size option.
-       The default value for this parameter is  specified  when  pcre2grep  is
-       built,  with  the  default  default  being 20K. A block of memory three
-       times this size is used (to allow for buffering  "before"  and  "after"
-       lines). An error occurs if a line overflows the buffer.
+       controlled  by  parameters  that  can  be  set by the --buffer-size and
+       --max-buffer-size options. The first of these sets the size  of  buffer
+       that  is obtained at the start of processing. If an input file contains
+       very long lines, a larger buffer may be  needed;  this  is  handled  by
+       automatically extending the buffer, up to the limit specified by --max-
+       buffer-size. The default values for these parameters are specified when
+       pcre2grep  is built, with the default defaults being 20K and 1M respec-
+       tively. An error occurs if a line is too long and  the  buffer  can  no
+       longer be expanded.

-       Patterns  can  be  no  longer than 8K or BUFSIZ bytes, whichever is the
-       greater.  BUFSIZ is defined in <stdio.h>. When there is more  than  one
+       The  block  of  memory that is actually used is three times the "buffer
+       size", to allow for buffering "before" and "after" lines. If the buffer
+       size  is too small, fewer than requested "before" and "after" lines may
+       be output.
+
+       Patterns can be no longer than 8K or BUFSIZ  bytes,  whichever  is  the
+       greater.   BUFSIZ  is defined in <stdio.h>. When there is more than one
       pattern (specified by the use of -e and/or -f), each pattern is applied
-       to each line in the order in which they are defined,  except  that  all
+       to  each  line  in the order in which they are defined, except that all
       the -e patterns are tried before the -f patterns.

-       By  default, as soon as one pattern matches a line, no further patterns
+       By default, as soon as one pattern matches a line, no further  patterns
       are considered. However, if --colour (or --color) is used to colour the
-       matching  substrings, or if --only-matching, --file-offsets, or --line-
-       offsets is used to output only  the  part  of  the  line  that  matched
+       matching substrings, or if --only-matching, --file-offsets, or  --line-
+       offsets  is  used  to  output  only  the  part of the line that matched
       (either shown literally, or as an offset), scanning resumes immediately
-       following the match, so that further matches on the same  line  can  be
-       found.  If  there  are  multiple  patterns,  they  are all tried on the
-       remainder of the line, but patterns that follow the  one  that  matched
+       following  the  match,  so that further matches on the same line can be
+       found. If there are multiple  patterns,  they  are  all  tried  on  the
+       remainder  of  the  line, but patterns that follow the one that matched
       are not tried on the earlier part of the line.

-       This  behaviour  means  that  the  order in which multiple patterns are
-       specified can affect the output when one of the above options is  used.
-       This  is no longer the same behaviour as GNU grep, which now manages to
-       display earlier matches for later patterns (as  long  as  there  is  no
+       This behaviour means that the order  in  which  multiple  patterns  are
+       specified  can affect the output when one of the above options is used.
+       This is no longer the same behaviour as GNU grep, which now manages  to
+       display  earlier  matches  for  later  patterns (as long as there is no
       overlap).

-       Patterns  that can match an empty string are accepted, but empty string
+       Patterns that can match an empty string are accepted, but empty  string
       matches   are   never   recognized.   An   example   is   the   pattern
-       "(super)?(man)?",  in  which  all components are optional. This pattern
-       finds all occurrences of both "super" and  "man";  the  output  differs
-       from  matching  with  "super|man" when only the matching substrings are
+       "(super)?(man)?", in which all components are  optional.  This  pattern
+       finds  all  occurrences  of  both "super" and "man"; the output differs
+       from matching with "super|man" when only the  matching  substrings  are
       being shown.

-       If the LC_ALL or LC_CTYPE environment variable is set,  pcre2grep  uses
+       If  the  LC_ALL or LC_CTYPE environment variable is set, pcre2grep uses
       the value to set a locale when calling the PCRE2 library.  The --locale
       option can be used to override this.


 SUPPORT FOR COMPRESSED FILES

-       It is possible to compile pcre2grep so that it uses libz or  libbz2  to
-       read  files  whose names end in .gz or .bz2, respectively. You can find
+       It  is  possible to compile pcre2grep so that it uses libz or libbz2 to
+       read files whose names end in .gz or .bz2, respectively. You  can  find
       out whether your binary has support for one or both of these file types
       by running it with the --help option. If the appropriate support is not
-       present, files are treated as plain text. The standard input is  always
+       present,  files are treated as plain text. The standard input is always
       so treated.


 BINARY FILES

-       By  default,  a  file that contains a binary zero byte within the first
-       1024 bytes is identified as a binary file, and is processed  specially.
-       (GNU  grep  also  identifies  binary  files  in  this  manner.) See the
-       --binary-files option for a means of changing the way binary files  are
+       By default, a file that contains a binary zero byte  within  the  first
+       1024  bytes is identified as a binary file, and is processed specially.
+       (GNU grep also  identifies  binary  files  in  this  manner.)  See  the
+       --binary-files  option for a means of changing the way binary files are
       handled.


 OPTIONS

-       The  order  in  which some of the options appear can affect the output.
-       For example, both the -h and -l options affect  the  printing  of  file
-       names.  Whichever  comes later in the command line will be the one that
-       takes effect. Similarly, except where noted  below,  if  an  option  is
-       given  twice,  the  later setting is used. Numerical values for options
-       may be followed by K  or  M,  to  signify  multiplication  by  1024  or
+       The order in which some of the options appear can  affect  the  output.
+       For  example,  both  the  -h and -l options affect the printing of file
+       names. Whichever comes later in the command line will be the  one  that
+       takes  effect.  Similarly,  except  where  noted below, if an option is
+       given twice, the later setting is used. Numerical  values  for  options
+       may  be  followed  by  K  or  M,  to  signify multiplication by 1024 or
       1024*1024 respectively.

       --        This terminates the list of options. It is useful if the next
-                 item on the command line starts with a hyphen but is  not  an
-                 option.  This  allows for the processing of patterns and file
+                 item  on  the command line starts with a hyphen but is not an
+                 option. This allows for the processing of patterns  and  file
                 names that start with hyphens.

       -A number, --after-context=number
-                 Output number lines of context after each matching  line.  If
-                 file  names  and/or  line  numbers are being output, a hyphen
-                 separator is used instead of a colon for the context lines. A
-                 line  containing  "--" is output between each group of lines,
-                 unless they are in fact contiguous in  the  input  file.  The
-                 value  of number is expected to be relatively small. However,
-                 pcre2grep guarantees to have  up  to  8K  of  following  text
-                 available for context output.
+                 Output  up  to  number  lines  of context after each matching
+                 line. Fewer lines are output if the next match or the end  of
+                 the  file  is  reached,  or if the processing buffer size has
+                 been set too small. If file names  and/or  line  numbers  are
+                 being  output,  a hyphen separator is used instead of a colon
+                 for the context lines.  A  line  containing  "--"  is  output
+                 between each group of lines, unless they are in fact contigu-
+                 ous in the input file. The value of number is expected to  be
+                 relatively small. When -c is used, -A is ignored.

       -a, --text
                 Treat  binary  files as text. This is equivalent to --binary-
                 files=text.

       -B number, --before-context=number
-                 Output number lines of context before each matching line.  If
-                 file  names  and/or  line  numbers are being output, a hyphen
-                 separator is used instead of a colon for the context lines. A
-                 line  containing  "--" is output between each group of lines,
-                 unless they are in fact contiguous in  the  input  file.  The
-                 value  of number is expected to be relatively small. However,
-                 pcre2grep guarantees to have  up  to  8K  of  preceding  text
-                 available for context output.
+                 Output up to number lines of  context  before  each  matching
+                 line.  Fewer  lines  are  output if the previous match or the
+                 start of the file is within number lines, or if the  process-
+                 ing  buffer size has been set too small. If file names and/or
+                 line numbers are being output, a  hyphen  separator  is  used
+                 instead  of  a colon for the context lines. A line containing
+                 "--" is output between each group of lines, unless  they  are
+                 in  fact contiguous in the input file. The value of number is
+                 expected to be relatively small.  When  -c  is  used,  -B  is
+                 ignored.

       --binary-files=word
                 Specify  how binary files are to be processed. If the word is
@ -164,54 +176,68 @@ OPTIONS
                 any output or affecting the return code.

       --buffer-size=number
-                 Set the parameter that controls how much memory is  used  for
-                 buffering files that are being scanned.
+                 Set the parameter that controls how much memory  is  obtained
+                 at the start of processing for buffering files that are being
+                 scanned. See also --max-buffer-size below.

       -C number, --context=number
-                 Output  number  lines  of  context both before and after each
-                 matching line.  This is equivalent to setting both -A and  -B
+                 Output number lines of context both  before  and  after  each
+                 matching  line.  This is equivalent to setting both -A and -B
                 to the same value.

       -c, --count
-                 Do  not  output  lines from the files that are being scanned;
-                 instead output the number of matches (or non-matches if -v is
-                 used)  that would otherwise have caused lines to be shown. By
-                 default, this count is the same as the number  of  suppressed
-                 lines, but if the -M (multiline) option is used (without -v),
-                 there may  be  more  suppressed  lines  than  the  number  of
-                 matches.
+                 Do not output lines from the files that  are  being  scanned;
+                 instead  output  the  number  of  lines  that would have been
+                 shown, either because they matched, or, if -v is set, because
+                 they  failed  to match. By default, this count is exactly the
+                 same as the number of lines that would have been output,  but
+                 if  the -M (multiline) option is used (without -v), there may
+                 be more suppressed lines than the count (that is, the  number
+                 of matches).

                 If  no lines are selected, the number zero is output. If sev-
                 eral files are are being scanned, a count is output for  each
-                 of  them. However, if the --files-with-matches option is also
-                 used, only those files whose counts are greater than zero are
-                 listed.  When  -c  is  used,  the  -A, -B, and -C options are
-                 ignored.
+                 of  them and the -t option can be used to cause a total to be
+                 output at  the  end.  However,  if  the  --files-with-matches
+                 option  is  also  used,  only  those  files  whose counts are
+                 greater than zero are listed. When -c is used,  the  -A,  -B,
+                 and -C options are ignored.

       --colour, --color
                 If this option is given without any data, it is equivalent to
-                 "--colour=auto".   If  data  is required, it must be given in
+                 "--colour=auto".  If data is required, it must  be  given  in
                 the same shell item, separated by an equals sign.

       --colour=value, --color=value
                 This option specifies under what circumstances the parts of a
                 line that matched a pattern should be coloured in the output.
-                 By default, the output is not coloured. The value  (which  is
-                 optional,  see above) may be "never", "always", or "auto". In
-                 the latter case, colouring happens only if the standard  out-
-                 put  is connected to a terminal. More resources are used when
+                 By  default,  the output is not coloured. The value (which is
+                 optional, see above) may be "never", "always", or "auto".  In
+                 the  latter case, colouring happens only if the standard out-
+                 put is connected to a terminal. More resources are used  when
                 colouring is enabled, because pcre2grep has to search for all
-                 possible  matches in a line, not just one, in order to colour
+                 possible matches in a line, not just one, in order to  colour
                 them all.

-                 The colour that is used can be specified by setting the envi-
-                 ronment  variable  PCRE2GREP_COLOUR  or  PCRE2GREP_COLOR. The
-                 value of this variable should be a  string  of  two  numbers,
-                 separated  by  a semicolon. They are copied directly into the
-                 control string for setting colour on a  terminal,  so  it  is
-                 your  responsibility  to ensure that they make sense. If nei-
-                 ther of the environment variables  is  set,  the  default  is
-                 "1;31", which gives red.
+                 The  colour  that  is used can be specified by setting one of
+                 the environment variables PCRE2GREP_COLOUR,  PCRE2GREP_COLOR,
+                 PCREGREP_COLOUR, or PCREGREP_COLOR, which are checked in that
+                 order.  If  none  of  these  are  set,  pcre2grep  looks  for
+                 GREP_COLORS  or  GREP_COLOR (in that order). The value of the
+                 variable should be a string of two numbers,  separated  by  a
+                 semicolon,  except  in  the  case  of GREP_COLORS, which must
+                 start with "ms=" or "mt=" followed by two semicolon-separated
+                 colours,  terminated  by the end of the string or by a colon.
+                 If GREP_COLORS does not start  with  "ms="  or  "mt="  it  is
+                 ignored, and GREP_COLOR is checked.
+
+                 If  the  string obtained from one of the above variables con-
+                 tains any characters other than semicolon or digits, the set-
+                 ting is ignored and the default colour is used. The string is
+                 copied directly into the control string for setting colour on
+                 a  terminal,  so it is your responsibility to ensure that the
+                 values make sense. If no  relevant  environment  variable  is
+                 set, the default is "1;31", which gives red.

       -D action, --devices=action
                 If  an  input  path  is  not  a  regular file or a directory,
@ -299,12 +325,12 @@ OPTIONS
                 Read patterns from the file, one per  line,  and  match  them
                 against  each  line of input. What constitutes a newline when
                 reading the file  is  the  operating  system's  default.  The
-                 --newline option has no effect on this option. Trailing white
-                 space is removed from each line, and blank lines are ignored.
-                 An  empty  file  contains  no  patterns and therefore matches
-                 nothing. See also the comments about multiple patterns versus
-                 a  single  pattern with alternatives in the description of -e
-                 above.
+                 --newline  option  has  no  effect  on this option.  Trailing
+                 white space is removed from each line, and  blank  lines  are
+                 ignored.  An  empty  file  contains no patterns and therefore
+                 matches nothing. See also the comments  about  multiple  pat-
+                 terns  versus  a  single  pattern  with  alternatives  in the
+                 description of -e above.

                 If this option is given more than  once,  all  the  specified
                 files  are read. A data line is output if any of the patterns
@ -482,96 +508,101 @@ OPTIONS
                 tings are specified when the PCRE2 library is compiled,  with
                 the default default being 10 million.

-       -M, --multiline
-                 Allow  patterns to match more than one line. When this option
-                 is given, patterns may usefully contain literal newline char-
-                 acters  and  internal  occurrences of ^ and $ characters. The
-                 output for a successful match may consist of  more  than  one
-                 line.  The  first is the line in which the match started, and
-                 the last is the line in which the match ended. If the matched
-                 string  ends  with  a newline sequence the output ends at the
-                 end of that line.
+       --max-buffer-size=number
+                 This  limits  the  expansion  of the processing buffer, whose
+                 initial size can be set by --buffer-size. The maximum  buffer
+                 size  is  silently  forced to be no smaller than the starting
+                 buffer size.

-                 When this option is set, the PCRE2 library is called in "mul-
-                 tiline"  mode.   However, pcre2grep still processes the input
-                 line by line. The difference is that  a  matched  string  may
-                 extend  past  the  end  of a line and continue on one or more
-                 subsequent lines. The newline sequence  must  be  matched  as
-                 part of the pattern. For example, to find the phrase "regular
-                 expression" in a file where "regular" might be at the end  of
-                 a  line  and  "expression" at the start of the next line, you
-                 could use this command:
+       -M, --multiline
+                 Allow patterns to match more than one line. When this  option
+                 is set, the PCRE2 library is called in "multiline" mode. This
+                 allows a matched string to extend past the end of a line  and
+                 continue  on one or more subsequent lines. Patterns used with
+                 -M may usefully contain literal newline characters and inter-
+                 nal  occurrences of ^ and $ characters. The output for a suc-
+                 cessful match may consist of more than one  line.  The  first
+                 line  is  the  line  in which the match started, and the last
+                 line is the line in which the match  ended.  If  the  matched
+                 string  ends  with a newline sequence, the output ends at the
+                 end of that line.  If -v is set,  none  of  the  lines  in  a
+                 multi-line  match  are output. Once a match has been handled,
+                 scanning restarts at the beginning of the line after the  one
+                 in which the match ended.
+
+                 The  newline  sequence  that separates multiple lines must be
+                 matched as part of the pattern.  For  example,  to  find  the
+                 phrase  "regular  expression" in a file where "regular" might
+                 be at the end of a line and "expression" at the start of  the
+                 next line, you could use this command:

                   pcre2grep -M 'regular\s+expression' <file>

-                 The \s escape sequence matches  any  white  space  character,
-                 including  newlines,  and  is  followed  by  + so as to match
-                 trailing white space on the first line as  well  as  possibly
+                 The  \s  escape  sequence  matches any white space character,
+                 including newlines, and is followed  by  +  so  as  to  match
+                 trailing  white  space  on the first line as well as possibly
                 handling a two-character newline sequence.

-                 There  is a limit to the number of lines that can be matched,
-                 imposed by the way that pcre2grep buffers the input  file  as
-                 it  scans  it.  However,  pcre2grep  ensures that at least 8K
-                 characters or the rest of the file (whichever is the shorter)
-                 are  available for forward matching, and similarly the previ-
-                 ous 8K characters (or all the previous characters,  if  fewer
-                 than 8K) are guaranteed to be available for lookbehind asser-
-                 tions. The -M option does not work when input is read line by
-                 line (see --line-buffered.)
+                 There is a limit to the number of lines that can be  matched,
+                 imposed  by  the way that pcre2grep buffers the input file as
+                 it scans it. With a  sufficiently  large  processing  buffer,
+                 this should not be a problem, but the -M option does not work
+                 when input is read line by line (see --line-buffered.)

       -N newline-type, --newline=newline-type
-                 The  PCRE2  library  supports  five different conventions for
-                 indicating the ends of lines. They are  the  single-character
-                 sequences  CR  (carriage  return) and LF (linefeed), the two-
-                 character sequence CRLF, an "anycrlf" convention, which  rec-
-                 ognizes  any  of the preceding three types, and an "any" con-
+                 The PCRE2 library supports  five  different  conventions  for
+                 indicating  the  ends of lines. They are the single-character
+                 sequences CR (carriage return) and LF  (linefeed),  the  two-
+                 character  sequence CRLF, an "anycrlf" convention, which rec-
+                 ognizes any of the preceding three types, and an  "any"  con-
                 vention, in which any Unicode line ending sequence is assumed
-                 to  end a line. The Unicode sequences are the three just men-
-                 tioned, plus  VT  (vertical  tab,  U+000B),  FF  (form  feed,
-                 U+000C),   NEL  (next  line,  U+0085),  LS  (line  separator,
+                 to end a line. The Unicode sequences are the three just  men-
+                 tioned,  plus  VT  (vertical  tab,  U+000B),  FF  (form feed,
+                 U+000C),  NEL  (next  line,  U+0085),  LS  (line   separator,
                 U+2028), and PS (paragraph separator, U+2029).

-                 When the  PCRE2  library  is  built,  a  default  line-ending
-                 sequence   is  specified.   This  is  normally  the  standard
+                 When  the  PCRE2  library  is  built,  a  default line-ending
+                 sequence  is  specified.   This  is  normally  the   standard
                 sequence for the operating system. Unless otherwise specified
-                 by  this  option,  pcre2grep uses the library's default.  The
+                 by this option, pcre2grep uses the  library's  default.   The
                 possible values for this option are CR, LF, CRLF, ANYCRLF, or
-                 ANY.  This  makes  it possible to use pcre2grep to scan files
+                 ANY. This makes it possible to use pcre2grep  to  scan  files
                 that have come from other environments without having to mod-
-                 ify  their  line  endings.  If the data that is being scanned
-                 does not agree  with  the  convention  set  by  this  option,
-                 pcre2grep  may  behave in strange ways. Note that this option
-                 does not apply to files specified by the -f,  --exclude-from,
-                 or  --include-from  options,  which  are  expected to use the
+                 ify their line endings. If the data  that  is  being  scanned
+                 does  not  agree  with  the  convention  set  by this option,
+                 pcre2grep may behave in strange ways. Note that  this  option
+                 does  not apply to files specified by the -f, --exclude-from,
+                 or --include-from options, which  are  expected  to  use  the
                 operating system's standard newline sequence.

       -n, --line-number
                 Precede each output line by its line number in the file, fol-
-                 lowed  by  a colon for matching lines or a hyphen for context
+                 lowed by a colon for matching lines or a hyphen  for  context
                 lines. If the file name is also being output, it precedes the
-                 line  number.  When  the  -M option causes a pattern to match
-                 more than one line, only the first is preceded  by  its  line
+                 line number. When the -M option causes  a  pattern  to  match
+                 more  than  one  line, only the first is preceded by its line
                 number. This option is forced if --line-offsets is used.

-       --no-jit  If  the  PCRE2 library is built with support for just-in-time
+       --no-jit  If the PCRE2 library is built with support  for  just-in-time
                 compiling (which speeds up matching), pcre2grep automatically
                 makes use of this, unless it was explicitly disabled at build
-                 time. This option can be used to disable the use  of  JIT  at
-                 run  time. It is provided for testing and working round prob-
+                 time.  This  option  can be used to disable the use of JIT at
+                 run time. It is provided for testing and working round  prob-
                 lems.  It should never be needed in normal use.

       -o, --only-matching
                 Show only the part of the line that matched a pattern instead
-                 of  the  whole  line. In this mode, no context is shown. That
-                 is, the -A, -B, and -C options are ignored. If there is  more
-                 than  one  match in a line, each of them is shown separately.
-                 If -o is combined with -v (invert the sense of the  match  to
-                 find  non-matching  lines),  no  output is generated, but the
-                 return code is set appropriately. If the matched  portion  of
-                 the  line is empty, nothing is output unless the file name or
-                 line number are being printed, in which case they  are  shown
-                 on an otherwise empty line. This option is mutually exclusive
-                 with --file-offsets and --line-offsets.
+                 of the whole line. In this mode, no context  is  shown.  That
+                 is,  the -A, -B, and -C options are ignored. If there is more
+                 than one match in a line, each of them is  shown  separately,
+                 on  a  separate  line  of  output.  If -o is combined with -v
+                 (invert the sense of the match to find  non-matching  lines),
+                 no  output is generated, but the return code is set appropri-
+                 ately. If the matched portion of the line is  empty,  nothing
+                 is  output  unless  the  file  name  or line number are being
+                 printed, in which case they are shown on an  otherwise  empty
+                 line.  This  option is mutually exclusive with --file-offsets
+                 and --line-offsets.

       -onumber, --only-matching=number
                 Show only the part of the line  that  matched  the  capturing
@ -587,65 +618,80 @@ OPTIONS
                 put.

                 If this option is given multiple times,  multiple  substrings
-                 are  output, in the order the options are given. For example,
-                 -o3 -o1 -o3 causes the substrings matched by capturing paren-
-                 theses  3  and  1  and then 3 again to be output. By default,
-                 there is no separator (but see the next option).
+                 are  output  for  each  match,  in  the order the options are
+                 given, and all on one line. For example, -o3 -o1  -o3  causes
+                 the  substrings  matched by capturing parentheses 3 and 1 and
+                 then 3 again to be output. By default, there is no  separator
+                 (but see the next option).

       --om-separator=text
-                 Specify a separating string for multiple occurrences  of  -o.
-                 The  default is an empty string. Separating strings are never
+                 Specify  a  separating string for multiple occurrences of -o.
+                 The default is an empty string. Separating strings are  never
                 coloured.

       -q, --quiet
                 Work quietly, that is, display nothing except error messages.
-                 The  exit  status  indicates  whether or not any matches were
+                 The exit status indicates whether or  not  any  matches  were
                 found.

       -r, --recursive
-                 If any given path is a directory, recursively scan the  files
-                 it  contains, taking note of any --include and --exclude set-
-                 tings. By default, a directory is read as a normal  file;  in
-                 some  operating  systems this gives an immediate end-of-file.
-                 This option is a shorthand  for  setting  the  -d  option  to
+                 If  any given path is a directory, recursively scan the files
+                 it contains, taking note of any --include and --exclude  set-
+                 tings.  By  default, a directory is read as a normal file; in
+                 some operating systems this gives an  immediate  end-of-file.
+                 This  option  is  a  shorthand  for  setting the -d option to
                 "recurse".

       --recursion-limit=number
                 See --match-limit above.

       -s, --no-messages
-                 Suppress  error  messages  about  non-existent  or unreadable
-                 files. Such files are quietly skipped.  However,  the  return
+                 Suppress error  messages  about  non-existent  or  unreadable
+                 files.  Such  files  are quietly skipped. However, the return
                 code is still 2, even if matches were found in other files.

+       -t, --total-count
+                 This option is useful when scanning more than  one  file.  If
+                 used  on its own, -t suppresses all output except for a grand
+                 total number of matching lines (or non-matching lines  if  -v
+                 is  used)  in  all  the files. If -t is used with -c, a grand
+                 total is output except when the previous output is  just  one
+                 line.  In  other words, it is not output when just one file's
+                 count is listed. If file names are being  output,  the  grand
+                 total  is preceded by "TOTAL:". Otherwise, it appears as just
+                 another number. The -t option is ignored when  used  with  -L
+                 (list  files  without matches), because the grand total would
+                 always be zero.
+
       -u, --utf-8
                 Operate in UTF-8 mode. This option is available only if PCRE2
                 has been compiled with UTF-8 support. All patterns (including
-                 those  for  any --exclude and --include options) and all sub-
-                 ject lines that are scanned must be valid  strings  of  UTF-8
+                 those for any --exclude and --include options) and  all  sub-
+                 ject  lines  that  are scanned must be valid strings of UTF-8
                 characters.

       -V, --version
-                 Write  the version numbers of pcre2grep and the PCRE2 library
-                 to the standard output and then exit. Anything  else  on  the
+                 Write the version numbers of pcre2grep and the PCRE2  library
+                 to  the  standard  output and then exit. Anything else on the
                 command line is ignored.

       -v, --invert-match
-                 Invert  the  sense  of  the match, so that lines which do not
+                 Invert the sense of the match, so that  lines  which  do  not
                 match any of the patterns are the ones that are found.

       -w, --word-regex, --word-regexp
                 Force the patterns to match only whole words. This is equiva-
-                 lent  to  having \b at the start and end of the pattern. This
-                 option applies only to the patterns that are matched  against
-                 the  contents  of files; it does not apply to patterns speci-
+                 lent to having \b at the start and end of the  pattern.  This
+                 option  applies only to the patterns that are matched against
+                 the contents of files; it does not apply to  patterns  speci-
                 fied by any of the --include or --exclude options.

       -x, --line-regex, --line-regexp
-                 Force the patterns to be anchored (each must  start  matching
-                 at  the beginning of a line) and in addition, require them to
-                 match entire lines. This is equivalent  to  having  ^  and  $
-                 characters at the start and end of each alternative top-level
+                 Force  the  patterns to be anchored (each must start matching
+                 at the beginning of a line) and in addition, require them  to
+                 match  entire  lines. In multiline mode the match may be more
+                 than one line. This is equivalent to having \A and \Z charac-
+                 ters  at  the  start  and  end  of each alternative top-level
                 branch in every pattern. This option applies only to the pat-
                 terns that are matched against the contents of files; it does
                 not apply to patterns specified by any of  the  --include  or
@ -725,35 +771,86 @@ OPTIONS WITH DATA
       equals character. Otherwise pcre2grep will assume that it has no data.


+CALLING EXTERNAL SCRIPTS
+
+       pcre2grep has, by default, support for  calling  external  programs  or
+       scripts during matching by making use of PCRE2's callout facility. How-
+       ever, this support can be disabled when pcre2grep  is  built.  You  can
+       find  out  whether  your  binary has support for callouts by running it
+       with the --help option. If the support is not enabled, all callouts  in
+       patterns are ignored by pcre2grep.
+
+       A  callout  in a PCRE2 pattern is of the form (?C<arg>) where the argu-
+       ment is either a number or a quoted string (see the pcre2callout  docu-
+       mentation  for  details).  Numbered  callouts are ignored by pcre2grep.
+       String arguments are parsed as a list of substrings separated  by  pipe
+       (vertical  bar)  characters.  The first substring must be an executable
+       name, with the following substrings specifying arguments:
+
+         executable_name|arg1|arg2|...
+
+       Any substring  (including  the  executable  name)  may  contain  escape
+       sequences  started  by  a dollar character: $<digits> or ${<digits>} is
+       replaced by the captured substring of the given decimal  number,  which
+       must  be greater than zero. If the number is greater than the number of
+       capturing substrings, or if the capture is unset,  the  replacement  is
+       empty.
+
+       Any  other  character  is  substituted  by itself. In particular, $$ is
+       replaced by a single dollar and $| is replaced  by  a  pipe  character.
+       Here is an example:
+
+         echo -e "abcde\n12345" | pcre2grep \
+           '(?x)(.)(..(.))
+           (?C"/bin/echo|Arg1: [$1] [$2] [$3]|Arg2: $|${1}$| ($4)")()' -
+
+         Output:
+
+           Arg1: [a] [bcd] [d] Arg2: |a| ()
+           abcde
+           Arg1: [1] [234] [4] Arg2: |1| ()
+           12345
+
+       The parameters for the execv() system call that is used to run the pro-
+       gram or script are zero-terminated strings. This means that binary zero
+       characters  in the callout argument will cause premature termination of
+       their substrings, and therefore  should  not  be  present.  Any  syntax
+       errors  in  the  string  (for example, a dollar not followed by another
+       character) cause the callout to be  ignored.  If  running  the  program
+       fails for any reason (including the non-existence of the executable), a
+       local matching failure occurs and the matcher backtracks in the  normal
+       way.
+
+
 MATCHING ERRORS

-       It is possible to supply a regular expression that takes  a  very  long
-       time  to  fail  to  match certain lines. Such patterns normally involve
-       nested indefinite repeats, for example: (a+)*\d when matched against  a
-       line  of  a's  with  no  final digit. The PCRE2 matching function has a
-       resource limit that causes it to abort in these circumstances. If  this
-       happens,  pcre2grep  outputs  an error message and the line that caused
-       the problem to the standard error stream. If there  are  more  than  20
+       It  is  possible  to supply a regular expression that takes a very long
+       time to fail to match certain lines.  Such  patterns  normally  involve
+       nested  indefinite repeats, for example: (a+)*\d when matched against a
+       line of a's with no final digit. The  PCRE2  matching  function  has  a
+       resource  limit that causes it to abort in these circumstances. If this
+       happens, pcre2grep outputs an error message and the  line  that  caused
+       the  problem  to  the  standard error stream. If there are more than 20
       such errors, pcre2grep gives up.

-       The  --match-limit  option  of pcre2grep can be used to set the overall
-       resource limit; there is a second option called --recursion-limit  that
-       sets  a limit on the amount of memory (usually stack) that is used (see
+       The --match-limit option of pcre2grep can be used to  set  the  overall
+       resource  limit; there is a second option called --recursion-limit that
+       sets a limit on the amount of memory (usually stack) that is used  (see
       the discussion of these options above).


 DIAGNOSTICS

       Exit status is 0 if any matches were found, 1 if no matches were found,
-       and  2  for syntax errors, overlong lines, non-existent or inaccessible
-       files (even if matches were found in other files) or too many  matching
+       and 2 for syntax errors, overlong lines, non-existent  or  inaccessible
+       files  (even if matches were found in other files) or too many matching
       errors. Using the -s option to suppress error messages about inaccessi-
       ble files does not affect the return code.


 SEE ALSO

-       pcre2pattern(3), pcre2syntax(3).
+       pcre2pattern(3), pcre2syntax(3), pcre2callout(3).


 AUTHOR
@ -765,5 +862,5 @@ AUTHOR

 REVISION

-       Last updated: 03 January 2015
-       Copyright (c) 1997-2015 University of Cambridge.
+       Last updated: 31 December 2016
+       Copyright (c) 1997-2016 University of Cambridge.
--- a/pcre2/doc/pcre2jit.3
+++ b/pcre2/doc/pcre2jit.3
@ -1,4 +1,4 @@
-.TH PCRE2JIT 3 "27 November 2014" "PCRE2 10.00"
+.TH PCRE2JIT 3 "05 June 2016" "PCRE2 10.22"
 .SH NAME
 PCRE2 - Perl-compatible regular expressions (revised API)
 .SH "PCRE2 JUST-IN-TIME COMPILER SUPPORT"
@ -61,6 +61,12 @@ much faster than the normal interpretive code, but yields exactly the same
 results. The returned value from \fBpcre2_jit_compile()\fP is zero on success,
 or a negative error code.
 .P
+There is a limit to the size of pattern that JIT supports, imposed by the size
+of machine stack that it uses. The exact rules are not documented because they
+may change at any time, in particular, when new optimizations are introduced.
+If a pattern is too big, a call to \fBpcre2_jit_compile()\fB returns
+PCRE2_ERROR_NOMEMORY.
+.P
 PCRE2_JIT_COMPLETE requests the JIT compiler to generate code for complete
 matches. If you want to run partial matches using the PCRE2_PARTIAL_HARD or
 PCRE2_PARTIAL_SOFT options of \fBpcre2_match()\fP, you should set one or both
@ -122,6 +128,9 @@ PCRE2_NOTBOL, PCRE2_NOTEOL, PCRE2_NOTEMPTY, PCRE2_NOTEMPTY_ATSTART,
 PCRE2_NO_UTF_CHECK, PCRE2_PARTIAL_HARD, and PCRE2_PARTIAL_SOFT. The
 PCRE2_ANCHORED option is not supported at match time.
 .P
+If the PCRE2_NO_JIT option is passed to \fBpcre2_match()\fP it disables the
+use of JIT, forcing matching by the interpreter code.
+.P
 The only unsupported pattern items are \eC (match a single data unit) when
 running in a UTF mode, and a callout immediately before an assertion condition
 in a conditional group.
@ -207,8 +216,13 @@ for JIT matching. A callback function can therefore be used to determine
 whether a match operation was executed by JIT or by the interpreter.
 .P
 You may safely use the same JIT stack for more than one pattern (either by
-assigning directly or by callback), as long as the patterns are all matched
-sequentially in the same thread. In a multithread application, if you do not
+assigning directly or by callback), as long as the patterns are matched
+sequentially in the same thread. Currently, the only way to set up
+non-sequential matches in one thread is to use callouts: if a callout function
+starts another match, that match must use a different JIT stack to the one used
+for currently suspended match(es).
+.P
+In a multithread application, if you do not
 specify a JIT stack, or if you assign or pass back NULL from a callback, that
 is thread-safe, because each thread has its own machine stack. However, if you
 assign or pass back a non-NULL JIT stack, this must be a different stack for
@ -366,7 +380,7 @@ The fast path function is called \fBpcre2_jit_match()\fP, and it takes exactly
 the same arguments as \fBpcre2_match()\fP. The return values are also the same,
 plus PCRE2_ERROR_JIT_BADOPTION if a matching mode (partial or complete) is
 requested that was not compiled. Unsupported option bits (for example,
-PCRE2_ANCHORED) are ignored.
+PCRE2_ANCHORED) are ignored, as is the PCRE2_NO_JIT option.
 .P
 When you call \fBpcre2_match()\fP, as well as testing for invalid options, a
 number of other sanity checks are performed on the arguments. For example, if
@ -399,6 +413,6 @@ Cambridge, England.
 .rs
 .sp
 .nf
-Last updated: 27 November 2014
-Copyright (c) 1997-2014 University of Cambridge.
+Last updated: 05 June 2016
+Copyright (c) 1997-2016 University of Cambridge.
 .fi
--- a/pcre2/doc/pcre2limits.3
+++ b/pcre2/doc/pcre2limits.3
@ -1,4 +1,4 @@
-.TH PCRE2LIMITS 3 "25 November 2014" "PCRE2 10.00"
+.TH PCRE2LIMITS 3 "26 October 2016" "PCRE2 10.23"
 .SH NAME
 PCRE2 - Perl-compatible regular expressions (revised API)
 .SH "SIZE AND OTHER LIMITATIONS"
@ -20,6 +20,10 @@ documentation for details. In these cases the limit is substantially larger.
 However, the speed of execution is slower. In the 32-bit library, the internal
 linkage size is always 4.
 .P
+The maximum length of a source pattern string is essentially unlimited; it is
+the largest number a PCRE2_SIZE variable can hold. However, the program that
+calls \fBpcre2_compile()\fP can specify a smaller limit.
+.P
 The maximum length (in code units) of a subject string is one less than the
 largest number a PCRE2_SIZE variable can hold. PCRE2_SIZE is an unsigned
 integer type, usually defined as size_t. Its maximum value (that is
@ -37,22 +41,25 @@ documentation.
 .P
 All values in repeating quantifiers must be less than 65536.
 .P
+The maximum length of a lookbehind assertion is 65535 characters.
+.P
 There is no limit to the number of parenthesized subpatterns, but there can be
 no more than 65535 capturing subpatterns. There is, however, a limit to the
 depth of nesting of parenthesized subpatterns of all kinds. This is imposed in
-order to limit the amount of system stack used at compile time. The limit can
-be specified when PCRE2 is built; the default is 250.
-.P
-There is a limit to the number of forward references to subsequent subpatterns
-of around 200,000. Repeated forward references with fixed upper limits, for
-example, (?2){0,100} when subpattern number 2 is to the right, are included in
-the count. There is no limit to the number of backward references.
+order to limit the amount of system stack used at compile time. The default
+limit can be specified when PCRE2 is built; the default default is 250. An
+application can change this limit by calling pcre2_set_parens_nest_limit() to
+set the limit in a compile context.
 .P
 The maximum length of name for a named subpattern is 32 code units, and the
 maximum number of named subpatterns is 10000.
 .P
 The maximum length of a name in a (*MARK), (*PRUNE), (*SKIP), or (*THEN) verb
-is 255 for the 8-bit library and 65535 for the 16-bit and 32-bit libraries.
+is 255 code units for the 8-bit library and 65535 code units for the 16-bit and
+32-bit libraries.
+.P
+The maximum length of a string argument to a callout is the largest number a
+32-bit unsigned integer can hold.
 .
 .
 .SH AUTHOR
@ -69,6 +76,6 @@ Cambridge, England.
 .rs
 .sp
 .nf
-Last updated: 25 November 2014
-Copyright (c) 1997-2014 University of Cambridge.
+Last updated: 26 October 2016
+Copyright (c) 1997-2016 University of Cambridge.
 .fi
--- a/pcre2/doc/pcre2pattern.3
+++ b/pcre2/doc/pcre2pattern.3
@ -1,4 +1,4 @@
-.TH PCRE2PATTERN 3 "13 June 2015" "PCRE2 10.20"
+.TH PCRE2PATTERN 3 "27 December 2016" "PCRE2 10.23"
 .SH NAME
 PCRE2 - Perl-compatible regular expressions (revised API)
 .SH "PCRE2 REGULAR EXPRESSION DETAILS"
@ -158,6 +158,11 @@ be less than the value set (or defaulted) by the caller of \fBpcre2_match()\fP
 for it to have any effect. In other words, the pattern writer can lower the
 limits set by the programmer, but not raise them. If there is more than one
 setting of one of these limits, the lower value is used.
+.P
+The match limit is used (but in a different way) when JIT is being used, but it
+is not relevant, and is ignored, when matching with \fBpcre2_dfa_match()\fP.
+However, the recursion limit is relevant for DFA matching, which does use some
+function recursion, in particular, for recursions within the pattern.
 .
 .
 .\" HTML <a name="newlines"></a>
@ -359,29 +364,28 @@ case letter, it is converted to upper case. Then bit 6 of the character (hex
 40) is inverted. Thus \ecA to \ecZ become hex 01 to hex 1A (A is 41, Z is 5A),
 but \ec{ becomes hex 3B ({ is 7B), and \ec; becomes hex 7B (; is 3B). If the
 code unit following \ec has a value less than 32 or greater than 126, a
-compile-time error occurs. This locks out non-printable ASCII characters in all
-modes.
+compile-time error occurs.
 .P
 When PCRE2 is compiled in EBCDIC mode, \ea, \ee, \ef, \en, \er, and \et
 generate the appropriate EBCDIC code values. The \ec escape is processed
 as specified for Perl in the \fBperlebcdic\fP document. The only characters
 that are allowed after \ec are A-Z, a-z, or one of @, [, \e, ], ^, _, or ?. Any
-other character provokes a compile-time error. The sequence \e@ encodes
-character code 0; the letters (in either case) encode characters 1-26 (hex 01
-to hex 1A); [, \e, ], ^, and _ encode characters 27-31 (hex 1B to hex 1F), and
-\e? becomes either 255 (hex FF) or 95 (hex 5F).
+other character provokes a compile-time error. The sequence \ec@ encodes
+character code 0; after \ec the letters (in either case) encode characters 1-26
+(hex 01 to hex 1A); [, \e, ], ^, and _ encode characters 27-31 (hex 1B to hex
+1F), and \ec? becomes either 255 (hex FF) or 95 (hex 5F).
 .P
-Thus, apart from \e?, these escapes generate the same character code values as
+Thus, apart from \ec?, these escapes generate the same character code values as
 they do in an ASCII environment, though the meanings of the values mostly
-differ. For example, \eG always generates code value 7, which is BEL in ASCII
+differ. For example, \ecG always generates code value 7, which is BEL in ASCII
 but DEL in EBCDIC.
 .P
-The sequence \e? generates DEL (127, hex 7F) in an ASCII environment, but
+The sequence \ec? generates DEL (127, hex 7F) in an ASCII environment, but
 because 127 is not a control character in EBCDIC, Perl makes it generate the
 APC character. Unfortunately, there are several variants of EBCDIC. In most of
 them the APC character has the value 255 (hex FF), but in the one Perl calls
 POSIX-BC its value is 95 (hex 5F). If certain other characters have POSIX-BC
-values, PCRE2 makes \e? generate 95; otherwise it generates 255.
+values, PCRE2 makes \ec? generate 95; otherwise it generates 255.
 .P
 After \e0 up to two further octal digits are read. If there are fewer than two
 digits, just those that are present are used. Thus the sequence \e0\ex\e015
@ -508,9 +512,9 @@ by code point, as described in the previous section.
 .SS "Absolute and relative back references"
 .rs
 .sp
-The sequence \eg followed by an unsigned or a negative number, optionally
-enclosed in braces, is an absolute or relative back reference. A named back
-reference can be coded as \eg{name}. Back references are discussed
+The sequence \eg followed by a signed or unsigned number, optionally enclosed
+in braces, is an absolute or relative back reference. A named back reference
+can be coded as \eg{name}. Back references are discussed
 .\" HTML <a href="#backreferences">
 .\" </a>
 later,
@ -671,8 +675,8 @@ below.
 This particular group matches either the two-character sequence CR followed by
 LF, or one of the single characters LF (linefeed, U+000A), VT (vertical tab,
 U+000B), FF (form feed, U+000C), CR (carriage return, U+000D), or NEL (next
-line, U+0085). The two-character sequence is treated as a single unit that
-cannot be split.
+line, U+0085). Because this is an atomic group, the two-character sequence is
+treated as a single unit that cannot be split.
 .P
 In other modes, two additional characters whose codepoints are greater than 255
 are added: LS (line separator, U+2028) and PS (paragraph separator, U+2029).
@ -738,6 +742,8 @@ example:
 Those that are not part of an identified script are lumped together as
 "Common". The current list of scripts is:
 .P
+Ahom,
+Anatolian_Hieroglyphs,
 Arabic,
 Armenian,
 Avestan,
@ -778,6 +784,7 @@ Gurmukhi,
 Han,
 Hangul,
 Hanunoo,
+Hatran,
 Hebrew,
 Hiragana,
 Imperial_Aramaic,
@ -814,12 +821,14 @@ Miao,
 Modi,
 Mongolian,
 Mro,
+Multani,
 Myanmar,
 Nabataean,
 New_Tai_Lue,
 Nko,
 Ogham,
 Ol_Chiki,
+Old_Hungarian,
 Old_Italic,
 Old_North_Arabian,
 Old_Permic,
@ -841,6 +850,7 @@ Saurashtra,
 Sharada,
 Shavian,
 Siddham,
+SignWriting,
 Sinhala,
 Sora_Sompeng,
 Sundanese,
@ -1177,6 +1187,18 @@ patterns that are anchored in single line mode because all branches start with
 when the \fIstartoffset\fP argument of \fBpcre2_match()\fP is non-zero. The
 PCRE2_DOLLAR_ENDONLY option is ignored if PCRE2_MULTILINE is set.
 .P
+When the newline convention (see
+.\" HTML <a href="#newlines">
+.\" </a>
+"Newline conventions"
+.\"
+below) recognizes the two-character sequence CRLF as a newline, this is
+preferred, even if the single characters CR and LF are also recognized as
+newlines. For example, if the newline convention is "any", a multiline mode
+circumflex matches before "xyz" in the string "abc\er\enxyz" rather than after
+CR, even though CR on its own is a valid newline. (It also matches at the very
+start of the string, of course.)
+.P
 Note that the sequences \eA, \eZ, and \ez can be used to match the start and
 end of the subject in both modes, and if all branches of a pattern start with
 \eA it is always anchored, whether or not PCRE2_MULTILINE is set.
@ -1227,21 +1249,31 @@ with \eC in UTF-8 or UTF-16 mode means that the rest of the string may start
 with a malformed UTF character. This has undefined results, because PCRE2
 assumes that it is matching character by character in a valid UTF string (by
 default it checks the subject string's validity at the start of processing
-unless the PCRE2_NO_UTF_CHECK option is used). An application can lock out the
-use of \eC by setting the PCRE2_NEVER_BACKSLASH_C option.
+unless the PCRE2_NO_UTF_CHECK option is used).
+.P
+An application can lock out the use of \eC by setting the
+PCRE2_NEVER_BACKSLASH_C option when compiling a pattern. It is also possible to
+build PCRE2 with the use of \eC permanently disabled.
 .P
 PCRE2 does not allow \eC to appear in lookbehind assertions
 .\" HTML <a href="#lookbehind">
 .\" </a>
 (described below)
 .\"
-in a UTF mode, because this would make it impossible to calculate the length of
-the lookbehind.
+in UTF-8 or UTF-16 modes, because this would make it impossible to calculate
+the length of the lookbehind. Neither the alternative matching function
+\fBpcre2_dfa_match()\fP nor the JIT optimizer support \eC in these UTF modes.
+The former gives a match-time error; the latter fails to optimize and so the
+match is always run using the interpreter.
+.P
+In the 32-bit library, however, \eC is always supported (when not explicitly
+locked out) because it always matches a single code unit, whether or not UTF-32
+is specified.
 .P
 In general, the \eC escape sequence is best avoided. However, one way of using
-it that avoids the problem of malformed UTF characters is to use a lookahead to
-check the length of the next character, as in this pattern, which could be used
-with a UTF-8 string (ignore white space and line breaks):
+it that avoids the problem of malformed UTF-8 or UTF-16 characters is to use a
+lookahead to check the length of the next character, as in this pattern, which
+could be used with a UTF-8 string (ignore white space and line breaks):
 .sp
  (?| (?=[\ex00-\ex7f])(\eC) |
      (?=[\ex80-\ex{7ff}])(\eC)(\eC) |
@ -1297,37 +1329,6 @@ when matching character classes, whatever line-ending sequence is in use, and
 whatever setting of the PCRE2_DOTALL and PCRE2_MULTILINE options is used. A
 class such as [^a] always matches one of these characters.
 .P
-The minus (hyphen) character can be used to specify a range of characters in a
-character class. For example, [d-m] matches any letter between d and m,
-inclusive. If a minus character is required in a class, it must be escaped with
-a backslash or appear in a position where it cannot be interpreted as
-indicating a range, typically as the first or last character in the class, or
-immediately after a range. For example, [b-d-z] matches letters in the range b
-to d, a hyphen character, or z.
-.P
-It is not possible to have the literal character "]" as the end character of a
-range. A pattern such as [W-]46] is interpreted as a class of two characters
-("W" and "-") followed by a literal string "46]", so it would match "W46]" or
-"-46]". However, if the "]" is escaped with a backslash it is interpreted as
-the end of range, so [W-\e]46] is interpreted as a class containing a range
-followed by two other characters. The octal or hexadecimal representation of
-"]" can also be used to end a range.
-.P
-An error is generated if a POSIX character class (see below) or an escape
-sequence other than one that defines a single character appears at a point
-where a range ending character is expected. For example, [z-\exff] is valid,
-but [A-\ed] and [A-[:digit:]] are not.
-.P
-Ranges operate in the collating sequence of character values. They can also be
-used for characters specified numerically, for example [\e000-\e037]. Ranges
-can include any characters that are valid for the current mode.
-.P
-If a range that includes letters is used when caseless matching is set, it
-matches the letters in either case. For example, [W-c] is equivalent to
-[][\e\e^_`wxyzabc], matched caselessly, and in a non-UTF mode, if character
-tables for a French locale are in use, [\exc8-\excb] matches accented E
-characters in both cases.
-.P
 The character escape sequences \ed, \eD, \eh, \eH, \ep, \eP, \es, \eS, \ev,
 \eV, \ew, and \eW may appear in a character class, and add the characters that
 they match to the class. For example, [\edABCDEF] matches any hexadecimal
@ -1343,6 +1344,46 @@ class; it matches the backspace character. The sequences \eB, \eN, \eR, and \eX
 are not special inside a character class. Like any other unrecognized escape
 sequences, they cause an error.
 .P
+The minus (hyphen) character can be used to specify a range of characters in a
+character class. For example, [d-m] matches any letter between d and m,
+inclusive. If a minus character is required in a class, it must be escaped with
+a backslash or appear in a position where it cannot be interpreted as
+indicating a range, typically as the first or last character in the class,
+or immediately after a range. For example, [b-d-z] matches letters in the range
+b to d, a hyphen character, or z.
+.P
+Perl treats a hyphen as a literal if it appears before or after a POSIX class
+(see below) or a character type escape such as as \ed, but gives a warning in
+its warning mode, as this is most likely a user error. As PCRE2 has no facility
+for warning, an error is given in these cases.
+.P
+It is not possible to have the literal character "]" as the end character of a
+range. A pattern such as [W-]46] is interpreted as a class of two characters
+("W" and "-") followed by a literal string "46]", so it would match "W46]" or
+"-46]". However, if the "]" is escaped with a backslash it is interpreted as
+the end of range, so [W-\e]46] is interpreted as a class containing a range
+followed by two other characters. The octal or hexadecimal representation of
+"]" can also be used to end a range.
+.P
+Ranges normally include all code points between the start and end characters,
+inclusive. They can also be used for code points specified numerically, for
+example [\e000-\e037]. Ranges can include any characters that are valid for the
+current mode.
+.P
+There is a special case in EBCDIC environments for ranges whose end points are
+both specified as literal letters in the same case. For compatibility with
+Perl, EBCDIC code points within the range that are not letters are omitted. For
+example, [h-k] matches only four characters, even though the codes for h and k
+are 0x88 and 0x92, a range of 11 code points. However, if the range is
+specified numerically, for example, [\ex88-\ex92] or [h-\ex92], all code points
+are included.
+.P
+If a range that includes letters is used when caseless matching is set, it
+matches the letters in either case. For example, [W-c] is equivalent to
+[][\e\e^_`wxyzabc], matched caselessly, and in a non-UTF mode, if character
+tables for a French locale are in use, [\exc8-\excb] matches accented E
+characters in both cases.
+.P
 A circumflex can conveniently be used with the upper case character types to
 specify a more restricted set of characters than the matching lower case type.
 For example, the class [^\eW_] matches any letter or digit, but not underscore,
@ -1514,12 +1555,8 @@ respectively.
 .P
 When one of these option changes occurs at top level (that is, not inside
 subpattern parentheses), the change applies to the remainder of the pattern
-that follows. If the change is placed right at the start of a pattern, PCRE2
-extracts it into the global options (and it will therefore show up in data
-extracted by the \fBpcre2_pattern_info()\fP function).
-.P
-An option change within a subpattern (see below for a description of
-subpatterns) affects only that part of the subpattern that follows it, so
+that follows. An option change within a subpattern (see below for a description
+of subpatterns) affects only that part of the subpattern that follows it, so
 .sp
  (a(?i)b)c
 .sp
@ -1650,6 +1687,9 @@ first one in the pattern with the given number. The following pattern matches
 .sp
  /(?|(abc)|(def))(?1)/
 .sp
+A relative reference such as (?-1) is no different: it is just a convenient way
+of computing an absolute group number.
+.P
 If a
 .\" HTML <a href="#conditions">
 .\" </a>
@ -2056,9 +2096,9 @@ no such problem when named parentheses are used. A back reference to any
 subpattern is possible using named parentheses (see below).
 .P
 Another way of avoiding the ambiguity inherent in the use of digits following a
-backslash is to use the \eg escape sequence. This escape must be followed by an
-unsigned number or a negative number, optionally enclosed in braces. These
-examples are all identical:
+backslash is to use the \eg escape sequence. This escape must be followed by a
+signed or unsigned number, optionally enclosed in braces. These examples are
+all identical:
 .sp
  (ring), \e1
  (ring), \eg1
@ -2066,8 +2106,7 @@ examples are all identical:
 .sp
 An unsigned number specifies an absolute reference without the ambiguity that
 is present in the older syntax. It is also useful when literal digits follow
-the reference. A negative number is a relative reference. Consider this
-example:
+the reference. A signed number is a relative reference. Consider this example:
 .sp
  (abc(def)ghi)\eg{-1}
 .sp
@ -2077,6 +2116,10 @@ Similarly, \eg{-2} would be equivalent to \e1. The use of relative references
 can be helpful in long patterns, and also in patterns that are created by
 joining together fragments that contain references within themselves.
 .P
+The sequence \eg{+1} is a reference to the next capturing subpattern. This kind
+of forward reference can be useful it patterns that repeat. Perl does not
+support the use of + in this way.
+.P
 A back reference matches whatever actually matched the capturing subpattern in
 the current subject string, rather than anything matching the subpattern
 itself (see
@ -2184,6 +2227,13 @@ numbering the capturing subpatterns in the whole pattern. However, substring
 capturing is carried out only for positive assertions. (Perl sometimes, but not
 always, does do capturing in negative assertions.)
 .P
+WARNING: If a positive assertion containing one or more capturing subpatterns
+succeeds, but failure to match later in the pattern causes backtracking over
+this assertion, the captures within the assertion are reset only if no higher
+numbered captures are already set. This is, unfortunately, a fundamental
+limitation of the current implementation; it may get removed in a future
+reworking.
+.P
 For compatibility with Perl, most assertion subpatterns may be repeated; though
 it makes no sense to assert the same thing several times, the side effect of
 capturing parentheses may occasionally be useful. However, an assertion that
@ -2281,23 +2331,34 @@ temporarily move the current position back by the fixed length and then try to
 match. If there are insufficient characters before the current position, the
 assertion fails.
 .P
-In a UTF mode, PCRE2 does not allow the \eC escape (which matches a single code
-unit even in a UTF mode) to appear in lookbehind assertions, because it makes
-it impossible to calculate the length of the lookbehind. The \eX and \eR
-escapes, which can match different numbers of code units, are also not
-permitted.
+In UTF-8 and UTF-16 modes, PCRE2 does not allow the \eC escape (which matches a
+single code unit even in a UTF mode) to appear in lookbehind assertions,
+because it makes it impossible to calculate the length of the lookbehind. The
+\eX and \eR escapes, which can match different numbers of code units, are never
+permitted in lookbehinds.
 .P
 .\" HTML <a href="#subpatternsassubroutines">
 .\" </a>
 "Subroutine"
 .\"
 calls (see below) such as (?2) or (?&X) are permitted in lookbehinds, as long
-as the subpattern matches a fixed-length string.
+as the subpattern matches a fixed-length string. However,
 .\" HTML <a href="#recursion">
 .\" </a>
-Recursion,
+recursion,
 .\"
-however, is not supported.
+that is, a "subroutine" call into a group that is already active,
+is not supported.
+.P
+Perl does not support back references in lookbehinds. PCRE2 does support them,
+but only if certain conditions are met. The PCRE2_MATCH_UNSET_BACKREF option
+must not be set, there must be no use of (?| in the pattern (it creates
+duplicate subpattern numbers), and if the back reference is by name, the name
+must be unique. Of course, the referenced subpattern must itself be of fixed
+length. The following pattern matches words containing at least two characters
+that begin and end with the same character:
+.sp
+   \eb(\ew)\ew++(?<=\e1)
 .P
 Possessive quantifiers can be used in conjunction with lookbehind assertions to
 specify efficient matching of fixed-length strings at the end of subject
@ -2436,7 +2497,9 @@ This makes the fragment independent of the parentheses in the larger pattern.
 .sp
 Perl uses the syntax (?(<name>)...) or (?('name')...) to test for a used
 subpattern by name. For compatibility with earlier versions of PCRE1, which had
-this facility before Perl, the syntax (?(name)...) is also recognized.
+this facility before Perl, the syntax (?(name)...) is also recognized. Note,
+however, that undelimited names consisting of the letter R followed by digits
+are ambiguous (see the following section).
 .P
 Rewriting the above example to use a named subpattern gives this:
 .sp
@ -2450,33 +2513,55 @@ matched.
 .SS "Checking for pattern recursion"
 .rs
 .sp
-If the condition is the string (R), and there is no subpattern with the name R,
-the condition is true if a recursive call to the whole pattern or any
-subpattern has been made. If digits or a name preceded by ampersand follow the
-letter R, for example:
-.sp
-  (?(R3)...) or (?(R&name)...)
-.sp
-the condition is true if the most recent recursion is into a subpattern whose
-number or name is given. This condition does not check the entire recursion
-stack. If the name used in a condition of this kind is a duplicate, the test is
-applied to all subpatterns of the same name, and is true if any one of them is
-the most recent recursion.
-.P
-At "top level", all these recursion test conditions are false.
+"Recursion" in this sense refers to any subroutine-like call from one part of
+the pattern to another, whether or not it is actually recursive. See the
+sections entitled
 .\" HTML <a href="#recursion">
 .\" </a>
-The syntax for recursive patterns
+"Recursive patterns"
 .\"
-is described below.
+and
+.\" HTML <a href="#subpatternsassubroutines">
+.\" </a>
+"Subpatterns as subroutines"
+.\"
+below for details of recursion and subpattern calls.
+.P
+If a condition is the string (R), and there is no subpattern with the name R,
+the condition is true if matching is currently in a recursion or subroutine
+call to the whole pattern or any subpattern. If digits follow the letter R, and
+there is no subpattern with that name, the condition is true if the most recent
+call is into a subpattern with the given number, which must exist somewhere in
+the overall pattern. This is a contrived example that is equivalent to a+b:
+.sp
+  ((?(R1)a+|(?1)b))
+.sp
+However, in both cases, if there is a subpattern with a matching name, the
+condition tests for its being set, as described in the section above, instead
+of testing for recursion. For example, creating a group with the name R1 by
+adding (?<R1>) to the above pattern completely changes its meaning.
+.P
+If a name preceded by ampersand follows the letter R, for example:
+.sp
+  (?(R&name)...)
+.sp
+the condition is true if the most recent recursion is into a subpattern of that
+name (which must exist within the pattern).
+.P
+This condition does not check the entire recursion stack. It tests only the
+current level. If the name used in a condition of this kind is a duplicate, the
+test is applied to all subpatterns of the same name, and is true if any one of
+them is the most recent recursion.
+.P
+At "top level", all these recursion test conditions are false.
 .
 .
 .\" HTML <a name="subdefine"></a>
 .SS "Defining subpatterns for use by reference only"
 .rs
 .sp
-If the condition is the string (DEFINE), and there is no subpattern with the
-name DEFINE, the condition is always false. In this case, there may be only one
+If the condition is the string (DEFINE), the condition is always false, even if
+there is a group with the name DEFINE. In this case, there may be only one
 alternative in the subpattern. It is always skipped if control reaches this
 point in the pattern; the idea of DEFINE is that it can be used to define
 subroutines that can be referenced from elsewhere. (The use of
@ -2513,7 +2598,8 @@ For example:
  (?(VERSION>=10.4)yes|no)
 .sp
 This pattern matches "yes" if the PCRE2 version is greater or equal to 10.4, or
-"no" otherwise.
+"no" otherwise. The fractional part of the version number may not contain more
+than two digits.
 .
 .
 .SS "Assertion conditions"
@ -2630,6 +2716,23 @@ pattern above you can write (?-2) to refer to the second most recently opened
 parentheses preceding the recursion. In other words, a negative number counts
 capturing parentheses leftwards from the point at which it is encountered.
 .P
+Be aware however, that if
+.\" HTML <a href="#dupsubpatternnumber">
+.\" </a>
+duplicate subpattern numbers
+.\"
+are in use, relative references refer to the earliest subpattern with the
+appropriate number. Consider, for example:
+.sp
+  (?|(a)|(b)) (c) (?-2)
+.sp
+The first two capturing groups (a) and (b) are both numbered 1, and group (c)
+is number 2. When the reference (?-2) is encountered, the second most recently
+opened parentheses has the number 1, but it is the first such group (the (a)
+group) to which the recursion refers. This would be the same if an absolute
+reference (?1) was used. In other words, relative references are just a
+shorthand for computing a group number.
+.P
 It is also possible to refer to subsequently opened parentheses, by writing
 references such as (?+2). However, these cannot be recursive because the
 reference is not inside the parentheses that are referenced. They are always
@ -2929,14 +3032,32 @@ in production code should be noted to avoid problems during upgrades." The same
 remarks apply to the PCRE2 features described in this section.
 .P
 The new verbs make use of what was previously invalid syntax: an opening
-parenthesis followed by an asterisk. They are generally of the form
-(*VERB) or (*VERB:NAME). Some may take either form, possibly behaving
-differently depending on whether or not a name is present. A name is any
-sequence of characters that does not include a closing parenthesis. The maximum
-length of name is 255 in the 8-bit library and 65535 in the 16-bit and 32-bit
-libraries. If the name is empty, that is, if the closing parenthesis
-immediately follows the colon, the effect is as if the colon were not there.
-Any number of these verbs may occur in a pattern.
+parenthesis followed by an asterisk. They are generally of the form (*VERB) or
+(*VERB:NAME). Some verbs take either form, possibly behaving differently
+depending on whether or not a name is present.
+.P
+By default, for compatibility with Perl, a name is any sequence of characters
+that does not include a closing parenthesis. The name is not processed in
+any way, and it is not possible to include a closing parenthesis in the name.
+This can be changed by setting the PCRE2_ALT_VERBNAMES option, but the result
+is no longer Perl-compatible.
+.P
+When PCRE2_ALT_VERBNAMES is set, backslash processing is applied to verb names
+and only an unescaped closing parenthesis terminates the name. However, the
+only backslash items that are permitted are \eQ, \eE, and sequences such as
+\ex{100} that define character code points. Character type escapes such as \ed
+are faulted.
+.P
+A closing parenthesis can be included in a name either as \e) or between \eQ
+and \eE. In addition to backslash processing, if the PCRE2_EXTENDED option is
+also set, unescaped whitespace in verb names is skipped, and #-comments are
+recognized, exactly as in the rest of the pattern. PCRE2_EXTENDED does not
+affect verb names unless PCRE2_ALT_VERBNAMES is also set.
+.P
+The maximum length of a name is 255 in the 8-bit library and 65535 in the
+16-bit and 32-bit libraries. If the name is empty, that is, if the closing
+parenthesis immediately follows the colon, the effect is as if the colon were
+not there. Any number of these verbs may occur in a pattern.
 .P
 Since these verbs are specifically related to backtracking, most of them can be
 used only when the pattern is to be matched using the traditional matching
@ -3361,6 +3482,6 @@ Cambridge, England.
 .rs
 .sp
 .nf
-Last updated: 13 June 2015
-Copyright (c) 1997-2015 University of Cambridge.
+Last updated: 27 December 2016
+Copyright (c) 1997-2016 University of Cambridge.
 .fi
--- a/pcre2/doc/pcre2posix.3
+++ b/pcre2/doc/pcre2posix.3
@ -1,4 +1,4 @@
-.TH PCRE2POSIX 3 "20 October 2014" "PCRE2 10.00"
+.TH PCRE2POSIX 3 "31 January 2016" "PCRE2 10.22"
 .SH NAME
 PCRE2 - Perl-compatible regular expressions (revised API)
 .SH "SYNOPSIS"
@ -28,7 +28,7 @@ expression 8-bit library. See the
 \fBpcre2api\fP
 .\"
 documentation for a description of PCRE2's native API, which contains much
-additional functionality. There is no POSIX-style wrapper for PCRE2's 16-bit
+additional functionality. There are no POSIX-style wrappers for PCRE2's 16-bit
 and 32-bit libraries.
 .P
 The functions described here are just wrapper functions that ultimately call
@ -44,9 +44,9 @@ value zero. This has no effect, but since programs that are written to the
 POSIX interface often use it, this makes it easier to slot in PCRE2 as a
 replacement library. Other POSIX options are not even defined.
 .P
-There are also some other options that are not defined by POSIX. These have
-been added at the request of users who want to make use of certain
-PCRE2-specific features via the POSIX calling interface.
+There are also some options that are not defined by POSIX. These have been
+added at the request of users who want to make use of certain PCRE2-specific
+features via the POSIX calling interface.
 .P
 When PCRE2 is called via these functions, it is only the API that is POSIX-like
 in style. The syntax and semantics of the regular expressions themselves are
@ -95,11 +95,11 @@ defined POSIX behaviour for REG_NEWLINE (see the following section).
 .sp
  REG_NOSUB
 .sp
-The PCRE2_NO_AUTO_CAPTURE option is set when the regular expression is passed
-for compilation to the native function. In addition, when a pattern that is
-compiled with this flag is passed to \fBregexec()\fP for matching, the
-\fInmatch\fP and \fIpmatch\fP arguments are ignored, and no captured strings
-are returned.
+When a pattern that is compiled with this flag is passed to \fBregexec()\fP for
+matching, the \fInmatch\fP and \fIpmatch\fP arguments are ignored, and no
+captured strings are returned. Versions of the PCRE library prior to 10.22 used
+to set the PCRE2_NO_AUTO_CAPTURE compile option, but this no longer happens
+because it disables the use of back references.
 .sp
  REG_UCP
 .sp
@ -145,7 +145,7 @@ use the contents of the \fIpreg\fP structure. If, for example, you pass it to
 This area is not simple, because POSIX and Perl take different views of things.
 It is not possible to get PCRE2 to obey POSIX semantics, but then PCRE2 was
 never intended to be a POSIX engine. The following table lists the different
-possibilities for matching newline characters in PCRE2:
+possibilities for matching newline characters in Perl and PCRE2:
 .sp
                          Default   Change with
 .sp
@ -155,7 +155,7 @@ possibilities for matching newline characters in PCRE2:
  $ matches \en in middle     no     PCRE2_MULTILINE
  ^ matches \en in middle     no     PCRE2_MULTILINE
 .sp
-This is the equivalent table for POSIX:
+This is the equivalent table for a POSIX-compatible pattern matcher:
 .sp
                          Default   Change with
 .sp
@ -165,13 +165,17 @@ This is the equivalent table for POSIX:
  $ matches \en in middle     no     REG_NEWLINE
  ^ matches \en in middle     no     REG_NEWLINE
 .sp
-PCRE2's behaviour is the same as Perl's, except that there is no equivalent for
-PCRE2_DOLLAR_ENDONLY in Perl. In both PCRE2 and Perl, there is no way to stop
-newline from matching [^a].
+This behaviour is not what happens when PCRE2 is called via its POSIX
+API. By default, PCRE2's behaviour is the same as Perl's, except that there is
+no equivalent for PCRE2_DOLLAR_ENDONLY in Perl. In both PCRE2 and Perl, there
+is no way to stop newline from matching [^a].
 .P
-The default POSIX newline handling can be obtained by setting PCRE2_DOTALL and
-PCRE2_DOLLAR_ENDONLY, but there is no way to make PCRE2 behave exactly as for
-the REG_NEWLINE action.
+Default POSIX newline handling can be obtained by setting PCRE2_DOTALL and
+PCRE2_DOLLAR_ENDONLY when calling \fBpcre2_compile()\fP directly, but there is
+no way to make PCRE2 behave exactly as for the REG_NEWLINE action. When using
+the POSIX API, passing REG_NEWLINE to PCRE2's \fBregcomp()\fP function
+causes PCRE2_MULTILINE to be passed to \fBpcre2_compile()\fP, and REG_DOTALL
+passes PCRE2_DOTALL. There is no way to pass PCRE2_DOLLAR_ENDONLY.
 .
 .
 .SH "MATCHING A PATTERN"
@ -207,16 +211,18 @@ to have a terminating NUL located at \fIstring\fP + \fIpmatch[0].rm_eo\fP
 IEEE Standard 1003.2 (POSIX.2), and should be used with caution in software
 intended to be portable to other systems. Note that a non-zero \fIrm_so\fP does
 not imply REG_NOTBOL; REG_STARTEND affects only the location of the string, not
-how it is matched.
+how it is matched. Setting REG_STARTEND and passing \fIpmatch\fP as NULL are
+mutually exclusive; the error REG_INVARG is returned.
 .P
 If the pattern was compiled with the REG_NOSUB flag, no data about any matched
 strings is returned. The \fInmatch\fP and \fIpmatch\fP arguments of
-\fBregexec()\fP are ignored.
+\fBregexec()\fP are ignored (except possibly as input for REG_STARTEND).
 .P
-If the value of \fInmatch\fP is zero, or if the value \fIpmatch\fP is NULL,
-no data about any matched strings is returned.
+The value of \fInmatch\fP may be zero, and the value \fIpmatch\fP may be NULL
+(unless REG_STARTEND is set); in both these cases no data about any matched
+strings is returned.
 .P
-Otherwise,the portion of the string that was matched, and also any captured
+Otherwise, the portion of the string that was matched, and also any captured
 substrings, are returned via the \fIpmatch\fP argument, which points to an
 array of \fInmatch\fP structures of type \fIregmatch_t\fP, containing the
 members \fIrm_so\fP and \fIrm_eo\fP. These contain the byte offset to the first
@ -236,9 +242,11 @@ header file, of which REG_NOMATCH is the "expected" failure code.
 The \fBregerror()\fP function maps a non-zero errorcode from either
 \fBregcomp()\fP or \fBregexec()\fP to a printable message. If \fIpreg\fP is not
 NULL, the error should have arisen from the use of that structure. A message
-terminated by a binary zero is placed in \fIerrbuf\fP. The length of the
-message, including the zero, is limited to \fIerrbuf_size\fP. The yield of the
-function is the size of buffer needed to hold the whole message.
+terminated by a binary zero is placed in \fIerrbuf\fP. If the buffer is too
+short, only the first \fIerrbuf_size\fP - 1 characters of the error message are
+used. The yield of the function is the size of buffer needed to hold the whole
+message, including the terminating zero. This value is greater than
+\fIerrbuf_size\fP if the message was truncated.
 .
 .
 .SH MEMORY USAGE
@ -263,6 +271,6 @@ Cambridge, England.
 .rs
 .sp
 .nf
-Last updated: 20 October 2014
-Copyright (c) 1997-2014 University of Cambridge.
+Last updated: 31 January 2016
+Copyright (c) 1997-2016 University of Cambridge.
 .fi
--- a/pcre2/doc/pcre2sample.3
+++ b/pcre2/doc/pcre2sample.3
@ -1,4 +1,4 @@
-.TH PCRE2SAMPLE 3 "20 October 2014" "PCRE2 10.00"
+.TH PCRE2SAMPLE 3 "02 February 2016" "PCRE2 10.22"
 .SH NAME
 PCRE2 - Perl-compatible regular expressions (revised API)
 .SH "PCRE2 SAMPLE PROGRAM"
@ -13,23 +13,28 @@ distribution. A listing of this program is given in the
 documentation. If you do not have a copy of the PCRE2 distribution, you can
 save this listing to re-create the contents of \fIpcre2demo.c\fP.
 .P
-The demonstration program, which uses the PCRE2 8-bit library, compiles the
-regular expression that is its first argument, and matches it against the
-subject string in its second argument. No PCRE2 options are set, and default
-character tables are used. If matching succeeds, the program outputs the
-portion of the subject that matched, together with the contents of any captured
-substrings.
+The demonstration program compiles the regular expression that is its
+first argument, and matches it against the subject string in its second
+argument. No PCRE2 options are set, and default character tables are used. If
+matching succeeds, the program outputs the portion of the subject that matched,
+together with the contents of any captured substrings.
 .P
 If the -g option is given on the command line, the program then goes on to
 check for further matches of the same regular expression in the same subject
 string. The logic is a little bit tricky because of the possibility of matching
 an empty string. Comments in the code explain what is going on.
 .P
+The code in \fBpcre2demo.c\fP is an 8-bit program that uses the PCRE2 8-bit
+library. It handles strings and characters that are stored in 8-bit code units.
+By default, one character corresponds to one code unit, but if the pattern
+starts with "(*UTF)", both it and the subject are treated as UTF-8 strings,
+where characters may occupy multiple code units.
+.P
 If PCRE2 is installed in the standard include and library directories for your
 operating system, you should be able to compile the demonstration program using
-this command:
+a command like this:
 .sp
-  gcc -o pcre2demo pcre2demo.c -lpcre2-8
+  cc -o pcre2demo pcre2demo.c -lpcre2-8
 .sp
 If PCRE2 is installed elsewhere, you may need to add additional options to the
 command line. For example, on a Unix-like system that has PCRE2 installed in
@ -37,12 +42,11 @@ command line. For example, on a Unix-like system that has PCRE2 installed in
 like this:
 .sp
 .\" JOINSH
-  gcc -o pcre2demo -I/usr/local/include pcre2demo.c \e
-      -L/usr/local/lib -lpcre2-8
+  cc -o pcre2demo -I/usr/local/include pcre2demo.c \e
+     -L/usr/local/lib -lpcre2-8
 .sp
-.P
-Once you have compiled and linked the demonstration program, you can run simple
-tests like this:
+Once you have built the demonstration program, you can run simple tests like
+this:
 .sp
  ./pcre2demo 'cat|dog' 'the cat sat on the mat'
  ./pcre2demo -g 'cat|dog' 'the dog sat on the cat'
@ -51,12 +55,13 @@ Note that there is a much more comprehensive test program, called
 .\" HREF
 \fBpcre2test\fP,
 .\"
-which supports many more facilities for testing regular expressions using the
-PCRE2 libraries. The
+which supports many more facilities for testing regular expressions using all
+three PCRE2 libraries (8-bit, 16-bit, and 32-bit, though not all three need be
+installed). The
 .\" HREF
 \fBpcre2demo\fP
 .\"
-program is provided as a simple coding example.
+program is provided as a relatively simple coding example.
 .P
 If you try to run
 .\" HREF
@ -65,7 +70,7 @@ If you try to run
 when PCRE2 is not installed in the standard library directory, you may get an
 error like this on some operating systems (e.g. Solaris):
 .sp
-  ld.so.1: a.out: fatal: libpcre2.so.0: open failed: No such file or directory
+  ld.so.1: pcre2demo: fatal: libpcre2-8.so.0: open failed: No such file or directory
 .sp
 This is caused by the way shared library support works on those systems. You
 need to add
@ -89,6 +94,6 @@ Cambridge, England.
 .rs
 .sp
 .nf
-Last updated: 20 October 2014
-Copyright (c) 1997-2014 University of Cambridge.
+Last updated: 02 February 2016
+Copyright (c) 1997-2016 University of Cambridge.
 .fi
--- a/pcre2/doc/pcre2serialize.3
+++ b/pcre2/doc/pcre2serialize.3
@ -1,4 +1,4 @@
-.TH PCRE2SERIALIZE 3 "20 January 2015" "PCRE2 10.10"
+.TH PCRE2SERIALIZE 3 "24 May 2016" "PCRE2 10.22"
 .SH NAME
 PCRE2 - Perl-compatible regular expressions (revised API)
 .SH "SAVING AND RE-USING PRECOMPILED PCRE2 PATTERNS"
@ -22,12 +22,22 @@ If you are running an application that uses a large number of regular
 expression patterns, it may be useful to store them in a precompiled form
 instead of having to compile them every time the application is run. However,
 if you are using the just-in-time optimization feature, it is not possible to
-save and reload the JIT data, because it is position-dependent. In addition,
-the host on which the patterns are reloaded must be running the same version of
-PCRE2, with the same code unit width, and must also have the same endianness,
-pointer width and PCRE2_SIZE type. For example, patterns compiled on a 32-bit
-system using PCRE2's 16-bit library cannot be reloaded on a 64-bit system, nor
-can they be reloaded using the 8-bit library.
+save and reload the JIT data, because it is position-dependent. The host on
+which the patterns are reloaded must be running the same version of PCRE2, with
+the same code unit width, and must also have the same endianness, pointer width
+and PCRE2_SIZE type. For example, patterns compiled on a 32-bit system using
+PCRE2's 16-bit library cannot be reloaded on a 64-bit system, nor can they be
+reloaded using the 8-bit library.
+.
+.
+.SH "SECURITY CONCERNS"
+.rs
+.sp
+The facility for saving and restoring compiled patterns is intended for use
+within individual applications. As such, the data supplied to
+\fBpcre2_serialize_decode()\fP is expected to be trusted data, not data from
+arbitrary external sources. There is only some simple consistency checking, not
+complete validation of what is being re-loaded.
 .
 .
 .SH "SAVING COMPILED PATTERNS"
@ -129,20 +139,26 @@ is filled with those that fit, and the remainder are ignored. The yield of the
 function is the number of decoded patterns, or one of the following negative
 error codes:
 .sp
-  PCRE2_ERROR_BADDATA   second argument is zero or less
-  PCRE2_ERROR_BADMAGIC  mismatch of id bytes in the data
-  PCRE2_ERROR_BADMODE   mismatch of variable unit size or PCRE2 version
-  PCRE2_ERROR_MEMORY    memory allocation failed
-  PCRE2_ERROR_NULL      first or third argument is NULL
+  PCRE2_ERROR_BADDATA    second argument is zero or less
+  PCRE2_ERROR_BADMAGIC   mismatch of id bytes in the data
+  PCRE2_ERROR_BADMODE    mismatch of code unit size or PCRE2 version
+  PCRE2_ERROR_BADSERIALIZEDDATA  other sanity check failure
+  PCRE2_ERROR_MEMORY     memory allocation failed
+  PCRE2_ERROR_NULL       first or third argument is NULL
 .sp
 PCRE2_ERROR_BADMAGIC may mean that the data is corrupt, or that it was compiled
 on a system with different endianness.
 .P
 Decoded patterns can be used for matching in the usual way, and must be freed
-by calling \fBpcre2_code_free()\fP as normal. A single copy of the character
-tables is used by all the decoded patterns. A reference count is used to
+by calling \fBpcre2_code_free()\fP. However, be aware that there is a potential
+race issue if you are using multiple patterns that were decoded from a single
+byte stream in a multithreaded application. A single copy of the character
+tables is used by all the decoded patterns and a reference count is used to
 arrange for its memory to be automatically freed when the last pattern is
-freed.
+freed, but there is no locking on this reference count. Therefore, if you want
+to call \fBpcre2_code_free()\fP for these patterns in different threads, you
+must arrange your own locking, and ensure that \fBpcre2_code_free()\fP cannot
+be called by two threads at the same time.
 .P
 If a pattern was processed by \fBpcre2_jit_compile()\fP before being
 serialized, the JIT data is discarded and so is no longer available after a
@ -165,6 +181,6 @@ Cambridge, England.
 .rs
 .sp
 .nf
-Last updated: 20 January 2015
-Copyright (c) 1997-2015 University of Cambridge.
+Last updated: 24 May 2016
+Copyright (c) 1997-2016 University of Cambridge.
 .fi
--- a/pcre2/doc/pcre2stack.3
+++ b/pcre2/doc/pcre2stack.3
@ -1,4 +1,4 @@
-.TH PCRE2STACK 3 "21 November 2014" "PCRE2 10.00"
+.TH PCRE2STACK 3 "23 December 2016" "PCRE2 10.23"
 .SH NAME
 PCRE2 - Perl-compatible regular expressions (revised API)
 .SH "PCRE2 DISCUSSION OF STACK USAGE"
@ -43,11 +43,12 @@ assertion and "once-only" subpatterns, which are handled like subroutine calls.
 Normally, these are never very deep, and the limit on the complexity of
 \fBpcre2_dfa_match()\fP is controlled by the amount of workspace it is given.
 However, it is possible to write patterns with runaway infinite recursions;
-such patterns will cause \fBpcre2_dfa_match()\fP to run out of stack. At
-present, there is no protection against this.
+such patterns will cause \fBpcre2_dfa_match()\fP to run out of stack unless a
+limit is applied (see below).
 .P
-The comments that follow do NOT apply to \fBpcre2_dfa_match()\fP; they are
-relevant only for \fBpcre2_match()\fP without the JIT optimization.
+The comments in the next three sections do not apply to
+\fBpcre2_dfa_match()\fP; they are relevant only for \fBpcre2_match()\fP without
+the JIT optimization.
 .
 .
 .SS "Reducing \fBpcre2_match()\fP's stack usage"
@ -106,7 +107,7 @@ in the
 \fBpcre2api\fP
 .\"
 documentation. Since the block sizes are always the same, it may be possible to
-implement customized a memory handler that is more efficient than the standard
+implement a customized memory handler that is more efficient than the standard
 function. The memory blocks obtained for this purpose are retained and re-used
 if possible while \fBpcre2_match()\fP is running. They are all freed just
 before it exits.
@ -147,6 +148,15 @@ pattern to match. This is done by calling \fBpcre2_match()\fP repeatedly with
 different limits.
 .
 .
+.SS "Limiting \fBpcre2_dfa_match()\fP's stack usage"
+.rs
+.sp
+The recursion limit, as described above for \fBpcre2_match()\fP, also applies
+to \fBpcre2_dfa_match()\fP, whose use of recursive function calls for
+recursions in the pattern can lead to runaway stack usage. The non-recursive
+match limit is not relevant for DFA matching, and is ignored.
+.
+.
 .SS "Changing stack size in Unix-like systems"
 .rs
 .sp
@ -197,6 +207,6 @@ Cambridge, England.
 .rs
 .sp
 .nf
-Last updated: 21 November 2014
-Copyright (c) 1997-2014 University of Cambridge.
+Last updated: 23 December 2016
+Copyright (c) 1997-2016 University of Cambridge.
 .fi
--- a/pcre2/doc/pcre2syntax.3
+++ b/pcre2/doc/pcre2syntax.3
@ -1,4 +1,4 @@
-.TH PCRE2SYNTAX 3 "13 June 2015" "PCRE2 10.20"
+.TH PCRE2SYNTAX 3 "23 December 2016" "PCRE2 10.23"
 .SH NAME
 PCRE2 - Perl-compatible regular expressions (revised API)
 .SH "PCRE2 REGULAR EXPRESSION SYNTAX SUMMARY"
@ -81,9 +81,10 @@ it matches a literal "u".
  \eW         a "non-word" character
  \eX         a Unicode extended grapheme cluster
 .sp
-The application can lock out the use of \eC by setting the
-PCRE2_NEVER_BACKSLASH_C option. It is dangerous because it may leave the
-current matching point in the middle of a UTF-8 or UTF-16 character.
+\eC is dangerous because it may leave the current matching point in the middle
+of a UTF-8 or UTF-16 character. The application can lock out the use of \eC by
+setting the PCRE2_NEVER_BACKSLASH_C option. It is also possible to build PCRE2
+with the use of \eC permanently disabled.
 .P
 By default, \ed, \es, and \ew match only ASCII characters, even in UTF-8 mode
 or in the 16-bit and 32-bit libraries. However, if locale-specific matching is
@ -159,6 +160,8 @@ at release 5.18.
 .SH "SCRIPT NAMES FOR \ep AND \eP"
 .rs
 .sp
+Ahom,
+Anatolian_Hieroglyphs,
 Arabic,
 Armenian,
 Avestan,
@ -199,6 +202,7 @@ Gurmukhi,
 Han,
 Hangul,
 Hanunoo,
+Hatran,
 Hebrew,
 Hiragana,
 Imperial_Aramaic,
@ -235,12 +239,14 @@ Miao,
 Modi,
 Mongolian,
 Mro,
+Multani,
 Myanmar,
 Nabataean,
 New_Tai_Lue,
 Nko,
 Ogham,
 Ol_Chiki,
+Old_Hungarian,
 Old_Italic,
 Old_North_Arabian,
 Old_Permic,
@ -262,6 +268,7 @@ Saurashtra,
 Sharada,
 Shavian,
 Siddham,
+SignWriting,
 Sinhala,
 Sora_Sompeng,
 Sundanese,
@ -421,9 +428,10 @@ appear.
  (*UCP)          set PCRE2_UCP (use Unicode properties for \ed etc)
 .sp
 Note that LIMIT_MATCH and LIMIT_RECURSION can only reduce the value of the
-limits set by the caller of pcre2_match(), not increase them. The application
-can lock out the use of (*UTF) and (*UCP) by setting the PCRE2_NEVER_UTF or
-PCRE2_NEVER_UCP options, respectively, at compile time.
+limits set by the caller of \fBpcre2_match()\fP or \fBpcre2_dfa_match()\fP, not
+increase them. The application can lock out the use of (*UTF) and (*UCP) by
+setting the PCRE2_NEVER_UTF or PCRE2_NEVER_UCP options, respectively, at
+compile time.
 .
 .
 .SH "NEWLINE CONVENTION"
@ -466,6 +474,9 @@ Each top-level branch of a look behind must be of a fixed length.
  \en              reference by number (can be ambiguous)
  \egn             reference by number
  \eg{n}           reference by number
+  \eg+n            relative reference by number (PCRE2 extension)
+  \eg-n            relative reference by number
+  \eg{+n}          relative reference by number (PCRE2 extension)
  \eg{-n}          relative reference by number
  \ek<name>        reference by name (Perl)
  \ek'name'        reference by name (Perl)
@ -504,13 +515,17 @@ Each top-level branch of a look behind must be of a fixed length.
  (?(-n)              relative reference condition
  (?(<name>)          named reference condition (Perl)
  (?('name')          named reference condition (Perl)
-  (?(name)            named reference condition (PCRE2)
+  (?(name)            named reference condition (PCRE2, deprecated)
  (?(R)               overall recursion condition
-  (?(Rn)              specific group recursion condition
-  (?(R&name)          specific recursion condition
+  (?(Rn)              specific numbered group recursion condition
+  (?(R&name)          specific named group recursion condition
  (?(DEFINE)          define subpattern for reference
  (?(VERSION[>]=n.m)  test PCRE2 version
  (?(assert)          assertion condition
+.sp
+Note the ambiguity of (?(R) and (?(Rn) which might be named reference
+conditions or recursion tests. Such a condition is interpreted as a reference
+condition if the relevant named group exists.
 .
 .
 .SH "BACKTRACKING CONTROL"
@ -570,6 +585,6 @@ Cambridge, England.
 .rs
 .sp
 .nf
-Last updated: 13 June 2015
-Copyright (c) 1997-2015 University of Cambridge.
+Last updated: 23 December 2016
+Copyright (c) 1997-2016 University of Cambridge.
 .fi
--- a/pcre2/doc/pcre2test.1
+++ b/pcre2/doc/pcre2test.1
@ -1,4 +1,4 @@
-.TH PCRE2TEST 1 "20 May 2015" "PCRE 10.20"
+.TH PCRE2TEST 1 "28 December 2016" "PCRE 10.23"
 .SH NAME
 pcre2test - a program for testing Perl-compatible regular expressions.
 .SH SYNOPSIS
@ -29,7 +29,7 @@ subject is processed, and what output is produced.
 .P
 As the original fairly simple PCRE library evolved, it acquired many different
 features, and as a result, the original \fBpcretest\fP program ended up with a
-lot of options in a messy, arcane syntax, for testing all the features. The
+lot of options in a messy, arcane syntax for testing all the features. The
 move to the new PCRE2 API provided an opportunity to re-implement the test
 program as \fBpcre2test\fP, with a cleaner modifier syntax. Nevertheless, there
 are still many obscure modifiers, some of which are specifically designed for
@ -47,31 +47,63 @@ strings that are encoded in 8-bit, 16-bit, or 32-bit code units. One, two, or
 all three of these libraries may be simultaneously installed. The
 \fBpcre2test\fP program can be used to test all the libraries. However, its own
 input and output are always in 8-bit format. When testing the 16-bit or 32-bit
-libraries, patterns and subject strings are converted to 16- or 32-bit format
-before being passed to the library functions. Results are converted back to
-8-bit code units for output.
+libraries, patterns and subject strings are converted to 16-bit or 32-bit
+format before being passed to the library functions. Results are converted back
+to 8-bit code units for output.
 .P
 In the rest of this document, the names of library functions and structures
 are given in generic form, for example, \fBpcre_compile()\fP. The actual
 names used in the libraries have a suffix _8, _16, or _32, as appropriate.
 .
 .
+.\" HTML <a name="inputencoding"></a>
 .SH "INPUT ENCODING"
 .rs
 .sp
 Input to \fBpcre2test\fP is processed line by line, either by calling the C
-library's \fBfgets()\fP function, or via the \fBlibreadline\fP library (see
-below). The input is processed using using C's string functions, so must not
-contain binary zeroes, even though in Unix-like environments, \fBfgets()\fP
-treats any bytes other than newline as data characters. In some Windows
-environments character 26 (hex 1A) causes an immediate end of file, and no
-further data is read.
+library's \fBfgets()\fP function, or via the \fBlibreadline\fP library. In some
+Windows environments character 26 (hex 1A) causes an immediate end of file, and
+no further data is read, so this character should be avoided unless you really
+want that action.
 .P
-For maximum portability, therefore, it is safest to avoid non-printing
-characters in \fBpcre2test\fP input files. There is a facility for specifying a
-pattern's characters as hexadecimal pairs, thus making it possible to include
-binary zeroes in a pattern for testing purposes. Subject lines are processed
-for backslash escapes, which makes it possible to include any data value.
+The input is processed using using C's string functions, so must not
+contain binary zeroes, even though in Unix-like environments, \fBfgets()\fP
+treats any bytes other than newline as data characters. An error is generated
+if a binary zero is encountered. Subject lines are processed for backslash
+escapes, which makes it possible to include any data value in strings that are
+passed to the library for matching. For patterns, there is a facility for
+specifying some or all of the 8-bit input characters as hexadecimal pairs,
+which makes it possible to include binary zeros.
+.
+.
+.SS "Input for the 16-bit and 32-bit libraries"
+.rs
+.sp
+When testing the 16-bit or 32-bit libraries, there is a need to be able to
+generate character code points greater than 255 in the strings that are passed
+to the library. For subject lines, backslash escapes can be used. In addition,
+when the \fButf\fP modifier (see
+.\" HTML <a href="#optionmodifiers">
+.\" </a>
+"Setting compilation options"
+.\"
+below) is set, the pattern and any following subject lines are interpreted as
+UTF-8 strings and translated to UTF-16 or UTF-32 as appropriate.
+.P
+For non-UTF testing of wide characters, the \fButf8_input\fP modifier can be
+used. This is mutually exclusive with \fButf\fP, and is allowed only in 16-bit
+or 32-bit mode. It causes the pattern and following subject lines to be treated
+as UTF-8 according to the original definition (RFC 2279), which allows for
+character values up to 0x7fffffff. Each character is placed in one 16-bit or
+32-bit code unit (in the 16-bit case, values greater than 0xffff cause an error
+to occur).
+.P
+UTF-8 is not capable of encoding values greater than 0x7fffffff, but such
+values can be handled by the 32-bit library. When testing this library in
+non-UTF mode with \fButf8_input\fP set, if any character is preceded by the
+byte 0xff (which is an illegal byte in UTF-8) 0x80000000 is added to the
+character's value. This is the only way of passing such code points in a
+pattern string. For subject strings, using an escape sequence is preferable.
 .
 .
 .SH "COMMAND LINE OPTIONS"
@ -92,8 +124,12 @@ If the 32-bit library has been built, this option causes it to be used. If only
 the 32-bit library has been built, this is the default. If the 32-bit library
 has not been built, this option causes an error.
 .TP 10
+\fB-ac\fP
+Behave as if each pattern has the \fBauto_callout\fP modifier, that is, insert
+automatic callouts into every pattern that is compiled.
+.TP 10
 \fB-b\fP
-Behave as if each pattern has the \fB/fullbincode\fP modifier; the full
+Behave as if each pattern has the \fBfullbincode\fP modifier; the full
 internal binary form of the pattern is output after compilation.
 .TP 10
 \fB-C\fP
@ -122,12 +158,13 @@ following options output the value and set the exit code as indicated:
 The following options output 1 for true or 0 for false, and set the exit code
 to the same value:
 .sp
-  ebcdic     compiled for an EBCDIC environment
-  jit        just-in-time support is available
-  pcre2-16   the 16-bit library was built
-  pcre2-32   the 32-bit library was built
-  pcre2-8    the 8-bit library was built
-  unicode    Unicode support is available
+  backslash-C  \eC is supported (not locked out)
+  ebcdic       compiled for an EBCDIC environment
+  jit          just-in-time support is available
+  pcre2-16     the 16-bit library was built
+  pcre2-32     the 32-bit library was built
+  pcre2-8      the 8-bit library was built
+  unicode      Unicode support is available
 .sp
 If an unknown option is given, an error message is output; the exit code is 0.
 .TP 10
@ -141,11 +178,17 @@ Behave as if each subject line has the \fBdfa\fP modifier; matching is done
 using the \fBpcre2_dfa_match()\fP function instead of the default
 \fBpcre2_match()\fP.
 .TP 10
+\fB-error\fP \fInumber[,number,...]\fP
+Call \fBpcre2_get_error_message()\fP for each of the error numbers in the
+comma-separated list, display the resulting messages on the standard output,
+then exit with zero exit code. The numbers may be positive or negative. This is
+a convenience facility for PCRE2 maintainers.
+.TP 10
 \fB-help\fP
 Output a brief summary these options and then exit.
 .TP 10
 \fB-i\fP
-Behave as if each pattern has the \fB/info\fP modifier; information about the
+Behave as if each pattern has the \fBinfo\fP modifier; information about the
 compiled pattern is given after compilation.
 .TP 10
 \fB-jit\fP
@ -217,9 +260,9 @@ Each subject line is matched separately and independently. If you want to do
 multi-line matches, you have to use the \en escape sequence (or \er or \er\en,
 etc., depending on the newline setting) in a single line of input to encode the
 newline sequences. There is no limit on the length of subject lines; the input
-buffer is automatically extended if it is too small. There is a replication
-feature that makes it possible to generate long subject lines without having to
-supply them explicitly.
+buffer is automatically extended if it is too small. There are replication
+features that makes it possible to generate long repetitive pattern or subject
+lines without having to supply them explicitly.
 .P
 An empty line or the end of the file signals the end of the subject lines for a
 test, at which point a new pattern or command line is expected if there is
@ -259,6 +302,34 @@ described in the section entitled "Saving and restoring compiled patterns"
 .\" </a>
 below.
 .\"
+.sp
+  #newline_default [<newline-list>]
+.sp
+When PCRE2 is built, a default newline convention can be specified. This
+determines which characters and/or character pairs are recognized as indicating
+a newline in a pattern or subject string. The default can be overridden when a
+pattern is compiled. The standard test files contain tests of various newline
+conventions, but the majority of the tests expect a single linefeed to be
+recognized as a newline by default. Without special action the tests would fail
+when PCRE2 is compiled with either CR or CRLF as the default newline.
+.P
+The #newline_default command specifies a list of newline types that are
+acceptable as the default. The types must be one of CR, LF, CRLF, ANYCRLF, or
+ANY (in upper or lower case), for example:
+.sp
+  #newline_default LF Any anyCRLF
+.sp
+If the default newline is in the list, this command has no effect. Otherwise,
+except when testing the POSIX API, a \fBnewline\fP modifier that specifies the
+first newline convention in the list (LF in the above example) is added to any
+pattern that does not already have a \fBnewline\fP modifier. If the newline
+list is empty, the feature is turned off. This command is present in a number
+of the standard test input files.
+.P
+When the POSIX API is being tested there is no way to override the default
+newline convention, though it is possible to set the newline convention from
+within the pattern. A warning is given if the \fBposix\fP modifier is used when
+\fB#newline_default\fP would set a default for the non-POSIX API.
 .sp
  #pattern <modifier-list>
 .sp
@ -276,9 +347,10 @@ test files that are also processed by \fBperltest.sh\fP. The \fB#perltest\fP
 command helps detect tests that are accidentally put in the wrong file.
 .sp
  #pop [<modifiers>]
+  #popcopy [<modifiers>]
 .sp
-This command is used to manipulate the stack of compiled patterns, as described
-in the section entitled "Saving and restoring compiled patterns"
+These commands are used to manipulate the stack of compiled patterns, as
+described in the section entitled "Saving and restoring compiled patterns"
 .\" HTML <a href="#saverestore">
 .\" </a>
 below.
@ -303,12 +375,13 @@ subject lines. Modifiers on a subject line can change these settings.
 .rs
 .sp
 Modifier lists are used with both pattern and subject lines. Items in a list
-are separated by commas and optional white space. Some modifiers may be given
-for both patterns and subject lines, whereas others are valid for one or the
-other only. Each modifier has a long name, for example "anchored", and some of
-them must be followed by an equals sign and a value, for example, "offset=12".
-Modifiers that do not take values may be preceded by a minus sign to turn off a
-previous setting.
+are separated by commas followed by optional white space. Trailing whitespace
+in a modifier list is ignored. Some modifiers may be given for both patterns
+and subject lines, whereas others are valid only for one or the other. Each
+modifier has a long name, for example "anchored", and some of them must be
+followed by an equals sign and a value, for example, "offset=12". Values cannot
+contain comma characters, but may contain spaces. Modifiers that do not take
+values may be preceded by a minus sign to turn off a previous setting.
 .P
 A few of the more common modifiers can also be specified as single letters, for
 example "i" for "caseless". In documentation, following the Perl convention,
@ -414,6 +487,12 @@ the start of a modifier list. For example:
 .sp
  abc\e=notbol,notempty
 .sp
+If the subject string is empty and \e= is followed by whitespace, the line is
+treated as a comment line, and is not used for matching. For example:
+.sp
+  \e= This is a comment.
+  abc\e= This is an invalid modifier list.
+.sp
 A backslash followed by any other non-alphanumeric character just escapes that
 character. A backslash followed by anything else causes an error. However, if
 the very last character in the line is a backslash (and there is no modifier
@ -424,10 +503,10 @@ a real empty line terminates the data input.
 .SH "PATTERN MODIFIERS"
 .rs
 .sp
-There are three types of modifier that can appear in pattern lines, two of
-which may also be used in a \fB#pattern\fP command. A pattern's modifier list
-can add to or override default modifiers that were set by a previous
-\fB#pattern\fP command.
+There are several types of modifier that can appear in pattern lines. Except
+where noted below, they may also be used in \fB#pattern\fP commands. A
+pattern's modifier list can add to or override default modifiers that were set
+by a previous \fB#pattern\fP command.
 .
 .
 .\" HTML <a name="optionmodifiers"></a>
@ -437,13 +516,14 @@ can add to or override default modifiers that were set by a previous
 The following modifiers set options for \fBpcre2_compile()\fP. The most common
 ones have single-letter abbreviations. See
 .\" HREF
-\fBpcreapi\fP
+\fBpcre2api\fP
 .\"
 for a description of their effects.
 .sp
      allow_empty_class         set PCRE2_ALLOW_EMPTY_CLASS
      alt_bsux                  set PCRE2_ALT_BSUX
      alt_circumflex            set PCRE2_ALT_CIRCUMFLEX
+      alt_verbnames             set PCRE2_ALT_VERBNAMES
      anchored                  set PCRE2_ANCHORED
      auto_callout              set PCRE2_AUTO_CALLOUT
  /i  caseless                  set PCRE2_CASELESS
@ -464,12 +544,15 @@ for a description of their effects.
      no_utf_check              set PCRE2_NO_UTF_CHECK
      ucp                       set PCRE2_UCP
      ungreedy                  set PCRE2_UNGREEDY
+      use_offset_limit          set PCRE2_USE_OFFSET_LIMIT
      utf                       set PCRE2_UTF
 .sp
 As well as turning on the PCRE2_UTF option, the \fButf\fP modifier causes all
 non-printing characters in output strings to be printed using the \ex{hh...}
 notation. Otherwise, those less than 0x100 are output in hex without the curly
-brackets.
+brackets. Setting \fButf\fP in 16-bit or 32-bit mode also causes pattern and
+subject strings to be translated to UTF-16 or UTF-32, respectively, before
+being passed to library functions.
 .
 .
 .\" HTML <a name="controlmodifiers"></a>
@ -485,18 +568,24 @@ about the pattern:
      debug                     same as info,fullbincode
      fullbincode               show binary code with lengths
  /I  info                      show info about compiled pattern
-      hex                       pattern is coded in hexadecimal
+      hex                       unquoted characters are hexadecimal
      jit[=<number>]            use JIT
      jitfast                   use JIT fast path
      jitverify                 verify JIT use
      locale=<name>             use this locale
+      max_pattern_length=<n>    set the maximum pattern length
      memory                    show memory used
      newline=<type>            set newline type
+      null_context              compile with a NULL context
      parens_nest_limit=<n>     set maximum parentheses depth
      posix                     use the POSIX API
+      posix_nosub               use the POSIX API with REG_NOSUB
      push                      push compiled pattern onto the stack
+      pushcopy                  push a copy onto the stack
      stackguard=<number>       test the stackguard feature
      tables=[0|1|2]            select internal tables
+      use_length                do not zero-terminate the pattern
+      utf8_input                treat input as UTF-8
 .sp
 The effects of these modifiers are described in the following sections.
 .
@ -565,40 +654,148 @@ is requested. For each callout, either its number or string is given, followed
 by the item that follows it in the pattern.
 .
 .
-.SS "Specifying a pattern in hex"
+.SS "Passing a NULL context"
 .rs
 .sp
-The \fBhex\fP modifier specifies that the characters of the pattern are to be
-interpreted as pairs of hexadecimal digits. White space is permitted between
-pairs. For example:
+Normally, \fBpcre2test\fP passes a context block to \fBpcre2_compile()\fP. If
+the \fBnull_context\fP modifier is set, however, NULL is passed. This is for
+testing that \fBpcre2_compile()\fP behaves correctly in this case (it uses
+default values).
+.
+.
+.SS "Specifying the pattern's length"
+.rs
+.sp
+By default, patterns are passed to the compiling functions as zero-terminated
+strings. When using the POSIX wrapper API, there is no other option. However,
+when using PCRE2's native API, patterns can be passed by length instead of
+being zero-terminated. The \fBuse_length\fP modifier causes this to happen.
+Using a length happens automatically (whether or not \fBuse_length\fP is set)
+when \fBhex\fP is set, because patterns specified in hexadecimal may contain
+binary zeros.
+.
+.
+.SS "Specifying pattern characters in hexadecimal"
+.rs
+.sp
+The \fBhex\fP modifier specifies that the characters of the pattern, except for
+substrings enclosed in single or double quotes, are to be interpreted as pairs
+of hexadecimal digits. This feature is provided as a way of creating patterns
+that contain binary zeros and other non-printing characters. White space is
+permitted between pairs of digits. For example, this pattern contains three
+characters:
 .sp
  /ab 32 59/hex
 .sp
-This feature is provided as a way of creating patterns that contain binary zero
-and other non-printing characters. By default, \fBpcre2test\fP passes patterns
-as zero-terminated strings to \fBpcre2_compile()\fP, giving the length as
-PCRE2_ZERO_TERMINATED. However, for patterns specified in hexadecimal, the
-actual length of the pattern is passed.
+Parts of such a pattern are taken literally if quoted. This pattern contains
+nine characters, only two of which are specified in hexadecimal:
+.sp
+  /ab "literal" 32/hex
+.sp
+Either single or double quotes may be used. There is no way of including
+the delimiter within a substring. The \fBhex\fP and \fBexpand\fP modifiers are
+mutually exclusive.
+.P
+The POSIX API cannot be used with patterns specified in hexadecimal because
+they may contain binary zeros, which conflicts with \fBregcomp()\fP's
+requirement for a zero-terminated string. Such patterns are always passed to
+\fBpcre2_compile()\fP as a string with a length, not as zero-terminated.
+.
+.
+.SS "Specifying wide characters in 16-bit and 32-bit modes"
+.rs
+.sp
+In 16-bit and 32-bit modes, all input is automatically treated as UTF-8 and
+translated to UTF-16 or UTF-32 when the \fButf\fP modifier is set. For testing
+the 16-bit and 32-bit libraries in non-UTF mode, the \fButf8_input\fP modifier
+can be used. It is mutually exclusive with \fButf\fP. Input lines are
+interpreted as UTF-8 as a means of specifying wide characters. More details are
+given in
+.\" HTML <a href="#inputencoding">
+.\" </a>
+"Input encoding"
+.\"
+above.
+.
+.
+.SS "Generating long repetitive patterns"
+.rs
+.sp
+Some tests use long patterns that are very repetitive. Instead of creating a
+very long input line for such a pattern, you can use a special repetition
+feature, similar to the one described for subject lines above. If the
+\fBexpand\fP modifier is present on a pattern, parts of the pattern that have
+the form
+.sp
+  \e[<characters>]{<count>}
+.sp
+are expanded before the pattern is passed to \fBpcre2_compile()\fP. For
+example, \e[AB]{6000} is expanded to "ABAB..." 6000 times. This construction
+cannot be nested. An initial "\e[" sequence is recognized only if "]{" followed
+by decimal digits and "}" is found later in the pattern. If not, the characters
+remain in the pattern unaltered. The \fBexpand\fP and \fBhex\fP modifiers are
+mutually exclusive.
+.P
+If part of an expanded pattern looks like an expansion, but is really part of
+the actual pattern, unwanted expansion can be avoided by giving two values in
+the quantifier. For example, \e[AB]{6000,6000} is not recognized as an
+expansion item.
+.P
+If the \fBinfo\fP modifier is set on an expanded pattern, the result of the
+expansion is included in the information that is output.
 .
 .
 .SS "JIT compilation"
 .rs
 .sp
-The \fB/jit\fP modifier may optionally be followed by an equals sign and a
-number in the range 0 to 7:
+Just-in-time (JIT) compiling is a heavyweight optimization that can greatly
+speed up pattern matching. See the
+.\" HREF
+\fBpcre2jit\fP
+.\"
+documentation for details. JIT compiling happens, optionally, after a pattern
+has been successfully compiled into an internal form. The JIT compiler converts
+this to optimized machine code. It needs to know whether the match-time options
+PCRE2_PARTIAL_HARD and PCRE2_PARTIAL_SOFT are going to be used, because
+different code is generated for the different cases. See the \fBpartial\fP
+modifier in "Subject Modifiers"
+.\" HTML <a href="#subjectmodifiers">
+.\" </a>
+below
+.\"
+for details of how these options are specified for each match attempt.
+.P
+JIT compilation is requested by the \fB/jit\fP pattern modifier, which may
+optionally be followed by an equals sign and a number in the range 0 to 7.
+The three bits that make up the number specify which of the three JIT operating
+modes are to be compiled:
+.sp
+  1  compile JIT code for non-partial matching
+  2  compile JIT code for soft partial matching
+  4  compile JIT code for hard partial matching
+.sp
+The possible values for the \fBjit\fP modifier are therefore:
 .sp
  0  disable JIT
-  1  use JIT for normal match only
-  2  use JIT for soft partial match only
-  3  use JIT for normal match and soft partial match
-  4  use JIT for hard partial match only
-  6  use JIT for soft and hard partial match
+  1  normal matching only
+  2  soft partial matching only
+  3  normal and soft partial matching
+  4  hard partial matching only
+  6  soft and hard partial matching only
  7  all three modes
 .sp
-If no number is given, 7 is assumed. If JIT compilation is successful, the
-compiled JIT code will automatically be used when \fBpcre2_match()\fP is run
-for the appropriate type of match, except when incompatible run-time options
-are specified. For more details, see the
+If no number is given, 7 is assumed. The phrase "partial matching" means a call
+to \fBpcre2_match()\fP with either the PCRE2_PARTIAL_SOFT or the
+PCRE2_PARTIAL_HARD option set. Note that such a call may return a complete
+match; the options enable the possibility of a partial match, but do not
+require it. Note also that if you request JIT compilation only for partial
+matching (for example, /jit=2) but do not set the \fBpartial\fP modifier on a
+subject line, that match will not use JIT code because none was compiled for
+non-partial matching.
+.P
+If JIT compilation is successful, the compiled JIT code will automatically be
+used when an appropriate type of match is run, except when incompatible
+run-time options are specified. For more details, see the
 .\" HREF
 \fBpcre2jit\fP
 .\"
@ -622,14 +819,14 @@ code was actually used in the match.
 .SS "Setting a locale"
 .rs
 .sp
-The \fB/locale\fP modifier must specify the name of a locale, for example:
+The \fBlocale\fP modifier must specify the name of a locale, for example:
 .sp
  /pattern/locale=fr_FR
 .sp
 The given locale is set, \fBpcre2_maketables()\fP is called to build a set of
 character tables for the locale, and this is then passed to
 \fBpcre2_compile()\fP when compiling the regular expression. The same tables
-are used when matching the following subject lines. The \fB/locale\fP modifier
+are used when matching the following subject lines. The \fBlocale\fP modifier
 applies only to the pattern on which it appears, but can be given in a
 \fB#pattern\fP command if a default is needed. Setting a locale and alternate
 character tables are mutually exclusive.
@ -638,7 +835,7 @@ character tables are mutually exclusive.
 .SS "Showing pattern memory"
 .rs
 .sp
-The \fB/memory\fP modifier causes the size in bytes of the memory used to hold
+The \fBmemory\fP modifier causes the size in bytes of the memory used to hold
 the compiled pattern to be output. This does not include the size of the
 \fBpcre2_code\fP block; it is just the actual compiled data. If the pattern is
 subsequently passed to the JIT compiler, the size of the JIT compiled code is
@ -660,30 +857,54 @@ sets its own default of 220, which is required for running the standard test
 suite.
 .
 .
+.SS "Limiting the pattern length"
+.rs
+.sp
+The \fBmax_pattern_length\fP modifier sets a limit, in code units, to the
+length of pattern that \fBpcre2_compile()\fP will accept. Breaching the limit
+causes a compilation error. The default is the largest number a PCRE2_SIZE
+variable can hold (essentially unlimited).
+.
+.
 .SS "Using the POSIX wrapper API"
 .rs
 .sp
-The \fB/posix\fP modifier causes \fBpcre2test\fP to call PCRE2 via the POSIX
-wrapper API rather than its native API. This supports only the 8-bit library.
-When the POSIX API is being used, the following pattern modifiers set options
-for the \fBregcomp()\fP function:
+The \fB/posix\fP and \fBposix_nosub\fP modifiers cause \fBpcre2test\fP to call
+PCRE2 via the POSIX wrapper API rather than its native API. When
+\fBposix_nosub\fP is used, the POSIX option REG_NOSUB is passed to
+\fBregcomp()\fP. The POSIX wrapper supports only the 8-bit library. Note that
+it does not imply POSIX matching semantics; for more detail see the
+.\" HREF
+\fBpcre2posix\fP
+.\"
+documentation. The following pattern modifiers set options for the
+\fBregcomp()\fP function:
 .sp
  caseless           REG_ICASE
  multiline          REG_NEWLINE
-  no_auto_capture    REG_NOSUB
  dotall             REG_DOTALL     )
  ungreedy           REG_UNGREEDY   ) These options are not part of
  ucp                REG_UCP        )   the POSIX standard
  utf                REG_UTF8       )
 .sp
+The \fBregerror_buffsize\fP modifier specifies a size for the error buffer that
+is passed to \fBregerror()\fP in the event of a compilation error. For example:
+.sp
+  /abc/posix,regerror_buffsize=20
+.sp
+This provides a means of testing the behaviour of \fBregerror()\fP when the
+buffer is too small for the error message. If this modifier has not been set, a
+large buffer is used.
+.P
 The \fBaftertext\fP and \fBallaftertext\fP subject modifiers work as described
-below. All other modifiers cause an error.
+below. All other modifiers are either ignored, with a warning message, or cause
+an error.
 .
 .
 .SS "Testing the stack guard feature"
 .rs
 .sp
-The \fB/stackguard\fP modifier is used to test the use of
+The \fBstackguard\fP modifier is used to test the use of
 \fBpcre2_set_compile_recursion_guard()\fP, a function that is provided to
 enable stack availability to be checked during compilation (see the
 .\" HREF
@ -700,7 +921,7 @@ be aborted.
 .SS "Using alternative character tables"
 .rs
 .sp
-The value specified for the \fB/tables\fP modifier must be one of the digits 0,
+The value specified for the \fBtables\fP modifier must be one of the digits 0,
 1, or 2. It causes a specific set of built-in character tables to be passed to
 \fBpcre2_compile()\fP. This is used in the PCRE2 tests to check behaviour with
 different character tables. The digit specifies the tables as follows:
@ -720,17 +941,22 @@ are mutually exclusive.
 .sp
 The following modifiers are really subject modifiers, and are described below.
 However, they may be included in a pattern's modifier list, in which case they
-are applied to every subject line that is processed with that pattern. They do
-not affect the compilation process.
+are applied to every subject line that is processed with that pattern. They may
+not appear in \fB#pattern\fP commands. These modifiers do not affect the
+compilation process.
 .sp
-      aftertext           show text after match
-      allaftertext        show text after captures
-      allcaptures         show all captures
-      allusedtext         show all consulted text
-  /g  global              global matching
-      mark                show mark values
-      replace=<string>    specify a replacement string
-      startchar           show starting character when relevant
+      aftertext                  show text after match
+      allaftertext               show text after captures
+      allcaptures                show all captures
+      allusedtext                show all consulted text
+  /g  global                     global matching
+      mark                       show mark values
+      replace=<string>           specify a replacement string
+      startchar                  show starting character when relevant
+      substitute_extended        use PCRE2_SUBSTITUTE_EXTENDED
+      substitute_overflow_length use PCRE2_SUBSTITUTE_OVERFLOW_LENGTH
+      substitute_unknown_unset   use PCRE2_SUBSTITUTE_UNKNOWN_UNSET
+      substitute_unset_empty     use PCRE2_SUBSTITUTE_UNSET_EMPTY
 .sp
 These modifiers may not appear in a \fB#pattern\fP command. If you want them as
 defaults, set them in a \fB#subject\fP command.
@ -746,15 +972,20 @@ facility is used when saving compiled patterns to a file, as described in the
 section entitled "Saving and restoring compiled patterns"
 .\" HTML <a href="#saverestore">
 .\" </a>
-below.
+below. If \fBpushcopy\fP is used instead of \fBpush\fP, a copy of the compiled
+pattern is stacked, leaving the original as current, ready to match the
+following input lines. This provides a way of testing the
+\fBpcre2_code_copy()\fP function.
 .\"
-The \fBpush\fP modifier is incompatible with compilation modifiers such as
-\fBglobal\fP that act at match time. Any that are specified are ignored, with a
-warning message, except for \fBreplace\fP, which causes an error. Note that,
-\fBjitverify\fP, which is allowed, does not carry through to any subsequent
-matching that uses this pattern.
+The \fBpush\fP and \fBpushcopy \fP modifiers are incompatible with compilation
+modifiers such as \fBglobal\fP that act at match time. Any that are specified
+are ignored (for the stacked copy), with a warning message, except for
+\fBreplace\fP, which causes an error. Note that \fBjitverify\fP, which is
+allowed, does not carry through to any subsequent matching that uses a stacked
+pattern.
 .
 .
+.\" HTML <a name="subjectmodifiers"></a>
 .SH "SUBJECT MODIFIERS"
 .rs
 .sp
@ -775,6 +1006,7 @@ for a description of their effects.
      anchored                  set PCRE2_ANCHORED
      dfa_restart               set PCRE2_DFA_RESTART
      dfa_shortest              set PCRE2_DFA_SHORTEST
+      no_jit                    set PCRE2_NO_JIT
      no_utf_check              set PCRE2_NO_UTF_CHECK
      notbol                    set PCRE2_NOTBOL
      notempty                  set PCRE2_NOTEMPTY
@ -786,11 +1018,11 @@ for a description of their effects.
 The partial matching modifiers are provided with abbreviations because they
 appear frequently in tests.
 .P
-If the \fB/posix\fP modifier was present on the pattern, causing the POSIX
+If the \fBposix\fP modifier was present on the pattern, causing the POSIX
 wrapper API to be used, the only option-setting modifiers that have any effect
 are \fBnotbol\fP, \fBnotempty\fP, and \fBnoteol\fP, causing REG_NOTBOL,
 REG_NOTEMPTY, and REG_NOTEOL, respectively, to be passed to \fBregexec()\fP.
-Any other modifiers cause an error.
+The other modifiers are ignored, with a warning message.
 .
 .
 .SS "Setting match controls"
@ -801,33 +1033,44 @@ information. Some of them may also be specified on a pattern line (see above),
 in which case they apply to every subject line that is matched against that
 pattern.
 .sp
-      aftertext                 show text after match
-      allaftertext              show text after captures
-      allcaptures               show all captures
-      allusedtext               show all consulted text (non-JIT only)
-      altglobal                 alternative global matching
-      callout_capture           show captures at callout time
-      callout_data=<n>          set a value to pass via callouts
-      callout_fail=<n>[:<m>]    control callout failure
-      callout_none              do not supply a callout function
-      copy=<number or name>     copy captured substring
-      dfa                       use \fBpcre2_dfa_match()\fP
-      find_limits               find match and recursion limits
-      get=<number or name>      extract captured substring
-      getall                    extract all captured substrings
-  /g  global                    global matching
-      jitstack=<n>              set size of JIT stack
-      mark                      show mark values
-      match_limit=>n>           set a match limit
-      memory                    show memory usage
-      offset=<n>                set starting offset
-      ovector=<n>               set size of output vector
-      recursion_limit=<n>       set a recursion limit
-      replace=<string>          specify a replacement string
-      startchar                 show startchar when relevant
-      zero_terminate            pass the subject as zero-terminated
+      aftertext                  show text after match
+      allaftertext               show text after captures
+      allcaptures                show all captures
+      allusedtext                show all consulted text (non-JIT only)
+      altglobal                  alternative global matching
+      callout_capture            show captures at callout time
+      callout_data=<n>           set a value to pass via callouts
+      callout_error=<n>[:<m>]    control callout error
+      callout_fail=<n>[:<m>]     control callout failure
+      callout_none               do not supply a callout function
+      copy=<number or name>      copy captured substring
+      dfa                        use \fBpcre2_dfa_match()\fP
+      find_limits                find match and recursion limits
+      get=<number or name>       extract captured substring
+      getall                     extract all captured substrings
+  /g  global                     global matching
+      jitstack=<n>               set size of JIT stack
+      mark                       show mark values
+      match_limit=<n>            set a match limit
+      memory                     show memory usage
+      null_context               match with a NULL context
+      offset=<n>                 set starting offset
+      offset_limit=<n>           set offset limit
+      ovector=<n>                set size of output vector
+      recursion_limit=<n>        set a recursion limit
+      replace=<string>           specify a replacement string
+      startchar                  show startchar when relevant
+      startoffset=<n>            same as offset=<n>
+      substitute_extedded        use PCRE2_SUBSTITUTE_EXTENDED
+      substitute_overflow_length use PCRE2_SUBSTITUTE_OVERFLOW_LENGTH
+      substitute_unknown_unset   use PCRE2_SUBSTITUTE_UNKNOWN_UNSET
+      substitute_unset_empty     use PCRE2_SUBSTITUTE_UNSET_EMPTY
+      zero_terminate             pass the subject as zero-terminated
 .sp
-The effects of these modifiers are described in the following sections.
+The effects of these modifiers are described in the following sections. When
+matching via the POSIX wrapper API, the \fBaftertext\fP, \fBallaftertext\fP,
+and \fBovector\fP subject modifiers work as described below. All other
+modifiers are either ignored, with a warning message, or cause an error.
 .
 .
 .SS "Showing more text"
@ -882,7 +1125,8 @@ The \fBallcaptures\fP modifier requests that the values of all potential
 captured parentheses be output after a match. By default, only those up to the
 highest one actually used in the match are output (corresponding to the return
 code from \fBpcre2_match()\fP). Groups that did not take part in the match
-are output as "<unset>".
+are output as "<unset>". This modifier is not relevant for DFA matching (which
+does no capturing); it is ignored, with a warning message, if present.
 .
 .
 .SS "Testing callouts"
@ -890,14 +1134,20 @@ are output as "<unset>".
 .sp
 A callout function is supplied when \fBpcre2test\fP calls the library matching
 functions, unless \fBcallout_none\fP is specified. If \fBcallout_capture\fP is
-set, the current captured groups are output when a callout occurs.
+set, the current captured groups are output when a callout occurs. The default
+return from the callout function is zero, which allows matching to continue.
 .P
 The \fBcallout_fail\fP modifier can be given one or two numbers. If there is
-only one number, 1 is returned instead of 0 when a callout of that number is
-reached. If two numbers are given, 1 is returned when callout <n> is reached
-for the <m>th time. Note that callouts with string arguments are always given
-the number zero. See "Callouts" below for a description of the output when a
-callout it taken.
+only one number, 1 is returned instead of 0 (causing matching to backtrack)
+when a callout of that number is reached. If two numbers (<n>:<m>) are given, 1
+is returned when callout <n> is reached and there have been at least <m>
+callouts. The \fBcallout_error\fP modifier is similar, except that
+PCRE2_ERROR_CALLOUT is returned, causing the entire matching process to be
+aborted. If both these modifiers are set for the same callout number,
+\fBcallout_error\fP takes precedence.
+.P
+Note that callouts with string arguments are always given the number zero. See
+"Callouts" below for a description of the output when a callout it taken.
 .P
 The \fBcallout_data\fP modifier can be given an unsigned or a negative number.
 This is set as the "user data" that is passed to the matching function, and
@ -909,7 +1159,7 @@ used as a return from \fBpcre2test\fP's callout function.
 .rs
 .sp
 Searching for all possible matches within a subject can be requested by the
-\fBglobal\fP or \fB/altglobal\fP modifier. After finding a match, the matching
+\fBglobal\fP or \fBaltglobal\fP modifier. After finding a match, the matching
 function is called again to search the remainder of the subject. The difference
 between \fBglobal\fP and \fBaltglobal\fP is that the former uses the
 \fIstart_offset\fP argument to \fBpcre2_match()\fP or \fBpcre2_dfa_match()\fP
@ -957,18 +1207,30 @@ by name.
 .rs
 .sp
 If the \fBreplace\fP modifier is set, the \fBpcre2_substitute()\fP function is
-called instead of one of the matching functions. Unlike subject strings,
-\fBpcre2test\fP does not process replacement strings for escape sequences. In
-UTF mode, a replacement string is checked to see if it is a valid UTF-8 string.
-If so, it is correctly converted to a UTF string of the appropriate code unit
-width. If it is not a valid UTF-8 string, the individual code units are copied
-directly. This provides a means of passing an invalid UTF-8 string for testing
-purposes.
+called instead of one of the matching functions. Note that replacement strings
+cannot contain commas, because a comma signifies the end of a modifier. This is
+not thought to be an issue in a test program.
 .P
-If the \fBglobal\fP modifier is set, PCRE2_SUBSTITUTE_GLOBAL is passed to
-\fBpcre2_substitute()\fP. After a successful substitution, the modified string
-is output, preceded by the number of replacements. This may be zero if there
-were no matches. Here is a simple example of a substitution test:
+Unlike subject strings, \fBpcre2test\fP does not process replacement strings
+for escape sequences. In UTF mode, a replacement string is checked to see if it
+is a valid UTF-8 string. If so, it is correctly converted to a UTF string of
+the appropriate code unit width. If it is not a valid UTF-8 string, the
+individual code units are copied directly. This provides a means of passing an
+invalid UTF-8 string for testing purposes.
+.P
+The following modifiers set options (in additional to the normal match options)
+for \fBpcre2_substitute()\fP:
+.sp
+  global                      PCRE2_SUBSTITUTE_GLOBAL
+  substitute_extended         PCRE2_SUBSTITUTE_EXTENDED
+  substitute_overflow_length  PCRE2_SUBSTITUTE_OVERFLOW_LENGTH
+  substitute_unknown_unset    PCRE2_SUBSTITUTE_UNKNOWN_UNSET
+  substitute_unset_empty      PCRE2_SUBSTITUTE_UNSET_EMPTY
+.sp
+.P
+After a successful substitution, the modified string is output, preceded by the
+number of replacements. This may be zero if there were no matches. Here is a
+simple example of a substitution test:
 .sp
  /abc/replace=xxx
      =abc=abc=
@ -976,12 +1238,12 @@ were no matches. Here is a simple example of a substitution test:
      =abc=abc=\e=global
   2: =xxx=xxx=
 .sp
-Subject and replacement strings should be kept relatively short for
-substitution tests, as fixed-size buffers are used. To make it easy to test for
-buffer overflow, if the replacement string starts with a number in square
-brackets, that number is passed to \fBpcre2_substitute()\fP as the size of the
-output buffer, with the replacement string starting at the next character. Here
-is an example that tests the edge case:
+Subject and replacement strings should be kept relatively short (fewer than 256
+characters) for substitution tests, as fixed-size buffers are used. To make it
+easy to test for buffer overflow, if the replacement string starts with a
+number in square brackets, that number is passed to \fBpcre2_substitute()\fP as
+the size of the output buffer, with the replacement string starting at the next
+character. Here is an example that tests the edge case:
 .sp
  /abc/
      123abc123\e=replace=[10]XYZ
@ -989,6 +1251,19 @@ is an example that tests the edge case:
      123abc123\e=replace=[9]XYZ
  Failed: error -47: no more memory
 .sp
+The default action of \fBpcre2_substitute()\fP is to return
+PCRE2_ERROR_NOMEMORY when the output buffer is too small. However, if the
+PCRE2_SUBSTITUTE_OVERFLOW_LENGTH option is set (by using the
+\fBsubstitute_overflow_length\fP modifier), \fBpcre2_substitute()\fP continues
+to go through the motions of matching and substituting, in order to compute the
+size of buffer that is required. When this happens, \fBpcre2test\fP shows the
+required buffer length (which includes space for the trailing zero) as part of
+the error message. For example:
+.sp
+  /abc/substitute_overflow_length
+      123abc123\e=replace=[9]XYZ
+  Failed: error -47: no more memory: 10 code units are needed
+.sp
 A replacement string is ignored with POSIX and DFA matching. Specifying partial
 matching provokes an error return ("bad option value") from
 \fBpcre2_substitute()\fP.
@ -1059,6 +1334,16 @@ The \fBoffset\fP modifier sets an offset in the subject string at which
 matching starts. Its value is a number of code units, not characters.
 .
 .
+.SS "Setting an offset limit"
+.rs
+.sp
+The \fBoffset_limit\fP modifier sets a limit for unanchored matches. If a match
+cannot be found starting at or before this offset in the subject, a "no match"
+return is given. The data value is a number of code units, not characters. When
+this modifier is used, the \fBuse_offset_limit\fP modifier must have been set
+for the pattern; if not, an error is generated.
+.
+.
 .SS "Setting the size of the output vector"
 .rs
 .sp
@ -1089,6 +1374,17 @@ When testing \fBpcre2_substitute()\fP, this modifier also has the effect of
 passing the replacement string as zero-terminated.
 .
 .
+.SS "Passing a NULL context"
+.rs
+.sp
+Normally, \fBpcre2test\fP passes a context block to \fBpcre2_match()\fP,
+\fBpcre2_dfa_match()\fP or \fBpcre2_jit_match()\fP. If the \fBnull_context\fP
+modifier is set, however, NULL is passed. This is for testing that the matching
+functions behave correctly in this case (they use default values). This
+modifier cannot be used with the \fBfind_limits\fP modifier or when testing the
+substitution function.
+.
+.
 .SH "THE ALTERNATIVE MATCHING FUNCTION"
 .rs
 .sp
@ -1156,7 +1452,7 @@ unset substring is shown as "<unset>", as for the second data line.
 If the strings contain any non-printing characters, they are output as \exhh
 escapes if the value is less than 256 and UTF mode is not set. Otherwise they
 are output as \ex{hh...} escapes. See below for the definition of non-printing
-characters. If the \fB/aftertext\fP modifier is set, the output for substring
+characters. If the \fBaftertext\fP modifier is set, the output for substring
 0 is followed by the the rest of the subject string, identified by "0+" like
 this:
 .sp
@ -1286,7 +1582,9 @@ item to be tested. For example:
 This output indicates that callout number 0 occurred for a match attempt
 starting at the fourth character of the subject string, when the pointer was at
 the seventh character, and when the next pattern item was \ed. Just
-one circumflex is output if the start and current positions are the same.
+one circumflex is output if the start and current positions are the same, or if
+the current position precedes the start position, which can happen if the
+callout is in a lookbehind assertion.
 .P
 Callouts numbered 255 are assumed to be automatic callouts, inserted as a
 result of the \fB/auto_callout\fP pattern modifier. In this case, instead of
@ -1352,7 +1650,7 @@ therefore shown as hex escapes.
 .P
 When \fBpcre2test\fP is outputting text that is a matched part of a subject
 string, it behaves in the same way, unless a different locale has been set for
-the pattern (using the \fB/locale\fP modifier). In this case, the
+the pattern (using the \fBlocale\fP modifier). In this case, the
 \fBisprint()\fP function is used to distinguish printing and non-printing
 characters.
 .
@ -1382,11 +1680,15 @@ can be used to test these functions.
 .P
 When a pattern with \fBpush\fP modifier is successfully compiled, it is pushed
 onto a stack of compiled patterns, and \fBpcre2test\fP expects the next line to
-contain a new pattern (or command) instead of a subject line. By this means, a
-number of patterns can be compiled and retained. The \fBpush\fP modifier is
-incompatible with \fBposix\fP, and control modifiers that act at match time are
-ignored (with a message). The \fBjitverify\fP modifier applies only at compile
-time. The command
+contain a new pattern (or command) instead of a subject line. By contrast,
+the \fBpushcopy\fP modifier causes a copy of the compiled pattern to be
+stacked, leaving the original available for immediate matching. By using
+\fBpush\fP and/or \fBpushcopy\fP, a number of patterns can be compiled and
+retained. These modifiers are incompatible with \fBposix\fP, and control
+modifiers that act at match time are ignored (with a message) for the stacked
+patterns. The \fBjitverify\fP modifier applies only at compile time.
+.P
+The command
 .sp
  #save <filename>
 .sp
@ -1406,7 +1708,8 @@ modifier list containing only
 control modifiers
 .\"
 that act after a pattern has been compiled. In particular, \fBhex\fP,
-\fBposix\fP, and \fBpush\fP are not allowed, nor are any
+\fBposix\fP, \fBposix_nosub\fP, \fBpush\fP, and \fBpushcopy\fP are not allowed,
+nor are any
 .\" HTML <a href="#optionmodifiers">
 .\" </a>
 option-setting modifiers.
@ -1426,6 +1729,10 @@ reloads two patterns.
 .sp
 If \fBjitverify\fP is used with #pop, it does not automatically imply
 \fBjit\fP, which is different behaviour from when it is used on a pattern.
+.P
+The #popcopy command is analagous to the \fBpushcopy\fP modifier in that it
+makes current a copy of the topmost stack pattern, leaving the original still
+on the stack.
 .
 .
 .
@ -1451,6 +1758,6 @@ Cambridge, England.
 .rs
 .sp
 .nf
-Last updated: 20 May 2015
-Copyright (c) 1997-2015 University of Cambridge.
+Last updated: 28 December 2016
+Copyright (c) 1997-2016 University of Cambridge.
 .fi
--- a/pcre2/doc/pcre2test.txt
+++ b/pcre2/doc/pcre2test.txt
--- a/pcre2/doc/pcre2unicode.3
+++ b/pcre2/doc/pcre2unicode.3
@ -1,4 +1,4 @@
-.TH PCRE2UNICODE 3 "23 November 2014" "PCRE2 10.00"
+.TH PCRE2UNICODE 3 "03 July 2016" "PCRE2 10.22"
 .SH NAME
 PCRE - Perl-compatible regular expressions (revised API)
 .SH "UNICODE AND UTF SUPPORT"
@ -57,17 +57,21 @@ individual code units.
 In UTF modes, the dot metacharacter matches one UTF character instead of a
 single code unit.
 .P
-The escape sequence \eC can be used to match a single code unit, in a UTF mode,
+The escape sequence \eC can be used to match a single code unit in a UTF mode,
 but its use can lead to some strange effects because it breaks up multi-unit
 characters (see the description of \eC in the
 .\" HREF
 \fBpcre2pattern\fP
 .\"
-documentation). The use of \eC is not supported in the alternative matching
-function \fBpcre2_dfa_match()\fP, nor is it supported in UTF mode by the JIT
-optimization. If JIT optimization is requested for a UTF pattern that contains
-\eC, it will not succeed, and so the matching will be carried out by the normal
-interpretive function.
+documentation).
+.P
+The use of \eC is not supported by the alternative matching function
+\fBpcre2_dfa_match()\fP when in UTF-8 or UTF-16 mode, that is, when a character
+may consist of more than one code unit. The use of \eC in these modes provokes
+a match-time error. Also, the JIT optimization does not support \eC in these
+modes. If JIT optimization is requested for a UTF-8 or UTF-16 pattern that
+contains \eC, it will not succeed, and so when \fBpcre2_match()\fP is called,
+the matching will be carried out by the normal interpretive function.
 .P
 The character escapes \eb, \eB, \ed, \eD, \es, \eS, \ew, and \eW correctly test
 characters of any code value, but, by default, the characters that PCRE2
@ -117,11 +121,21 @@ UTF-16 and UTF-32 strings can indicate their endianness by special code knows
 as a byte-order mark (BOM). The PCRE2 functions do not handle this, expecting
 strings to be in host byte order.
 .P
-The entire string is checked before any other processing takes place. In
-addition to checking the format of the string, there is a check to ensure that
-all code points lie in the range U+0 to U+10FFFF, excluding the surrogate area.
-The so-called "non-character" code points are not excluded because Unicode
-corrigendum #9 makes it clear that they should not be.
+A UTF string is checked before any other processing takes place. In the case of
+\fBpcre2_match()\fP and \fBpcre2_dfa_match()\fP calls with a non-zero starting
+offset, the check is applied only to that part of the subject that could be
+inspected during matching, and there is a check that the starting offset points
+to the first code unit of a character or to the end of the subject. If there
+are no lookbehind assertions in the pattern, the check starts at the starting
+offset. Otherwise, it starts at the length of the longest lookbehind before the
+starting offset, or at the start of the subject if there are not that many
+characters before the starting offset. Note that the sequences \eb and \eB are
+one-character lookbehinds.
+.P
+In addition to checking the format of the string, there is a check to ensure
+that all code points lie in the range U+0 to U+10FFFF, excluding the surrogate
+area. The so-called "non-character" code points are not excluded because
+Unicode corrigendum #9 makes it clear that they should not be.
 .P
 Characters in the "Surrogate Area" of Unicode are reserved for use by UTF-16,
 where they are used in pairs to encode code points with values greater than
@ -221,9 +235,9 @@ never occur in a valid UTF-8 string.
 .sp
 The following negative error codes are given for invalid UTF-16 strings:
 .sp
-  PCRE_UTF16_ERR1  Missing low surrogate at end of string
-  PCRE_UTF16_ERR2  Invalid low surrogate follows high surrogate
-  PCRE_UTF16_ERR3  Isolated low surrogate
+  PCRE2_ERROR_UTF16_ERR1  Missing low surrogate at end of string
+  PCRE2_ERROR_UTF16_ERR2  Invalid low surrogate follows high surrogate
+  PCRE2_ERROR_UTF16_ERR3  Isolated low surrogate
 .sp
 .
 .
@ -233,8 +247,8 @@ The following negative error codes are given for invalid UTF-16 strings:
 .sp
 The following negative error codes are given for invalid UTF-32 strings:
 .sp
-  PCRE_UTF32_ERR1  Surrogate character (range from 0xd800 to 0xdfff)
-  PCRE_UTF32_ERR2  Code point is greater than 0x10ffff
+  PCRE2_ERROR_UTF32_ERR1  Surrogate character (0xd800 to 0xdfff)
+  PCRE2_ERROR_UTF32_ERR2  Code point is greater than 0x10ffff
 .sp
 .
 .
@ -252,6 +266,6 @@ Cambridge, England.
 .rs
 .sp
 .nf
-Last updated: 23 November 2014
-Copyright (c) 1997-2014 University of Cambridge.
+Last updated: 03 July 2016
+Copyright (c) 1997-2016 University of Cambridge.
 .fi