Update bundled PCRE2-library to version 10.23

Some manual changes done to the library were lost with this update.
They will be added in the next commit.
This commit is contained in:
Esa Korhonen
2017-05-29 15:31:42 +03:00
parent 7231563937
commit 36af74cb25
218 changed files with 49218 additions and 26130 deletions

View File

@ -604,6 +604,19 @@ AB.VE the turtle
010203040506
match 1:
a
match 2:
b
match 3:
c
match 4:
d
match 5:
e
Rhubarb
Custard Tart
PUT NEW DATA ABOVE THIS LINE.
=============================

View File

@ -10,7 +10,7 @@ RC=0
7:PATTERN at the start of a line.
8:In the middle of a line, PATTERN appears.
10:This pattern is in lower case.
610:Check up on PATTERN near the end.
623:Check up on PATTERN near the end.
RC=0
---------------------------- Test 4 ------------------------------
4
@ -19,7 +19,7 @@ RC=0
./testdata/grepinput:7:PATTERN at the start of a line.
./testdata/grepinput:8:In the middle of a line, PATTERN appears.
./testdata/grepinput:10:This pattern is in lower case.
./testdata/grepinput:610:Check up on PATTERN near the end.
./testdata/grepinput:623:Check up on PATTERN near the end.
./testdata/grepinputx:3:Here is the pattern again.
./testdata/grepinputx:5:Pattern
./testdata/grepinputx:42:This line contains pattern not on a line by itself.
@ -28,7 +28,7 @@ RC=0
7:PATTERN at the start of a line.
8:In the middle of a line, PATTERN appears.
10:This pattern is in lower case.
610:Check up on PATTERN near the end.
623:Check up on PATTERN near the end.
3:Here is the pattern again.
5:Pattern
42:This line contains pattern not on a line by itself.
@ -324,10 +324,10 @@ RC=0
./testdata/grepinput-9-
./testdata/grepinput:10:This pattern is in lower case.
--
./testdata/grepinput-607-PUT NEW DATA ABOVE THIS LINE.
./testdata/grepinput-608-=============================
./testdata/grepinput-609-
./testdata/grepinput:610:Check up on PATTERN near the end.
./testdata/grepinput-620-PUT NEW DATA ABOVE THIS LINE.
./testdata/grepinput-621-=============================
./testdata/grepinput-622-
./testdata/grepinput:623:Check up on PATTERN near the end.
--
./testdata/grepinputx-1-This is a second file of input for the pcregrep tests.
./testdata/grepinputx-2-
@ -349,8 +349,8 @@ RC=0
./testdata/grepinput-12-Here follows a whole lot of stuff that makes the file over 24K long.
./testdata/grepinput-13-
--
./testdata/grepinput:610:Check up on PATTERN near the end.
./testdata/grepinput-611-This is the last line of this file.
./testdata/grepinput:623:Check up on PATTERN near the end.
./testdata/grepinput-624-This is the last line of this file.
--
./testdata/grepinputx:3:Here is the pattern again.
./testdata/grepinputx-4-
@ -456,8 +456,8 @@ over the lazy dog.
This time it jumps and jumps and jumps.
RC=0
---------------------------- Test 52 ------------------------------
fox jumps
This time it jumps and jumps and jumps.
fox jumps
This time it jumps and jumps and jumps.
RC=0
---------------------------- Test 53 ------------------------------
36972,6
@ -474,9 +474,9 @@ RC=0
597:32,4
RC=0
---------------------------- Test 55 -----------------------------
Here is the pattern again.
That time it was on a line by itself.
This line contains pattern not on a line by itself.
Here is the pattern again.
That time it was on a line by itself.
This line contains pattern not on a line by itself.
RC=0
---------------------------- Test 56 -----------------------------
./testdata/grepinput:456
@ -588,56 +588,57 @@ RC=0
---------------------------- Test 70 -----------------------------
triple: t1_txt s1_tag s_txt p_tag p_txt o_tag o_txt
triple: t3_txt s2_tag s_txt p_tag p_txt o_tag o_txt
triple: t3_txt s2_tag s_txt p_tag p_txt o_tag o_txt
triple: t4_txt s1_tag s_txt p_tag p_txt o_tag o_txt
triple: t4_txt s1_tag s_txt p_tag p_txt o_tag o_txt
triple: t6_txt s2_tag s_txt p_tag p_txt o_tag o_txt
triple: t6_txt s2_tag s_txt p_tag p_txt o_tag o_txt
RC=0
RC=0
---------------------------- Test 71 -----------------------------
01
RC=0
---------------------------- Test 72 -----------------------------
010203040506
010203040506
RC=0
---------------------------- Test 73 -----------------------------
01
01
RC=0
---------------------------- Test 74 -----------------------------
01
02
RC=0
---------------------------- Test 75 -----------------------------
010203040506
010203040506
RC=0
---------------------------- Test 76 -----------------------------
01
02
01
02
RC=0
---------------------------- Test 77 -----------------------------
01
03
RC=0
---------------------------- Test 78 -----------------------------
010203040506
010203040506
RC=0
---------------------------- Test 79 -----------------------------
01
03
01
03
RC=0
---------------------------- Test 80 -----------------------------
01
RC=0
---------------------------- Test 81 -----------------------------
010203040506
010203040506
RC=0
---------------------------- Test 82 -----------------------------
01
01
RC=0
---------------------------- Test 83 -----------------------------
pcre2grep: line 4 of file ./testdata/grepinput3 is too long for the internal buffer
pcre2grep: check the --buffer-size option
pcre2grep: the maximum buffer size is 100
pcre2grep: use the --max-buffer-size option to change it
RC=2
---------------------------- Test 84 -----------------------------
testdata/grepinputv:fox jumps
@ -701,9 +702,9 @@ RC=0
./testdata/grepinput:zerothe.
RC=0
---------------------------- Test 101 ------------------------------
./testdata/grepinput:.|zero|the|.
./testdata/grepinput:zero|a
./testdata/grepinput:.|zero|the|.
./testdata/grepinput:.|zero|the|.
./testdata/grepinput:zero|a
./testdata/grepinput:.|zero|the|.
RC=0
---------------------------- Test 102 -----------------------------
2:
@ -724,21 +725,21 @@ RC=0
14:
RC=0
---------------------------- Test 105 -----------------------------
triple: t1_txt s1_tag s_txt p_tag p_txt o_tag o_txt

triple: t2_txt s1_tag s_txt p_tag p_txt o_tag
Lorem ipsum dolor sit amet, consectetur adipisicing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam, quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo consequat. Duis aute irure dolor in reprehenderit in voluptate velit esse cillum dolore eu fugiat nulla pariatur. Excepteur sint occaecat cupidatat non proident, sunt in culpa qui officia deserunt mollit anim id est laborum.

triple: t3_txt s2_tag s_txt p_tag p_txt o_tag o_txt

triple: t4_txt s1_tag s_txt p_tag p_txt o_tag o_txt

triple: t5_txt s1_tag s_txt p_tag p_txt o_tag
o_txt

triple: t6_txt s2_tag s_txt p_tag p_txt o_tag o_txt

triple: t7_txt s1_tag s_txt p_tag p_txt o_tag o_txt
triple: t1_txt s1_tag s_txt p_tag p_txt o_tag o_txt
triple: t2_txt s1_tag s_txt p_tag p_txt o_tag
Lorem ipsum dolor sit amet, consectetur adipisicing elit, sed do eiusmod tempor incididunt ut labore et dolore magna aliqua. Ut enim ad minim veniam, quis nostrud exercitation ullamco laboris nisi ut aliquip ex ea commodo consequat. Duis aute irure dolor in reprehenderit in voluptate velit esse cillum dolore eu fugiat nulla pariatur. Excepteur sint occaecat cupidatat non proident, sunt in culpa qui officia deserunt mollit anim id est laborum.
triple: t3_txt s2_tag s_txt p_tag p_txt o_tag o_txt
triple: t4_txt s1_tag s_txt p_tag p_txt o_tag o_txt
triple: t5_txt s1_tag s_txt p_tag p_txt o_tag
o_txt
triple: t6_txt s2_tag s_txt p_tag p_txt o_tag o_txt
triple: t7_txt s1_tag s_txt p_tag p_txt o_tag o_txt
RC=0
---------------------------- Test 106 -----------------------------
a
@ -751,3 +752,80 @@ RC=0
2:3,1
2:4,1
RC=0
---------------------------- Test 108 ------------------------------
RC=0
---------------------------- Test 109 -----------------------------
RC=0
---------------------------- Test 110 -----------------------------
match 1:
a
/1/a
match 2:
b
/2/b
match 3:
c
/3/c
match 4:
d
/4/d
match 5:
e
/5/e
RC=0
---------------------------- Test 111 -----------------------------
607:0,12
609:0,12
611:0,12
613:0,12
615:0,12
RC=0
---------------------------- Test 112 -----------------------------
37168,12
37180,12
37192,12
37204,12
37216,12
RC=0
---------------------------- Test 113 -----------------------------
476
RC=0
---------------------------- Test 114 -----------------------------
testdata/grepinput:469
testdata/grepinput3:0
testdata/grepinput8:0
testdata/grepinputv:1
testdata/grepinputx:6
TOTAL:476
RC=0
---------------------------- Test 115 -----------------------------
testdata/grepinput:469
testdata/grepinputv:1
testdata/grepinputx:6
TOTAL:476
RC=0
---------------------------- Test 116 -----------------------------
476
RC=0
---------------------------- Test 117 -----------------------------
469
0
0
1
6
476
RC=0
---------------------------- Test 118 -----------------------------
testdata/grepinput3
testdata/grepinput8
RC=0
---------------------------- Test 119 -----------------------------
123
456
789
---
abc
def
xyz
---
RC=0

8
pcre2/testdata/grepoutputC vendored Normal file
View File

@ -0,0 +1,8 @@
Arg1: [T] [he ] [ ] Arg2: |T| () () (0)
Arg1: [T] [his] [s] Arg2: |T| () () (0)
The quick brown
This time it jumps and jumps and jumps.
Arg1: [qu] [qu]
Arg1: [ t] [ t]
The quick brown
This time it jumps and jumps and jumps.

File diff suppressed because it is too large Load Diff

View File

@ -1,45 +1,7 @@
# This set of tests is for UTF-8 support and Unicode property support, with
# relevance only for the 8-bit library.
/X(\C{3})/utf
X\x{1234}
/X(\C{4})/utf
X\x{1234}YZ
/X\C*/utf
XYZabcdce
/X\C*?/utf
XYZabcde
/X\C{3,5}/utf
Xabcdefg
X\x{1234}
X\x{1234}YZ
X\x{1234}\x{512}
X\x{1234}\x{512}YZ
/X\C{3,5}?/utf
Xabcdefg
X\x{1234}
X\x{1234}YZ
X\x{1234}\x{512}
/a\Cb/utf
aXb
a\nb
/a\C\Cb/utf
a\x{100}b
/ab\Cde/utf
abXde
/a\C\Cb/utf
a\x{100}b
** Failers
a\x{12257}b
# The next 4 patterns have UTF-8 errors
/[�]/utf
@ -47,7 +9,12 @@
/���xxx/utf
/��������/utf
# Now test subjects
/badutf/utf
\= Expect UTF-8 errors
X\xdf
XX\xef
XXX\xef\x80
@ -89,11 +56,13 @@
\xff
/badutf/utf
\= Expect UTF-8 errors
XX\xfb\x80\x80\x80\x80
XX\xfd\x80\x80\x80\x80\x80
XX\xf7\xbf\xbf\xbf
/shortutf/utf
\= Expect UTF-8 errors
XX\xdf\=ph
XX\xef\=ph
XX\xef\x80\=ph
@ -111,6 +80,7 @@
\xfd\x80\x80\x80\x80\=ph
/anything/utf
\= Expect UTF-8 errors
X\xc0\x80
XX\xc1\x8f
XXX\xe0\x9f\x80
@ -119,20 +89,57 @@
\xfc\x83\x80\x80\x80\x80
\xfe\x80\x80\x80\x80\x80
\xff\x80\x80\x80\x80\x80
\xf8\x88\x80\x80\x80
\xf9\x87\x80\x80\x80
\xfc\x84\x80\x80\x80\x80
\xfd\x83\x80\x80\x80\x80
\= Expect no match
\xc3\x8f
\xe0\xaf\x80
\xe1\x80\x80
\xf0\x9f\x80\x80
\xf1\x8f\x80\x80
\xf8\x88\x80\x80\x80
\xf9\x87\x80\x80\x80
\xfc\x84\x80\x80\x80\x80
\xfd\x83\x80\x80\x80\x80
\xf8\x88\x80\x80\x80\=no_utf_check
\xf9\x87\x80\x80\x80\=no_utf_check
\xfc\x84\x80\x80\x80\x80\=no_utf_check
\xfd\x83\x80\x80\x80\x80\=no_utf_check
# Similar tests with offsets
/badutf/utf
\= Expect UTF-8 errors
X\xdfabcd
X\xdfabcd\=offset=1
\= Expect no match
X\xdfabcd\=offset=2
/(?<=x)badutf/utf
\= Expect UTF-8 errors
X\xdfabcd
X\xdfabcd\=offset=1
X\xdfabcd\=offset=2
X\xdfabcd\xdf\=offset=3
\= Expect no match
X\xdfabcd\=offset=3
/(?<=xx)badutf/utf
\= Expect UTF-8 errors
X\xdfabcd
X\xdfabcd\=offset=1
X\xdfabcd\=offset=2
X\xdfabcd\=offset=3
/(?<=xxxx)badutf/utf
\= Expect UTF-8 errors
X\xdfabcd
X\xdfabcd\=offset=1
X\xdfabcd\=offset=2
X\xdfabcd\=offset=3
X\xdfabc\xdf\=offset=6
X\xdfabc\xdf\=offset=7
\= Expect no match
X\xdfabcd\=offset=6
/\x{100}/IB,utf
/\x{1000}/IB,utf
@ -167,27 +174,12 @@
/\x{212ab}/IB,utf
# This one is here not because it's different to Perl, but because the way
# the captured single-byte is displayed. (In Perl it becomes a character, and you
# can't tell the difference.)
/X(\C)(.*)/utf
X\x{1234}
X\nabc
# This one is here because Perl gives out a grumbly error message (quite
# correctly, but that messes up comparisons).
/a\Cb/utf
*** Failers
a\x{100}b
/[^ab\xC0-\xF0]/IB,utf
\x{f1}
\x{bf}
\x{100}
\x{1000}
*** Failers
\= Expect no match
\x{c0}
\x{f0}
@ -214,7 +206,6 @@
\x{100}
Z\x{100}
\x{100}Z
*** Failers
/[\xff]/IB,utf
>\x{ff}<
@ -236,21 +227,23 @@
# This tests the stricter UTF-8 check according to RFC 3629.
/X/utf
\= Expect UTF-8 errors
\x{d800}
\x{d800}\=no_utf_check
\x{da00}
\x{da00}\=no_utf_check
\x{dfff}
\x{dfff}\=no_utf_check
\x{110000}
\x{110000}\=no_utf_check
\x{2000000}
\x{2000000}\=no_utf_check
\x{7fffffff}
\= Expect no match
\x{d800}\=no_utf_check
\x{da00}\=no_utf_check
\x{dfff}\=no_utf_check
\x{110000}\=no_utf_check
\x{2000000}\=no_utf_check
\x{7fffffff}\=no_utf_check
/(*UTF8)\x{1234}/
abcd\x{1234}pqr
abcd\x{1234}pqr
/(*CRLF)(*UTF)(*BSR_UNICODE)a\Rb/I
@ -290,11 +283,14 @@
/a+/utf
a\x{123}aa\=offset=1
a\x{123}aa\=offset=2
a\x{123}aa\=offset=3
a\x{123}aa\=offset=4
a\x{123}aa\=offset=5
\= Expect bad offset value
a\x{123}aa\=offset=6
\= Expect bad UTF-8 offset
a\x{123}aa\=offset=2
\= Expect no match
a\x{123}aa\=offset=5
/\x{1234}+/Ii,utf
@ -395,7 +391,6 @@
Z\x{100}
\x{100}
\x{100}Z
*** Failers
/[z-\x{100}]/IB,utf
@ -421,7 +416,7 @@
\x{104}
\x{105}
\x{109}
** Failers
\= Expect no match
\x{100}
\x{10a}
@ -435,7 +430,7 @@
\x{ff}
\x{100}
\x{101}
** Failers
\= Expect no match
\x{102}
Y
y
@ -445,6 +440,22 @@
/\x{3a3}B/IBi,utf
/abc/utf,replace=�
abc
abc
/(?<=(a)(?-1))x/I,utf
a\x80zx\=offset=3
/[\W\p{Any}]/B
abc
123
/[\W\pL]/B
abc
\= Expect no match
123
/(*:*++++++++++++''''''''''''''''''''+''+++'+++x+++++++++++++++++++++++++++++++++++(++++++++++++++++++++:++++++%++:''''''''''''''''''''''''+++++++++++++++++++++++++++++++++++++++++++++++++++++-++++++++k+++++++''''+++'+++++++++++++++++++++++''''++++++++++++':ƿ)/utf
/[\s[:^ascii:]]/B,ucp
# End of testinput10

View File

@ -4,11 +4,8 @@
# different, so they have separate output files.
#forbid_utf
#newline_default LF ANY ANYCRLF
/a\Cb/
aXb
a\nb
/[^\x{c4}]/IB
/\x{100}/I
@ -343,7 +340,7 @@
# Non-UTF characters
/\C{2,3}/
/.{2,3}/
\x{400000}\x{400001}\x{400002}\x{400003}
/\x{400000}\x{800000}/IBi
@ -354,4 +351,21 @@
/[\V]/IB
/(*THEN:\[A]{65501})/expand
# We can use pcre2test's utf8_input modifier to create wide pattern characters,
# even though this test is run when UTF is not supported.
/ab������z/utf8_input
ab������z
ab\x{7fffffff}z
/ab�������z/utf8_input
ab�������z
ab\x{ffffffff}z
/ab�Az/utf8_input
ab�Az
ab\x{80000041}z
# End of testinput11

View File

@ -7,49 +7,6 @@
/abc/utf
�]
/X(\C{3})/utf
X\x{11234}Y
X\x{11234}YZ
/X(\C{4})/utf
X\x{11234}YZ
X\x{11234}YZW
/X\C*/utf
XYZabcdce
/X\C*?/utf
XYZabcde
/X\C{3,5}/utf
Xabcdefg
X\x{11234}Y
X\x{11234}YZ
X\x{11234}\x{512}
X\x{11234}\x{512}YZ
X\x{11234}\x{512}\x{11234}Z
/X\C{3,5}?/utf
Xabcdefg
X\x{11234}Y
X\x{11234}YZ
X\x{11234}\x{512}YZ
*** Failers
X\x{11234}
/a\Cb/utf
aXb
a\nb
/a\C\Cb/utf
a\x{12257}b
a\x{12257}\x{11234}b
** Failers
a\x{100}b
/ab\Cde/utf
abXde
# Check maximum character size
/\x{ffff}/IB,utf
@ -90,27 +47,12 @@
/\x{212ab}/IB,utf
# This one is here not because it's different to Perl, but because the way
# the captured single-byte is displayed. (In Perl it becomes a character, and you
# can't tell the difference.)
/X(\C)(.*)/utf
X\x{1234}
X\nabc
# This one is here because Perl gives out a grumbly error message (quite
# correctly, but that messes up comparisons).
/a\Cb/utf
*** Failers
a\x{100}b
/[^ab\xC0-\xF0]/IB,utf
\x{f1}
\x{bf}
\x{100}
\x{1000}
*** Failers
\= Expect no match
\x{c0}
\x{f0}
@ -137,7 +79,6 @@
\x{100}
Z\x{100}
\x{100}Z
*** Failers
/[\xff]/IB,utf
>\x{ff}<
@ -157,18 +98,24 @@
/^[\QĀ\E-\QŐ\E/B,utf
/X/utf
XX\x{d800}
XX\x{d800}\=no_utf_check
XX\x{da00}
XX\x{da00}\=no_utf_check
XX\x{dc00}
XX\x{dc00}\=no_utf_check
XX\x{de00}
XX\x{de00}\=no_utf_check
XX\x{dfff}
XX\x{dfff}\=no_utf_check
\= Expect UTF error
XX\x{d800}
XX\x{da00}
XX\x{dc00}
XX\x{de00}
XX\x{dfff}
XX\x{110000}
XX\x{d800}\x{1234}
\= Expect no match
XX\x{d800}\=offset=3
/(?<=.)X/utf
XX\x{d800}\=offset=3
/(*UTF16)\x{11234}/
abcd\x{11234}pqr
@ -229,7 +176,9 @@
a\x{123}aa\=offset=1
a\x{123}aa\=offset=2
a\x{123}aa\=offset=3
\= Expect no match
a\x{123}aa\=offset=4
\= Expect bad offset error
a\x{123}aa\=offset=5
a\x{123}aa\=offset=6
@ -250,11 +199,16 @@
# Check bad offset
/a/utf
\= Expect bad UTF-16 offset, or no match in 32-bit
\x{10000}\=offset=1
\x{10000}ab\=offset=1
\= Expect 16-bit match, 32-bit no match
\x{10000}ab\=offset=2
\= Expect no match
\x{10000}ab\=offset=3
\= Expect no match in 16-bit, bad offset in 32-bit
\x{10000}ab\=offset=4
\= Expect bad offset
\x{10000}ab\=offset=5
/���/utf
@ -329,9 +283,6 @@
/\o{4200000}/utf
/\C/utf
\x{110000}
/\x{100}*A/IB,utf
A
@ -341,7 +292,6 @@
Z\x{100}
\x{100}
\x{100}Z
*** Failers
/[z-\x{100}]/IB,utf
@ -367,7 +317,7 @@
\x{104}
\x{105}
\x{109}
** Failers
\= Expect no match
\x{100}
\x{10a}
@ -381,7 +331,7 @@
\x{ff}
\x{100}
\x{101}
** Failers
\= Expect no match
\x{102}
Y
y
@ -390,4 +340,24 @@
/\x{3a3}B/IBi,utf
/./utf
\x{110000}
/(*UTF)ab������z/B
/ab������z/utf
/[\W\p{Any}]/B
abc
123
/[\W\pL]/B
abc
\x{100}
\x{308}
\= Expect no match
123
/[\s[:^ascii:]]/B,ucp
# End of testinput12

View File

@ -1,112 +1,37 @@
# These are:
#
# (1) Tests of the match-limiting features. The results are different for
# interpretive or JIT matching, so this test should not be run with JIT. The
# same tests are run using JIT in test 16.
# These test special (mostly error) UTF features of DFA matching. They are a
# selection of the more comprehensive tests that are run for non-DFA matching.
# The output is different for the different widths.
# (2) Other tests that must not be run with JIT.
#subject dfa
/(a+)*zz/I
aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaazzbbbbbb\=find_limits
aaaaaaaaaaaaaz\=find_limits
/X/utf
XX\x{d800}
XX\x{d800}\=offset=3
XX\x{d800}\=no_utf_check
XX\x{da00}
XX\x{da00}\=no_utf_check
XX\x{dc00}
XX\x{dc00}\=no_utf_check
XX\x{de00}
XX\x{de00}\=no_utf_check
XX\x{dfff}
XX\x{dfff}\=no_utf_check
XX\x{110000}
XX\x{d800}\x{1234}
/badutf/utf
X\xdf
XX\xef
XXX\xef\x80
X\xf7
XX\xf7\x80
XXX\xf7\x80\x80
!((?:\s|//.*\\n|/[*](?:\\n|.)*?[*]/)*)!I
/* this is a C style comment */\=find_limits
/^(?>a)++/
aa\=find_limits
aaaaaaaaa\=find_limits
/(a)(?1)++/
aa\=find_limits
aaaaaaaaa\=find_limits
/a(?:.)*?a/ims
abbbbbbbbbbbbbbbbbbbbba\=find_limits
/a(?:.(*THEN))*?a/ims
abbbbbbbbbbbbbbbbbbbbba\=find_limits
/a(?:.(*THEN:ABC))*?a/ims
abbbbbbbbbbbbbbbbbbbbba\=find_limits
/^(?>a+)(?>b+)(?>c+)(?>d+)(?>e+)/
aabbccddee\=find_limits
/^(?>(a+))(?>(b+))(?>(c+))(?>(d+))(?>(e+))/
aabbccddee\=find_limits
/^(?>(a+))(?>b+)(?>(c+))(?>d+)(?>(e+))/
aabbccddee\=find_limits
/(*LIMIT_MATCH=12bc)abc/
/(*LIMIT_MATCH=4294967290)abc/
/(*LIMIT_RECURSION=4294967280)abc/I
/(a+)*zz/
aaaaaaaaaaaaaz
aaaaaaaaaaaaaz\=match_limit=3000
/(a+)*zz/
aaaaaaaaaaaaaz\=recursion_limit=10
/(*LIMIT_MATCH=3000)(a+)*zz/I
aaaaaaaaaaaaaz
aaaaaaaaaaaaaz\=match_limit=60000
/(*LIMIT_MATCH=60000)(*LIMIT_MATCH=3000)(a+)*zz/I
aaaaaaaaaaaaaz
/(*LIMIT_MATCH=60000)(a+)*zz/I
aaaaaaaaaaaaaz
aaaaaaaaaaaaaz\=match_limit=3000
/(*LIMIT_RECURSION=10)(a+)*zz/I
aaaaaaaaaaaaaz
aaaaaaaaaaaaaz\=recursion_limit=1000
/(*LIMIT_RECURSION=10)(*LIMIT_RECURSION=1000)(a+)*zz/I
aaaaaaaaaaaaaz
/(*LIMIT_RECURSION=1000)(a+)*zz/I
aaaaaaaaaaaaaz
aaaaaaaaaaaaaz\=recursion_limit=10
# These three have infinitely nested recursions.
/((?2))((?1))/
abc
/((?(R2)a+|(?1)b))/
aaaabcde
/(?(R)a*(?1)|((?R))b)/
aaaabcde
# The allusedtext modifier does not work with JIT, which does not maintain
# the leftchar/rightchar data.
/abc(?=xyz)/allusedtext
abcxyzpqr
abcxyzpqr\=aftertext
/(?<=pqr)abc(?=xyz)/allusedtext
xyzpqrabcxyzpqr
xyzpqrabcxyzpqr\=aftertext
/a\b/
a.\=allusedtext
a\=allusedtext
/abc\Kxyz/
abcxyz\=allusedtext
/abc(?=xyz(*ACCEPT))/
abcxyz\=allusedtext
/abc(?=abcde)(?=ab)/allusedtext
abcabcdefg
/shortutf/utf
XX\xdf\=ph
XX\xef\=ph
XX\xef\x80\=ph
\xf7\=ph
\xf7\x80\=ph
# End of testinput14

View File

@ -1,9 +1,168 @@
# This test is run only when JIT support is not available. It checks that an
# attempt to use it has the expected behaviour. It also tests things that
# are different without JIT.
# These are:
#
# (1) Tests of the match-limiting features. The results are different for
# interpretive or JIT matching, so this test should not be run with JIT. The
# same tests are run using JIT in test 17.
/abc/I,jit,jitverify
# (2) Other tests that must not be run with JIT.
/a*/I
/(a+)*zz/I
aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaazzbbbbbb\=find_limits
aaaaaaaaaaaaaz\=find_limits
!((?:\s|//.*\\n|/[*](?:\\n|.)*?[*]/)*)!I
/* this is a C style comment */\=find_limits
/^(?>a)++/
aa\=find_limits
aaaaaaaaa\=find_limits
/(a)(?1)++/
aa\=find_limits
aaaaaaaaa\=find_limits
/a(?:.)*?a/ims
abbbbbbbbbbbbbbbbbbbbba\=find_limits
/a(?:.(*THEN))*?a/ims
abbbbbbbbbbbbbbbbbbbbba\=find_limits
/a(?:.(*THEN:ABC))*?a/ims
abbbbbbbbbbbbbbbbbbbbba\=find_limits
/^(?>a+)(?>b+)(?>c+)(?>d+)(?>e+)/
aabbccddee\=find_limits
/^(?>(a+))(?>(b+))(?>(c+))(?>(d+))(?>(e+))/
aabbccddee\=find_limits
/^(?>(a+))(?>b+)(?>(c+))(?>d+)(?>(e+))/
aabbccddee\=find_limits
/(*LIMIT_MATCH=12bc)abc/
/(*LIMIT_MATCH=4294967290)abc/
/(*LIMIT_RECURSION=4294967280)abc/I
/(a+)*zz/
aaaaaaaaaaaaaz
aaaaaaaaaaaaaz\=match_limit=3000
/(a+)*zz/
aaaaaaaaaaaaaz\=recursion_limit=10
/(*LIMIT_MATCH=3000)(a+)*zz/I
aaaaaaaaaaaaaz
aaaaaaaaaaaaaz\=match_limit=60000
/(*LIMIT_MATCH=60000)(*LIMIT_MATCH=3000)(a+)*zz/I
aaaaaaaaaaaaaz
/(*LIMIT_MATCH=60000)(a+)*zz/I
aaaaaaaaaaaaaz
aaaaaaaaaaaaaz\=match_limit=3000
/(*LIMIT_RECURSION=10)(a+)*zz/I
aaaaaaaaaaaaaz
aaaaaaaaaaaaaz\=recursion_limit=1000
/(*LIMIT_RECURSION=10)(*LIMIT_RECURSION=1000)(a+)*zz/I
aaaaaaaaaaaaaz
/(*LIMIT_RECURSION=1000)(a+)*zz/I
aaaaaaaaaaaaaz
aaaaaaaaaaaaaz\=recursion_limit=10
# These three have infinitely nested recursions.
/((?2))((?1))/
abc
/((?(R2)a+|(?1)b))()/
aaaabcde
/(?(R)a*(?1)|((?R))b)/
aaaabcde
# The allusedtext modifier does not work with JIT, which does not maintain
# the leftchar/rightchar data.
/abc(?=xyz)/allusedtext
abcxyzpqr
abcxyzpqr\=aftertext
/(?<=pqr)abc(?=xyz)/allusedtext
xyzpqrabcxyzpqr
xyzpqrabcxyzpqr\=aftertext
/a\b/
a.\=allusedtext
a\=allusedtext
/abc\Kxyz/
abcxyz\=allusedtext
/abc(?=xyz(*ACCEPT))/
abcxyz\=allusedtext
/abc(?=abcde)(?=ab)/allusedtext
abcabcdefg
# These tests provoke recursion loops, which give a different error message
# when JIT is used.
/(?R)/I
abcd
/(a|(?R))/I
abcd
defg
/(ab|(bc|(de|(?R))))/I
abcd
fghi
/(ab|(bc|(de|(?1))))/I
abcd
fghi
/x(ab|(bc|(de|(?1)x)x)x)/I
xab123
xfghi
/(?!\w)(?R)/
abcd
=abc
/(?=\w)(?R)/
=abc
abcd
/(?<!\w)(?R)/
abcd
/(?<=\w)(?R)/
abcd
/(a+|(?R)b)/
aaa
bbb
/[^\xff]((?1))/BI
abcd
# These tests don't behave the same with JIT
/\w+(?C1)/BI,no_auto_possess
abc\=callout_fail=1
/(*NO_AUTO_POSSESS)\w+(?C1)/BI
abc\=callout_fail=1
# This test breaks the JIT stack limit
/(|]+){2,2452}/
(|]+){2,2452}
# End of testinput15

File diff suppressed because one or more lines are too long

File diff suppressed because one or more lines are too long

View File

@ -1,17 +1,112 @@
# This set of tests is run only with the 8-bit library. It tests the POSIX
# interface with UTF/UCP support, which is supported only with the 8-bit
# library. This test should not be run with JIT (which is not available for the
# POSIX interface).
# interface, which is supported only with the 8-bit library. This test should
# not be run with JIT (which is not available for the POSIX interface).
#forbid_utf
#pattern posix
/a\x{1234}b/utf
a\x{1234}b
# Test invalid options
/\w/
+++\x{c2}
/abc/auto_callout
/\w/ucp
+++\x{c2}
# End of testdata/testinput17
/abc/
abc\=find_limits
/abc/
abc\=partial_hard
# Real tests
/abc/
abc
/^abc|def/
abcdef
abcdef\=notbol
/.*((abc)$|(def))/
defabc
defabc\=noteol
/the quick brown fox/
the quick brown fox
\= Expect no match
The Quick Brown Fox
/the quick brown fox/i
the quick brown fox
The Quick Brown Fox
/(*LF)abc.def/
\= Expect no match
abc\ndef
/(*LF)abc$/
abc
abc\n
/(abc)\2/
/(abc\1)/
\= Expect no match
abc
/a*(b+)(z)(z)/
aaaabbbbzzzz
aaaabbbbzzzz\=ovector=0
aaaabbbbzzzz\=ovector=1
aaaabbbbzzzz\=ovector=2
/(*ANY)ab.cd/
ab-cd
ab=cd
\= Expect no match
ab\ncd
/ab.cd/s
ab-cd
ab=cd
ab\ncd
/a(b)c/posix_nosub
abc
/a(?P<name>b)c/posix_nosub
abc
/(a)\1/posix_nosub
zaay
/a?|b?/
abc
\= Expect no match
ddd\=notempty
/\w+A/
CDAAAAB
/\w+A/ungreedy
CDAAAAB
/\Biss\B/I,aftertext
Mississippi
/abc/\
"(?(?C)"
"(?(?C))"
/abcd/substitute_extended
/\[A]{1000000}**/expand,regerror_buffsize=31
/\[A]{1000000}**/expand,regerror_buffsize=32
//posix_nosub
\=offset=70000
/(?=(a\K))/
a
# End of testdata/testinput18

View File

@ -1,62 +1,18 @@
# This set of tests exercises the serialization/deserialization functions in
# the library. It does not use UTF or JIT.
#forbid_utf
# Compile several patterns, push them onto the stack, and then write them
# all to a file.
#pattern push
/(?<NAME>(?&NAME_PAT))\s+(?<ADDR>(?&ADDRESS_PAT))
(?(DEFINE)
(?<NAME_PAT>[a-z]+)
(?<ADDRESS_PAT>\d+)
)/x
/^(?:((.)(?1)\2|)|((.)(?3)\4|.))$/i
#save testsaved1
# Do it again for some more patterns.
/(*MARK:A)(*SKIP:B)(C|X)/mark
/(?:(?<n>foo)|(?<n>bar))\k<n>/dupnames
#save testsaved2
#pattern -push
# Reload the patterns, then pop them one by one and check them.
#load testsaved1
#load testsaved2
#pop info
foofoo
barbar
# This set of tests is run only with the 8-bit library. It tests the POSIX
# interface with UTF/UCP support, which is supported only with the 8-bit
# library. This test should not be run with JIT (which is not available for the
# POSIX interface).
#pop mark
C
D
#pattern posix
/a\x{1234}b/utf
a\x{1234}b
/\w/
\= Expect no match
+++\x{c2}
/\w/ucp
+++\x{c2}
#pop
AmanaplanacanalPanama
#pop info
metcalfe 33
# Check for an error when different tables are used.
/abc/push,tables=1
/xyz/push,tables=2
#save testsaved1
#pop
xyz
#pop
abc
#pop should give an error
pqr
# End of testinput19
# End of testdata/testinput19

File diff suppressed because it is too large Load Diff

100
pcre2/testdata/testinput20 vendored Normal file
View File

@ -0,0 +1,100 @@
# This set of tests exercises the serialization/deserialization and code copy
# functions in the library. It does not use UTF or JIT.
#forbid_utf
# Compile several patterns, push them onto the stack, and then write them
# all to a file.
#pattern push
/(?<NAME>(?&NAME_PAT))\s+(?<ADDR>(?&ADDRESS_PAT))
(?(DEFINE)
(?<NAME_PAT>[a-z]+)
(?<ADDRESS_PAT>\d+)
)/x
/^(?:((.)(?1)\2|)|((.)(?3)\4|.))$/i
#save testsaved1
# Do it again for some more patterns.
/(*MARK:A)(*SKIP:B)(C|X)/mark
/(?:(?<n>foo)|(?<n>bar))\k<n>/dupnames
#save testsaved2
#pattern -push
# Reload the patterns, then pop them one by one and check them.
#load testsaved1
#load testsaved2
#pop info
foofoo
barbar
#pop mark
C
\= Expect no match
D
#pop
AmanaplanacanalPanama
#pop info
metcalfe 33
# Check for an error when different tables are used.
/abc/push,tables=1
/xyz/push,tables=2
#save testsaved1
#pop
xyz
#pop
abc
#pop should give an error
pqr
/abcd/pushcopy
abcd
#pop
abcd
#pop should give an error
/abcd/push
#popcopy
abcd
#pop
abcd
/abcd/push
#save testsaved1
#pop should give an error
#load testsaved1
#popcopy
abcd
#pop
abcd
#pop should give an error
/abcd/pushtablescopy
abcd
#popcopy
abcd
#pop
abcd
# End of testinput20

16
pcre2/testdata/testinput21 vendored Normal file
View File

@ -0,0 +1,16 @@
# These are tests of \C that do not involve UTF. They are not run when \C is
# disabled by compiling with --enable-never-backslash-C.
/\C+\D \C+\d \C+\S \C+\s \C+\W \C+\w \C+. \C+\R \C+\H \C+\h \C+\V \C+\v \C+\Z \C+\z \C+$/Bx
/\D+\C \d+\C \S+\C \s+\C \W+\C \w+\C .+\C \R+\C \H+\C \h+\C \V+\C \v+\C a+\C \n+\C \C+\C/Bx
/ab\Cde/never_backslash_c
/ab\Cde/info
abXde
/(?<=ab\Cde)X/
abZdeX
# End of testinput21

97
pcre2/testdata/testinput22 vendored Normal file
View File

@ -0,0 +1,97 @@
# Tests of \C when Unicode support is available. Note that \C is not supported
# for DFA matching in UTF mode, so this test is not run with -dfa. The output
# of this test is different in 8-, 16-, and 32-bit modes. Some tests may match
# in some widths and not in others.
/ab\Cde/utf,info
abXde
# This should produce an error diagnostic (\C in UTF lookbehind) in 8-bit and
# 16-bit modes, but not in 32-bit mode.
/(?<=ab\Cde)X/utf
ab!deXYZ
# Autopossessification tests
/\C+\X \X+\C/Bx
/\C+\X \X+\C/Bx,utf
/\C\X*TӅ;
{0,6}\v+
F
/utf
\= Expect no match
Ӆ\x0a
/\C(\W?ſ)'?{{/utf
\= Expect no match
\\C(\\W?ſ)'?{{
/X(\C{3})/utf
X\x{1234}
X\x{11234}Y
X\x{11234}YZ
/X(\C{4})/utf
X\x{1234}YZ
X\x{11234}YZ
X\x{11234}YZW
/X\C*/utf
XYZabcdce
/X\C*?/utf
XYZabcde
/X\C{3,5}/utf
Xabcdefg
X\x{1234}
X\x{1234}YZ
X\x{1234}\x{512}
X\x{1234}\x{512}YZ
X\x{11234}Y
X\x{11234}YZ
X\x{11234}\x{512}
X\x{11234}\x{512}YZ
X\x{11234}\x{512}\x{11234}Z
/X\C{3,5}?/utf
Xabcdefg
X\x{1234}
X\x{1234}YZ
X\x{1234}\x{512}
X\x{11234}Y
X\x{11234}YZ
X\x{11234}\x{512}YZ
X\x{11234}
/a\Cb/utf
aXb
a\nb
a\x{100}b
/a\C\Cb/utf
a\x{100}b
a\x{12257}b
a\x{12257}\x{11234}b
/ab\Cde/utf
abXde
# This one is here not because it's different to Perl, but because the way
# the captured single code unit is displayed. (In Perl it becomes a character,
# and you can't tell the difference.)
/X(\C)(.*)/utf
X\x{1234}
X\nabc
# This one is here because Perl gives out a grumbly error message (quite
# correctly, but that messes up comparisons).
/a\Cb/utf
\= Expect no match in 8-bit mode
a\x{100}b

7
pcre2/testdata/testinput23 vendored Normal file
View File

@ -0,0 +1,7 @@
# This test is run when PCRE2 has been built with --enable-never-backslash-C,
# which disables the use of \C. All we can do is check that it gives the
# correct error message.
/a\Cb/
# End of testinput23

View File

@ -8,35 +8,35 @@
#forbid_utf
/^[\w]+/
*** Failers
\= Expect no match
�cole
/^[\w]+/locale=fr_FR
�cole
/^[\w]+/
*** Failers
\= Expect no match
�cole
/^[\W]+/
�cole
/^[\W]+/locale=fr_FR
*** Failers
\= Expect no match
�cole
/[\b]/
\b
*** Failers
\= Expect no match
a
/[\b]/locale=fr_FR
\b
*** Failers
\= Expect no match
a
/^\w+/
*** Failers
\= Expect no match
�cole
/^\w+/locale=fr_FR
@ -46,12 +46,12 @@
�cole
/(.+)\b(.+)/locale=fr_FR
*** Failers
\= Expect no match
�cole
/�cole/i
�cole
*** Failers
\= Expect no match
�cole
/�cole/i,locale=fr_FR
@ -72,7 +72,7 @@
/^[\xc8-\xc9]/
�cole
*** Failers
\= Expect no match
�cole
/\W+/

File diff suppressed because it is too large Load Diff

View File

@ -3,6 +3,8 @@
# results in 8-bit, 16-bit, and 32-bit modes are excluded (see tests 10 and
# 12).
#newline_default lf any anycrlf
# PCRE2 and Perl disagree about the characteristics of certain Unicode
# characters. For example, 061C is considered by Perl to be Arabic, though
# is it not listed as such in the Unicode Scripts.txt file, and 2066-2069 are
@ -11,11 +13,11 @@
# test 4.
/^[\p{Arabic}]/utf
** Failers
\= Expect no match
\x{061c}
/^[[:graph:]]+$/utf,ucp
** Failers
\= Expect no match
\x{61c}
\x{2066}
\x{2067}
@ -23,7 +25,7 @@
\x{2069}
/^[[:print:]]+$/utf,ucp
** Failers
\= Expect no match
\x{61c}
\x{2066}
\x{2067}
@ -54,6 +56,7 @@
A\x{85}\x{2005}Z
/^[[:graph:]]+$/utf,ucp
\= Expect no match
\x{180e}
/^[[:print:]]+$/utf,ucp
@ -63,6 +66,7 @@
\x{09}\x{0a}\x{1D}\x{20}\x{85}\x{a0}\x{61c}\x{1680}\x{180e}
/^[[:^print:]]+$/utf,ucp
\= Expect no match
\x{180e}
# End of U+180E tests.
@ -109,12 +113,9 @@
/.{3,5}?/IB,utf
\x{212ab}\x{212ab}\x{212ab}\x{861}
/(?<=\C)X/utf
Should produce an error diagnostic
/^[ab]/IB,utf
bar
*** Failers
\= Expect no match
c
\x{ff}
\x{100}
@ -123,7 +124,7 @@
c
\x{ff}
\x{100}
*** Failers
\= Expect no match
aaa
/\x{100}*(\d+|"(?1)")/utf
@ -133,7 +134,7 @@
"\x{100}1234"
\x{100}\x{100}12ab
\x{100}\x{100}"12"
*** Failers
\= Expect no match
\x{100}\x{100}abcd
/\x{100}*/IB,utf
@ -147,7 +148,7 @@
/[Ā-Ą]/utf
\x{100}
\x{104}
*** Failers
\= Expect no match
\x{105}
\x{ff}
@ -217,7 +218,7 @@
a\x{85}b
a\x{2028}b
a\x{2029}b
** Failers
\= Expect no match
a\n\rb
/^a\R*b/bsr=unicode,utf
@ -240,7 +241,7 @@
a\x{85}b
a\n\rb
a\n\r\x{85}\x0cb
** Failers
\= Expect no match
ab
/^a\R{1,3}b/bsr=unicode,utf
@ -251,34 +252,34 @@
a\r\n\r\n\r\nb
a\n\r\n\rb
a\n\n\r\nb
** Failers
\= Expect no match
a\n\n\n\rb
a\r
/\H\h\V\v/utf
X X\x0a
X\x09X\x0b
** Failers
\= Expect no match
\x{a0} X\x0a
/\H*\h+\V?\v{3,4}/utf
\x09\x20\x{a0}X\x0a\x0b\x0c\x0d\x0a
\x09\x20\x{a0}\x0a\x0b\x0c\x0d\x0a
\x09\x20\x{a0}\x0a\x0b\x0c
** Failers
\= Expect no match
\x09\x20\x{a0}\x0a\x0b
/\H\h\V\v/utf
\x{3001}\x{3000}\x{2030}\x{2028}
X\x{180e}X\x{85}
** Failers
\= Expect no match
\x{2009} X\x0a
/\H*\h+\V?\v{3,4}/utf
\x{1680}\x{180e}\x{2007}X\x{2028}\x{2029}\x0c\x0d\x0a
\x09\x{205f}\x{a0}\x0a\x{2029}\x0c\x{2028}\x0a
\x09\x20\x{202f}\x0a\x0b\x0c
** Failers
\= Expect no match
\x09\x{200a}\x{a0}\x{2028}\x0b
/[\h]/B,utf
@ -300,7 +301,7 @@
a\rb
a\nb
a\r\nb
** Failers
\= Expect no match
a\x{85}b
a\x0bb
@ -315,7 +316,7 @@
a\rb
a\nb
a\r\nb
** Failers
\= Expect no match
a\x{85}b
a\x0bb
@ -325,11 +326,10 @@
a\r\nb
a\x{85}b
a\x0bb
** Failers
/.*a.*=.b.*/utf,newline=any
QQQ\x{2029}ABCaXYZ=!bPQR
** Failers
\= Expect no match
a\x{2029}b
\x61\xe2\x80\xa9\x62
@ -338,13 +338,13 @@
/a[^]b/utf,alt_bsux,allow_empty_class,match_unset_backref
a\x{1234}b
a\nb
** Failers
\= Expect no match
ab
/a[^]+b/utf,alt_bsux,allow_empty_class,match_unset_backref
aXb
a\nX\nX\x{1234}b
** Failers
\= Expect no match
ab
/(\x{de})\1/
@ -396,6 +396,7 @@
X\x{123}\x{123}\x{123}\x{123}\=ps
/X\x{123}{2,4}b/utf
\= Expect no match
Xx\=ps
X\x{123}x\=ps
X\x{123}\x{123}x\=ps
@ -403,6 +404,7 @@
X\x{123}\x{123}\x{123}\x{123}x\=ps
/X\x{123}{2,4}?b/utf
\= Expect no match
Xx\=ps
X\x{123}x\=ps
X\x{123}\x{123}x\=ps
@ -410,6 +412,7 @@
X\x{123}\x{123}\x{123}\x{123}x\=ps
/X\x{123}{2,4}+b/utf
\= Expect no match
Xx\=ps
X\x{123}x\=ps
X\x{123}\x{123}x\=ps
@ -804,6 +807,7 @@
/[^\x{100}]*[^\x{10000}]+[^\x{10ffff}]??[^\x{8000}]{4,}[^\x{7fff}]{2,9}?[^\x{fffff}]{5,6}+/Bi,utf
/(?<=\x{1234}\x{1234})\bxy/I,utf
/(?<!^)ETA/utf
\= Expect no match
ETA
@ -834,7 +838,7 @@
/[\p{Nd}+-]+/IB,utf
1234
12-34
12-34
12+\x{661}-34
\= Expect no match
abcd
@ -901,7 +905,7 @@
\x{2068}
\x{2069}
/^\p{Cs}/utf
/^\p{Cs}/utf
\x{dfff}\=no_utf_check
\= Expect no match
\x{09f}
@ -918,7 +922,7 @@
\x{230a}
/^\p{Sc}+/utf
$\x{a2}\x{a3}\x{a4}\x{a5}\x{a6}
$\x{a2}\x{a3}\x{a4}\x{a5}\x{a6}
\x{9f2}
\= Expect no match
X
@ -928,7 +932,7 @@
\ \
\x{a0}
\x{1680}
\x{2000}
\x{2000}
\x{2001}
\= Expect no match
\x{2028}
@ -937,31 +941,31 @@
# These are here because Perl has problems with the negative versions of the
# properties and has changed how it behaves for caseless matching.
/\p{^Lu}/i,utf
/\p{^Lu}/i,utf
1234
\= Expect no match
ABC
/\P{Lu}/i,utf
/\P{Lu}/i,utf
1234
\= Expect no match
ABC
/\p{Ll}/i,utf
a
a
Az
\= Expect no match
ABC
/\p{Lu}/i,utf
A
A
a\x{10a0}B
\= Expect no match
a
\x{1d00}
/\p{Lu}/i,utf
A
A
aZ
\= Expect no match
abc
@ -1018,12 +1022,12 @@
ABCD
1234
\x{6ca}
\x{a6c}
\x{a6c}
\x{10a7}
\= Expect no match
_ABC
/^\p{Xan}+/utf
/^\p{Xan}+/utf
ABCD1234\x{6ca}\x{a6c}\x{10a7}_
\= Expect no match
_ABC
@ -1044,18 +1048,18 @@
ABCD1234_
1234abcd_
\x{6ca}
\x{a6c}
\x{a6c}
\x{10a7}
\= Expect no match
_ABC
/^[\p{Xan}]+/utf
/^[\p{Xan}]+/utf
ABCD1234\x{6ca}\x{a6c}\x{10a7}_
\= Expect no match
_ABC
/^>\p{Xsp}/utf
>\x{1680}\x{2028}\x{0b}
>\x{1680}\x{2028}\x{0b}
>\x{a0}
\= Expect no match
\x{0b}
@ -1082,7 +1086,7 @@
> \x{09}\x{0a}\x{0c}\x{0d}\x{a0}\x{1680}\x{2028}\x{0b}
/^>\p{Xps}/utf
>\x{1680}\x{2028}\x{0b}
>\x{1680}\x{2028}\x{0b}
>\x{a0}
\= Expect no match
\x{0b}
@ -1113,7 +1117,7 @@
1234
\x{6ca}
\x{a6c}
\x{10a7}
\x{10a7}
_ABC
\= Expect no match
[]
@ -1138,7 +1142,7 @@
1234abcd_
\x{6ca}
\x{a6c}
\x{10a7}
\x{10a7}
_ABC
\= Expect no match
[]
@ -1232,7 +1236,7 @@
# Without PCRE_UCP, non-ASCII always fail, even if < 256
/\b...\B/utf
/\b...\B/utf
abc_
\= Expect no match
\x{37e}abc\x{376}
@ -1288,9 +1292,11 @@
/A+\p{N}A+\dB+\p{N}*B+\d*/B,ucp
# These behaved oddly in Perl, so they are kept in this test
/(\x{23a}\x{23a}\x{23a})?\1/i,utf
\= Expect no match
\x{23a}\x{23a}\x{23a}\x{2c65}\x{2c65}
/(ȺȺȺ)?\1/i,utf
\= Expect no match
ȺȺȺⱥⱥ
@ -1300,9 +1306,11 @@
/(ȺȺȺ)?\1/i,utf
ȺȺȺⱥⱥⱥ
/(\x{23a}\x{23a}\x{23a})\1/i,utf
\= Expect no match
\x{23a}\x{23a}\x{23a}\x{2c65}\x{2c65}
/(ȺȺȺ)\1/i,utf
\= Expect no match
ȺȺȺⱥⱥ
@ -1328,19 +1336,19 @@
# These scripts weren't yet in Perl when I added Unicode 6.0.0 to PCRE
/^[\p{Batak}]/utf
\x{1bc0}
\x{1bc0}
\x{1bff}
\= Expect no match
\x{1bf4}
/^[\p{Brahmi}]/utf
\x{11000}
\x{11000}
\x{1106f}
\= Expect no match
\x{1104e}
/^[\p{Mandaic}]/utf
\x{840}
\x{840}
\x{85e}
\= Expect no match
\x{85c}
@ -1355,11 +1363,9 @@
/^\X/utf
́réo
/^a\X41z/alt_bsux,allow_empty_class,match_unset_backref,dupnames
/^a\X41z/alt_bsux,allow_empty_class,match_unset_backref,dupnames
aX41z
\= Expect no match
aAz
aAz
/\X/
@ -1453,7 +1459,7 @@
/\x{3a3}+./i,utf,aftertext
\x{3a3}\x{3c3}\x{3c2}\x{3a3}\x{3c3}\x{3c2}
/\x{3a3}++./i,utf,aftertext
\= Expect no match
\x{3a3}\x{3c3}\x{3c2}\x{3a3}\x{3c3}\x{3c2}
@ -1463,19 +1469,24 @@
/[^\x{3a3}]*\x{3c2}/Bi,utf
/[^a]*\x{3c2}/Bi,utf
/ist/Bi,utf
\= Expect no match
ikt
/is+t/i,utf
iSs\x{17f}t
\= Expect no match
ikt
/is+?t/i,utf
\= Expect no match
ikt
/is?t/i,utf
\= Expect no match
ikt
/is{2}t/i,utf
\= Expect no match
iskt
@ -1485,52 +1496,52 @@
/^\p{Xuc}/utf
$abc
@abc
`abc
`abc
\x{1234}abc
\= Expect no match
abc
/^\p{Xuc}+/utf
/^\p{Xuc}+/utf
$@`\x{a0}\x{1234}\x{e000}**
\= Expect no match
\x{9f}
/^\p{Xuc}+?/utf
/^\p{Xuc}+?/utf
$@`\x{a0}\x{1234}\x{e000}**
\= Expect no match
\x{9f}
/^\p{Xuc}+?\*/utf
/^\p{Xuc}+?\*/utf
$@`\x{a0}\x{1234}\x{e000}**
\= Expect no match
\x{9f}
/^\p{Xuc}++/utf
/^\p{Xuc}++/utf
$@`\x{a0}\x{1234}\x{e000}**
\= Expect no match
\x{9f}
/^\p{Xuc}{3,5}/utf
/^\p{Xuc}{3,5}/utf
$@`\x{a0}\x{1234}\x{e000}**
\= Expect no match
\x{9f}
/^\p{Xuc}{3,5}?/utf
/^\p{Xuc}{3,5}?/utf
$@`\x{a0}\x{1234}\x{e000}**
\= Expect no match
\x{9f}
/^[\p{Xuc}]/utf
/^[\p{Xuc}]/utf
$@`\x{a0}\x{1234}\x{e000}**
\= Expect no match
\x{9f}
/^[\p{Xuc}]+/utf
/^[\p{Xuc}]+/utf
$@`\x{a0}\x{1234}\x{e000}**
\= Expect no match
\x{9f}
/^\P{Xuc}/utf
/^\P{Xuc}/utf
abc
\= Expect no match
$abc
@ -1538,7 +1549,7 @@
`abc
\x{1234}abc
/^[\P{Xuc}]/utf
/^[\P{Xuc}]/utf
abc
\= Expect no match
$abc
@ -1603,13 +1614,13 @@
/[\p{N}]?+/B,no_auto_possess
/[\p{L}ab]{2,3}+/B,no_auto_possess
/[\p{L}ab]{2,3}+/B,no_auto_possess
/\D+\X \d+\X \S+\X \s+\X \W+\X \w+\X \R+\X \H+\X \h+\X \V+\X \v+\X a+\X \n+\X .+\X/Bx
/.+\X/Bsx
/\X+$/Bmx
/\X+$/Bmx
/\X+\D \X+\d \X+\S \X+\s \X+\W \X+\w \X+. \X+\R \X+\H \X+\h \X+\V \X+\v \X+\X \X+\Z \X+\z \X+$/Bx
@ -1634,9 +1645,7 @@
/ábc/utf,replace=XሴZ
123ábc123
/(?<=abc)(|def)/g,utf,replace=<$0>
123abcáyzabcdef789abcሴqr
/(?<=abc)(|def)/g,utf,replace=<$0>
123abcáyzabcdef789abcሴqr
@ -1651,4 +1660,107 @@
"\xa\xf<(.\pZ*\P{Xwd}+^\xa8\3'3yq.::?(?J:()\xd1+!~:3'(8?:)':(?'d'(?'d'^u]!.+.+\\A\Ah(n+?9){7}+\K;(?'X'u'(?'c'(?'z'(?<y>\xb::\xf0'|\xd3(\xae?'w(z\x8?P>l)\x8?P>a)'\H\R\xd1+!!~:3'(?:h$N{26875}\W+?\\=D{2}\x89(?i:Uy0\N({2\xa(\v\x85*){y*\A(()\p{L}+?\P{^Xan}'+?\xff\+pS\?|).{;y*\A(()\p{L}+?\8}\d?1(|)(/1){7}.+[Lp{Me}].\s\xdcC*?(?(<y>))(?<!^)$C((;*?(R))+(\xbf(R))\x8a\X*?\x8a\xb\xd1^9\3*+(\xc1,\k'R'\xb4)\xcc(z\z(?J)(?'X'\x1b(\xb\xd1^9\?'3*+P{^Xan}+?\xff\+(\xc1.]k+\xb'Pm'\xb4)\xcc4f\xa7'\xd1V(?i:U,{2,2})'(?'X'))?-%--\x95$9*\4'|\xd1(\x9c''%\x94$9)#(?'R')3\x7?('P\xed7'\xa8\xb1^u\xeaw\1\0\0\(|(?1){7}.+[\p{Me}].\s\xdcC*^\x14?(?(<y>))(?<!^)$C((;*?(R*?))+(?(R)\x8a\X*?\x8a\xb\xd1^9\3*+|(\xc1,\k'R'\xb4)\xcc! z)\z(?JJ)(?'X';(\xb\xd1^9\?'3*+(\xc1.]k+\xb'Pm'\xb4))':(?'d')(?'RD'(d')|)|$)'|(?<x>\g{d});\g{x}\x11\g{d}\x81\|$((?'X'\'X'(?'W''\x92()'9'\x83*))\xba*\!?^ <){)':;\xcc4'\xd1'(?'X'28))?-%--\x95$9*\4'|\xd1((''e\x94*$9:)*#(?'R')3)\x7?('P\xed')\\x16:;()\x1e\x10*:(?<y>)\xd1+0!~:(?)'d'E:yD!\s(?'R'\x1e;\x10:U))|'\x9g!\xb0*){)\\x16:;()\x1e\x10\x87*:(?<y>)\xd1+!~:(?)'}'\d'E:yD!\s(?'R'\x1e;\x10:U))|'))|)g!\xb0*R+9{29+)#(?'P'})*?pS\{3,}\x85,{0,}l{*UTF)(\xe{7}){3722,{9,}d{2,?|))|{)\(A?&d}}{\xa,}2}){3,}7,l{)22}(,}l:7{2,4}}29\x19+)#?'P'})*v?))\x5"
/$(&.+[\p{Me}].\s\xdcC*?(?(<y>))(?<!^)$C((;*?(R))+(?(R)){0,6}?|){12\x8a\X*?\x8a\x0b\xd1^9\3*+(\xc1,\k'P'\xb4)\xcc(z\z(?JJ)(?'X'8};(\x0b\xd1^9\?'3*+(\xc1.]k+\x0b'Pm'\xb4\xcc4'\xd1'(?'X'))?-%--\x95$9*\4'|\xd1(''%\x95*$9)#(?'R')3\x07?('P\xed')\\x16:;()\x1e\x10*:(?<y>)\xd1+!~:(?)''(d'E:yD!\s(?'R'\x1e;\x10:U))|')g!\xb0*){29+))#(?'P'})*?/
"(*UTF)(*UCP)(.UTF).+X(\V+;\^(\D|)!999}(?(?C{7(?C')\H*\S*/^\x5\xa\\xd3\x85n?(;\D*(?m).[^mH+((*UCP)(*U:F)})(?!^)(?'"
/[\pS#moq]/
=
/(*:a\x{12345}b\t(d\)c)xxx/utf,alt_verbnames,mark
cxxxz
/abcd/utf,replace=x\x{824}y\o{3333}z(\Q12\$34$$\x34\E5$$),substitute_extended
abcd
/a(\x{e0}\x{101})(\x{c0}\x{102})/utf,replace=a\u$1\U$1\E$1\l$2\L$2\Eab\U\x{e0}\x{101}\L\x{d0}\x{160}\EDone,substitute_extended
a\x{e0}\x{101}\x{c0}\x{102}
/((?<digit>\d)|(?<letter>\p{L}))/g,substitute_extended,replace=<${digit:+digit; :not digit; }${letter:+letter:not a letter}>
ab12cde
/(*UCP)(*UTF)[[:>:]]X/B
/abc/utf,replace=xyz
abc\=zero_terminate
/a[[:punct:]b]/ucp,bincode
/a[[:punct:]b]/utf,ucp,bincode
/a[b[:punct:]]/utf,ucp,bincode
/[[:^ascii:]]/utf,ucp,bincode
/[[:^ascii:]\w]/utf,ucp,bincode
/[\w[:^ascii:]]/utf,ucp,bincode
/[^[:ascii:]\W]/utf,ucp,bincode
\x{de}
\x{200}
\= Expect no match
\x{300}
\x{37e}
/[[:^ascii:]a]/utf,ucp,bincode
/L(?#(|++<!(2)?/B,utf,no_auto_possess,auto_callout
/L(?#(|++<!(2)?/B,utf,ucp,auto_callout
/(*UTF)C\x09((?<!'(?x)!*H? #\xcc\x9a[^$]/
/[\D]/utf
\x{1d7cf}
/[\D\P{Nd}]/utf
\x{1d7cf}
/[^\D]/utf
a9b
\= Expect no match
\x{1d7cf}
/[^\D\P{Nd}]/utf
a9b
\x{1d7cf}
\= Expect no match
\x{10000}
# Hex uses pattern length, not zero-terminated. This tests for overrunning
# the given length of a pattern.
/'(*UTF)'/hex
/'#('/hex,extended,utf
/a(?<=A\XB)/utf
/ab(?<=A\RB)/utf
/../utf,auto_callout
\n\x{123}\x{123}\x{123}\x{123}
# This tests processing wide characters in extended mode.
/XȀ/x,utf
# These three test a bug fix that was not clearing up after a locale setting
# when the test or a subsequent one matched a wide character.
//locale=C
/[\P{Yi}]/utf
\x{2f000}
/[\P{Yi}]/utf,locale=C
\x{2f000}
/^(?<!(?=􃡜))/B,utf
# Horizontal and vertical space lists ignore caseless
/[\HH]/Bi,utf
/[^\HH]/Bi,utf

File diff suppressed because it is too large Load Diff

File diff suppressed because it is too large Load Diff

View File

@ -1,8 +1,11 @@
# These are a few representative patterns whose lengths and offsets are to be
# shown when the link size is 2. This is just a doublecheck test to ensure the
# sizes don't go horribly wrong when something is changed. The pattern contents
# are all themselves checked in other tests. Unicode, including property
# support, is required for these tests.
# There are two sorts of patterns in this test. A number of them are
# representative patterns whose lengths and offsets are checked. This is just a
# doublecheck test to ensure the sizes don't go horribly wrong when something
# is changed. The operation of these patterns is checked in other tests.
#
# This file also contains tests whose output varies with code unit size and/or
# link size. Unicode support is required for these tests. There are separate
# output files for each code unit size and link size.
#pattern fullbincode,memory
@ -67,7 +70,7 @@
/\xff/utf
/\x{0041}\x{2262}\x{0391}\x{002e}/I,utf
/\x{D55c}\x{ad6d}\x{C5B4}/I,utf
/\x{65e5}\x{672c}\x{8a9e}/I,utf
@ -150,10 +153,33 @@
# Check the absolute limit on nesting (?| etc. This varies with code unit
# width because the workspace is a different number of bytes. It will fail
# in 8-bit and 16-bit but not in 32-bit.
# with link size 2 in 8-bit and 16-bit but not in 32-bit.
/(?|(?|(?J:(?|(?x:(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|
)))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))
/parens_nest_limit=1000,-fullbincode
# Use "expand" to create some very long patterns with nested parentheses, in
# order to test workspace overflow. Again, this varies with code unit width,
# and even when it fails in two modes, the error offset differs. It also varies
# with link size - hence multiple tests with different values.
/(?'ABC'\[[bar](]{792}*THEN:\[A]{255}\[)]{793}/expand,-fullbincode,parens_nest_limit=1000
/(?'ABC'\[[bar](]{793}*THEN:\[A]{255}\[)]{794}/expand,-fullbincode,parens_nest_limit=1000
/(?'ABC'\[[bar](]{1793}*THEN:\[A]{255}\[)]{1794}/expand,-fullbincode,parens_nest_limit=2000
/(?(1)(?1)){8,}+()/debug
abcd
/(?(1)|a(?1)b){2,}+()/debug
abcde
/((?1)(?2)(?3)(?4)(?5)(?6)(?7)(?8)(?9)(?9)(?8)(?7)(?6)(?5)(?4)(?3)(?2)(?1)(?0)){2,}()()()()()()()()()/debug
/([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00](*ACCEPT)/
/([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00](*ACCEPT)))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))/-fullbincode
# End of testinput8

View File

@ -2,11 +2,10 @@
# UTF-8 or Unicode property support. */
#forbid_utf
#newline_default lf any anycrlf
/a\Cb/
aXb
a\nb
** Failers (too big char)
/ab/
\= Expect error message (too big char) and no match
A\x{123}B
A\o{443}B
@ -240,9 +239,15 @@
/(*:0123456789ABCDEF0123456789ABCDEF0123456789ABCDEF0123456789ABCDEF0123456789ABCDEF0123456789ABCDEF0123456789ABCDEF0123456789ABCDEF0123456789ABCDEF0123456789ABCDEF0123456789ABCDEF0123456789ABCDEF0123456789ABCDEF0123456789ABCDEF0123456789ABCDEF0123456789ABCDEF)XX/mark
XX
/(*:0123456789ABCDEF0123456789ABCDEF0123456789ABCDEF0123456789ABCDEF0123456789ABCDEF0123456789ABCDEF0123456789ABCDEF0123456789ABCDEF0123456789ABCDEF0123456789ABCDEF0123456789ABCDEF0123456789ABCDEF0123456789ABCDEF0123456789ABCDEF0123456789ABCDEF0123456789ABCDEF)XX/mark,alt_verbnames
XX
/(*:0123456789ABCDEF0123456789ABCDEF0123456789ABCDEF0123456789ABCDEF0123456789ABCDEF0123456789ABCDEF0123456789ABCDEF0123456789ABCDEF0123456789ABCDEF0123456789ABCDEF0123456789ABCDEF0123456789ABCDEF0123456789ABCDEF0123456789ABCDEF0123456789ABCDEF0123456789ABCDE)XX/mark
XX
/(*:0123456789ABCDEF0123456789ABCDEF0123456789ABCDEF0123456789ABCDEF0123456789ABCDEF0123456789ABCDEF0123456789ABCDEF0123456789ABCDEF0123456789ABCDEF0123456789ABCDEF0123456789ABCDEF0123456789ABCDEF0123456789ABCDEF0123456789ABCDEF0123456789ABCDEF0123456789ABCDE)XX/mark,alt_verbnames
XX
/\u0100/alt_bsux,allow_empty_class,match_unset_backref,dupnames
/[\u0100-\u0200]/alt_bsux,allow_empty_class,match_unset_backref,dupnames
@ -251,4 +256,8 @@
/[^\s]*\s* [^\W]+\W+ [^\d]*?\d0 [^\d\w]{4,6}?\w*A/B
/(*MARK:a\x{100}b)z/alt_verbnames
/(*:*++++++++++++''''''''''''''''''''+''+++'+++x+++++++++++++++++++++++++++++++++++(++++++++++++++++++++:++++++%++:''''''''''''''''''''''''+++++++++++++++++++++++++++++++++++++++++++++++++++++-++++++++k+++++++''''+++'+++++++++++++++++++++++''''++++++++++++':ƿ)/
# End of testinput9

File diff suppressed because it is too large Load Diff

View File

@ -1,70 +1,10 @@
# This set of tests is for UTF-8 support and Unicode property support, with
# relevance only for the 8-bit library.
/X(\C{3})/utf
X\x{1234}
0: X\x{1234}
1: \x{1234}
/X(\C{4})/utf
X\x{1234}YZ
0: X\x{1234}Y
1: \x{1234}Y
/X\C*/utf
XYZabcdce
0: XYZabcdce
/X\C*?/utf
XYZabcde
0: X
/X\C{3,5}/utf
Xabcdefg
0: Xabcde
X\x{1234}
0: X\x{1234}
X\x{1234}YZ
0: X\x{1234}YZ
X\x{1234}\x{512}
0: X\x{1234}\x{512}
X\x{1234}\x{512}YZ
0: X\x{1234}\x{512}
/X\C{3,5}?/utf
Xabcdefg
0: Xabc
X\x{1234}
0: X\x{1234}
X\x{1234}YZ
0: X\x{1234}
X\x{1234}\x{512}
0: X\x{1234}
/a\Cb/utf
aXb
0: aXb
a\nb
0: a\x{0a}b
/a\C\Cb/utf
a\x{100}b
0: a\x{100}b
/ab\Cde/utf
abXde
0: abXde
/a\C\Cb/utf
a\x{100}b
0: a\x{100}b
** Failers
No match
a\x{12257}b
No match
# The next 4 patterns have UTF-8 errors
/[�]/utf
Failed: error -8 at offset 0: UTF-8 error: byte 2 top bits not 0x80
Failed: error -8 at offset 1: UTF-8 error: byte 2 top bits not 0x80
/�/utf
Failed: error -3 at offset 0: UTF-8 error: 1 byte missing at end
@ -72,7 +12,13 @@ Failed: error -3 at offset 0: UTF-8 error: 1 byte missing at end
/���xxx/utf
Failed: error -8 at offset 0: UTF-8 error: byte 2 top bits not 0x80
/��������/utf
Failed: error -22 at offset 2: UTF-8 error: isolated byte with 0x80 bit set
# Now test subjects
/badutf/utf
\= Expect UTF-8 errors
X\xdf
Failed: error -3: UTF-8 error: 1 byte missing at end at offset 1
XX\xef
@ -146,13 +92,14 @@ Failed: error -20: UTF-8 error: overlong 5-byte sequence at offset 0
\xfc\x80\x80\x80\x80\x8f
Failed: error -21: UTF-8 error: overlong 6-byte sequence at offset 0
\x80
Failed: error -22: UTF-8 error: isolated 0x80 byte at offset 0
Failed: error -22: UTF-8 error: isolated byte with 0x80 bit set at offset 0
\xfe
Failed: error -23: UTF-8 error: illegal byte (0xfe or 0xff) at offset 0
\xff
Failed: error -23: UTF-8 error: illegal byte (0xfe or 0xff) at offset 0
/badutf/utf
\= Expect UTF-8 errors
XX\xfb\x80\x80\x80\x80
Failed: error -13: UTF-8 error: 5-byte character is not allowed (RFC 3629) at offset 2
XX\xfd\x80\x80\x80\x80\x80
@ -161,6 +108,7 @@ Failed: error -14: UTF-8 error: 6-byte character is not allowed (RFC 3629) at of
Failed: error -15: UTF-8 error: code points greater than 0x10ffff are not defined at offset 2
/shortutf/utf
\= Expect UTF-8 errors
XX\xdf\=ph
Failed: error -3: UTF-8 error: 1 byte missing at end at offset 2
XX\xef\=ph
@ -193,6 +141,7 @@ Failed: error -4: UTF-8 error: 2 bytes missing at end at offset 0
Failed: error -3: UTF-8 error: 1 byte missing at end at offset 0
/anything/utf
\= Expect UTF-8 errors
X\xc0\x80
Failed: error -17: UTF-8 error: overlong 2-byte sequence at offset 1
XX\xc1\x8f
@ -209,6 +158,15 @@ Failed: error -21: UTF-8 error: overlong 6-byte sequence at offset 0
Failed: error -23: UTF-8 error: illegal byte (0xfe or 0xff) at offset 0
\xff\x80\x80\x80\x80\x80
Failed: error -23: UTF-8 error: illegal byte (0xfe or 0xff) at offset 0
\xf8\x88\x80\x80\x80
Failed: error -13: UTF-8 error: 5-byte character is not allowed (RFC 3629) at offset 0
\xf9\x87\x80\x80\x80
Failed: error -13: UTF-8 error: 5-byte character is not allowed (RFC 3629) at offset 0
\xfc\x84\x80\x80\x80\x80
Failed: error -14: UTF-8 error: 6-byte character is not allowed (RFC 3629) at offset 0
\xfd\x83\x80\x80\x80\x80
Failed: error -14: UTF-8 error: 6-byte character is not allowed (RFC 3629) at offset 0
\= Expect no match
\xc3\x8f
No match
\xe0\xaf\x80
@ -219,14 +177,6 @@ No match
No match
\xf1\x8f\x80\x80
No match
\xf8\x88\x80\x80\x80
Failed: error -13: UTF-8 error: 5-byte character is not allowed (RFC 3629) at offset 0
\xf9\x87\x80\x80\x80
Failed: error -13: UTF-8 error: 5-byte character is not allowed (RFC 3629) at offset 0
\xfc\x84\x80\x80\x80\x80
Failed: error -14: UTF-8 error: 6-byte character is not allowed (RFC 3629) at offset 0
\xfd\x83\x80\x80\x80\x80
Failed: error -14: UTF-8 error: 6-byte character is not allowed (RFC 3629) at offset 0
\xf8\x88\x80\x80\x80\=no_utf_check
No match
\xf9\x87\x80\x80\x80\=no_utf_check
@ -235,7 +185,62 @@ No match
No match
\xfd\x83\x80\x80\x80\x80\=no_utf_check
No match
# Similar tests with offsets
/badutf/utf
\= Expect UTF-8 errors
X\xdfabcd
Failed: error -8: UTF-8 error: byte 2 top bits not 0x80 at offset 1
X\xdfabcd\=offset=1
Failed: error -8: UTF-8 error: byte 2 top bits not 0x80 at offset 1
\= Expect no match
X\xdfabcd\=offset=2
No match
/(?<=x)badutf/utf
\= Expect UTF-8 errors
X\xdfabcd
Failed: error -8: UTF-8 error: byte 2 top bits not 0x80 at offset 1
X\xdfabcd\=offset=1
Failed: error -8: UTF-8 error: byte 2 top bits not 0x80 at offset 1
X\xdfabcd\=offset=2
Failed: error -8: UTF-8 error: byte 2 top bits not 0x80 at offset 1
X\xdfabcd\xdf\=offset=3
Failed: error -3: UTF-8 error: 1 byte missing at end at offset 6
\= Expect no match
X\xdfabcd\=offset=3
No match
/(?<=xx)badutf/utf
\= Expect UTF-8 errors
X\xdfabcd
Failed: error -8: UTF-8 error: byte 2 top bits not 0x80 at offset 1
X\xdfabcd\=offset=1
Failed: error -8: UTF-8 error: byte 2 top bits not 0x80 at offset 1
X\xdfabcd\=offset=2
Failed: error -8: UTF-8 error: byte 2 top bits not 0x80 at offset 1
X\xdfabcd\=offset=3
Failed: error -8: UTF-8 error: byte 2 top bits not 0x80 at offset 1
/(?<=xxxx)badutf/utf
\= Expect UTF-8 errors
X\xdfabcd
Failed: error -8: UTF-8 error: byte 2 top bits not 0x80 at offset 1
X\xdfabcd\=offset=1
Failed: error -8: UTF-8 error: byte 2 top bits not 0x80 at offset 1
X\xdfabcd\=offset=2
Failed: error -8: UTF-8 error: byte 2 top bits not 0x80 at offset 1
X\xdfabcd\=offset=3
Failed: error -8: UTF-8 error: byte 2 top bits not 0x80 at offset 1
X\xdfabc\xdf\=offset=6
Failed: error -3: UTF-8 error: 1 byte missing at end at offset 5
X\xdfabc\xdf\=offset=7
Failed: error -33: bad offset value
\= Expect no match
X\xdfabcd\=offset=6
No match
/\x{100}/IB,utf
------------------------------------------------------------------
Bra
@ -448,29 +453,6 @@ First code unit = \xf0
Last code unit = \xab
Subject length lower bound = 1
# This one is here not because it's different to Perl, but because the way
# the captured single-byte is displayed. (In Perl it becomes a character, and you
# can't tell the difference.)
/X(\C)(.*)/utf
X\x{1234}
0: X\x{1234}
1: \x{e1}
2: \x{88}\x{b4}
X\nabc
0: X\x{0a}abc
1: \x{0a}
2: abc
# This one is here because Perl gives out a grumbly error message (quite
# correctly, but that messes up comparisons).
/a\Cb/utf
*** Failers
No match
a\x{100}b
No match
/[^ab\xC0-\xF0]/IB,utf
------------------------------------------------------------------
Bra
@ -499,8 +481,7 @@ Subject length lower bound = 1
0: \x{100}
\x{1000}
0: \x{1000}
*** Failers
0: *
\= Expect no match
\x{c0}
No match
\x{f0}
@ -659,8 +640,6 @@ Subject length lower bound = 1
0: \x{100}
\x{100}Z
0: \x{100}
*** Failers
No match
/[\xff]/IB,utf
------------------------------------------------------------------
@ -750,33 +729,35 @@ Failed: error 106 at offset 15: missing terminating ] for character class
# This tests the stricter UTF-8 check according to RFC 3629.
/X/utf
\= Expect UTF-8 errors
\x{d800}
Failed: error -16: UTF-8 error: code points 0xd800-0xdfff are not defined at offset 0
\x{d800}\=no_utf_check
No match
\x{da00}
Failed: error -16: UTF-8 error: code points 0xd800-0xdfff are not defined at offset 0
\x{da00}\=no_utf_check
No match
\x{dfff}
Failed: error -16: UTF-8 error: code points 0xd800-0xdfff are not defined at offset 0
\x{dfff}\=no_utf_check
No match
\x{110000}
Failed: error -15: UTF-8 error: code points greater than 0x10ffff are not defined at offset 0
\x{110000}\=no_utf_check
No match
\x{2000000}
Failed: error -13: UTF-8 error: 5-byte character is not allowed (RFC 3629) at offset 0
\x{2000000}\=no_utf_check
No match
\x{7fffffff}
Failed: error -14: UTF-8 error: 6-byte character is not allowed (RFC 3629) at offset 0
\= Expect no match
\x{d800}\=no_utf_check
No match
\x{da00}\=no_utf_check
No match
\x{dfff}\=no_utf_check
No match
\x{110000}\=no_utf_check
No match
\x{2000000}\=no_utf_check
No match
\x{7fffffff}\=no_utf_check
No match
/(*UTF8)\x{1234}/
abcd\x{1234}pqr
abcd\x{1234}pqr
0: \x{1234}
/(*CRLF)(*UTF)(*BSR_UNICODE)a\Rb/I
@ -887,16 +868,19 @@ Subject length lower bound = 3
/a+/utf
a\x{123}aa\=offset=1
0: aa
a\x{123}aa\=offset=2
Error -36 (bad UTF-8 offset)
a\x{123}aa\=offset=3
0: aa
a\x{123}aa\=offset=4
0: a
a\x{123}aa\=offset=5
No match
\= Expect bad offset value
a\x{123}aa\=offset=6
Failed: error -33: bad offset value
\= Expect bad UTF-8 offset
a\x{123}aa\=offset=2
Error -36 (bad UTF-8 offset)
\= Expect no match
a\x{123}aa\=offset=5
No match
/\x{1234}+/Ii,utf
Capturing subpattern count = 0
@ -1281,8 +1265,6 @@ Subject length lower bound = 1
0: \x{100}
\x{100}Z
0: \x{100}
*** Failers
No match
/[z-\x{100}]/IB,utf
------------------------------------------------------------------
@ -1467,8 +1449,7 @@ Subject length lower bound = 1
0: \x{105}
\x{109}
0: \x{109}
** Failers
No match
\= Expect no match
\x{100}
No match
\x{10a}
@ -1507,8 +1488,7 @@ Subject length lower bound = 1
0: \x{100}
\x{101}
0: \x{101}
** Failers
No match
\= Expect no match
\x{102}
No match
Y
@ -1547,7 +1527,52 @@ Last code unit = 'B' (caseless)
Subject length lower bound = 2
/abc/utf,replace=�
abc
abc
Failed: error -3: UTF-8 error: 1 byte missing at end
/(?<=(a)(?-1))x/I,utf
Capturing subpattern count = 1
Max lookbehind = 2
Options: utf
First code unit = 'x'
Subject length lower bound = 1
a\x80zx\=offset=3
Failed: error -22: UTF-8 error: isolated byte with 0x80 bit set at offset 1
/[\W\p{Any}]/B
------------------------------------------------------------------
Bra
[\x00-/:-@[-^`{-\xff\p{Any}]
Ket
End
------------------------------------------------------------------
abc
0: a
123
0: 1
/[\W\pL]/B
------------------------------------------------------------------
Bra
[\x00-/:-@[-^`{-\xff\p{L}]
Ket
End
------------------------------------------------------------------
abc
0: a
\= Expect no match
123
No match
/(*:*++++++++++++''''''''''''''''''''+''+++'+++x+++++++++++++++++++++++++++++++++++(++++++++++++++++++++:++++++%++:''''''''''''''''''''''''+++++++++++++++++++++++++++++++++++++++++++++++++++++-++++++++k+++++++''''+++'+++++++++++++++++++++++''''++++++++++++':ƿ)/utf
Failed: error 176 at offset 259: name is too long in (*MARK), (*PRUNE), (*SKIP), or (*THEN)
/[\s[:^ascii:]]/B,ucp
------------------------------------------------------------------
Bra
[\x80-\xff\p{Xsp}]
Ket
End
------------------------------------------------------------------
# End of testinput10

View File

@ -4,13 +4,8 @@
# different, so they have separate output files.
#forbid_utf
#newline_default LF ANY ANYCRLF
/a\Cb/
aXb
0: aXb
a\nb
0: a\x0ab
/[^\x{c4}]/IB
------------------------------------------------------------------
Bra
@ -581,7 +576,7 @@ Failed: error 134 at offset 11: character code point value in \x{} or \o{} is to
# Non-UTF characters
/\C{2,3}/
/.{2,3}/
\x{400000}\x{400001}\x{400002}\x{400003}
** Character \x{400000} is greater than 0xffff and UTF-16 mode is not enabled.
** Truncation will probably give the wrong result.
@ -646,4 +641,24 @@ Starting code units: \x00 \x01 \x02 \x03 \x04 \x05 \x06 \x07 \x08 \x09 \x0e
\xfc \xfd \xfe \xff
Subject length lower bound = 1
/(*THEN:\[A]{65501})/expand
# We can use pcre2test's utf8_input modifier to create wide pattern characters,
# even though this test is run when UTF is not supported.
/ab������z/utf8_input
** Failed: character value greater than 0xffff cannot be converted to 16-bit in non-UTF mode
ab������z
ab\x{7fffffff}z
/ab�������z/utf8_input
** Failed: invalid UTF-8 string cannot be converted to 16-bit string
ab�������z
ab\x{ffffffff}z
/ab�Az/utf8_input
** Failed: invalid UTF-8 string cannot be converted to 16-bit string
ab�Az
ab\x{80000041}z
# End of testinput11

View File

@ -4,13 +4,8 @@
# different, so they have separate output files.
#forbid_utf
#newline_default LF ANY ANYCRLF
/a\Cb/
aXb
0: aXb
a\nb
0: a\x0ab
/[^\x{c4}]/IB
------------------------------------------------------------------
Bra
@ -582,7 +577,7 @@ Subject length lower bound = 2
# Non-UTF characters
/\C{2,3}/
/.{2,3}/
\x{400000}\x{400001}\x{400002}\x{400003}
0: \x{400000}\x{400001}\x{400002}
@ -649,4 +644,27 @@ Starting code units: \x00 \x01 \x02 \x03 \x04 \x05 \x06 \x07 \x08 \x09 \x0e
\xfc \xfd \xfe \xff
Subject length lower bound = 1
/(*THEN:\[A]{65501})/expand
# We can use pcre2test's utf8_input modifier to create wide pattern characters,
# even though this test is run when UTF is not supported.
/ab������z/utf8_input
ab������z
0: ab\x{7fffffff}z
ab\x{7fffffff}z
0: ab\x{7fffffff}z
/ab�������z/utf8_input
ab�������z
0: ab\x{ffffffff}z
ab\x{ffffffff}z
0: ab\x{ffffffff}z
/ab�Az/utf8_input
ab�Az
0: ab\x{80000041}z
ab\x{80000041}z
0: ab\x{80000041}z
# End of testinput11

View File

@ -9,78 +9,6 @@
�]
** Failed: invalid UTF-8 string cannot be used as input in UTF mode
/X(\C{3})/utf
X\x{11234}Y
0: X\x{11234}Y
1: \x{11234}Y
X\x{11234}YZ
0: X\x{11234}Y
1: \x{11234}Y
/X(\C{4})/utf
X\x{11234}YZ
0: X\x{11234}YZ
1: \x{11234}YZ
X\x{11234}YZW
0: X\x{11234}YZ
1: \x{11234}YZ
/X\C*/utf
XYZabcdce
0: XYZabcdce
/X\C*?/utf
XYZabcde
0: X
/X\C{3,5}/utf
Xabcdefg
0: Xabcde
X\x{11234}Y
0: X\x{11234}Y
X\x{11234}YZ
0: X\x{11234}YZ
X\x{11234}\x{512}
0: X\x{11234}\x{512}
X\x{11234}\x{512}YZ
0: X\x{11234}\x{512}YZ
X\x{11234}\x{512}\x{11234}Z
0: X\x{11234}\x{512}\x{11234}
/X\C{3,5}?/utf
Xabcdefg
0: Xabc
X\x{11234}Y
0: X\x{11234}Y
X\x{11234}YZ
0: X\x{11234}Y
X\x{11234}\x{512}YZ
0: X\x{11234}\x{512}
*** Failers
No match
X\x{11234}
No match
/a\Cb/utf
aXb
0: aXb
a\nb
0: a\x{0a}b
/a\C\Cb/utf
a\x{12257}b
0: a\x{12257}b
a\x{12257}\x{11234}b
No match
** Failers
No match
a\x{100}b
No match
/ab\Cde/utf
abXde
0: abXde
# Check maximum character size
/\x{ffff}/IB,utf
@ -310,29 +238,6 @@ First code unit = \x{d844}
Last code unit = \x{deab}
Subject length lower bound = 1
# This one is here not because it's different to Perl, but because the way
# the captured single-byte is displayed. (In Perl it becomes a character, and you
# can't tell the difference.)
/X(\C)(.*)/utf
X\x{1234}
0: X\x{1234}
1: \x{1234}
2:
X\nabc
0: X\x{0a}abc
1: \x{0a}
2: abc
# This one is here because Perl gives out a grumbly error message (quite
# correctly, but that messes up comparisons).
/a\Cb/utf
*** Failers
No match
a\x{100}b
0: a\x{100}b
/[^ab\xC0-\xF0]/IB,utf
------------------------------------------------------------------
Bra
@ -362,8 +267,7 @@ Subject length lower bound = 1
0: \x{100}
\x{1000}
0: \x{1000}
*** Failers
0: *
\= Expect no match
\x{c0}
No match
\x{f0}
@ -520,8 +424,6 @@ Subject length lower bound = 1
0: \x{100}
\x{100}Z
0: \x{100}
*** Failers
No match
/[\xff]/IB,utf
------------------------------------------------------------------
@ -607,30 +509,38 @@ Subject length lower bound = 2
Failed: error 106 at offset 13: missing terminating ] for character class
/X/utf
XX\x{d800}
Failed: error -24: UTF-16 error: missing low surrogate at end at offset 2
XX\x{d800}\=no_utf_check
0: X
XX\x{da00}
Failed: error -24: UTF-16 error: missing low surrogate at end at offset 2
XX\x{da00}\=no_utf_check
0: X
XX\x{dc00}
Failed: error -26: UTF-16 error: isolated low surrogate at offset 2
XX\x{dc00}\=no_utf_check
0: X
XX\x{de00}
Failed: error -26: UTF-16 error: isolated low surrogate at offset 2
XX\x{de00}\=no_utf_check
0: X
XX\x{dfff}
Failed: error -26: UTF-16 error: isolated low surrogate at offset 2
XX\x{dfff}\=no_utf_check
0: X
\= Expect UTF error
XX\x{d800}
Failed: error -24: UTF-16 error: missing low surrogate at end at offset 2
XX\x{da00}
Failed: error -24: UTF-16 error: missing low surrogate at end at offset 2
XX\x{dc00}
Failed: error -26: UTF-16 error: isolated low surrogate at offset 2
XX\x{de00}
Failed: error -26: UTF-16 error: isolated low surrogate at offset 2
XX\x{dfff}
Failed: error -26: UTF-16 error: isolated low surrogate at offset 2
XX\x{110000}
** Failed: character \x{110000} is greater than 0x10ffff and so cannot be converted to UTF-16
XX\x{d800}\x{1234}
Failed: error -25: UTF-16 error: invalid low surrogate at offset 3
\= Expect no match
XX\x{d800}\=offset=3
No match
/(?<=.)X/utf
XX\x{d800}\=offset=3
Failed: error -24: UTF-16 error: missing low surrogate at end at offset 2
/(*UTF16)\x{11234}/
abcd\x{11234}pqr
@ -647,7 +557,7 @@ Subject length lower bound = 1
0: \x{11234}
/(*UTF-32)\x{11234}/
Failed: error 134 at offset 17: character code point value in \x{} or \o{} is too large
Failed: error 160 at offset 5: (*VERB) not recognized or malformed
abcd\x{11234}pqr
/(*UTF-32)\x{112}/
@ -788,8 +698,10 @@ Subject length lower bound = 3
0: aa
a\x{123}aa\=offset=3
0: a
\= Expect no match
a\x{123}aa\=offset=4
No match
\= Expect bad offset error
a\x{123}aa\=offset=5
Failed: error -33: bad offset value
a\x{123}aa\=offset=6
@ -854,16 +766,21 @@ Subject length lower bound = 1
# Check bad offset
/a/utf
\= Expect bad UTF-16 offset, or no match in 32-bit
\x{10000}\=offset=1
Error -36 (bad UTF-16 offset)
\x{10000}ab\=offset=1
Error -36 (bad UTF-16 offset)
\= Expect 16-bit match, 32-bit no match
\x{10000}ab\=offset=2
0: a
\= Expect no match
\x{10000}ab\=offset=3
No match
\= Expect no match in 16-bit, bad offset in 32-bit
\x{10000}ab\=offset=4
No match
\= Expect bad offset
\x{10000}ab\=offset=5
Failed: error -33: bad offset value
@ -1123,10 +1040,6 @@ Failed: error 134 at offset 9: character code point value in \x{} or \o{} is too
/\o{4200000}/utf
Failed: error 134 at offset 10: character code point value in \x{} or \o{} is too large
/\C/utf
\x{110000}
** Failed: character \x{110000} is greater than 0x10ffff and so cannot be converted to UTF-16
/\x{100}*A/IB,utf
------------------------------------------------------------------
Bra
@ -1174,8 +1087,6 @@ Subject length lower bound = 1
0: \x{100}
\x{100}Z
0: \x{100}
*** Failers
No match
/[z-\x{100}]/IB,utf
------------------------------------------------------------------
@ -1365,8 +1276,7 @@ Subject length lower bound = 1
0: \x{105}
\x{109}
0: \x{109}
** Failers
No match
\= Expect no match
\x{100}
No match
\x{10a}
@ -1410,8 +1320,7 @@ Subject length lower bound = 1
0: \x{100}
\x{101}
0: \x{101}
** Failers
No match
\= Expect no match
\x{102}
No match
Y
@ -1454,4 +1363,56 @@ Starting code units: \xff
Last code unit = 'B' (caseless)
Subject length lower bound = 2
/./utf
\x{110000}
** Failed: character \x{110000} is greater than 0x10ffff and so cannot be converted to UTF-16
/(*UTF)ab������z/B
------------------------------------------------------------------
Bra
ab\x{fd}\x{bf}\x{bf}\x{bf}\x{bf}\x{bf}z
Ket
End
------------------------------------------------------------------
/ab������z/utf
** Failed: character value greater than 0x10ffff cannot be converted to UTF
/[\W\p{Any}]/B
------------------------------------------------------------------
Bra
[\x00-/:-@[-^`{-\xff\p{Any}\x{100}-\x{ffff}]
Ket
End
------------------------------------------------------------------
abc
0: a
123
0: 1
/[\W\pL]/B
------------------------------------------------------------------
Bra
[\x00-/:-@[-^`{-\xff\p{L}\x{100}-\x{ffff}]
Ket
End
------------------------------------------------------------------
abc
0: a
\x{100}
0: \x{100}
\x{308}
0: \x{308}
\= Expect no match
123
No match
/[\s[:^ascii:]]/B,ucp
------------------------------------------------------------------
Bra
[\x80-\xff\p{Xsp}\x{100}-\x{ffff}]
Ket
End
------------------------------------------------------------------
# End of testinput12

View File

@ -9,76 +9,6 @@
�]
** Failed: invalid UTF-8 string cannot be used as input in UTF mode
/X(\C{3})/utf
X\x{11234}Y
No match
X\x{11234}YZ
0: X\x{11234}YZ
1: \x{11234}YZ
/X(\C{4})/utf
X\x{11234}YZ
No match
X\x{11234}YZW
0: X\x{11234}YZW
1: \x{11234}YZW
/X\C*/utf
XYZabcdce
0: XYZabcdce
/X\C*?/utf
XYZabcde
0: X
/X\C{3,5}/utf
Xabcdefg
0: Xabcde
X\x{11234}Y
No match
X\x{11234}YZ
0: X\x{11234}YZ
X\x{11234}\x{512}
No match
X\x{11234}\x{512}YZ
0: X\x{11234}\x{512}YZ
X\x{11234}\x{512}\x{11234}Z
0: X\x{11234}\x{512}\x{11234}Z
/X\C{3,5}?/utf
Xabcdefg
0: Xabc
X\x{11234}Y
No match
X\x{11234}YZ
0: X\x{11234}YZ
X\x{11234}\x{512}YZ
0: X\x{11234}\x{512}Y
*** Failers
No match
X\x{11234}
No match
/a\Cb/utf
aXb
0: aXb
a\nb
0: a\x{0a}b
/a\C\Cb/utf
a\x{12257}b
No match
a\x{12257}\x{11234}b
0: a\x{12257}\x{11234}b
** Failers
No match
a\x{100}b
No match
/ab\Cde/utf
abXde
0: abXde
# Check maximum character size
/\x{ffff}/IB,utf
@ -303,29 +233,6 @@ Options: utf
First code unit = \x{212ab}
Subject length lower bound = 1
# This one is here not because it's different to Perl, but because the way
# the captured single-byte is displayed. (In Perl it becomes a character, and you
# can't tell the difference.)
/X(\C)(.*)/utf
X\x{1234}
0: X\x{1234}
1: \x{1234}
2:
X\nabc
0: X\x{0a}abc
1: \x{0a}
2: abc
# This one is here because Perl gives out a grumbly error message (quite
# correctly, but that messes up comparisons).
/a\Cb/utf
*** Failers
No match
a\x{100}b
0: a\x{100}b
/[^ab\xC0-\xF0]/IB,utf
------------------------------------------------------------------
Bra
@ -355,8 +262,7 @@ Subject length lower bound = 1
0: \x{100}
\x{1000}
0: \x{1000}
*** Failers
0: *
\= Expect no match
\x{c0}
No match
\x{f0}
@ -513,8 +419,6 @@ Subject length lower bound = 1
0: \x{100}
\x{100}Z
0: \x{100}
*** Failers
No match
/[\xff]/IB,utf
------------------------------------------------------------------
@ -600,30 +504,38 @@ Subject length lower bound = 2
Failed: error 106 at offset 13: missing terminating ] for character class
/X/utf
XX\x{d800}
Failed: error -27: UTF-32 error: code points 0xd800-0xdfff are not defined at offset 2
XX\x{d800}\=no_utf_check
0: X
XX\x{da00}
Failed: error -27: UTF-32 error: code points 0xd800-0xdfff are not defined at offset 2
XX\x{da00}\=no_utf_check
0: X
XX\x{dc00}
Failed: error -27: UTF-32 error: code points 0xd800-0xdfff are not defined at offset 2
XX\x{dc00}\=no_utf_check
0: X
XX\x{de00}
Failed: error -27: UTF-32 error: code points 0xd800-0xdfff are not defined at offset 2
XX\x{de00}\=no_utf_check
0: X
XX\x{dfff}
Failed: error -27: UTF-32 error: code points 0xd800-0xdfff are not defined at offset 2
XX\x{dfff}\=no_utf_check
0: X
\= Expect UTF error
XX\x{d800}
Failed: error -27: UTF-32 error: code points 0xd800-0xdfff are not defined at offset 2
XX\x{da00}
Failed: error -27: UTF-32 error: code points 0xd800-0xdfff are not defined at offset 2
XX\x{dc00}
Failed: error -27: UTF-32 error: code points 0xd800-0xdfff are not defined at offset 2
XX\x{de00}
Failed: error -27: UTF-32 error: code points 0xd800-0xdfff are not defined at offset 2
XX\x{dfff}
Failed: error -27: UTF-32 error: code points 0xd800-0xdfff are not defined at offset 2
XX\x{110000}
Failed: error -28: UTF-32 error: code points greater than 0x10ffff are not defined at offset 2
XX\x{d800}\x{1234}
Failed: error -27: UTF-32 error: code points 0xd800-0xdfff are not defined at offset 2
\= Expect no match
XX\x{d800}\=offset=3
No match
/(?<=.)X/utf
XX\x{d800}\=offset=3
Failed: error -27: UTF-32 error: code points 0xd800-0xdfff are not defined at offset 2
/(*UTF16)\x{11234}/
Failed: error 160 at offset 5: (*VERB) not recognized or malformed
@ -780,8 +692,10 @@ Subject length lower bound = 3
0: aa
a\x{123}aa\=offset=3
0: a
\= Expect no match
a\x{123}aa\=offset=4
No match
\= Expect bad offset error
a\x{123}aa\=offset=5
Failed: error -33: bad offset value
a\x{123}aa\=offset=6
@ -846,16 +760,21 @@ Subject length lower bound = 1
# Check bad offset
/a/utf
\= Expect bad UTF-16 offset, or no match in 32-bit
\x{10000}\=offset=1
No match
\x{10000}ab\=offset=1
0: a
\= Expect 16-bit match, 32-bit no match
\x{10000}ab\=offset=2
No match
\= Expect no match
\x{10000}ab\=offset=3
No match
\= Expect no match in 16-bit, bad offset in 32-bit
\x{10000}ab\=offset=4
Failed: error -33: bad offset value
\= Expect bad offset
\x{10000}ab\=offset=5
Failed: error -33: bad offset value
@ -1115,10 +1034,6 @@ Failed: error 134 at offset 9: character code point value in \x{} or \o{} is too
/\o{4200000}/utf
Failed: error 134 at offset 10: character code point value in \x{} or \o{} is too large
/\C/utf
\x{110000}
Failed: error -28: UTF-32 error: code points greater than 0x10ffff are not defined at offset 0
/\x{100}*A/IB,utf
------------------------------------------------------------------
Bra
@ -1166,8 +1081,6 @@ Subject length lower bound = 1
0: \x{100}
\x{100}Z
0: \x{100}
*** Failers
No match
/[z-\x{100}]/IB,utf
------------------------------------------------------------------
@ -1357,8 +1270,7 @@ Subject length lower bound = 1
0: \x{105}
\x{109}
0: \x{109}
** Failers
No match
\= Expect no match
\x{100}
No match
\x{10a}
@ -1402,8 +1314,7 @@ Subject length lower bound = 1
0: \x{100}
\x{101}
0: \x{101}
** Failers
No match
\= Expect no match
\x{102}
No match
Y
@ -1446,4 +1357,56 @@ Starting code units: \xff
Last code unit = 'B' (caseless)
Subject length lower bound = 2
/./utf
\x{110000}
Failed: error -28: UTF-32 error: code points greater than 0x10ffff are not defined at offset 0
/(*UTF)ab������z/B
------------------------------------------------------------------
Bra
ab\x{fd}\x{bf}\x{bf}\x{bf}\x{bf}\x{bf}z
Ket
End
------------------------------------------------------------------
/ab������z/utf
** Failed: character value greater than 0x10ffff cannot be converted to UTF
/[\W\p{Any}]/B
------------------------------------------------------------------
Bra
[\x00-/:-@[-^`{-\xff\p{Any}\x{100}-\x{ffffffff}]
Ket
End
------------------------------------------------------------------
abc
0: a
123
0: 1
/[\W\pL]/B
------------------------------------------------------------------
Bra
[\x00-/:-@[-^`{-\xff\p{L}\x{100}-\x{ffffffff}]
Ket
End
------------------------------------------------------------------
abc
0: a
\x{100}
0: \x{100}
\x{308}
0: \x{308}
\= Expect no match
123
No match
/[\s[:^ascii:]]/B,ucp
------------------------------------------------------------------
Bra
[\x80-\xff\p{Xsp}\x{100}-\x{ffffffff}]
Ket
End
------------------------------------------------------------------
# End of testinput12

View File

@ -1,242 +0,0 @@
# These are:
#
# (1) Tests of the match-limiting features. The results are different for
# interpretive or JIT matching, so this test should not be run with JIT. The
# same tests are run using JIT in test 16.
# (2) Other tests that must not be run with JIT.
/(a+)*zz/I
Capturing subpattern count = 1
Starting code units: a z
Last code unit = 'z'
Subject length lower bound = 2
aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaazzbbbbbb\=find_limits
Minimum match limit = 8
Minimum recursion limit = 6
0: aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaazz
1: aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
aaaaaaaaaaaaaz\=find_limits
Minimum match limit = 32768
Minimum recursion limit = 29
No match
!((?:\s|//.*\\n|/[*](?:\\n|.)*?[*]/)*)!I
Capturing subpattern count = 1
May match empty string
Subject length lower bound = 0
/* this is a C style comment */\=find_limits
Minimum match limit = 120
Minimum recursion limit = 6
0: /* this is a C style comment */
1: /* this is a C style comment */
/^(?>a)++/
aa\=find_limits
Minimum match limit = 5
Minimum recursion limit = 2
0: aa
aaaaaaaaa\=find_limits
Minimum match limit = 12
Minimum recursion limit = 2
0: aaaaaaaaa
/(a)(?1)++/
aa\=find_limits
Minimum match limit = 7
Minimum recursion limit = 4
0: aa
1: a
aaaaaaaaa\=find_limits
Minimum match limit = 21
Minimum recursion limit = 4
0: aaaaaaaaa
1: a
/a(?:.)*?a/ims
abbbbbbbbbbbbbbbbbbbbba\=find_limits
Minimum match limit = 65
Minimum recursion limit = 2
0: abbbbbbbbbbbbbbbbbbbbba
/a(?:.(*THEN))*?a/ims
abbbbbbbbbbbbbbbbbbbbba\=find_limits
Minimum match limit = 86
Minimum recursion limit = 45
0: abbbbbbbbbbbbbbbbbbbbba
/a(?:.(*THEN:ABC))*?a/ims
abbbbbbbbbbbbbbbbbbbbba\=find_limits
Minimum match limit = 86
Minimum recursion limit = 45
0: abbbbbbbbbbbbbbbbbbbbba
/^(?>a+)(?>b+)(?>c+)(?>d+)(?>e+)/
aabbccddee\=find_limits
Minimum match limit = 7
Minimum recursion limit = 2
0: aabbccddee
/^(?>(a+))(?>(b+))(?>(c+))(?>(d+))(?>(e+))/
aabbccddee\=find_limits
Minimum match limit = 17
Minimum recursion limit = 16
0: aabbccddee
1: aa
2: bb
3: cc
4: dd
5: ee
/^(?>(a+))(?>b+)(?>(c+))(?>d+)(?>(e+))/
aabbccddee\=find_limits
Minimum match limit = 13
Minimum recursion limit = 10
0: aabbccddee
1: aa
2: cc
3: ee
/(*LIMIT_MATCH=12bc)abc/
Failed: error 160 at offset 0: (*VERB) not recognized or malformed
/(*LIMIT_MATCH=4294967290)abc/
Failed: error 160 at offset 0: (*VERB) not recognized or malformed
/(*LIMIT_RECURSION=4294967280)abc/I
Capturing subpattern count = 0
Recursion limit = 4294967280
First code unit = 'a'
Last code unit = 'c'
Subject length lower bound = 3
/(a+)*zz/
aaaaaaaaaaaaaz
No match
aaaaaaaaaaaaaz\=match_limit=3000
Failed: error -47: match limit exceeded
/(a+)*zz/
aaaaaaaaaaaaaz\=recursion_limit=10
Failed: error -53: recursion limit exceeded
/(*LIMIT_MATCH=3000)(a+)*zz/I
Capturing subpattern count = 1
Match limit = 3000
Starting code units: a z
Last code unit = 'z'
Subject length lower bound = 2
aaaaaaaaaaaaaz
Failed: error -47: match limit exceeded
aaaaaaaaaaaaaz\=match_limit=60000
Failed: error -47: match limit exceeded
/(*LIMIT_MATCH=60000)(*LIMIT_MATCH=3000)(a+)*zz/I
Capturing subpattern count = 1
Match limit = 3000
Starting code units: a z
Last code unit = 'z'
Subject length lower bound = 2
aaaaaaaaaaaaaz
Failed: error -47: match limit exceeded
/(*LIMIT_MATCH=60000)(a+)*zz/I
Capturing subpattern count = 1
Match limit = 60000
Starting code units: a z
Last code unit = 'z'
Subject length lower bound = 2
aaaaaaaaaaaaaz
No match
aaaaaaaaaaaaaz\=match_limit=3000
Failed: error -47: match limit exceeded
/(*LIMIT_RECURSION=10)(a+)*zz/I
Capturing subpattern count = 1
Recursion limit = 10
Starting code units: a z
Last code unit = 'z'
Subject length lower bound = 2
aaaaaaaaaaaaaz
Failed: error -53: recursion limit exceeded
aaaaaaaaaaaaaz\=recursion_limit=1000
Failed: error -53: recursion limit exceeded
/(*LIMIT_RECURSION=10)(*LIMIT_RECURSION=1000)(a+)*zz/I
Capturing subpattern count = 1
Recursion limit = 1000
Starting code units: a z
Last code unit = 'z'
Subject length lower bound = 2
aaaaaaaaaaaaaz
No match
/(*LIMIT_RECURSION=1000)(a+)*zz/I
Capturing subpattern count = 1
Recursion limit = 1000
Starting code units: a z
Last code unit = 'z'
Subject length lower bound = 2
aaaaaaaaaaaaaz
No match
aaaaaaaaaaaaaz\=recursion_limit=10
Failed: error -53: recursion limit exceeded
# These three have infinitely nested recursions.
/((?2))((?1))/
abc
Failed: error -52: nested recursion at the same subject position
/((?(R2)a+|(?1)b))/
aaaabcde
Failed: error -52: nested recursion at the same subject position
/(?(R)a*(?1)|((?R))b)/
aaaabcde
Failed: error -52: nested recursion at the same subject position
# The allusedtext modifier does not work with JIT, which does not maintain
# the leftchar/rightchar data.
/abc(?=xyz)/allusedtext
abcxyzpqr
0: abcxyz
>>>
abcxyzpqr\=aftertext
0: abcxyz
>>>
0+ xyzpqr
/(?<=pqr)abc(?=xyz)/allusedtext
xyzpqrabcxyzpqr
0: pqrabcxyz
<<< >>>
xyzpqrabcxyzpqr\=aftertext
0: pqrabcxyz
<<< >>>
0+ xyzpqr
/a\b/
a.\=allusedtext
0: a.
>
a\=allusedtext
0: a
/abc\Kxyz/
abcxyz\=allusedtext
0: abcxyz
<<<
/abc(?=xyz(*ACCEPT))/
abcxyz\=allusedtext
0: abcxyz
>>>
/abc(?=abcde)(?=ab)/allusedtext
abcabcdefg
0: abcabcde
>>>>>
# End of testinput14

61
pcre2/testdata/testoutput14-16 vendored Normal file
View File

@ -0,0 +1,61 @@
# These test special (mostly error) UTF features of DFA matching. They are a
# selection of the more comprehensive tests that are run for non-DFA matching.
# The output is different for the different widths.
#subject dfa
/X/utf
XX\x{d800}
Failed: error -24: UTF-16 error: missing low surrogate at end at offset 2
XX\x{d800}\=offset=3
No match
XX\x{d800}\=no_utf_check
0: X
XX\x{da00}
Failed: error -24: UTF-16 error: missing low surrogate at end at offset 2
XX\x{da00}\=no_utf_check
0: X
XX\x{dc00}
Failed: error -26: UTF-16 error: isolated low surrogate at offset 2
XX\x{dc00}\=no_utf_check
0: X
XX\x{de00}
Failed: error -26: UTF-16 error: isolated low surrogate at offset 2
XX\x{de00}\=no_utf_check
0: X
XX\x{dfff}
Failed: error -26: UTF-16 error: isolated low surrogate at offset 2
XX\x{dfff}\=no_utf_check
0: X
XX\x{110000}
** Failed: character \x{110000} is greater than 0x10ffff and so cannot be converted to UTF-16
XX\x{d800}\x{1234}
Failed: error -25: UTF-16 error: invalid low surrogate at offset 3
/badutf/utf
X\xdf
No match
XX\xef
No match
XXX\xef\x80
No match
X\xf7
No match
XX\xf7\x80
No match
XXX\xf7\x80\x80
No match
/shortutf/utf
XX\xdf\=ph
No match
XX\xef\=ph
No match
XX\xef\x80\=ph
No match
\xf7\=ph
No match
\xf7\x80\=ph
No match
# End of testinput14

61
pcre2/testdata/testoutput14-32 vendored Normal file
View File

@ -0,0 +1,61 @@
# These test special (mostly error) UTF features of DFA matching. They are a
# selection of the more comprehensive tests that are run for non-DFA matching.
# The output is different for the different widths.
#subject dfa
/X/utf
XX\x{d800}
Failed: error -27: UTF-32 error: code points 0xd800-0xdfff are not defined at offset 2
XX\x{d800}\=offset=3
No match
XX\x{d800}\=no_utf_check
0: X
XX\x{da00}
Failed: error -27: UTF-32 error: code points 0xd800-0xdfff are not defined at offset 2
XX\x{da00}\=no_utf_check
0: X
XX\x{dc00}
Failed: error -27: UTF-32 error: code points 0xd800-0xdfff are not defined at offset 2
XX\x{dc00}\=no_utf_check
0: X
XX\x{de00}
Failed: error -27: UTF-32 error: code points 0xd800-0xdfff are not defined at offset 2
XX\x{de00}\=no_utf_check
0: X
XX\x{dfff}
Failed: error -27: UTF-32 error: code points 0xd800-0xdfff are not defined at offset 2
XX\x{dfff}\=no_utf_check
0: X
XX\x{110000}
Failed: error -28: UTF-32 error: code points greater than 0x10ffff are not defined at offset 2
XX\x{d800}\x{1234}
Failed: error -27: UTF-32 error: code points 0xd800-0xdfff are not defined at offset 2
/badutf/utf
X\xdf
No match
XX\xef
No match
XXX\xef\x80
No match
X\xf7
No match
XX\xf7\x80
No match
XXX\xf7\x80\x80
No match
/shortutf/utf
XX\xdf\=ph
No match
XX\xef\=ph
No match
XX\xef\x80\=ph
No match
\xf7\=ph
No match
\xf7\x80\=ph
No match
# End of testinput14

61
pcre2/testdata/testoutput14-8 vendored Normal file
View File

@ -0,0 +1,61 @@
# These test special (mostly error) UTF features of DFA matching. They are a
# selection of the more comprehensive tests that are run for non-DFA matching.
# The output is different for the different widths.
#subject dfa
/X/utf
XX\x{d800}
Failed: error -16: UTF-8 error: code points 0xd800-0xdfff are not defined at offset 2
XX\x{d800}\=offset=3
Error -36 (bad UTF-8 offset)
XX\x{d800}\=no_utf_check
0: X
XX\x{da00}
Failed: error -16: UTF-8 error: code points 0xd800-0xdfff are not defined at offset 2
XX\x{da00}\=no_utf_check
0: X
XX\x{dc00}
Failed: error -16: UTF-8 error: code points 0xd800-0xdfff are not defined at offset 2
XX\x{dc00}\=no_utf_check
0: X
XX\x{de00}
Failed: error -16: UTF-8 error: code points 0xd800-0xdfff are not defined at offset 2
XX\x{de00}\=no_utf_check
0: X
XX\x{dfff}
Failed: error -16: UTF-8 error: code points 0xd800-0xdfff are not defined at offset 2
XX\x{dfff}\=no_utf_check
0: X
XX\x{110000}
Failed: error -15: UTF-8 error: code points greater than 0x10ffff are not defined at offset 2
XX\x{d800}\x{1234}
Failed: error -16: UTF-8 error: code points 0xd800-0xdfff are not defined at offset 2
/badutf/utf
X\xdf
Failed: error -3: UTF-8 error: 1 byte missing at end at offset 1
XX\xef
Failed: error -4: UTF-8 error: 2 bytes missing at end at offset 2
XXX\xef\x80
Failed: error -3: UTF-8 error: 1 byte missing at end at offset 3
X\xf7
Failed: error -5: UTF-8 error: 3 bytes missing at end at offset 1
XX\xf7\x80
Failed: error -4: UTF-8 error: 2 bytes missing at end at offset 2
XXX\xf7\x80\x80
Failed: error -3: UTF-8 error: 1 byte missing at end at offset 3
/shortutf/utf
XX\xdf\=ph
Failed: error -3: UTF-8 error: 1 byte missing at end at offset 2
XX\xef\=ph
Failed: error -4: UTF-8 error: 2 bytes missing at end at offset 2
XX\xef\x80\=ph
Failed: error -3: UTF-8 error: 1 byte missing at end at offset 2
\xf7\=ph
Failed: error -5: UTF-8 error: 3 bytes missing at end at offset 0
\xf7\x80\=ph
Failed: error -4: UTF-8 error: 2 bytes missing at end at offset 0
# End of testinput14

View File

@ -1,17 +1,390 @@
# This test is run only when JIT support is not available. It checks that an
# attempt to use it has the expected behaviour. It also tests things that
# are different without JIT.
# These are:
#
# (1) Tests of the match-limiting features. The results are different for
# interpretive or JIT matching, so this test should not be run with JIT. The
# same tests are run using JIT in test 17.
/abc/I,jit,jitverify
# (2) Other tests that must not be run with JIT.
/(a+)*zz/I
Capturing subpattern count = 1
Starting code units: a z
Last code unit = 'z'
Subject length lower bound = 2
aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaazzbbbbbb\=find_limits
Minimum match limit = 8
Minimum recursion limit = 6
0: aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaazz
1: aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
aaaaaaaaaaaaaz\=find_limits
Minimum match limit = 32768
Minimum recursion limit = 29
No match
!((?:\s|//.*\\n|/[*](?:\\n|.)*?[*]/)*)!I
Capturing subpattern count = 1
May match empty string
Subject length lower bound = 0
/* this is a C style comment */\=find_limits
Minimum match limit = 120
Minimum recursion limit = 6
0: /* this is a C style comment */
1: /* this is a C style comment */
/^(?>a)++/
aa\=find_limits
Minimum match limit = 5
Minimum recursion limit = 2
0: aa
aaaaaaaaa\=find_limits
Minimum match limit = 12
Minimum recursion limit = 2
0: aaaaaaaaa
/(a)(?1)++/
aa\=find_limits
Minimum match limit = 7
Minimum recursion limit = 4
0: aa
1: a
aaaaaaaaa\=find_limits
Minimum match limit = 21
Minimum recursion limit = 4
0: aaaaaaaaa
1: a
/a(?:.)*?a/ims
abbbbbbbbbbbbbbbbbbbbba\=find_limits
Minimum match limit = 65
Minimum recursion limit = 2
0: abbbbbbbbbbbbbbbbbbbbba
/a(?:.(*THEN))*?a/ims
abbbbbbbbbbbbbbbbbbbbba\=find_limits
Minimum match limit = 86
Minimum recursion limit = 45
0: abbbbbbbbbbbbbbbbbbbbba
/a(?:.(*THEN:ABC))*?a/ims
abbbbbbbbbbbbbbbbbbbbba\=find_limits
Minimum match limit = 86
Minimum recursion limit = 45
0: abbbbbbbbbbbbbbbbbbbbba
/^(?>a+)(?>b+)(?>c+)(?>d+)(?>e+)/
aabbccddee\=find_limits
Minimum match limit = 7
Minimum recursion limit = 2
0: aabbccddee
/^(?>(a+))(?>(b+))(?>(c+))(?>(d+))(?>(e+))/
aabbccddee\=find_limits
Minimum match limit = 17
Minimum recursion limit = 16
0: aabbccddee
1: aa
2: bb
3: cc
4: dd
5: ee
/^(?>(a+))(?>b+)(?>(c+))(?>d+)(?>(e+))/
aabbccddee\=find_limits
Minimum match limit = 13
Minimum recursion limit = 10
0: aabbccddee
1: aa
2: cc
3: ee
/(*LIMIT_MATCH=12bc)abc/
Failed: error 160 at offset 17: (*VERB) not recognized or malformed
/(*LIMIT_MATCH=4294967290)abc/
Failed: error 160 at offset 24: (*VERB) not recognized or malformed
/(*LIMIT_RECURSION=4294967280)abc/I
Capturing subpattern count = 0
Recursion limit = 4294967280
First code unit = 'a'
Last code unit = 'c'
Subject length lower bound = 3
JIT support is not available in this version of PCRE2
/a*/I
/(a+)*zz/
aaaaaaaaaaaaaz
No match
aaaaaaaaaaaaaz\=match_limit=3000
Failed: error -47: match limit exceeded
/(a+)*zz/
aaaaaaaaaaaaaz\=recursion_limit=10
Failed: error -53: recursion limit exceeded
/(*LIMIT_MATCH=3000)(a+)*zz/I
Capturing subpattern count = 1
Match limit = 3000
Starting code units: a z
Last code unit = 'z'
Subject length lower bound = 2
aaaaaaaaaaaaaz
Failed: error -47: match limit exceeded
aaaaaaaaaaaaaz\=match_limit=60000
Failed: error -47: match limit exceeded
/(*LIMIT_MATCH=60000)(*LIMIT_MATCH=3000)(a+)*zz/I
Capturing subpattern count = 1
Match limit = 3000
Starting code units: a z
Last code unit = 'z'
Subject length lower bound = 2
aaaaaaaaaaaaaz
Failed: error -47: match limit exceeded
/(*LIMIT_MATCH=60000)(a+)*zz/I
Capturing subpattern count = 1
Match limit = 60000
Starting code units: a z
Last code unit = 'z'
Subject length lower bound = 2
aaaaaaaaaaaaaz
No match
aaaaaaaaaaaaaz\=match_limit=3000
Failed: error -47: match limit exceeded
/(*LIMIT_RECURSION=10)(a+)*zz/I
Capturing subpattern count = 1
Recursion limit = 10
Starting code units: a z
Last code unit = 'z'
Subject length lower bound = 2
aaaaaaaaaaaaaz
Failed: error -53: recursion limit exceeded
aaaaaaaaaaaaaz\=recursion_limit=1000
Failed: error -53: recursion limit exceeded
/(*LIMIT_RECURSION=10)(*LIMIT_RECURSION=1000)(a+)*zz/I
Capturing subpattern count = 1
Recursion limit = 1000
Starting code units: a z
Last code unit = 'z'
Subject length lower bound = 2
aaaaaaaaaaaaaz
No match
/(*LIMIT_RECURSION=1000)(a+)*zz/I
Capturing subpattern count = 1
Recursion limit = 1000
Starting code units: a z
Last code unit = 'z'
Subject length lower bound = 2
aaaaaaaaaaaaaz
No match
aaaaaaaaaaaaaz\=recursion_limit=10
Failed: error -53: recursion limit exceeded
# These three have infinitely nested recursions.
/((?2))((?1))/
abc
Failed: error -52: nested recursion at the same subject position
/((?(R2)a+|(?1)b))()/
aaaabcde
Failed: error -52: nested recursion at the same subject position
/(?(R)a*(?1)|((?R))b)/
aaaabcde
Failed: error -52: nested recursion at the same subject position
# The allusedtext modifier does not work with JIT, which does not maintain
# the leftchar/rightchar data.
/abc(?=xyz)/allusedtext
abcxyzpqr
0: abcxyz
>>>
abcxyzpqr\=aftertext
0: abcxyz
>>>
0+ xyzpqr
/(?<=pqr)abc(?=xyz)/allusedtext
xyzpqrabcxyzpqr
0: pqrabcxyz
<<< >>>
xyzpqrabcxyzpqr\=aftertext
0: pqrabcxyz
<<< >>>
0+ xyzpqr
/a\b/
a.\=allusedtext
0: a.
>
a\=allusedtext
0: a
/abc\Kxyz/
abcxyz\=allusedtext
0: abcxyz
<<<
/abc(?=xyz(*ACCEPT))/
abcxyz\=allusedtext
0: abcxyz
>>>
/abc(?=abcde)(?=ab)/allusedtext
abcabcdefg
0: abcabcde
>>>>>
# These tests provoke recursion loops, which give a different error message
# when JIT is used.
/(?R)/I
Capturing subpattern count = 0
May match empty string
Subject length lower bound = 0
abcd
Failed: error -52: nested recursion at the same subject position
/(a|(?R))/I
Capturing subpattern count = 1
May match empty string
Subject length lower bound = 0
abcd
0: a
1: a
defg
Failed: error -52: nested recursion at the same subject position
/(ab|(bc|(de|(?R))))/I
Capturing subpattern count = 3
May match empty string
Subject length lower bound = 0
abcd
0: ab
1: ab
fghi
Failed: error -52: nested recursion at the same subject position
/(ab|(bc|(de|(?1))))/I
Capturing subpattern count = 3
May match empty string
Subject length lower bound = 0
abcd
0: ab
1: ab
fghi
Failed: error -52: nested recursion at the same subject position
/x(ab|(bc|(de|(?1)x)x)x)/I
Capturing subpattern count = 3
First code unit = 'x'
Subject length lower bound = 3
xab123
0: xab
1: ab
xfghi
Failed: error -52: nested recursion at the same subject position
/(?!\w)(?R)/
abcd
Failed: error -52: nested recursion at the same subject position
=abc
Failed: error -52: nested recursion at the same subject position
/(?=\w)(?R)/
=abc
Failed: error -52: nested recursion at the same subject position
abcd
Failed: error -52: nested recursion at the same subject position
/(?<!\w)(?R)/
abcd
Failed: error -52: nested recursion at the same subject position
/(?<=\w)(?R)/
abcd
Failed: error -52: nested recursion at the same subject position
/(a+|(?R)b)/
aaa
0: aaa
1: aaa
bbb
Failed: error -52: nested recursion at the same subject position
/[^\xff]((?1))/BI
------------------------------------------------------------------
Bra
[^\x{ff}]
CBra 1
Recurse
Ket
Ket
End
------------------------------------------------------------------
Capturing subpattern count = 1
Subject length lower bound = 1
abcd
Failed: error -52: nested recursion at the same subject position
# These tests don't behave the same with JIT
/\w+(?C1)/BI,no_auto_possess
------------------------------------------------------------------
Bra
\w+
Callout 1 8 0
Ket
End
------------------------------------------------------------------
Capturing subpattern count = 0
Options: no_auto_possess
Starting code units: 0 1 2 3 4 5 6 7 8 9 A B C D E F G H I J K L M N O P
Q R S T U V W X Y Z _ a b c d e f g h i j k l m n o p q r s t u v w x y z
Subject length lower bound = 1
abc\=callout_fail=1
--->abc
1 ^ ^
1 ^ ^
1 ^^
1 ^ ^
1 ^^
1 ^^
No match
/(*NO_AUTO_POSSESS)\w+(?C1)/BI
------------------------------------------------------------------
Bra
\w+
Callout 1 26 0
Ket
End
------------------------------------------------------------------
Capturing subpattern count = 0
Compile options: <none>
Overall options: no_auto_possess
Starting code units: 0 1 2 3 4 5 6 7 8 9 A B C D E F G H I J K L M N O P
Q R S T U V W X Y Z _ a b c d e f g h i j k l m n o p q r s t u v w x y z
Subject length lower bound = 1
abc\=callout_fail=1
--->abc
1 ^ ^
1 ^ ^
1 ^^
1 ^ ^
1 ^^
1 ^^
No match
# This test breaks the JIT stack limit
/(|]+){2,2452}/
(|]+){2,2452}
0:
1:
# End of testinput15

File diff suppressed because one or more lines are too long

File diff suppressed because one or more lines are too long

View File

@ -1,20 +1,171 @@
# This set of tests is run only with the 8-bit library. It tests the POSIX
# interface with UTF/UCP support, which is supported only with the 8-bit
# library. This test should not be run with JIT (which is not available for the
# POSIX interface).
# interface, which is supported only with the 8-bit library. This test should
# not be run with JIT (which is not available for the POSIX interface).
#forbid_utf
#pattern posix
/a\x{1234}b/utf
a\x{1234}b
0: a\x{1234}b
# Test invalid options
/\w/
+++\x{c2}
/abc/auto_callout
** Ignored with POSIX interface: auto_callout
/abc/
abc\=find_limits
** Ignored with POSIX interface: find_limits
0: abc
/abc/
abc\=partial_hard
** Ignored with POSIX interface: partial_hard
0: abc
# Real tests
/abc/
abc
0: abc
/^abc|def/
abcdef
0: abc
abcdef\=notbol
0: def
/.*((abc)$|(def))/
defabc
0: defabc
1: abc
2: abc
defabc\=noteol
0: def
1: def
3: def
/the quick brown fox/
the quick brown fox
0: the quick brown fox
\= Expect no match
The Quick Brown Fox
No match: POSIX code 17: match failed
/\w/ucp
+++\x{c2}
0: \xc2
# End of testdata/testinput17
/the quick brown fox/i
the quick brown fox
0: the quick brown fox
The Quick Brown Fox
0: The Quick Brown Fox
/(*LF)abc.def/
\= Expect no match
abc\ndef
No match: POSIX code 17: match failed
/(*LF)abc$/
abc
0: abc
abc\n
0: abc
/(abc)\2/
Failed: POSIX code 15: bad back reference at offset 6
/(abc\1)/
\= Expect no match
abc
No match: POSIX code 17: match failed
/a*(b+)(z)(z)/
aaaabbbbzzzz
0: aaaabbbbzz
1: bbbb
2: z
3: z
aaaabbbbzzzz\=ovector=0
Matched without capture
aaaabbbbzzzz\=ovector=1
0: aaaabbbbzz
aaaabbbbzzzz\=ovector=2
0: aaaabbbbzz
1: bbbb
/(*ANY)ab.cd/
ab-cd
0: ab-cd
ab=cd
0: ab=cd
\= Expect no match
ab\ncd
No match: POSIX code 17: match failed
/ab.cd/s
ab-cd
0: ab-cd
ab=cd
0: ab=cd
ab\ncd
0: ab\x0acd
/a(b)c/posix_nosub
abc
Matched with REG_NOSUB
/a(?P<name>b)c/posix_nosub
abc
Matched with REG_NOSUB
/(a)\1/posix_nosub
zaay
Matched with REG_NOSUB
/a?|b?/
abc
0: a
\= Expect no match
ddd\=notempty
No match: POSIX code 17: match failed
/\w+A/
CDAAAAB
0: CDAAAA
/\w+A/ungreedy
CDAAAAB
0: CDA
/\Biss\B/I,aftertext
** Ignored with POSIX interface: info
Mississippi
0: iss
0+ issippi
/abc/\
Failed: POSIX code 9: bad escape sequence at offset 4
"(?(?C)"
Failed: POSIX code 11: unbalanced () at offset 6
"(?(?C))"
Failed: POSIX code 3: pattern error at offset 6
/abcd/substitute_extended
** Ignored with POSIX interface: substitute_extended
/\[A]{1000000}**/expand,regerror_buffsize=31
Failed: POSIX code 4: ? * + invalid at offset 100000
** regerror() message truncated
/\[A]{1000000}**/expand,regerror_buffsize=32
Failed: POSIX code 4: ? * + invalid at offset 1000001
//posix_nosub
\=offset=70000
** Ignored with POSIX interface: offset
Matched with REG_NOSUB
/(?=(a\K))/
a
Start of matched string is beyond its end - displaying from end to start.
0: a
1: a
# End of testdata/testinput18

View File

@ -1,100 +1,21 @@
# This set of tests exercises the serialization/deserialization functions in
# the library. It does not use UTF or JIT.
#forbid_utf
# Compile several patterns, push them onto the stack, and then write them
# all to a file.
#pattern push
/(?<NAME>(?&NAME_PAT))\s+(?<ADDR>(?&ADDRESS_PAT))
(?(DEFINE)
(?<NAME_PAT>[a-z]+)
(?<ADDRESS_PAT>\d+)
)/x
/^(?:((.)(?1)\2|)|((.)(?3)\4|.))$/i
#save testsaved1
# Do it again for some more patterns.
/(*MARK:A)(*SKIP:B)(C|X)/mark
** Ignored when compiled pattern is stacked with 'push': mark
/(?:(?<n>foo)|(?<n>bar))\k<n>/dupnames
#save testsaved2
#pattern -push
# Reload the patterns, then pop them one by one and check them.
#load testsaved1
#load testsaved2
#pop info
Capturing subpattern count = 2
Max back reference = 2
Named capturing subpatterns:
n 1
n 2
Options: dupnames
Starting code units: b f
Subject length lower bound = 6
foofoo
0: foofoo
1: foo
barbar
0: barbar
1: <unset>
2: bar
# This set of tests is run only with the 8-bit library. It tests the POSIX
# interface with UTF/UCP support, which is supported only with the 8-bit
# library. This test should not be run with JIT (which is not available for the
# POSIX interface).
#pop mark
C
0: C
1: C
MK: A
D
No match, mark = A
#pattern posix
/a\x{1234}b/utf
a\x{1234}b
0: a\x{1234}b
/\w/
\= Expect no match
+++\x{c2}
No match: POSIX code 17: match failed
/\w/ucp
+++\x{c2}
0: \xc2
#pop
AmanaplanacanalPanama
0: AmanaplanacanalPanama
1: <unset>
2: <unset>
3: AmanaplanacanalPanama
4: A
#pop info
Capturing subpattern count = 4
Named capturing subpatterns:
ADDR 2
ADDRESS_PAT 4
NAME 1
NAME_PAT 3
Options: extended
Subject length lower bound = 3
metcalfe 33
0: metcalfe 33
1: metcalfe
2: 33
# Check for an error when different tables are used.
/abc/push,tables=1
/xyz/push,tables=2
#save testsaved1
Serialization failed: error -30: patterns do not all use the same character tables
#pop
xyz
0: xyz
#pop
abc
0: abc
#pop should give an error
** Can't pop off an empty stack
pqr
# End of testinput19
# End of testdata/testinput19

File diff suppressed because it is too large Load Diff

150
pcre2/testdata/testoutput20 vendored Normal file
View File

@ -0,0 +1,150 @@
# This set of tests exercises the serialization/deserialization and code copy
# functions in the library. It does not use UTF or JIT.
#forbid_utf
# Compile several patterns, push them onto the stack, and then write them
# all to a file.
#pattern push
/(?<NAME>(?&NAME_PAT))\s+(?<ADDR>(?&ADDRESS_PAT))
(?(DEFINE)
(?<NAME_PAT>[a-z]+)
(?<ADDRESS_PAT>\d+)
)/x
/^(?:((.)(?1)\2|)|((.)(?3)\4|.))$/i
#save testsaved1
# Do it again for some more patterns.
/(*MARK:A)(*SKIP:B)(C|X)/mark
** Ignored when compiled pattern is stacked with 'push': mark
/(?:(?<n>foo)|(?<n>bar))\k<n>/dupnames
#save testsaved2
#pattern -push
# Reload the patterns, then pop them one by one and check them.
#load testsaved1
#load testsaved2
#pop info
Capturing subpattern count = 2
Max back reference = 2
Named capturing subpatterns:
n 1
n 2
Options: dupnames
Starting code units: b f
Subject length lower bound = 6
foofoo
0: foofoo
1: foo
barbar
0: barbar
1: <unset>
2: bar
#pop mark
C
0: C
1: C
MK: A
\= Expect no match
D
No match, mark = A
#pop
AmanaplanacanalPanama
0: AmanaplanacanalPanama
1: <unset>
2: <unset>
3: AmanaplanacanalPanama
4: A
#pop info
Capturing subpattern count = 4
Named capturing subpatterns:
ADDR 2
ADDRESS_PAT 4
NAME 1
NAME_PAT 3
Options: extended
Subject length lower bound = 3
metcalfe 33
0: metcalfe 33
1: metcalfe
2: 33
# Check for an error when different tables are used.
/abc/push,tables=1
/xyz/push,tables=2
#save testsaved1
Serialization failed: error -30: patterns do not all use the same character tables
#pop
xyz
0: xyz
#pop
abc
0: abc
#pop should give an error
** Can't pop off an empty stack
pqr
/abcd/pushcopy
abcd
0: abcd
#pop
abcd
0: abcd
#pop should give an error
** Can't pop off an empty stack
/abcd/push
#popcopy
abcd
0: abcd
#pop
abcd
0: abcd
/abcd/push
#save testsaved1
#pop should give an error
** Can't pop off an empty stack
#load testsaved1
#popcopy
abcd
0: abcd
#pop
abcd
0: abcd
#pop should give an error
** Can't pop off an empty stack
/abcd/pushtablescopy
abcd
0: abcd
#popcopy
abcd
0: abcd
#pop
abcd
0: abcd
# End of testinput20

94
pcre2/testdata/testoutput21 vendored Normal file
View File

@ -0,0 +1,94 @@
# These are tests of \C that do not involve UTF. They are not run when \C is
# disabled by compiling with --enable-never-backslash-C.
/\C+\D \C+\d \C+\S \C+\s \C+\W \C+\w \C+. \C+\R \C+\H \C+\h \C+\V \C+\v \C+\Z \C+\z \C+$/Bx
------------------------------------------------------------------
Bra
AllAny+
\D
AllAny+
\d
AllAny+
\S
AllAny+
\s
AllAny+
\W
AllAny+
\w
AllAny+
Any
AllAny+
\R
AllAny+
\H
AllAny+
\h
AllAny+
\V
AllAny+
\v
AllAny+
\Z
AllAny++
\z
AllAny+
$
Ket
End
------------------------------------------------------------------
/\D+\C \d+\C \S+\C \s+\C \W+\C \w+\C .+\C \R+\C \H+\C \h+\C \V+\C \v+\C a+\C \n+\C \C+\C/Bx
------------------------------------------------------------------
Bra
\D+
AllAny
\d+
AllAny
\S+
AllAny
\s+
AllAny
\W+
AllAny
\w+
AllAny
Any+
AllAny
\R+
AllAny
\H+
AllAny
\h+
AllAny
\V+
AllAny
\v+
AllAny
a+
AllAny
\x0a+
AllAny
AllAny+
AllAny
Ket
End
------------------------------------------------------------------
/ab\Cde/never_backslash_c
Failed: error 183 at offset 4: using \C is disabled by the application
/ab\Cde/info
Capturing subpattern count = 0
Contains \C
First code unit = 'a'
Last code unit = 'e'
Subject length lower bound = 5
abXde
0: abXde
/(?<=ab\Cde)X/
abZdeX
0: X
# End of testinput21

169
pcre2/testdata/testoutput22-16 vendored Normal file
View File

@ -0,0 +1,169 @@
# Tests of \C when Unicode support is available. Note that \C is not supported
# for DFA matching in UTF mode, so this test is not run with -dfa. The output
# of this test is different in 8-, 16-, and 32-bit modes. Some tests may match
# in some widths and not in others.
/ab\Cde/utf,info
Capturing subpattern count = 0
Contains \C
Options: utf
First code unit = 'a'
Last code unit = 'e'
Subject length lower bound = 0
abXde
0: abXde
# This should produce an error diagnostic (\C in UTF lookbehind) in 8-bit and
# 16-bit modes, but not in 32-bit mode.
/(?<=ab\Cde)X/utf
Failed: error 136 at offset 0: \C is not allowed in a lookbehind assertion in UTF-16 mode
ab!deXYZ
# Autopossessification tests
/\C+\X \X+\C/Bx
------------------------------------------------------------------
Bra
AllAny+
extuni
extuni+
AllAny
Ket
End
------------------------------------------------------------------
/\C+\X \X+\C/Bx,utf
------------------------------------------------------------------
Bra
Anybyte+
extuni
extuni+
Anybyte
Ket
End
------------------------------------------------------------------
/\C\X*TӅ;
{0,6}\v+
F
/utf
\= Expect no match
Ӆ\x0a
No match
/\C(\W?ſ)'?{{/utf
\= Expect no match
\\C(\\W?ſ)'?{{
No match
/X(\C{3})/utf
X\x{1234}
No match
X\x{11234}Y
0: X\x{11234}Y
1: \x{11234}Y
X\x{11234}YZ
0: X\x{11234}Y
1: \x{11234}Y
/X(\C{4})/utf
X\x{1234}YZ
No match
X\x{11234}YZ
0: X\x{11234}YZ
1: \x{11234}YZ
X\x{11234}YZW
0: X\x{11234}YZ
1: \x{11234}YZ
/X\C*/utf
XYZabcdce
0: XYZabcdce
/X\C*?/utf
XYZabcde
0: X
/X\C{3,5}/utf
Xabcdefg
0: Xabcde
X\x{1234}
No match
X\x{1234}YZ
0: X\x{1234}YZ
X\x{1234}\x{512}
No match
X\x{1234}\x{512}YZ
0: X\x{1234}\x{512}YZ
X\x{11234}Y
0: X\x{11234}Y
X\x{11234}YZ
0: X\x{11234}YZ
X\x{11234}\x{512}
0: X\x{11234}\x{512}
X\x{11234}\x{512}YZ
0: X\x{11234}\x{512}YZ
X\x{11234}\x{512}\x{11234}Z
0: X\x{11234}\x{512}\x{11234}
/X\C{3,5}?/utf
Xabcdefg
0: Xabc
X\x{1234}
No match
X\x{1234}YZ
0: X\x{1234}YZ
X\x{1234}\x{512}
No match
X\x{11234}Y
0: X\x{11234}Y
X\x{11234}YZ
0: X\x{11234}Y
X\x{11234}\x{512}YZ
0: X\x{11234}\x{512}
X\x{11234}
No match
/a\Cb/utf
aXb
0: aXb
a\nb
0: a\x{0a}b
a\x{100}b
0: a\x{100}b
/a\C\Cb/utf
a\x{100}b
No match
a\x{12257}b
0: a\x{12257}b
a\x{12257}\x{11234}b
No match
/ab\Cde/utf
abXde
0: abXde
# This one is here not because it's different to Perl, but because the way
# the captured single code unit is displayed. (In Perl it becomes a character,
# and you can't tell the difference.)
/X(\C)(.*)/utf
X\x{1234}
0: X\x{1234}
1: \x{1234}
2:
X\nabc
0: X\x{0a}abc
1: \x{0a}
2: abc
# This one is here because Perl gives out a grumbly error message (quite
# correctly, but that messes up comparisons).
/a\Cb/utf
\= Expect no match in 8-bit mode
a\x{100}b
0: a\x{100}b

167
pcre2/testdata/testoutput22-32 vendored Normal file
View File

@ -0,0 +1,167 @@
# Tests of \C when Unicode support is available. Note that \C is not supported
# for DFA matching in UTF mode, so this test is not run with -dfa. The output
# of this test is different in 8-, 16-, and 32-bit modes. Some tests may match
# in some widths and not in others.
/ab\Cde/utf,info
Capturing subpattern count = 0
Contains \C
Options: utf
First code unit = 'a'
Last code unit = 'e'
Subject length lower bound = 5
abXde
0: abXde
# This should produce an error diagnostic (\C in UTF lookbehind) in 8-bit and
# 16-bit modes, but not in 32-bit mode.
/(?<=ab\Cde)X/utf
ab!deXYZ
0: X
# Autopossessification tests
/\C+\X \X+\C/Bx
------------------------------------------------------------------
Bra
AllAny+
extuni
extuni+
AllAny
Ket
End
------------------------------------------------------------------
/\C+\X \X+\C/Bx,utf
------------------------------------------------------------------
Bra
AllAny+
extuni
extuni+
AllAny
Ket
End
------------------------------------------------------------------
/\C\X*TӅ;
{0,6}\v+
F
/utf
\= Expect no match
Ӆ\x0a
No match
/\C(\W?ſ)'?{{/utf
\= Expect no match
\\C(\\W?ſ)'?{{
No match
/X(\C{3})/utf
X\x{1234}
No match
X\x{11234}Y
No match
X\x{11234}YZ
0: X\x{11234}YZ
1: \x{11234}YZ
/X(\C{4})/utf
X\x{1234}YZ
No match
X\x{11234}YZ
No match
X\x{11234}YZW
0: X\x{11234}YZW
1: \x{11234}YZW
/X\C*/utf
XYZabcdce
0: XYZabcdce
/X\C*?/utf
XYZabcde
0: X
/X\C{3,5}/utf
Xabcdefg
0: Xabcde
X\x{1234}
No match
X\x{1234}YZ
0: X\x{1234}YZ
X\x{1234}\x{512}
No match
X\x{1234}\x{512}YZ
0: X\x{1234}\x{512}YZ
X\x{11234}Y
No match
X\x{11234}YZ
0: X\x{11234}YZ
X\x{11234}\x{512}
No match
X\x{11234}\x{512}YZ
0: X\x{11234}\x{512}YZ
X\x{11234}\x{512}\x{11234}Z
0: X\x{11234}\x{512}\x{11234}Z
/X\C{3,5}?/utf
Xabcdefg
0: Xabc
X\x{1234}
No match
X\x{1234}YZ
0: X\x{1234}YZ
X\x{1234}\x{512}
No match
X\x{11234}Y
No match
X\x{11234}YZ
0: X\x{11234}YZ
X\x{11234}\x{512}YZ
0: X\x{11234}\x{512}Y
X\x{11234}
No match
/a\Cb/utf
aXb
0: aXb
a\nb
0: a\x{0a}b
a\x{100}b
0: a\x{100}b
/a\C\Cb/utf
a\x{100}b
No match
a\x{12257}b
No match
a\x{12257}\x{11234}b
0: a\x{12257}\x{11234}b
/ab\Cde/utf
abXde
0: abXde
# This one is here not because it's different to Perl, but because the way
# the captured single code unit is displayed. (In Perl it becomes a character,
# and you can't tell the difference.)
/X(\C)(.*)/utf
X\x{1234}
0: X\x{1234}
1: \x{1234}
2:
X\nabc
0: X\x{0a}abc
1: \x{0a}
2: abc
# This one is here because Perl gives out a grumbly error message (quite
# correctly, but that messes up comparisons).
/a\Cb/utf
\= Expect no match in 8-bit mode
a\x{100}b
0: a\x{100}b

171
pcre2/testdata/testoutput22-8 vendored Normal file
View File

@ -0,0 +1,171 @@
# Tests of \C when Unicode support is available. Note that \C is not supported
# for DFA matching in UTF mode, so this test is not run with -dfa. The output
# of this test is different in 8-, 16-, and 32-bit modes. Some tests may match
# in some widths and not in others.
/ab\Cde/utf,info
Capturing subpattern count = 0
Contains \C
Options: utf
First code unit = 'a'
Last code unit = 'e'
Subject length lower bound = 0
abXde
0: abXde
# This should produce an error diagnostic (\C in UTF lookbehind) in 8-bit and
# 16-bit modes, but not in 32-bit mode.
/(?<=ab\Cde)X/utf
Failed: error 136 at offset 0: \C is not allowed in a lookbehind assertion in UTF-8 mode
ab!deXYZ
# Autopossessification tests
/\C+\X \X+\C/Bx
------------------------------------------------------------------
Bra
AllAny+
extuni
extuni+
AllAny
Ket
End
------------------------------------------------------------------
/\C+\X \X+\C/Bx,utf
------------------------------------------------------------------
Bra
Anybyte+
extuni
extuni+
Anybyte
Ket
End
------------------------------------------------------------------
/\C\X*TӅ;
{0,6}\v+
F
/utf
\= Expect no match
Ӆ\x0a
No match
/\C(\W?ſ)'?{{/utf
\= Expect no match
\\C(\\W?ſ)'?{{
No match
/X(\C{3})/utf
X\x{1234}
0: X\x{1234}
1: \x{1234}
X\x{11234}Y
0: X\x{f0}\x{91}\x{88}
1: \x{f0}\x{91}\x{88}
X\x{11234}YZ
0: X\x{f0}\x{91}\x{88}
1: \x{f0}\x{91}\x{88}
/X(\C{4})/utf
X\x{1234}YZ
0: X\x{1234}Y
1: \x{1234}Y
X\x{11234}YZ
0: X\x{11234}
1: \x{11234}
X\x{11234}YZW
0: X\x{11234}
1: \x{11234}
/X\C*/utf
XYZabcdce
0: XYZabcdce
/X\C*?/utf
XYZabcde
0: X
/X\C{3,5}/utf
Xabcdefg
0: Xabcde
X\x{1234}
0: X\x{1234}
X\x{1234}YZ
0: X\x{1234}YZ
X\x{1234}\x{512}
0: X\x{1234}\x{512}
X\x{1234}\x{512}YZ
0: X\x{1234}\x{512}
X\x{11234}Y
0: X\x{11234}Y
X\x{11234}YZ
0: X\x{11234}Y
X\x{11234}\x{512}
0: X\x{11234}\x{d4}
X\x{11234}\x{512}YZ
0: X\x{11234}\x{d4}
X\x{11234}\x{512}\x{11234}Z
0: X\x{11234}\x{d4}
/X\C{3,5}?/utf
Xabcdefg
0: Xabc
X\x{1234}
0: X\x{1234}
X\x{1234}YZ
0: X\x{1234}
X\x{1234}\x{512}
0: X\x{1234}
X\x{11234}Y
0: X\x{f0}\x{91}\x{88}
X\x{11234}YZ
0: X\x{f0}\x{91}\x{88}
X\x{11234}\x{512}YZ
0: X\x{f0}\x{91}\x{88}
X\x{11234}
0: X\x{f0}\x{91}\x{88}
/a\Cb/utf
aXb
0: aXb
a\nb
0: a\x{0a}b
a\x{100}b
No match
/a\C\Cb/utf
a\x{100}b
0: a\x{100}b
a\x{12257}b
No match
a\x{12257}\x{11234}b
No match
/ab\Cde/utf
abXde
0: abXde
# This one is here not because it's different to Perl, but because the way
# the captured single code unit is displayed. (In Perl it becomes a character,
# and you can't tell the difference.)
/X(\C)(.*)/utf
X\x{1234}
0: X\x{1234}
1: \x{e1}
2: \x{88}\x{b4}
X\nabc
0: X\x{0a}abc
1: \x{0a}
2: abc
# This one is here because Perl gives out a grumbly error message (quite
# correctly, but that messes up comparisons).
/a\Cb/utf
\= Expect no match in 8-bit mode
a\x{100}b
No match

8
pcre2/testdata/testoutput23 vendored Normal file
View File

@ -0,0 +1,8 @@
# This test is run when PCRE2 has been built with --enable-never-backslash-C,
# which disables the use of \C. All we can do is check that it gives the
# correct error message.
/a\Cb/
Failed: error 185 at offset 3: using \C is disabled in this PCRE2 library
# End of testinput23

View File

@ -8,8 +8,7 @@
#forbid_utf
/^[\w]+/
*** Failers
No match
\= Expect no match
�cole
No match
@ -18,8 +17,7 @@ No match
0: �cole
/^[\w]+/
*** Failers
No match
\= Expect no match
�cole
No match
@ -28,30 +26,26 @@ No match
0: \xc9
/^[\W]+/locale=fr_FR
*** Failers
0: ***
\= Expect no match
�cole
No match
/[\b]/
\b
0: \x08
*** Failers
No match
\= Expect no match
a
No match
/[\b]/locale=fr_FR
\b
0: \x08
*** Failers
No match
\= Expect no match
a
No match
/^\w+/
*** Failers
No match
\= Expect no match
�cole
No match
@ -66,18 +60,14 @@ No match
2: cole
/(.+)\b(.+)/locale=fr_FR
*** Failers
0: *** Failers
1: ***
2: Failers
\= Expect no match
�cole
No match
/�cole/i
�cole
0: \xc9cole
*** Failers
No match
\= Expect no match
�cole
No match
@ -114,8 +104,7 @@ Subject length lower bound = 1
/^[\xc8-\xc9]/
�cole
0: �
*** Failers
No match
\= Expect no match
�cole
No match

View File

@ -8,8 +8,7 @@
#forbid_utf
/^[\w]+/
*** Failers
No match
\= Expect no match
�cole
No match
@ -18,8 +17,7 @@ No match
0: �cole
/^[\w]+/
*** Failers
No match
\= Expect no match
�cole
No match
@ -28,30 +26,26 @@ No match
0: \xc9
/^[\W]+/locale=fr_FR
*** Failers
0: ***
\= Expect no match
�cole
No match
/[\b]/
\b
0: \x08
*** Failers
No match
\= Expect no match
a
No match
/[\b]/locale=fr_FR
\b
0: \x08
*** Failers
No match
\= Expect no match
a
No match
/^\w+/
*** Failers
No match
\= Expect no match
�cole
No match
@ -66,18 +60,14 @@ No match
2: cole
/(.+)\b(.+)/locale=fr_FR
*** Failers
0: *** Failers
1: ***
2: Failers
\= Expect no match
�cole
No match
/�cole/i
�cole
0: \xc9cole
*** Failers
No match
\= Expect no match
�cole
No match
@ -114,8 +104,7 @@ Subject length lower bound = 1
/^[\xc8-\xc9]/
�cole
0: �
*** Failers
No match
\= Expect no match
�cole
No match

View File

@ -8,8 +8,7 @@
#forbid_utf
/^[\w]+/
*** Failers
No match
\= Expect no match
�cole
No match
@ -18,8 +17,7 @@ No match
0: �cole
/^[\w]+/
*** Failers
No match
\= Expect no match
�cole
No match
@ -28,30 +26,26 @@ No match
0: \xc9
/^[\W]+/locale=fr_FR
*** Failers
0: ***
\= Expect no match
�cole
No match
/[\b]/
\b
0: \x08
*** Failers
No match
\= Expect no match
a
No match
/[\b]/locale=fr_FR
\b
0: \x08
*** Failers
No match
\= Expect no match
a
No match
/^\w+/
*** Failers
No match
\= Expect no match
�cole
No match
@ -66,18 +60,14 @@ No match
2: cole
/(.+)\b(.+)/locale=fr_FR
*** Failers
0: *** Failers
1: ***
2: Failers
\= Expect no match
�cole
No match
/�cole/i
�cole
0: \xc9cole
*** Failers
No match
\= Expect no match
�cole
No match
@ -114,8 +104,7 @@ Subject length lower bound = 1
/^[\xc8-\xc9]/
�cole
0: �
*** Failers
No match
\= Expect no match
�cole
No match

File diff suppressed because it is too large Load Diff

View File

@ -3,6 +3,8 @@
# results in 8-bit, 16-bit, and 32-bit modes are excluded (see tests 10 and
# 12).
#newline_default lf any anycrlf
# PCRE2 and Perl disagree about the characteristics of certain Unicode
# characters. For example, 061C is considered by Perl to be Arabic, though
# is it not listed as such in the Unicode Scripts.txt file, and 2066-2069 are
@ -11,14 +13,12 @@
# test 4.
/^[\p{Arabic}]/utf
** Failers
No match
\= Expect no match
\x{061c}
No match
/^[[:graph:]]+$/utf,ucp
** Failers
No match
\= Expect no match
\x{61c}
No match
\x{2066}
@ -31,8 +31,7 @@ No match
No match
/^[[:print:]]+$/utf,ucp
** Failers
0: ** Failers
\= Expect no match
\x{61c}
No match
\x{2066}
@ -76,6 +75,7 @@ No match
0: A\x{85}\x{2005}Z
/^[[:graph:]]+$/utf,ucp
\= Expect no match
\x{180e}
No match
@ -88,6 +88,7 @@ No match
0: \x{09}\x{0a}\x{1d} \x{85}\x{a0}\x{61c}\x{1680}\x{180e}
/^[[:^print:]]+$/utf,ucp
\= Expect no match
\x{180e}
No match
@ -182,10 +183,6 @@ Subject length lower bound = 3
\x{212ab}\x{212ab}\x{212ab}\x{861}
0: \x{212ab}\x{212ab}\x{212ab}
/(?<=\C)X/utf
Failed: error 136 at offset 6: \C is not allowed in a lookbehind assertion
Should produce an error diagnostic
/^[ab]/IB,utf
------------------------------------------------------------------
Bra
@ -200,8 +197,7 @@ Overall options: anchored utf
Subject length lower bound = 1
bar
0: b
*** Failers
No match
\= Expect no match
c
No match
\x{ff}
@ -227,8 +223,7 @@ Subject length lower bound = 1
0: \x{ff}
\x{100}
0: \x{100}
*** Failers
0: *
\= Expect no match
aaa
No match
@ -251,8 +246,7 @@ No match
\x{100}\x{100}"12"
0: \x{100}\x{100}"12"
1: "12"
*** Failers
No match
\= Expect no match
\x{100}\x{100}abcd
No match
@ -303,8 +297,7 @@ Failed: error 108 at offset 15: range out of order in character class
0: \x{100}
\x{104}
0: \x{104}
*** Failers
No match
\= Expect no match
\x{105}
No match
\x{ff}
@ -581,8 +574,7 @@ Matched, but too many substrings
0: a\x{2028}b
a\x{2029}b
0: a\x{2029}b
** Failers
No match
\= Expect no match
a\n\rb
No match
@ -623,8 +615,7 @@ No match
0: a\x{0a}\x{0d}b
a\n\r\x{85}\x0cb
0: a\x{0a}\x{0d}\x{85}\x{0c}b
** Failers
No match
\= Expect no match
ab
No match
@ -643,8 +634,7 @@ No match
0: a\x{0a}\x{0d}\x{0a}\x{0d}b
a\n\n\r\nb
0: a\x{0a}\x{0a}\x{0d}\x{0a}b
** Failers
No match
\= Expect no match
a\n\n\n\rb
No match
a\r
@ -655,8 +645,7 @@ No match
0: X X\x{0a}
X\x09X\x0b
0: X\x{09}X\x{0b}
** Failers
No match
\= Expect no match
\x{a0} X\x0a
No match
@ -667,8 +656,7 @@ No match
0: \x{09} \x{a0}\x{0a}\x{0b}\x{0c}\x{0d}
\x09\x20\x{a0}\x0a\x0b\x0c
0: \x{09} \x{a0}\x{0a}\x{0b}\x{0c}
** Failers
No match
\= Expect no match
\x09\x20\x{a0}\x0a\x0b
No match
@ -677,8 +665,7 @@ No match
0: \x{3001}\x{3000}\x{2030}\x{2028}
X\x{180e}X\x{85}
0: X\x{180e}X\x{85}
** Failers
No match
\= Expect no match
\x{2009} X\x0a
No match
@ -689,8 +676,7 @@ No match
0: \x{09}\x{205f}\x{a0}\x{0a}\x{2029}\x{0c}\x{2028}
\x09\x20\x{202f}\x0a\x0b\x0c
0: \x{09} \x{202f}\x{0a}\x{0b}\x{0c}
** Failers
No match
\= Expect no match
\x09\x{200a}\x{a0}\x{2028}\x0b
No match
@ -755,8 +741,7 @@ Subject length lower bound = 3
0: a\x{0a}b
a\r\nb
0: a\x{0d}\x{0a}b
** Failers
No match
\= Expect no match
a\x{85}b
No match
a\x0bb
@ -793,8 +778,7 @@ Subject length lower bound = 2
0: a\x{0a}b
a\r\nb
0: a\x{0d}\x{0a}b
** Failers
No match
\= Expect no match
a\x{85}b
No match
a\x0bb
@ -817,14 +801,11 @@ Subject length lower bound = 2
0: a\x{85}b
a\x0bb
0: a\x{0b}b
** Failers
No match
/.*a.*=.b.*/utf,newline=any
QQQ\x{2029}ABCaXYZ=!bPQR
0: ABCaXYZ=!bPQR
** Failers
No match
\= Expect no match
a\x{2029}b
No match
\x61\xe2\x80\xa9\x62
@ -838,8 +819,7 @@ Failed: error 130 at offset 3: unknown POSIX class name
0: a\x{1234}b
a\nb
0: a\x{0a}b
** Failers
No match
\= Expect no match
ab
No match
@ -848,8 +828,7 @@ No match
0: aXb
a\nX\nX\x{1234}b
0: a\x{0a}X\x{0a}X\x{1234}b
** Failers
No match
\= Expect no match
ab
No match
@ -935,6 +914,7 @@ Partial match: X\x{123}\x{123}\x{123}
Partial match: X\x{123}\x{123}\x{123}\x{123}
/X\x{123}{2,4}b/utf
\= Expect no match
Xx\=ps
No match
X\x{123}x\=ps
@ -947,6 +927,7 @@ No match
No match
/X\x{123}{2,4}?b/utf
\= Expect no match
Xx\=ps
No match
X\x{123}x\=ps
@ -959,6 +940,7 @@ No match
No match
/X\x{123}{2,4}+b/utf
\= Expect no match
Xx\=ps
No match
X\x{123}x\=ps
@ -1745,6 +1727,7 @@ Last code unit = 'y'
First code unit = 'x'
Last code unit = 'y'
Subject length lower bound = 2
/(?<!^)ETA/utf
\= Expect no match
ETA
@ -1765,7 +1748,7 @@ No match
Ket
End
------------------------------------------------------------------
/\ud800/utf,alt_bsux,allow_empty_class,match_unset_backref
Failed: error 173 at offset 6: disallowed Unicode code point (>= 0xd800 && <= 0xdfff)
@ -1874,8 +1857,7 @@ Subject length lower bound = 1
0: 1234
12-34
0: 12-34
12+\x{661}-34
0: 12+\x{661}-34
12+\x{661}-34
0: 12+\x{661}-34
\= Expect no match
abcd
@ -1995,8 +1977,7 @@ No match
0: \x{2069}
/^\p{Cs}/utf
\x{dfff}\=no_utf_check
0: \x{dfff}
\x{dfff}\=no_utf_check
0: \x{dfff}
\= Expect no match
\x{09f}
@ -2021,8 +2002,7 @@ No match
/^\p{Sc}+/utf
$\x{a2}\x{a3}\x{a4}\x{a5}\x{a6}
0: $\x{a2}\x{a3}\x{a4}\x{a5}
\x{9f2}
0: \x{9f2}
\x{9f2}
0: \x{9f2}
\= Expect no match
X
@ -2039,8 +2019,7 @@ No match
0: \x{1680}
\x{2000}
0: \x{2000}
\x{2001}
0: \x{2001}
\x{2001}
0: \x{2001}
\= Expect no match
\x{2028}
@ -2052,16 +2031,14 @@ No match
# properties and has changed how it behaves for caseless matching.
/\p{^Lu}/i,utf
1234
0: 1
1234
0: 1
\= Expect no match
ABC
No match
/\P{Lu}/i,utf
1234
0: 1
1234
0: 1
\= Expect no match
ABC
@ -2070,8 +2047,7 @@ No match
/\p{Ll}/i,utf
a
0: a
Az
0: z
Az
0: z
\= Expect no match
ABC
@ -2080,8 +2056,7 @@ No match
/\p{Lu}/i,utf
A
0: A
a\x{10a0}B
0: \x{10a0}
a\x{10a0}B
0: \x{10a0}
\= Expect no match
a
@ -2092,8 +2067,7 @@ No match
/\p{Lu}/i,utf
A
0: A
aZ
0: Z
aZ
0: Z
\= Expect no match
abc
@ -2182,16 +2156,14 @@ No match
0: \x{6ca}
\x{a6c}
0: \x{a6c}
\x{10a7}
0: \x{10a7}
\x{10a7}
0: \x{10a7}
\= Expect no match
_ABC
No match
/^\p{Xan}+/utf
ABCD1234\x{6ca}\x{a6c}\x{10a7}_
0: ABCD1234\x{6ca}\x{a6c}\x{10a7}
ABCD1234\x{6ca}\x{a6c}\x{10a7}_
0: ABCD1234\x{6ca}\x{a6c}\x{10a7}
\= Expect no match
_ABC
@ -2222,16 +2194,14 @@ No match
0: \x{6ca}
\x{a6c}
0: \x{a6c}
\x{10a7}
0: \x{10a7}
\x{10a7}
0: \x{10a7}
\= Expect no match
_ABC
No match
/^[\p{Xan}]+/utf
ABCD1234\x{6ca}\x{a6c}\x{10a7}_
0: ABCD1234\x{6ca}\x{a6c}\x{10a7}
ABCD1234\x{6ca}\x{a6c}\x{10a7}_
0: ABCD1234\x{6ca}\x{a6c}\x{10a7}
\= Expect no match
_ABC
@ -2240,8 +2210,7 @@ No match
/^>\p{Xsp}/utf
>\x{1680}\x{2028}\x{0b}
0: >\x{1680}
>\x{a0}
0: >\x{a0}
>\x{a0}
0: >\x{a0}
\= Expect no match
\x{0b}
@ -2278,8 +2247,7 @@ No match
/^>\p{Xps}/utf
>\x{1680}\x{2028}\x{0b}
0: >\x{1680}
>\x{a0}
0: >\x{a0}
>\x{a0}
0: >\x{a0}
\= Expect no match
\x{0b}
@ -2324,8 +2292,7 @@ No match
0: \x{a6c}
\x{10a7}
0: \x{10a7}
_ABC
0: _
_ABC
0: _
\= Expect no match
[]
@ -2362,8 +2329,7 @@ No match
0: \x{a6c}
\x{10a7}
0: \x{10a7}
_ABC
0: _
_ABC
0: _
\= Expect no match
[]
@ -2630,8 +2596,7 @@ No match
# Without PCRE_UCP, non-ASCII always fail, even if < 256
/\b...\B/utf
abc_
0: abc
abc_
0: abc
\= Expect no match
\x{37e}abc\x{376}
@ -2825,10 +2790,12 @@ No match
------------------------------------------------------------------
# These behaved oddly in Perl, so they are kept in this test
/(\x{23a}\x{23a}\x{23a})?\1/i,utf
\= Expect no match
\x{23a}\x{23a}\x{23a}\x{2c65}\x{2c65}
No match
/(ȺȺȺ)?\1/i,utf
\= Expect no match
ȺȺȺⱥⱥ
@ -2843,10 +2810,12 @@ No match
ȺȺȺⱥⱥⱥ
0: \x{23a}\x{23a}\x{23a}\x{2c65}\x{2c65}\x{2c65}
1: \x{23a}\x{23a}\x{23a}
/(\x{23a}\x{23a}\x{23a})\1/i,utf
\= Expect no match
\x{23a}\x{23a}\x{23a}\x{2c65}\x{2c65}
No match
/(ȺȺȺ)\1/i,utf
\= Expect no match
ȺȺȺⱥⱥ
@ -2887,8 +2856,7 @@ No match
/^[\p{Batak}]/utf
\x{1bc0}
0: \x{1bc0}
\x{1bff}
0: \x{1bff}
\x{1bff}
0: \x{1bff}
\= Expect no match
\x{1bf4}
@ -2897,8 +2865,7 @@ No match
/^[\p{Brahmi}]/utf
\x{11000}
0: \x{11000}
\x{1106f}
0: \x{1106f}
\x{1106f}
0: \x{1106f}
\= Expect no match
\x{1104e}
@ -2907,8 +2874,7 @@ No match
/^[\p{Mandaic}]/utf
\x{840}
0: \x{840}
\x{85e}
0: \x{85e}
\x{85e}
0: \x{85e}
\= Expect no match
\x{85c}
@ -2933,14 +2899,10 @@ No match
0: \x{301}
/^a\X41z/alt_bsux,allow_empty_class,match_unset_backref,dupnames
aX41z
0: aX41z
aX41z
0: aX41z
\= Expect no match
aAz
No match
/(?<=ab\Cde)X/utf
No match
/\X/
@ -3138,8 +3100,7 @@ Subject length lower bound = 3
\x{3a3}\x{3c3}\x{3c2}\x{3a3}\x{3c3}\x{3c2}
0: \x{3a3}\x{3c3}\x{3c2}\x{3a3}\x{3c3}\x{3c2}
0+
/\x{3a3}++./i,utf,aftertext
/\x{3a3}++./i,utf,aftertext
\= Expect no match
\x{3a3}\x{3c3}\x{3c2}\x{3a3}\x{3c3}\x{3c2}
@ -3179,24 +3140,29 @@ No match
clist 0053 0073 017f
/i t
Ket
End
------------------------------------------------------------------
\= Expect no match
ikt
No match
/is+t/i,utf
iSs\x{17f}t
0: iSs\x{17f}t
\= Expect no match
ikt
No match
/is+?t/i,utf
\= Expect no match
ikt
No match
/is?t/i,utf
\= Expect no match
ikt
No match
/is{2}t/i,utf
\= Expect no match
iskt
@ -3211,80 +3177,70 @@ No match
0: @
`abc
0: `
\x{1234}abc
0: \x{1234}
\x{1234}abc
0: \x{1234}
\= Expect no match
abc
No match
/^\p{Xuc}+/utf
$@`\x{a0}\x{1234}\x{e000}**
0: $@`\x{a0}\x{1234}\x{e000}
$@`\x{a0}\x{1234}\x{e000}**
0: $@`\x{a0}\x{1234}\x{e000}
\= Expect no match
\x{9f}
No match
/^\p{Xuc}+?/utf
$@`\x{a0}\x{1234}\x{e000}**
0: $
$@`\x{a0}\x{1234}\x{e000}**
0: $
\= Expect no match
\x{9f}
No match
/^\p{Xuc}+?\*/utf
$@`\x{a0}\x{1234}\x{e000}**
0: $@`\x{a0}\x{1234}\x{e000}*
$@`\x{a0}\x{1234}\x{e000}**
0: $@`\x{a0}\x{1234}\x{e000}*
\= Expect no match
\x{9f}
No match
/^\p{Xuc}++/utf
$@`\x{a0}\x{1234}\x{e000}**
0: $@`\x{a0}\x{1234}\x{e000}
$@`\x{a0}\x{1234}\x{e000}**
0: $@`\x{a0}\x{1234}\x{e000}
\= Expect no match
\x{9f}
No match
/^\p{Xuc}{3,5}/utf
$@`\x{a0}\x{1234}\x{e000}**
0: $@`\x{a0}\x{1234}
$@`\x{a0}\x{1234}\x{e000}**
0: $@`\x{a0}\x{1234}
\= Expect no match
\x{9f}
No match
/^\p{Xuc}{3,5}?/utf
$@`\x{a0}\x{1234}\x{e000}**
0: $@`
$@`\x{a0}\x{1234}\x{e000}**
0: $@`
\= Expect no match
\x{9f}
No match
/^[\p{Xuc}]/utf
$@`\x{a0}\x{1234}\x{e000}**
0: $
$@`\x{a0}\x{1234}\x{e000}**
0: $
\= Expect no match
\x{9f}
No match
/^[\p{Xuc}]+/utf
$@`\x{a0}\x{1234}\x{e000}**
0: $@`\x{a0}\x{1234}\x{e000}
$@`\x{a0}\x{1234}\x{e000}**
0: $@`\x{a0}\x{1234}\x{e000}
\= Expect no match
\x{9f}
No match
/^\P{Xuc}/utf
abc
0: a
abc
0: a
\= Expect no match
$abc
@ -3297,8 +3253,7 @@ No match
No match
/^[\P{Xuc}]/utf
abc
0: a
abc
0: a
\= Expect no match
$abc
@ -3843,7 +3798,7 @@ No match
[ab\p{L}]{2,3}+
Ket
End
------------------------------------------------------------------
------------------------------------------------------------------
/\D+\X \d+\X \S+\X \s+\X \W+\X \w+\X \R+\X \H+\X \h+\X \V+\X \v+\X a+\X \n+\X .+\X/Bx
------------------------------------------------------------------
@ -3858,8 +3813,6 @@ No match
extuni
\W+
extuni
\w+
extuni
\w+
extuni
\R+
@ -3898,7 +3851,7 @@ No match
/m $
Ket
End
------------------------------------------------------------------
------------------------------------------------------------------
/\X+\D \X+\d \X+\S \X+\s \X+\W \X+\w \X+. \X+\R \X+\H \X+\h \X+\V \X+\v \X+\X \X+\Z \X+\z \X+$/Bx
------------------------------------------------------------------
@ -3916,8 +3869,6 @@ No match
extuni+
\w
extuni+
Any
extuni+
Any
extuni+
\R
@ -4003,12 +3954,9 @@ Subject length lower bound = 1
/ábc/utf,replace=XሴZ
123ábc123
1: 123X\x{1234}Z123
/(?<=abc)(|def)/g,utf,replace=<$0>
123abcáyzabcdef789abcሴqr
4: 123abc<>\x{e1}yzabc<><def>789abc<>\x{1234}qr
/[^\xff]((?1))/utf,debug
4: 123abc<>\x{e1}yzabc<><def>789abc<>\x{1234}qr
/[A-`]/iB,utf
@ -4050,4 +3998,238 @@ Failed: error 122 at offset 1227: unmatched closing parenthesis
"\xa\xf<(.\pZ*\P{Xwd}+^\xa8\3'3yq.::?(?J:()\xd1+!~:3'(8?:)':(?'d'(?'d'^u]!.+.+\\A\Ah(n+?9){7}+\K;(?'X'u'(?'c'(?'z'(?<y>\xb::\xf0'|\xd3(\xae?'w(z\x8?P>l)\x8?P>a)'\H\R\xd1+!!~:3'(?:h$N{26875}\W+?\\=D{2}\x89(?i:Uy0\N({2\xa(\v\x85*){y*\A(()\p{L}+?\P{^Xan}'+?\xff\+pS\?|).{;y*\A(()\p{L}+?\8}\d?1(|)(/1){7}.+[Lp{Me}].\s\xdcC*?(?(<y>))(?<!^)$C((;*?(R))+(\xbf(R))\x8a\X*?\x8a\xb\xd1^9\3*+(\xc1,\k'R'\xb4)\xcc(z\z(?J)(?'X'\x1b(\xb\xd1^9\?'3*+P{^Xan}+?\xff\+(\xc1.]k+\xb'Pm'\xb4)\xcc4f\xa7'\xd1V(?i:U,{2,2})'(?'X'))?-%--\x95$9*\4'|\xd1(\x9c''%\x94$9)#(?'R')3\x7?('P\xed7'\xa8\xb1^u\xeaw\1\0\0\(|(?1){7}.+[\p{Me}].\s\xdcC*^\x14?(?(<y>))(?<!^)$C((;*?(R*?))+(?(R)\x8a\X*?\x8a\xb\xd1^9\3*+|(\xc1,\k'R'\xb4)\xcc! z)\z(?JJ)(?'X';(\xb\xd1^9\?'3*+(\xc1.]k+\xb'Pm'\xb4))':(?'d')(?'RD'(d')|)|$)'|(?<x>\g{d});\g{x}\x11\g{d}\x81\|$((?'X'\'X'(?'W''\x92()'9'\x83*))\xba*\!?^ <){)':;\xcc4'\xd1'(?'X'28))?-%--\x95$9*\4'|\xd1((''e\x94*$9:)*#(?'R')3)\x7?('P\xed')\\x16:;()\x1e\x10*:(?<y>)\xd1+0!~:(?)'d'E:yD!\s(?'R'\x1e;\x10:U))|'\x9g!\xb0*){)\\x16:;()\x1e\x10\x87*:(?<y>)\xd1+!~:(?)'}'\d'E:yD!\s(?'R'\x1e;\x10:U))|'))|)g!\xb0*R+9{29+)#(?'P'})*?pS\{3,}\x85,{0,}l{*UTF)(\xe{7}){3722,{9,}d{2,?|))|{)\(A?&d}}{\xa,}2}){3,}7,l{)22}(,}l:7{2,4}}29\x19+)#?'P'})*v?))\x5"
Failed: error 122 at offset 1227: unmatched closing parenthesis
/$(&.+[\p{Me}].\s\xdcC*?(?(<y>))(?<!^)$C((;*?(R))+(?(R)){0,6}?|){12\x8a\X*?\x8a\x0b\xd1^9\3*+(\xc1,\k'P'\xb4)\xcc(z\z(?JJ)(?'X'8};(\x0b\xd1^9\?'3*+(\xc1.]k+\x0b'Pm'\xb4\xcc4'\xd1'(?'X'))?-%--\x95$9*\4'|\xd1(''%\x95*$9)#(?'R')3\x07?('P\xed')\\x16:;()\x1e\x10*:(?<y>)\xd1+!~:(?)''(d'E:yD!\s(?'R'\x1e;\x10:U))|')g!\xb0*){29+))#(?'P'})*?/
"(*UTF)(*UCP)(.UTF).+X(\V+;\^(\D|)!999}(?(?C{7(?C')\H*\S*/^\x5\xa\\xd3\x85n?(;\D*(?m).[^mH+((*UCP)(*U:F)})(?!^)(?'"
Failed: error 162 at offset 113: subpattern name expected
/[\pS#moq]/
=
0: =
/(*:a\x{12345}b\t(d\)c)xxx/utf,alt_verbnames,mark
cxxxz
0: xxx
MK: a\x{12345}b\x{09}(d)c
/abcd/utf,replace=x\x{824}y\o{3333}z(\Q12\$34$$\x34\E5$$),substitute_extended
abcd
1: x\x{824}y\x{6db}z(12\$34$$\x345$)
/a(\x{e0}\x{101})(\x{c0}\x{102})/utf,replace=a\u$1\U$1\E$1\l$2\L$2\Eab\U\x{e0}\x{101}\L\x{d0}\x{160}\EDone,substitute_extended
a\x{e0}\x{101}\x{c0}\x{102}
1: a\x{c0}\x{101}\x{c0}\x{100}\x{e0}\x{101}\x{e0}\x{102}\x{e0}\x{103}ab\x{c0}\x{100}\x{f0}\x{161}Done
/((?<digit>\d)|(?<letter>\p{L}))/g,substitute_extended,replace=<${digit:+digit; :not digit; }${letter:+letter:not a letter}>
ab12cde
7: <not digit; letter><not digit; letter><digit; not a letter><digit; not a letter><not digit; letter><not digit; letter><not digit; letter>
/(*UCP)(*UTF)[[:>:]]X/B
------------------------------------------------------------------
Bra
\b
AssertB
Reverse
prop Xwd
Ket
X
Ket
End
------------------------------------------------------------------
/abc/utf,replace=xyz
abc\=zero_terminate
1: xyz
/a[[:punct:]b]/ucp,bincode
------------------------------------------------------------------
Bra
a
[b[:punct:]]
Ket
End
------------------------------------------------------------------
/a[[:punct:]b]/utf,ucp,bincode
------------------------------------------------------------------
Bra
a
[b[:punct:]]
Ket
End
------------------------------------------------------------------
/a[b[:punct:]]/utf,ucp,bincode
------------------------------------------------------------------
Bra
a
[b[:punct:]]
Ket
End
------------------------------------------------------------------
/[[:^ascii:]]/utf,ucp,bincode
------------------------------------------------------------------
Bra
[\x80-\xff] (neg)
Ket
End
------------------------------------------------------------------
/[[:^ascii:]\w]/utf,ucp,bincode
------------------------------------------------------------------
Bra
[\x80-\xff\p{Xwd}\x{100}-\x{10ffff}]
Ket
End
------------------------------------------------------------------
/[\w[:^ascii:]]/utf,ucp,bincode
------------------------------------------------------------------
Bra
[\x80-\xff\p{Xwd}\x{100}-\x{10ffff}]
Ket
End
------------------------------------------------------------------
/[^[:ascii:]\W]/utf,ucp,bincode
------------------------------------------------------------------
Bra
[^\x00-\x7f\P{Xwd}]
Ket
End
------------------------------------------------------------------
\x{de}
0: \x{de}
\x{200}
0: \x{200}
\= Expect no match
\x{300}
No match
\x{37e}
No match
/[[:^ascii:]a]/utf,ucp,bincode
------------------------------------------------------------------
Bra
[a\x80-\xff] (neg)
Ket
End
------------------------------------------------------------------
/L(?#(|++<!(2)?/B,utf,no_auto_possess,auto_callout
------------------------------------------------------------------
Bra
Callout 255 0 14
L?
Callout 255 14 0
Ket
End
------------------------------------------------------------------
/L(?#(|++<!(2)?/B,utf,ucp,auto_callout
------------------------------------------------------------------
Bra
Callout 255 0 14
L?+
Callout 255 14 0
Ket
End
------------------------------------------------------------------
/(*UTF)C\x09((?<!'(?x)!*H? #\xcc\x9a[^$]/
Failed: error 114 at offset 39: missing closing parenthesis
/[\D]/utf
\x{1d7cf}
0: \x{1d7cf}
/[\D\P{Nd}]/utf
\x{1d7cf}
0: \x{1d7cf}
/[^\D]/utf
a9b
0: 9
\= Expect no match
\x{1d7cf}
No match
/[^\D\P{Nd}]/utf
a9b
0: 9
\x{1d7cf}
0: \x{1d7cf}
\= Expect no match
\x{10000}
No match
# Hex uses pattern length, not zero-terminated. This tests for overrunning
# the given length of a pattern.
/'(*UTF)'/hex
/'#('/hex,extended,utf
/a(?<=A\XB)/utf
Failed: error 125 at offset 1: lookbehind assertion is not fixed length
/ab(?<=A\RB)/utf
Failed: error 125 at offset 2: lookbehind assertion is not fixed length
/../utf,auto_callout
\n\x{123}\x{123}\x{123}\x{123}
--->\x{0a}\x{123}\x{123}\x{123}\x{123}
+0 ^ .
+0 ^ .
+1 ^ ^ .
+2 ^ ^
0: \x{123}\x{123}
# This tests processing wide characters in extended mode.
/XȀ/x,utf
# These three test a bug fix that was not clearing up after a locale setting
# when the test or a subsequent one matched a wide character.
//locale=C
/[\P{Yi}]/utf
\x{2f000}
0: \x{2f000}
/[\P{Yi}]/utf,locale=C
\x{2f000}
0: \x{2f000}
/^(?<!(?=􃡜))/B,utf
------------------------------------------------------------------
Bra
^
AssertB not
Assert
\x{10385c}
Ket
Ket
Ket
End
------------------------------------------------------------------
# Horizontal and vertical space lists ignore caseless
/[\HH]/Bi,utf
------------------------------------------------------------------
Bra
[\x00-\x08\x0a-\x1f!-\x9f\xa1-\xff\x{100}-\x{167f}\x{1681}-\x{180d}\x{180f}-\x{1fff}\x{200b}-\x{202e}\x{2030}-\x{205e}\x{2060}-\x{2fff}\x{3001}-\x{10ffff}]
Ket
End
------------------------------------------------------------------
/[^\HH]/Bi,utf
------------------------------------------------------------------
Bra
[^\x00-\x08\x0a-\x1f!-\x9f\xa1-\xff\x{100}-\x{167f}\x{1681}-\x{180d}\x{180f}-\x{1fff}\x{200b}-\x{202e}\x{2030}-\x{205e}\x{2060}-\x{2fff}\x{3001}-\x{10ffff}]
Ket
End
------------------------------------------------------------------

File diff suppressed because it is too large Load Diff

File diff suppressed because it is too large Load Diff

View File

@ -1,8 +1,11 @@
# These are a few representative patterns whose lengths and offsets are to be
# shown when the link size is 2. This is just a doublecheck test to ensure the
# sizes don't go horribly wrong when something is changed. The pattern contents
# are all themselves checked in other tests. Unicode, including property
# support, is required for these tests.
# There are two sorts of patterns in this test. A number of them are
# representative patterns whose lengths and offsets are checked. This is just a
# doublecheck test to ensure the sizes don't go horribly wrong when something
# is changed. The operation of these patterns is checked in other tests.
#
# This file also contains tests whose output varies with code unit size and/or
# link size. Unicode support is required for these tests. There are separate
# output files for each code unit size and link size.
#pattern fullbincode,memory
@ -378,7 +381,7 @@ Options: utf
First code unit = 'A'
Last code unit = '.'
Subject length lower bound = 4
/\x{D55c}\x{ad6d}\x{C5B4}/I,utf
Memory allocation (code space): 22
------------------------------------------------------------------
@ -842,11 +845,185 @@ Memory allocation (code space): 14
# Check the absolute limit on nesting (?| etc. This varies with code unit
# width because the workspace is a different number of bytes. It will fail
# in 8-bit and 16-bit but not in 32-bit.
# with link size 2 in 8-bit and 16-bit but not in 32-bit.
/(?|(?|(?J:(?|(?x:(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|
)))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))
/parens_nest_limit=1000,-fullbincode
Failed: error 184 at offset 1540: (?| and/or (?J: or (?x: parentheses are too deeply nested
# Use "expand" to create some very long patterns with nested parentheses, in
# order to test workspace overflow. Again, this varies with code unit width,
# and even when it fails in two modes, the error offset differs. It also varies
# with link size - hence multiple tests with different values.
/(?'ABC'\[[bar](]{792}*THEN:\[A]{255}\[)]{793}/expand,-fullbincode,parens_nest_limit=1000
Failed: error 186 at offset 5813: regular expression is too complicated
/(?'ABC'\[[bar](]{793}*THEN:\[A]{255}\[)]{794}/expand,-fullbincode,parens_nest_limit=1000
Failed: error 186 at offset 5820: regular expression is too complicated
/(?'ABC'\[[bar](]{1793}*THEN:\[A]{255}\[)]{1794}/expand,-fullbincode,parens_nest_limit=2000
Failed: error 186 at offset 12820: regular expression is too complicated
/(?(1)(?1)){8,}+()/debug
------------------------------------------------------------------
0 79 Bra
2 70 Once
4 6 Cond
6 1 Cond ref
8 74 Recurse
10 6 Ket
12 6 Cond
14 1 Cond ref
16 74 Recurse
18 6 Ket
20 6 Cond
22 1 Cond ref
24 74 Recurse
26 6 Ket
28 6 Cond
30 1 Cond ref
32 74 Recurse
34 6 Ket
36 6 Cond
38 1 Cond ref
40 74 Recurse
42 6 Ket
44 6 Cond
46 1 Cond ref
48 74 Recurse
50 6 Ket
52 6 Cond
54 1 Cond ref
56 74 Recurse
58 6 Ket
60 10 SBraPos
62 6 SCond
64 1 Cond ref
66 74 Recurse
68 6 Ket
70 10 KetRpos
72 70 Ket
74 3 CBra 1
77 3 Ket
79 79 Ket
81 End
------------------------------------------------------------------
Capturing subpattern count = 1
Max back reference = 1
May match empty string
Subject length lower bound = 0
abcd
0:
1:
/(?(1)|a(?1)b){2,}+()/debug
------------------------------------------------------------------
0 43 Bra
2 34 Once
4 4 Cond
6 1 Cond ref
8 8 Alt
10 a
12 38 Recurse
14 b
16 12 Ket
18 16 SBraPos
20 4 SCond
22 1 Cond ref
24 8 Alt
26 a
28 38 Recurse
30 b
32 12 Ket
34 16 KetRpos
36 34 Ket
38 3 CBra 1
41 3 Ket
43 43 Ket
45 End
------------------------------------------------------------------
Capturing subpattern count = 1
Max back reference = 1
May match empty string
Subject length lower bound = 0
abcde
No match
/((?1)(?2)(?3)(?4)(?5)(?6)(?7)(?8)(?9)(?9)(?8)(?7)(?6)(?5)(?4)(?3)(?2)(?1)(?0)){2,}()()()()()()()()()/debug
------------------------------------------------------------------
0 133 Bra
2 41 CBra 1
5 2 Recurse
7 88 Recurse
9 93 Recurse
11 98 Recurse
13 103 Recurse
15 108 Recurse
17 113 Recurse
19 118 Recurse
21 123 Recurse
23 123 Recurse
25 118 Recurse
27 113 Recurse
29 108 Recurse
31 103 Recurse
33 98 Recurse
35 93 Recurse
37 88 Recurse
39 2 Recurse
41 0 Recurse
43 41 Ket
45 41 SCBra 1
48 2 Recurse
50 88 Recurse
52 93 Recurse
54 98 Recurse
56 103 Recurse
58 108 Recurse
60 113 Recurse
62 118 Recurse
64 123 Recurse
66 123 Recurse
68 118 Recurse
70 113 Recurse
72 108 Recurse
74 103 Recurse
76 98 Recurse
78 93 Recurse
80 88 Recurse
82 2 Recurse
84 0 Recurse
86 41 KetRmax
88 3 CBra 2
91 3 Ket
93 3 CBra 3
96 3 Ket
98 3 CBra 4
101 3 Ket
103 3 CBra 5
106 3 Ket
108 3 CBra 6
111 3 Ket
113 3 CBra 7
116 3 Ket
118 3 CBra 8
121 3 Ket
123 3 CBra 9
126 3 Ket
128 3 CBra 10
131 3 Ket
133 133 Ket
135 End
------------------------------------------------------------------
Capturing subpattern count = 10
May match empty string
Subject length lower bound = 0
/([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00](*ACCEPT)/
Failed: error 114 at offset 509: missing closing parenthesis
/([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00](*ACCEPT)))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))/-fullbincode
# End of testinput8

1026
pcre2/testdata/testoutput8-16-3 vendored Normal file

File diff suppressed because it is too large Load Diff

View File

@ -1,8 +1,11 @@
# These are a few representative patterns whose lengths and offsets are to be
# shown when the link size is 2. This is just a doublecheck test to ensure the
# sizes don't go horribly wrong when something is changed. The pattern contents
# are all themselves checked in other tests. Unicode, including property
# support, is required for these tests.
# There are two sorts of patterns in this test. A number of them are
# representative patterns whose lengths and offsets are checked. This is just a
# doublecheck test to ensure the sizes don't go horribly wrong when something
# is changed. The operation of these patterns is checked in other tests.
#
# This file also contains tests whose output varies with code unit size and/or
# link size. Unicode support is required for these tests. There are separate
# output files for each code unit size and link size.
#pattern fullbincode,memory
@ -378,7 +381,7 @@ Options: utf
First code unit = 'A'
Last code unit = '.'
Subject length lower bound = 4
/\x{D55c}\x{ad6d}\x{C5B4}/I,utf
Memory allocation (code space): 44
------------------------------------------------------------------
@ -842,10 +845,184 @@ Memory allocation (code space): 28
# Check the absolute limit on nesting (?| etc. This varies with code unit
# width because the workspace is a different number of bytes. It will fail
# in 8-bit and 16-bit but not in 32-bit.
# with link size 2 in 8-bit and 16-bit but not in 32-bit.
/(?|(?|(?J:(?|(?x:(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|
)))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))
/parens_nest_limit=1000,-fullbincode
# Use "expand" to create some very long patterns with nested parentheses, in
# order to test workspace overflow. Again, this varies with code unit width,
# and even when it fails in two modes, the error offset differs. It also varies
# with link size - hence multiple tests with different values.
/(?'ABC'\[[bar](]{792}*THEN:\[A]{255}\[)]{793}/expand,-fullbincode,parens_nest_limit=1000
Failed: error 186 at offset 5813: regular expression is too complicated
/(?'ABC'\[[bar](]{793}*THEN:\[A]{255}\[)]{794}/expand,-fullbincode,parens_nest_limit=1000
Failed: error 186 at offset 5820: regular expression is too complicated
/(?'ABC'\[[bar](]{1793}*THEN:\[A]{255}\[)]{1794}/expand,-fullbincode,parens_nest_limit=2000
Failed: error 186 at offset 12820: regular expression is too complicated
/(?(1)(?1)){8,}+()/debug
------------------------------------------------------------------
0 79 Bra
2 70 Once
4 6 Cond
6 1 Cond ref
8 74 Recurse
10 6 Ket
12 6 Cond
14 1 Cond ref
16 74 Recurse
18 6 Ket
20 6 Cond
22 1 Cond ref
24 74 Recurse
26 6 Ket
28 6 Cond
30 1 Cond ref
32 74 Recurse
34 6 Ket
36 6 Cond
38 1 Cond ref
40 74 Recurse
42 6 Ket
44 6 Cond
46 1 Cond ref
48 74 Recurse
50 6 Ket
52 6 Cond
54 1 Cond ref
56 74 Recurse
58 6 Ket
60 10 SBraPos
62 6 SCond
64 1 Cond ref
66 74 Recurse
68 6 Ket
70 10 KetRpos
72 70 Ket
74 3 CBra 1
77 3 Ket
79 79 Ket
81 End
------------------------------------------------------------------
Capturing subpattern count = 1
Max back reference = 1
May match empty string
Subject length lower bound = 0
abcd
0:
1:
/(?(1)|a(?1)b){2,}+()/debug
------------------------------------------------------------------
0 43 Bra
2 34 Once
4 4 Cond
6 1 Cond ref
8 8 Alt
10 a
12 38 Recurse
14 b
16 12 Ket
18 16 SBraPos
20 4 SCond
22 1 Cond ref
24 8 Alt
26 a
28 38 Recurse
30 b
32 12 Ket
34 16 KetRpos
36 34 Ket
38 3 CBra 1
41 3 Ket
43 43 Ket
45 End
------------------------------------------------------------------
Capturing subpattern count = 1
Max back reference = 1
May match empty string
Subject length lower bound = 0
abcde
No match
/((?1)(?2)(?3)(?4)(?5)(?6)(?7)(?8)(?9)(?9)(?8)(?7)(?6)(?5)(?4)(?3)(?2)(?1)(?0)){2,}()()()()()()()()()/debug
------------------------------------------------------------------
0 133 Bra
2 41 CBra 1
5 2 Recurse
7 88 Recurse
9 93 Recurse
11 98 Recurse
13 103 Recurse
15 108 Recurse
17 113 Recurse
19 118 Recurse
21 123 Recurse
23 123 Recurse
25 118 Recurse
27 113 Recurse
29 108 Recurse
31 103 Recurse
33 98 Recurse
35 93 Recurse
37 88 Recurse
39 2 Recurse
41 0 Recurse
43 41 Ket
45 41 SCBra 1
48 2 Recurse
50 88 Recurse
52 93 Recurse
54 98 Recurse
56 103 Recurse
58 108 Recurse
60 113 Recurse
62 118 Recurse
64 123 Recurse
66 123 Recurse
68 118 Recurse
70 113 Recurse
72 108 Recurse
74 103 Recurse
76 98 Recurse
78 93 Recurse
80 88 Recurse
82 2 Recurse
84 0 Recurse
86 41 KetRmax
88 3 CBra 2
91 3 Ket
93 3 CBra 3
96 3 Ket
98 3 CBra 4
101 3 Ket
103 3 CBra 5
106 3 Ket
108 3 CBra 6
111 3 Ket
113 3 CBra 7
116 3 Ket
118 3 CBra 8
121 3 Ket
123 3 CBra 9
126 3 Ket
128 3 CBra 10
131 3 Ket
133 133 Ket
135 End
------------------------------------------------------------------
Capturing subpattern count = 10
May match empty string
Subject length lower bound = 0
/([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00](*ACCEPT)/
Failed: error 114 at offset 509: missing closing parenthesis
/([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00](*ACCEPT)))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))/-fullbincode
# End of testinput8

1028
pcre2/testdata/testoutput8-32-3 vendored Normal file

File diff suppressed because it is too large Load Diff

1028
pcre2/testdata/testoutput8-32-4 vendored Normal file

File diff suppressed because it is too large Load Diff

View File

@ -1,8 +1,11 @@
# These are a few representative patterns whose lengths and offsets are to be
# shown when the link size is 2. This is just a doublecheck test to ensure the
# sizes don't go horribly wrong when something is changed. The pattern contents
# are all themselves checked in other tests. Unicode, including property
# support, is required for these tests.
# There are two sorts of patterns in this test. A number of them are
# representative patterns whose lengths and offsets are checked. This is just a
# doublecheck test to ensure the sizes don't go horribly wrong when something
# is changed. The operation of these patterns is checked in other tests.
#
# This file also contains tests whose output varies with code unit size and/or
# link size. Unicode support is required for these tests. There are separate
# output files for each code unit size and link size.
#pattern fullbincode,memory
@ -378,7 +381,7 @@ Options: utf
First code unit = 'A'
Last code unit = '.'
Subject length lower bound = 4
/\x{D55c}\x{ad6d}\x{C5B4}/I,utf
Memory allocation (code space): 19
------------------------------------------------------------------
@ -842,11 +845,184 @@ Memory allocation (code space): 10
# Check the absolute limit on nesting (?| etc. This varies with code unit
# width because the workspace is a different number of bytes. It will fail
# in 8-bit and 16-bit but not in 32-bit.
# with link size 2 in 8-bit and 16-bit but not in 32-bit.
/(?|(?|(?J:(?|(?x:(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|(?|
)))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))
/parens_nest_limit=1000,-fullbincode
Failed: error 184 at offset 1540: (?| and/or (?J: or (?x: parentheses are too deeply nested
# Use "expand" to create some very long patterns with nested parentheses, in
# order to test workspace overflow. Again, this varies with code unit width,
# and even when it fails in two modes, the error offset differs. It also varies
# with link size - hence multiple tests with different values.
/(?'ABC'\[[bar](]{792}*THEN:\[A]{255}\[)]{793}/expand,-fullbincode,parens_nest_limit=1000
/(?'ABC'\[[bar](]{793}*THEN:\[A]{255}\[)]{794}/expand,-fullbincode,parens_nest_limit=1000
Failed: error 186 at offset 5820: regular expression is too complicated
/(?'ABC'\[[bar](]{1793}*THEN:\[A]{255}\[)]{1794}/expand,-fullbincode,parens_nest_limit=2000
Failed: error 186 at offset 12820: regular expression is too complicated
/(?(1)(?1)){8,}+()/debug
------------------------------------------------------------------
0 119 Bra
3 105 Once
6 9 Cond
9 1 Cond ref
12 111 Recurse
15 9 Ket
18 9 Cond
21 1 Cond ref
24 111 Recurse
27 9 Ket
30 9 Cond
33 1 Cond ref
36 111 Recurse
39 9 Ket
42 9 Cond
45 1 Cond ref
48 111 Recurse
51 9 Ket
54 9 Cond
57 1 Cond ref
60 111 Recurse
63 9 Ket
66 9 Cond
69 1 Cond ref
72 111 Recurse
75 9 Ket
78 9 Cond
81 1 Cond ref
84 111 Recurse
87 9 Ket
90 15 SBraPos
93 9 SCond
96 1 Cond ref
99 111 Recurse
102 9 Ket
105 15 KetRpos
108 105 Ket
111 5 CBra 1
116 5 Ket
119 119 Ket
122 End
------------------------------------------------------------------
Capturing subpattern count = 1
Max back reference = 1
May match empty string
Subject length lower bound = 0
abcd
0:
1:
/(?(1)|a(?1)b){2,}+()/debug
------------------------------------------------------------------
0 61 Bra
3 47 Once
6 6 Cond
9 1 Cond ref
12 10 Alt
15 a
17 53 Recurse
20 b
22 16 Ket
25 22 SBraPos
28 6 SCond
31 1 Cond ref
34 10 Alt
37 a
39 53 Recurse
42 b
44 16 Ket
47 22 KetRpos
50 47 Ket
53 5 CBra 1
58 5 Ket
61 61 Ket
64 End
------------------------------------------------------------------
Capturing subpattern count = 1
Max back reference = 1
May match empty string
Subject length lower bound = 0
abcde
No match
/((?1)(?2)(?3)(?4)(?5)(?6)(?7)(?8)(?9)(?9)(?8)(?7)(?6)(?5)(?4)(?3)(?2)(?1)(?0)){2,}()()()()()()()()()/debug
------------------------------------------------------------------
0 205 Bra
3 62 CBra 1
8 3 Recurse
11 133 Recurse
14 141 Recurse
17 149 Recurse
20 157 Recurse
23 165 Recurse
26 173 Recurse
29 181 Recurse
32 189 Recurse
35 189 Recurse
38 181 Recurse
41 173 Recurse
44 165 Recurse
47 157 Recurse
50 149 Recurse
53 141 Recurse
56 133 Recurse
59 3 Recurse
62 0 Recurse
65 62 Ket
68 62 SCBra 1
73 3 Recurse
76 133 Recurse
79 141 Recurse
82 149 Recurse
85 157 Recurse
88 165 Recurse
91 173 Recurse
94 181 Recurse
97 189 Recurse
100 189 Recurse
103 181 Recurse
106 173 Recurse
109 165 Recurse
112 157 Recurse
115 149 Recurse
118 141 Recurse
121 133 Recurse
124 3 Recurse
127 0 Recurse
130 62 KetRmax
133 5 CBra 2
138 5 Ket
141 5 CBra 3
146 5 Ket
149 5 CBra 4
154 5 Ket
157 5 CBra 5
162 5 Ket
165 5 CBra 6
170 5 Ket
173 5 CBra 7
178 5 Ket
181 5 CBra 8
186 5 Ket
189 5 CBra 9
194 5 Ket
197 5 CBra 10
202 5 Ket
205 205 Ket
208 End
------------------------------------------------------------------
Capturing subpattern count = 10
May match empty string
Subject length lower bound = 0
/([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00](*ACCEPT)/
Failed: error 114 at offset 509: missing closing parenthesis
/([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00]([00](*ACCEPT)))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))))/-fullbincode
# End of testinput8

1026
pcre2/testdata/testoutput8-8-3 vendored Normal file

File diff suppressed because it is too large Load Diff

1026
pcre2/testdata/testoutput8-8-4 vendored Normal file

File diff suppressed because it is too large Load Diff

View File

@ -2,14 +2,10 @@
# UTF-8 or Unicode property support. */
#forbid_utf
#newline_default lf any anycrlf
/a\Cb/
aXb
0: aXb
a\nb
0: a\x0ab
** Failers (too big char)
No match
/ab/
\= Expect error message (too big char) and no match
A\x{123}B
** Character \x{123} is greater than 255 and UTF-8 mode is not enabled.
** Truncation will probably give the wrong result.
@ -311,22 +307,31 @@ Subject length lower bound = 1
------------------------------------------------------------------
/\777/I
Failed: error 151 at offset 3: octal value is greater than \377 in 8-bit non-UTF-8 mode
Failed: error 151 at offset 4: octal value is greater than \377 in 8-bit non-UTF-8 mode
/(*:0123456789ABCDEF0123456789ABCDEF0123456789ABCDEF0123456789ABCDEF0123456789ABCDEF0123456789ABCDEF0123456789ABCDEF0123456789ABCDEF0123456789ABCDEF0123456789ABCDEF0123456789ABCDEF0123456789ABCDEF0123456789ABCDEF0123456789ABCDEF0123456789ABCDEF0123456789ABCDEF)XX/mark
Failed: error 176 at offset 259: name is too long in (*MARK), (*PRUNE), (*SKIP), or (*THEN)
XX
/(*:0123456789ABCDEF0123456789ABCDEF0123456789ABCDEF0123456789ABCDEF0123456789ABCDEF0123456789ABCDEF0123456789ABCDEF0123456789ABCDEF0123456789ABCDEF0123456789ABCDEF0123456789ABCDEF0123456789ABCDEF0123456789ABCDEF0123456789ABCDEF0123456789ABCDEF0123456789ABCDEF)XX/mark,alt_verbnames
Failed: error 176 at offset 259: name is too long in (*MARK), (*PRUNE), (*SKIP), or (*THEN)
XX
/(*:0123456789ABCDEF0123456789ABCDEF0123456789ABCDEF0123456789ABCDEF0123456789ABCDEF0123456789ABCDEF0123456789ABCDEF0123456789ABCDEF0123456789ABCDEF0123456789ABCDEF0123456789ABCDEF0123456789ABCDEF0123456789ABCDEF0123456789ABCDEF0123456789ABCDEF0123456789ABCDE)XX/mark
XX
0: XX
MK: 0123456789ABCDEF0123456789ABCDEF0123456789ABCDEF0123456789ABCDEF0123456789ABCDEF0123456789ABCDEF0123456789ABCDEF0123456789ABCDEF0123456789ABCDEF0123456789ABCDEF0123456789ABCDEF0123456789ABCDEF0123456789ABCDEF0123456789ABCDEF0123456789ABCDEF0123456789ABCDE
/(*:0123456789ABCDEF0123456789ABCDEF0123456789ABCDEF0123456789ABCDEF0123456789ABCDEF0123456789ABCDEF0123456789ABCDEF0123456789ABCDEF0123456789ABCDEF0123456789ABCDEF0123456789ABCDEF0123456789ABCDEF0123456789ABCDEF0123456789ABCDEF0123456789ABCDEF0123456789ABCDE)XX/mark,alt_verbnames
XX
0: XX
MK: 0123456789ABCDEF0123456789ABCDEF0123456789ABCDEF0123456789ABCDEF0123456789ABCDEF0123456789ABCDEF0123456789ABCDEF0123456789ABCDEF0123456789ABCDEF0123456789ABCDEF0123456789ABCDEF0123456789ABCDEF0123456789ABCDEF0123456789ABCDEF0123456789ABCDEF0123456789ABCDE
/\u0100/alt_bsux,allow_empty_class,match_unset_backref,dupnames
Failed: error 177 at offset 5: character code point value in \u.... sequence is too large
Failed: error 177 at offset 6: character code point value in \u.... sequence is too large
/[\u0100-\u0200]/alt_bsux,allow_empty_class,match_unset_backref,dupnames
Failed: error 177 at offset 6: character code point value in \u.... sequence is too large
Failed: error 177 at offset 7: character code point value in \u.... sequence is too large
/[^\x00-a]{12,}[^b-\xff]*/B
------------------------------------------------------------------
@ -356,4 +361,10 @@ Failed: error 177 at offset 6: character code point value in \u.... sequence is
End
------------------------------------------------------------------
/(*MARK:a\x{100}b)z/alt_verbnames
Failed: error 134 at offset 14: character code point value in \x{} or \o{} is too large
/(*:*++++++++++++''''''''''''''''''''+''+++'+++x+++++++++++++++++++++++++++++++++++(++++++++++++++++++++:++++++%++:''''''''''''''''''''''''+++++++++++++++++++++++++++++++++++++++++++++++++++++-++++++++k+++++++''''+++'+++++++++++++++++++++++''''++++++++++++':ƿ)/
Failed: error 176 at offset 259: name is too long in (*MARK), (*PRUNE), (*SKIP), or (*THEN)
# End of testinput9

15
pcre2/testdata/valgrind-jit.supp vendored Normal file
View File

@ -0,0 +1,15 @@
{
name
Memcheck:Addr16
obj:???
obj:???
obj:???
}
{
name
Memcheck:Cond
obj:???
obj:???
obj:???
}

View File

@ -159,7 +159,7 @@ No match
/[[:alpha:]][[:lower:]][[:upper:]]/IB
------------------------------------------------------------------
Bra
[A-Za-z\x83\x8a\x8c\x8e\x9a\x9c\x9e\x9f\xaa\xb2\xb3\xb5\xb9\xba\xc0-\xd6\xd8-\xf6\xf8-\xff]
[A-Za-z\x83\x8a\x8c\x8e\x9a\x9c\x9e\x9f\xaa\xb5\xba\xc0-\xd6\xd8-\xf6\xf8-\xff]
[a-z\x83\x9a\x9c\x9e\xaa\xb5\xba\xdf-\xf6\xf8-\xff]
[A-Z\x8a\x8c\x8e\x9f\xc0-\xd6\xd8-\xde]
Ket
@ -167,9 +167,9 @@ No match
------------------------------------------------------------------
Capturing subpattern count = 0
Starting code units: A B C D E F G H I J K L M N O P Q R S T U V W X Y Z
a b c d e f g h i j k l m n o p q r s t u v w x y z � � � � � � � � � � �
� � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � �
� � � � � � � � � � � � � � � � � � � � � � � � � � � �
a b c d e f g h i j k l m n o p q r s t u v w x y z � � � � � � � � � � �
� � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � �
� � � � � � � � � � � � � � � � � � � � � � � � �
Subject length lower bound = 3
# End of testinput3