Delphi Inspiration

Components and Applications

User Tools

Site Tools


products:pcre2:history

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

products:pcre2:history [2019/03/07 18:01]
products:pcre2:history [2019/12/24 20:00] (current)
Line 1: Line 1:
 +====== YuPcre2: Version History ======
 +{{page>​header}}
 +=====YuPcre2 1.12.0 – 24 Dec 2019=====
 +
 +  * Add a check for the maximum number of capturing subpatterns,​ which is 65535.
 +  * Improve the invalid utf32 support of the JIT compiler. Now it correctly detects invalid characters in the 0xd800-0xdfff range.
 +  * Fix minor typo bug in JIT compile when \X is used in a non-UTF string.
 +  * Add support for matching in invalid UTF strings to the ''​pcre2_match''​ interpreter,​ and integrate with the existing JIT support via the new ''​PCRE2_MATCH_INVALID_UTF''​ compile-time option.
 +  * Adjust the limit for "must have" code unit searching, in particular, increase it substantially for non-anchored patterns.
 +  * Allow ''​(*ACCEPT)''​ to be quantified, because an ungreedy quantifier with a zero minimum is potentially useful.
 +  * Some changes to the way the minimum subject length is handled:
 +    * When ''​PCRE2_NO_START_OPTIMIZE''​ is set, no minimum length is computed.
 +    * An incorrect minimum length could be calculated for a pattern that contained ''​(*ACCEPT)''​ inside a qualified group whose minimum repetition was zero, for example ''​A(?:​(*ACCEPT))?​B'',​ which incorrectly computed a minimum of 2. The minimum length scan no longer happens for a pattern that contains ''​(*ACCEPT)''​.
 +    * When no minimum length is set by the normal scan, but a first and/or last code unit is recorded, set the minimum to 1 or 2 as appropriate.
 +    * When a pattern contains multiple groups with the same number, a back reference cannot know which one to scan for a minimum length. This used to cause the minimum length finder to give up with no result. Now it treats such references as not adding to the minimum length (which it should have done all along).
 +    * Furthermore,​ the above action now happens only if the back reference is to a group that exists more than once in a pattern instead of any back reference in a pattern with duplicate numbers.
 +  * A ''​(*MARK)''​ value inside a successful condition was not being returned by the interpretive matcher (it was returned by JIT). This bug has been mended.
 +  * The quantifier ''​{1}''​ was always being ignored, but this is incorrect when it is made possessive and applied to an item in parentheses,​ because a parenthesized item may contain multiple branches or other backtracking points, for example ''​(a|ab){1}+c''​ or ''​(a+){1}+a''​.
 +  * DFA matching (using ''​pcre2_dfa_match''​) was not recognising a partial match if the end of the subject was encountered in a lookahead (conditional or otherwise), an atomic group, or a recursion.
 +  * Check for integer overflow when computing lookbehind lengths.
 +  * Implement non-atomic positive lookaround assertions.
 +  * If a lookbehind contained a lookahead that contained another lookbehind within it, the nested lookbehind was not correctly processed. For example, if ''​(?​%%<​%%=(?​=(?​%%<​%%=a)))b''​ was matched to "​ab"​ it gave no match instead of matching "​b"​.
 +  * Implemented ''​pcre2_get_match_data_size''​.
 +  * Two alterations to partial matching:
 +    * The definition of a partial match is slightly changed: if a pattern contains any lookbehinds,​ an empty partial match may be given, because this is another situation where adding characters to the current subject can lead to a full match. Example: ''​c*+(?​%%<​%%=[bc])''​ with subject "​ab"​.
 +  * Similarly, if a pattern could match an empty string, an empty partial match may be given. Example: ''​(?​![ab]).*''​ with subject "​ab"​. This case applies only to ''​PCRE2_PARTIAL_HARD''​.
 +    * An empty string partial hard match can be returned for ''​\z''​ and ''​\Z''​ as it is documented that they shouldn'​t match.
 +  * A branch that started with ''​(*ACCEPT)''​ was not being recognized as one that could match an empty string.
 +  * Corrected ''​pcre2_set_character_tables''​ tables data type: was const ''​C_unsigned_char_num_ptr''​ instead of const ''​C_uint8_t_ptr'',​ as generated by ''​pcre2_maketables''​.
 +  * Upgraded to Unicode 12.1.0.
 +  * If the length of one branch of a group exceeded 65535 (the maximum value that is remembered as a minimum length), the whole group'​s length was incorrectly recorded as 65535, leading to incorrect "no match" when start-up optimizations were in force.
 +  * The "​rightmost consulted character"​ value was not always correct; in particular, if a pattern ended with a negative lookahead, characters that were inspected in that lookahead were not included.
 +  * Add the ''​pcre2_maketables_free''​ function.
 +  * The start-up optimization that looks for a unique initial matching code unit in the interpretive engines uses memchr() in 8-bit mode. When the search is caseless, it was doing so inefficiently,​ which ended up slowing down the match drastically when the subject was very long. The revised code (a) remembers if one case is not found, so it never repeats the search for that case after a bumpalong and (b) when one case has been found, it searches only up to that position for an earlier occurrence of the other case. This fix applies to both interpretive ''​pcre2_match''​ and to ''​pcre2_dfa_match''​.
 +  * While scanning to find the minimum length of a group, if any branch has minimum length zero, there is no need to scan any subsequent branches (a small compile-time performance improvement).
 +  * Add underflow check in JIT which may occur when the value of subject string pointer is close to 0.
 +  * Arrange for classes such as ''​[Aa]''​ which contain just the two cases of the same character, to be treated as a single caseless character. This causes the first and required code unit optimizations to kick in where relevant.
 +  * Improve the bitmap of starting bytes for positive classes that include wide characters, but no property types, in UTF-8 mode. Previously, on encountering such a class, the bits for all bytes greater than $c4 were set, thus specifying any character with codepoint >= $100. Now the only bits that are set are for the relevant bytes that start the wide characters. This can give a noticeable performance improvement.
 +  * If the bitmap of starting code units contains only 1 or 2 bits, replace it with a single starting code unit (1 bit) or a caseless single starting code unit if the two relevant characters are case-partners. This is particularly relevant to the 8-bit library, though it applies to all. It can give a performance boost for patterns such as ''​[Ww]ord''​ and ''​(word|WORD)''​. However, this optimization doesn'​t happen if there is a "​required"​ code unit of the same value (because the search for a "​required"​ code unit starts at the match start for non-unique first code unit patterns, but after a unique first code unit, and patterns such as a*a need the former action).
 +  * If a non-ASCII character was the first in a starting assertion in a caseless match, the "first code unit" optimization did not get the casing right, and the assertion failed to match a character in the other case if it did not start with the same code unit.
 +  * Detect empty matches in JIT.
 +  * Fix a JIT bug which allowed to read the fields of the compiled pattern before its existence is checked.
 +  * Capturing groups that contained recursive back references to themselves are no longer atomic.
 +
 +=====YuPcre2 1.11.0 – 8 Oct 2019=====
 +
 +  * Fix subject buffer overread in JIT when UTF is disabled and ''​\X''​ or ''​\R''​ has a greater than 1 fixed quantifier.
 +  * Added support for callouts from ''​pcre2_substitute''​.
 +  * Fix an xclass matching issue in JIT.
 +  * Implement ''​PCRE2_EXTRA_ESCAPED_CR_IS_LF''​.
 +  * Implement the Perl 5.28 experimental alphabetic names for atomic groups and lookaround assertions, for example, ''​(*pla:​...)''​ and ''​(*atomic:​...)''​. These are characterized by a lower case letter following ''​(*''​.
 +  * Implement the new Perl "​script run" features ''​(*script_run:​...)''​ and ''​(*atomic_script_run:​...)''​ aka ''​(*sr:​...)''​ and ''​(*asr:​...)''​.
 +  * Implement ''​PCRE2_COPY_MATCHED_SUBJECT''​ for ''​pcre2_match''​ (including JIT via ''​pcre2_match''​) and ''​pcre2_dfa_match'',​ but *not* the ''​pcre2_jit_match''​ fast path. Also, when a match fails, set the subject field in the match data to nil for tidiness - none of the substring extractors should reference this after match failure.
 +  * If a pattern started with a subroutine call that had a quantifier with a minimum of zero, an incorrect "match must start with this character"​ could be recorded. Example: ''​(?&​xxx)*ABC(?​%%<​%%xxx>​XYZ)''​ would (incorrectly) expect '​A'​ to be the first character of a match.
 +  * The heap limit checking code in ''​pcre2_dfa_match''​ could suffer from overflow if the heap limit was set very large. This could cause incorrect "heap limit exceeded"​ errors.
 +  * If a pattern started with ''​(*MARK)'',​ ''​(*COMMIT)'',​ ''​(*PRUNE)'',​ ''​(*SKIP)#,​ or ''​(*THEN)''​ followed by ''​^''​ it was not recognized as anchored.''​
 +  * With ''​PCRE2_EXTRA_BAD_ESCAPE_IS_LITERAL''​ set, escape sequences such as ''​\s''​ which are valid in character classes, but not as the end of ranges, were being treated as literals. An example is ''​[_-\s]''​ (but not ''​[\s-_]''​ because that gave an error at the //start// of a range). Now an "​invalid range" error is given independently of ''​PCRE2_EXTRA_BAD_ESCAPE_IS_LITERAL''​.
 +  * ''​PCRE2_EXTRA_BAD_ESCAPE_IS_LITERAL''​ was affecting known escape sequences such as ''​\eX''​ when they appeared invalidly in a character class. Now the option applies only to unrecognized or malformed escape sequences.
 +  * The ''​pcre2_dfa_match''​ function was incorrectly handling conditional version tests such as ''​(?​(VERSION>​=0)...)''​ when the version test was true. Incorrect processing or a crash could result.
 +  * When ''​PCRE2_UTF''​ is set, allow non-ASCII letters and decimal digits in group names, as Perl does.
 +  * Implemented ''​PCRE2_EXTRA_ALT_BSUX''​ to support ECMAScript 6's ''​\u{hhh}''​ construct.
 +  * Compile ''​\p{Any}''​ to be the same as ''​.''​ in ''​PCRE2_DOTALL''​ mode, so that it benefits from auto-anchoring if ''​\p{Any}*''​ starts a pattern.
 +  * Disable SSE2 JIT optimizations in x86 CPUs when SSE2 is not available.
 +  * Improve DIUtils.pas Unicode processing to support Unicode Code Points from $000000 to $10FFFF. Adjust remaining source code accordingly.
 +  * Update DIUtils.pas Unicode functions to Unicode 12.1.0.
 +  * Remove ''​DI.inc''​ include file. Directly link in ''​DICompilers.inc''​ instead.
 +
 +=====YuPcre2 1.10.0 – 7 Mar 2019=====
 +
 +  * Fix: ''​TDIRegEx2_8.Replace''​ and ''​TDIRegEx2_16.Replace''​ did not return the start of the string if StartOffset > 0.
 +  * Adjust ''​TDIRegEx2SearchStream_Enc''​ to DIConverters 1.18.0: Converter functions now use the native unsigned integer type for the length of a string and support stings longer than 2 GB. This change only affects projects using DIConverters 1.18.0.
 +
 +=====YuPcre2 1.9.2 – 8 Jan 2019=====
 +
 +  * Matching the pattern ''​(*UTF)\C[^\v]+\x80''​ against an 8-bit string containing multi-code-unit characters caused bad behaviour and possibly a crash.
 +  * When returning an error from ''​pcre2_pattern_convert'',​ ensure the error offset is set zero for early errors.
 +  * Refactored ''​pcre2_dfa_match''​ so that the internal recursive calls no longer use the stack for local workspace and local ovectors. Instead, an initial block of stack is reserved, but if this is insufficient,​ heap memory is used. The heap limit parameter now applies to ''​pcre2_dfa_match''​.
 +  * In ''​pcre2_substitute'',​ with global matching, a pattern that matched an empty string, but never at the starting match offset, was not handled in a Perl-compatible way. The pattern ''​(%%<​%%?​=\G.)''​ is an example of such a pattern. Because ''​\G''​ is in a lookbehind assertion, there has to be a "​bumpalong"​ before there can be a match. The automatic "​advance by one character after an empty string match" rule is therefore inappropriate. A more complicated algorithm has now been implemented.
 +  * When checking to see if a lookbehind is of fixed length, lookaheads were correctly ignored, but qualifiers on lookaheads were not being ignored, leading to an incorrect "​lookbehind assertion is not fixed length"​ error.
 +  * Updated to Unicode version 11.0.0. As well as the usual addition of new scripts and characters, this involved re-jigging the grapheme break property algorithm because Unicode has changed the way emojis are handled.
 +  * Fixed an obscure bug that struck when there were two atomic groups not separated by something with a backtracking point. There could be an incorrect backtrack into the first of the atomic groups. A complicated example is ''​(?>​a(*:​1))(?>​b)(*SKIP:​1)x|.*''​ matched against "​abc",​ where the ''​*SKIP''​ shouldn'​t find a MARK (because is in an atomic group), but it did.
 +  * ''​(*ACCEPT:​ARG)'',​ ''​(*FAIL:​ARG)'',​ and ''​(*COMMIT:​ARG)''​ are now supported.
 +  * A ''​(*MARK)''​ name was not being passed back for positive assertions that were terminated by ''​(*ACCEPT)''​.
 +  * Add support for ''​\N{U+dddd}'',​ but only in Unicode mode.
 +  * Add support for ''​(?​^)''​ for unsetting all ''​imnsx''​ options.
 +  * The ''​PCRE2_EXTENDED''​ (''/​x''​) option only ever discarded space characters whose code point was less than 256. Now, when Unicode support is compiled, ''​PCRE2_EXTENDED''​ also discards U+0085, U+200E, U+200F, U+2028, and U+2029, which are additional characters defined by Unicode as "​Pattern White Space"​. This makes PCRE2 compatible with Perl.
 +  * In certain circumstances,​ option settings within patterns were not being correctly processed. For example, the pattern ''​%%((%%?​i)A)(?​m)B''​ incorrectly matched "​ab"​. (The ''​(?​m)''​ setting lost the fact that ''​(?​i)''​ should be reset at the end of its group during the parse process, but without another setting such as ''​(?​m)''​ the compile phase got it right.)
 +  * When serializing a pattern, set the memctl, executable_jit,​ and tables fields (that is, all the fields that contain pointers) to zeros so that the result of serializing is always the same. These fields are re-set when the pattern is deserialized.
 +  * In a pattern such as ''​[^\x{100}-\x{ffff}]*[\x80-\xff]''​ which has a repeated negative class with no characters less than 0x100 followed by a positive class with only characters less than 0x100, the first class was incorrectly being auto-possessified,​ causing incorrect match failures.
 +  * If the only branch in a conditional subpattern was anchored, the whole subpattern was treated as anchored, when it should not have been, since the assumed empty second branch cannot be anchored. Demonstrated by test patterns such as ''​(?​(1)^())b''​ or ''​(?​(?​=^))b''​.
 +  * A repeated conditional subpattern that could match an empty string was always assumed to be unanchored. Now it it checked just like any other repeated conditional subpattern, and can be found to be anchored if the minimum quantifier is one or more.
 +
 +=====YuPcre2 1.9.1 – 1 Jan 2019=====
 +
 +  * Fix ''​TDIRegEx2_16.MatchNext''​ which might not not have properly advanced the start offset if the previous match was an empty string.
 +  * In YuPcre2_RegEx2.pas,​ replace a few character constants with ordinal constants to work around duplicate case label errors with at least one Delphi 10.3 Rio installation.
 +
 +=====YuPcre2 1.9.0 – 24 Dec 2018=====
 +
 +  * Support Delphi 10.3 Rio Win32 and Win64.
 +
 +=====YuPcre2 1.8.0 – 2 Mar 2018=====
 +
 +  * Add new ''​pcre2_config''​ options: ''​PCRE2_CONFIG_NEVER_BACKSLASH_C''​ and ''​PCRE2_CONFIG_COMPILED_WIDTHS''​.
 +  * Defined public names for all the ''​pcre2_compile''​ error numbers.
 +  * When an assertion contained (*ACCEPT) it caused all open capturing groups to be closed (as for a non-assertion ACCEPT), which was wrong and could lead to misbehaviour for subsequent references to groups that started outside the assertion. ACCEPT in an assertion now closes only those groups that were started within that assertion.
 +  * Although ''​pcre2_jit_match''​ checks whether the pattern is compiled in a given mode, it was also expected that at least one mode is available. This is fixed and ''​pcre2_jit_match''​ returns with ''​PCRE2_ERROR_JIT_BADOPTION''​ when the pattern is not optimized by JIT at all.
 +  * If a backreference with a minimum repeat count of zero was first in a pattern, apart from assertions, an incorrect first matching character could be recorded. For example, for the pattern ''​(?​=(a))\1?​b'',​ "​b"​ was incorrectly set as the first character of a match.
 +  * Characters in a leading positive assertion are considered for recording a first character of a match when the rest of the pattern does not provide one. However, a character in a non-assertive group within a leading assertion such as in the pattern ''​(?​=(a))\1?​b''​ caused this process to fail. This was an infelicity rather than an outright bug, because it did not affect the result of a match, just its speed. (In fact, in this case, the starting '​a'​ was subsequently picked up in the study.)
 +  * Allocate a single callout block on the stack at the start of ''​pcre2_match''​ and set its never-changing fields once only. Do the same for ''​pcre2_dfa_match''​.
 +  * Save the extra compile options (set in the compile context) with the compiled pattern (they were not previously saved), add ''​PCRE2_INFO_EXTRAOPTIONS''​ to retrieve them.
 +  * Added ''​PCRE2_CALLOUT_STARTMATCH''​ and ''​PCRE2_CALLOUT_BACKTRACK''​ bits to a new field callout_flags in callout blocks. The bits are set by ''​pcre2_match'',​ but not by JIT or ''​pcre2_dfa_match''​. These bits are provided to help with tracking how a backtracking match is proceeding.
 +  * When ''​PCRE2_FIRSTLINE''​ without ''​PCRE2_NO_START_OPTIMIZE''​ was used in non-JIT matching (both ''​pcre2_match''​ and ''​pcre2_dfa_match''​) and the matched string started with the first code unit of a newline sequence, matching failed because it was not tried at the newline.
 +  * Code for giving up a non-partial match after failing to find a starting code unit anywhere in the subject was missing when searching for one of a number of code units (the bitmap case) in both ''​pcre2_match''​ and ''​pcre2_dfa_match''​. This was a missing optimization rather than a bug.
 +  * The JIT compiler has been updated.
 +  * Avoid pointer overflow for unset captures in ''​pcre2_substring_list_get''​. This could not actually cause a crash because it was always used in a memcpy() call with zero length.
 +  * Auto-possessification at the end of a capturing group was dependent on what follows the group (e.g. ''​(a+)b''​ would auto-possessify the ''​a+''​) but this caused incorrect behaviour when the group was called recursively from elsewhere in the pattern where something different might follow. Iterators at the ends of capturing groups are no longer considered for auto-possessification if the pattern contains any recursions.
 +
 +=====YuPcre2 1.7.0 – 16 Aug 2017=====
 +
 +  * Implement ''​PCRE2_ENDANCHORED'',​ ''​coEndAnchored'',​ and ''​moEndAnchored''​.
 +  * Add an explicit limit on the amount of heap used by ''​pcre2_match'',​ set by ''​pcre2_set_heap_limit'',​ ''​TDIPerlRegEx2_8.HeapLimit'',​ ''​TDIDfaRegEx2_16.HeapLimit'',​ and the pattern start ''​(*LIMIT_HEAP=xxx)''​.
 +  * Extend auto-anchoring etc. to ignore groups with a zero qualifier and single-branch conditions with a false condition (e.g. DEFINE) at the start of a branch. For example, ''​(?​(DEFINE)...)^A''​ and ''​(...){0}^B''​ are now flagged as anchored.
 +  * Implement ''​PCRE2_EXTENDED_MORE''​ and ''​coExtendedMore'',​ and related ''/​xx''​ and ''​(?​xx)''​ features.
 +  * Implement ''​(?​n:''​ for ''​PCRE2_NO_AUTO_CAPTURE''​ and ''​coNoAutoCapture'',​ because Perl now has this.
 +  * Implement extra compile options in the compile context:
 +    * ''​PCRE2_EXTRA_ALLOW_SURROGATE_ESCAPES''​ and ''​coAllowSurrogateEscapes'';​
 +    * ''​PCRE2_EXTRA_BAD_ESCAPE_IS_LITERAL''​ and ''​coBadEscapeIsLiteral'';​
 +    * ''​PCRE2_EXTRA_MATCH_LINE''​ and ''​coMatchLine'';​
 +    * ''​PCRE2_EXTRA_MATCH_WORD''​ and ''​coMatchWord''​.
 +  * Implement newline type ''​PCRE2_NEWLINE_NUL''​.
 +  * A lookbehind assertion that had a zero-length branch caused undefined behaviour when processed by ''​pcre2_dfa_match''​.
 +  * The match limit value now also applies to ''​pcre2_dfa_match''​ as there are patterns that can use up a lot of resources without necessarily recursing very deeply.
 +  * Implement ''​PCRE2_LITERAL''​ and ''​coLiteral''​.
 +  * Increased the limit for searching for a "must be present"​ code unit in subjects from 1000 to 2000 for 8-bit searches, since they are much faster.
 +  * Arrange for anchored patterns to record and use "first code unit" data, because this can give a fast "no match" without searching for a "​required code unit". Previously only non-anchored patterns did this.
 +  * Upgraded the Unicode tables from Unicode 8.0.0 to Unicode 10.0.0.
 +  * Update extended grapheme breaking rules to the latest set that are in Unicode Standard Annex #29.
 +  * Added experimental foreign pattern conversion facilities (''​pcre2_pattern_convert''​ and friends).
 +  * If a hyphen that follows a character class is the last character in the class, Perl does not give a warning. PCRE2 now also treats this as a literal.
 +  * PCRE2 was not throwing an error for ''​[\d-X]''​ (and similar escapes), as is documented.
 +
 +=====YuPcre2 1.6.0 – 3 Apr 2017=====
 +
 +**New features:**
 +
 +  * Support Delphi 10.2 Tokyo Win32 and Win64.
 +  * The main interpreter,​ ''​pcre2_match'',​ has been refactored into a new version that does not use recursive function calls (and therefore the stack) for remembering backtracking positions. The new implementation allows backtracking into recursive group calls in patterns, making it more compatible with Perl, and also fixes some other hard-to-do issues.
 +    * Now that ''​pcre2_match''​ no longer uses recursive function calls (see above), the "match limit recursion"​ value seems misnamed. It still exists, and limits the depth of tree that is searched. To avoid future confusion, it has been renamed as "depth limit" in all relevant places (''​TDIRegEx2Base.MatchLimitDepth'',​ ''​PCRE2_INFO_DEPTHLIMIT'',​ ''​PCRE2_CONFIG_DEPTHLIMIT'',​ ''​PCRE2_ERROR_DEPTHLIMIT'',​ ''​pcre2_set_depth_limit'',​ etc.) but the old names are still available for backwards compatibility.
 +    * ''​PCRE2_CONFIG_STACKRECURSE''​ is no longer used and deprecated.
 +  * Added the ''​PCRE2_INFO_FRAMESIZE''​ item to ''​pcre2_pattern_info''​ and the ''​InfoFrameSize''​ property to ''​TDIRegEx2_8''​ as well as ''​TDIRegEx2_16.InfoFrameSize''​.
 +  * The depth (formerly recursion) limit now applies to DFA matching.
 +
 +** Bug fixes:**
 +
 +  * In the 32-bit library in non-UTF mode, an attempt to find a Unicode property for a character with a code point greater than 0x10ffff (the Unicode maximum) caused a crash.
 +  * If a lookbehind assertion that contained a back reference to a group appearing later in the pattern was compiled with the ''​PCRE2_ANCHORED''​ option, undefined actions (often a segmentation fault) could occur, depending on what other options were set. An example assertion is ''​(?​%%<​%%!\1(abc))''​ where the reference ''​\1''​ precedes the group ''​(abc)''​.
 +  * Fix memory leak in ''​pcre2_serialize_decode''​ when the input is invalid.
 +  * Fix potential nil dereference in ''​pcre2_callout_enumerate''​ if called with a nil pattern pointer.
 +  * The alternative matching function, ''​pcre2_dfa_match''​ misbehaved if it encountered a character class with a possessive repeat, for example ''​[a-f]{3}+''​.
 +
 +=====YuPcre2 1.5.0 – 17 Feb 2017=====
 +
 +**New features:**
 +
 +  * Implemented ''​pcre2_code_copy_with_tables''​.
 +  * ''​\g{+%%<​%%number>​}''​ (e.g. ''​\g{+2}''​) is now supported. It is a "​forward back reference"​ and can be useful in repetitions (compare ''​\g{-%%<​%%number>​}''​). Perl does not recognize this syntax.
 +
 +**Optimizations:​**
 +
 +  * When a pattern is too complicated,​ PCRE2 gives up trying to find a minimum matching length and just records zero. Typically this happens when there are too many nested or recursive back references. If the limit was reached in certain recursive cases it failed to be triggered and an internal error could be the result.
 +  * The ''​pcre2_dfa_match''​ function now takes note of the recursion limit for the internal recursive calls that are used for lookrounds and recursions within the pattern.
 +  * Detecting patterns that are too large inside the length-measuring loop saves processing ridiculously long patterns to their end.
 +  * When autopossessifying,​ skip empty branches without recursion, to reduce stack usage. Example pattern: ''​X?​(R||){3335}''​.
 +  * A pattern with very many explicit back references to a group that is a long way from the start of the pattern could take a long time to compile because searching for the referenced group in order to find the minimum length was being done repeatedly. Now up to 128 group minimum lengths are cached and the attempt to find a minimum length is abandoned if there is a back reference to a group whose number is greater than 128. (In that case, the pattern is so complicated that this optimization probably isn't worth it.)
 +
 +**Bug fixes:**
 +
 +  * In any wide-character mode (8-bit UTF or any 16-bit or 32-bit mode), without PCRE2_UCP set, a negative character type such as ''​\D''​ in a positive class should cause all characters greater than 255 to match, whatever else is in the class. There was a bug that caused this not to happen if a Unicode property item was added to such a class, for example ''​[\D\P{Nd}]''​ or ''​[\W\pL]''​.
 +  * There has been a major re-factoring of ''​pcre2_compile''​. Most syntax checking is now done in the pre-pass that identifies capturing groups. While doing this, some minor bugs and Perl incompatibilities were fixed, including:
 +    - ''​\Q\E''​ in the middle of a quantifier such as ''​A+\Q\E+''​ is now ignored instead of giving an invalid quantifier error.
 +    - ''​{0}''​ can now be used after a group in a lookbehind assertion; previously this caused an "​assertion is not fixed length"​ error.
 +    - Perl always treats ''​(?​(DEFINE)''​ as a "​define"​ group, even if a group with the name "​DEFINE"​ exists. PCRE2 now does likewise.
 +    - A recursion condition test such as ''​(?​(R2)...)''​ must now refer to an existing subpattern.
 +    - A conditional recursion test such as ''​(?​(R)...)''​ misbehaved if there was a group whose name began with "​R"​.
 +    - A hyphen appearing immediately after a POSIX character class (for example ''​%%[[%%:​ascii:​]-z]''​) now generates an error. Perl does accept this as a literal, but gives a warning, so it seems best to fail it in PCRE.
 +    - An empty ''​\Q\E''​ sequence may appear after a callout that precedes an assertion condition (it is, of course, ignored).\\ \\ One effect of the refactoring is that some error numbers and messages have changed, and the pattern offset given for compiling errors is not always the right-most character that has been read. In particular, for a variable-length lookbehind assertion it now points to the start of the assertion. Another change is that when a callout appears before a group, the "​length of next pattern item" that is passed now just gives the length of the opening parenthesis item, not the length of the whole group. A length of zero is now given only for a callout at the end of the pattern. Automatic callouts are no longer inserted before and after explicit callouts in the pattern. * Back references are now permitted in lookbehind assertions when there are no duplicated group numbers (that is, ''​(?​|''​ has not been used), and, if the reference is by name, there is only one group of that name. The referenced group must, of course be of fixed length.
 +  * Automatic callouts are no longer generated before and after callouts in the pattern.
 +  * A number of bugs have been mended relating to match start-up optimizations when the first thing in a pattern is a positive lookahead. These all applied only when ''​PCRE2_NO_START_OPTIMIZE''​ was *not* set:
 +    - A pattern such as ''​(?​=.*X)X$''​ was incorrectly optimized as if it needed both an initial '​X'​ and a following '​X'​.
 +    - Some patterns starting with an assertion that started with ''​.*''​ were incorrectly optimized as having to match at the start of the subject or after a newline. There are cases where this is not true, for example, ''​(?​=.*[A-Z])(?​=.{8,​16})(?​!.*[\s])''​ matches after the start in lines that start with spaces. Starting ''​.*''​ in an assertion is no longer taken as an indication of matching at the start (or after a newline).
 +  * A pattern with ''​PCRE2_DOTALL''​ (''/​s''​) set but not ''​PCRE2_NO_DOTSTAR_ANCHOR'',​ and which started with ''​.*''​ inside a positive lookahead was incorrectly being compiled as implicitly anchored.
 +  * Fix out-of-bounds read for partial matching of ''​.''​ against an empty string when the newline type is CRLF.
 +  * The appearance of ''​\p'',​ ''​\P'',​ or ''​\X''​ in a substitution string when ''​PCRE2_SUBSTITUTE_EXTENDED''​ was set caused a segmentation fault (''​nil''​ dereference).
 +  * If the starting offset was specified as greater than the subject length in a call to ''​pcre2_substitute''​ an out-of-bounds memory reference could occur.
 +  * Incorrect data was compiled for a pattern with ''​PCRE2_UCP''​ set without ''​PCRE2_UTF''​ if a class required all wide characters to match (for example, ''​[\s[:​^ascii:​]]''​).
 +  * The limit in the auto-possessification code that was intended to catch overly-complicated patterns and not spend too much time auto-possessifying was being reset too often, resulting in very long compile times for some patterns. Now such patterns are no longer completely auto-possessified.
 +  * Ignore ''​PCRE2_CASELESS''​ when processing ''​\h'',​ ''​\H'',​ ''​\v'',​ and ''​\V''​ in classes as it just wastes time. In the UTF case it can also produce redundant entries in XCLASS lists caused by characters with multiple other cases and pairs of characters in the same "​not-x"​ sublists.
 +
 +=====YuPcre2 1.4.0 – 31 Jul 2016=====
 +
 +**New Features:**
 +
 +  * Implemented ''​pcre2_code_copy''​ to make a copy of a compiled pattern.
 +  * Implemented the ''​PCRE2_NO_JIT''​ option for ''​pcre2_match''​ and ''​moNoJit''​ option for ''​TDIRegEx2Base.MatchOptions''​.
 +  * Calls to ''​pcre2_get_error_message''​ with error numbers that are never returned by PCRE2 functions were returning empty strings. Now the error code ''​PCRE2_ERROR_BADDATA''​ is returned.
 +  * Allow ''​\C''​ in lookbehinds and DFA matching in UTF-32 mode.
 +
 +**Bug fixes:**
 +
 +  * Detect unmatched closing parentheses and give the error in the pre-scan instead of later. Previously the pre-scan carried on and could give a misleading incorrect error message. For example, ''​(?​J)(?'​a'​))(?'​a'​)''​ gave a message about invalid duplicate group names.
 +  * A pattern that included ''​(*ACCEPT)''​ in the middle of a sufficiently deeply nested set of parentheses of sufficient size caused an overflow of the compiling workspace (which was diagnosed, but of course is not desirable).
 +  * Detect missing closing parentheses during the pre-pass for group identification.
 +  * Fix a racing condition in JIT.
 +  * Fix register overwrite in JIT when SSE2 acceleration is enabled.
 +
 +=====YuPcre2 1.3.0 – 7 May 2016=====
 +
 +  * Support Delphi 10.1 Berlin Win32 and Win64.
 +
 +=====YuPcre2 1.2.0 – 4 Mar 2016=====
 +
 +** New features:**
 +
 +  * New option to limit the length of a pattern: ''​TDIRegEx2Base.MaxPatternLength''​ and ''​pcre2_set_max_pattern_length''​.
 +  * New option to limit the offset of unanchored matches: ''​TDIRegEx2Base.OffsetLimit''​ and ''​pcre2_set_offset_limit''​.
 +  * New ''​pcre2_substitute''​ options ''​PCRE2_SUBSTITUTE_EXTENDED'',​ ''​PCRE2_SUBSTITUTE_UNSET_EMPTY'',​ ''​PCRE2_SUBSTITUTE_UNKNOWN_UNSET'',​ and ''​PCRE2_SUBSTITUTE_OVERFLOW_LENGTH''​.
 +
 +** Bug fixes:**
 +
 +  * In a character class such as ''​[\W\p{Any}]''​ where both a negative-type escape ("not a word character"​) and a property escape were present, the property escape was being ignored.
 +  * Fixed integer overflow for patterns whose minimum matching length is very, very large.
 +  * The special sequences ''​%%[%%[:​%%<​%%:​]]''​ and ''​%%[%%[:>:​]]''​ gave rise to incorrect compiling errors or other strange effects if compiled in UCP mode.
 +  * Adding group information caching improves the speed of compiling when checking whether a group has a fixed length and/or could match an empty string, especially when recursion or subroutine calls are involved.
 +  * If ''​[:​^ascii:​]''​ or ''​[:​^xdigit:​]''​ are present in a non-negated class, all characters with code points greater than 255 are in the class. When a Unicode property was also in the class (if ''​PCRE2_UCP''​ is set, escapes such as ''​\w''​ are turned into Unicode properties),​ wide characters were not correctly handled, and could fail to match. Negated classes such as ''​[^[:​^ascii:​]\d]''​ were also not working correctly in UCP mode.
 +  * If ''​PCRE2_AUTO_CALLOUT''​ was set on a pattern that had a ''​(?#''​ comment between an item and its qualifier (for example, ''​A(?#​comment)?​B''​) ''​pcre2_compile''​ misbehaved.
 +  * Similarly, if an isolated ''​\E''​ was present between an item and its qualifier when ''​PCRE2_AUTO_CALLOUT''​ was set, ''​pcre2_compile''​ misbehaved.
 +  * The error for an invalid UTF pattern string always gave the code unit offset as zero instead of where the invalidity was found.
 +  * An empty ''​\Q\E''​ sequence between an item and its qualifier caused ''​pcre2_compile''​ to misbehave when auto callouts were enabled.
 +  * If both ''​PCRE2_ALT_VERBNAMES''​ and ''​PCRE2_EXTENDED''​ were set, and a ''​(*MARK)''​ or other verb "​name"​ ended with whitespace immediately before the closing parenthesis,​ ''​pcre2_compile''​ misbehaved. Example: ''​(*:​abc )'',​ but only when both those options were set.
 +  * In a number of places ''​pcre2_compile''​ was not handling ''​nil''​ characters correctly.
 +  * If a pattern that was compiled with ''​PCRE2_EXTENDED''​ started with white space or a #-type comment that was followed by ''​(?​-x)'',​ which turns off ''​PCRE2_EXTENDED'',​ and there was no subsequent ''​(?​x)''​ to turn it on again, ''​pcre2_compile''​ assumed that ''​(?​-x)''​ applied to the whole pattern and consequently mis-compiled it. The fix for this bug means that a setting of any of the ''​(?​imsxU)''​ options at the start of a pattern is no longer transferred to the options that are returned by ''​PCRE2_INFO_ALLOPTIONS''​. In fact, this was an anachronism that should have changed when the effects of those options were all moved to compile time.
 +  * An escaped closing parenthesis in the "​name"​ part of a ''​(*verb)''​ when ''​PCRE2_ALT_VERBNAMES''​ was set caused ''​pcre2_compile''​ to malfunction.
 +
 +=====YuPcre2 1.1.0 – 15 Sep 2015=====
 +
 +  * Support Delphi 10 Seattle Win32 and Win64.
 +
 +  * Match limit check added to recursion.
 +  * Arrange for the UTF check in ''​pcre2_match''​ and ''​pcre2_dfa_match''​ to look only at the part of the subject that is relevant when the starting offset is non-zero.
 +  * Improve first character match in JIT with SSE2 on x86.
 +  * Fixed two assertion fails in JIT.
 +  * Fixed a corner case of range optimization in JIT.
 +  * Add the ${*MARK} facility to ''​pcre2_substitute''​.
 +  * Implemented ''​PCRE2_ALT_VERBNAMES''​ and ''​coAltVerbnames''​.
 +  * Fixed two issues in JIT.
 +
 +=====YuPcre2 1.0.1 – 8 Aug 2015=====
 +
 +  * Pathological patterns containing many nested occurrences of ''​[:''​ caused ''​pcre2_compile''​ to run for a very long time.
 +  * A missing closing parenthesis for a callout with a string argument was not being diagnosed, possibly leading to a buffer overflow.
 +  * A conditional group with only one branch has an implicit empty alternative branch and must therefore be treated as potentially matching an empty string.
 +  * If ''​(?​R''​ was followed by ''​-''​ or ''​+''​ incorrect behaviour happened instead of a diagnostic.
 +  * Conditional groups whose condition was an assertion preceded by an explicit callout with a string argument might be incorrectly processed, especially if the string contained ''​\Q''​.
 +  * Fix buffer overflow while checking a UTF-8 string if the final multi-byte UTF-8 character was truncated.
 +  * Finding the minimum matching length of complex patterns with back references and/or recursions can take a long time. There is now a cut-off that gives up trying to find a minimum length when things get too complex.
 +  * An optimization has been added that speeds up finding the minimum matching length for patterns containing repeated capturing groups or recursions.
 +  * If a pattern contained a back reference to a group whose number was duplicated as a result of appearing in a ''​(?​|...)''​ group, the computation of the minimum matching length gave a wrong result, which could cause incorrect "no match" errors. For such patterns, a minimum matching length cannot at present be computed.
 +  * Added a check for integer overflow in conditions ''​(?​(%%<​%%digits>​)''​ and ''​(?​(R%%<​%%digits>​)''​.
 +  * Fixed an issue when ''​\p{Any}''​ inside an xclass did not read the current character.
 +  * The JIT compiler did not restore the control verb head in case of ''​*THEN''​ control verbs.
 +  * The way recursive references such as ''​(?​3)''​ are compiled has been re-written because the old way was the cause of many issues. Now, conversion of the group number into a pattern offset does not happen until the pattern has been completely compiled. This does mean that detection of all infinitely looping recursions is postponed till match time. In the past, some easy ones were detected at compile time.
 +  * A test for a back reference to a non-existent group was missing for items such as ''​\987''​. This caused incorrect code to be compiled.
 +  * Error messages for syntax errors following ''​\g''​ and ''​\k''​ were giving inaccurate offsets in the pattern.
 +  * Improve the performance of starting single character repetitions in JIT.
 +  * ''​(*LIMIT_MATCH=)''​ now gives an error instead of setting the value to 0.
 +  * Error messages for syntax errors in *LIMIT_MATCH and *LIMIT_RECURSION now give the right offset instead of zero.
 +  * The JIT compiler should not check repeats after a {0,1} repeat byte code.
 +  * The JIT compiler should restore the control chain for empty possessive repeats.
 +
 +=====YuPcre2 1.0.0 – 22 Jul 2015=====
 +
 +  * Initial release.
 +