YuPcre2 is an up to date regular expression library for Delphi with Perl syntax. Directly supports UnicodeString, AnsiString, or UCS4String, as well as UTF-8, and UTF-16.
This document describes the differences and similarities between the new YuPcre2 and the old DIRegEx to help convert existing projects. If you never used DIRegEx or start a new project with YuPcre2, you might skip this document.
YuPcre2 is a new project, not just a drastic update to DIRegEx. A lot has changed, even though some units, classes, and functions carry familiar names. Unfortunately, it was not possible to keep identical identifiers because Delphi rejects them if both YuPcre2 and DIRegEx are installed into the IDE. Overall, DIRegEx names have changed to DIRegEx2 where possible, which should simplify transition to YuPcre2.
Unit names had to be changed to allow YuPcre2 to be installed into the IDE in parallel with DIRegEx. Unit names start with the YuPcre2 prefix. The native PCRE2 API is in YuPcre2.pas. DIRegEx units with class wrappers and helper routines have been renamed to YuPcre2_RegEx2…:
| DIRegEx | YuPcre2 |
|---|---|
| DIRegEx_Api.pas | YuPcre2.pas |
| n/a | YuPcre2OptInfo.pas |
| DIRegEx_Reg.pas | YuPcre2Reg.pas |
| DIRegEx.pas | YuPcre2_RegEx2.pas |
| DIRegEx_Consts.pas | YuPcre2_RegEx2_Consts.pas |
| DIRegEx_MaskControls.pas | YuPcre2_RegEx2_MaskControls.pas |
| DIRegEx_SearchStream.pas | YuPcre2_RegEx2_SearchStream.pas |
| DIRegEx_Utils.pas | YuPcre2_RegEx2_Utils.pas |
Class names now contain “RegEx2” the number 2 is appended to “RegEx”. Most members, helper routines and identifier names are unchanged. Deprecated warnings are issued where appropriate.
TDIRegEx2Base.CompileOptions is empty by default. In DIRegEx, coCaseLess and coDotAll were set by default. YuPcre2 excludes them for compatibility with PCRE2. If matching relies on these options, set them like this:
{ Set YuPcre2 CompileOptions to DIRegEx default: } RegEx.CompileOptions := [coCaseLess, coDotAll];
TDIRegEx2Base.BSR and TDIRegEx2Base.NewLine options are new properties of their own. In DIRegEx they were be part of the CompileOptions and MachOptions. As a consequence, BSR and NewLine options can no longer be passed to CompileMatchPatternStrOpt but must be set beforehand.
pcre_exec has become pcre2_match. The PCRE_JAVASCRIPT_COMPAT option has been split into independent functional options PCRE2_ALT_BSUX, PCRE2_ALLOW_EMPTY_CLASS, and PCRE2_MATCH_UNSET_BACKREF.PCRE2_ZERO_TERMINATED for zero-terminated strings.PCRE2_SIZE elements instead of Integers. The special value PCRE2_UNSET is used for unset elements.pcre2_jit_compile after a successful return from pcre2_compile.capture_last field of the pcre2_callout_block is now an unsigned integer, set to zero if there have been no captures.pcre2_substitute that performs “find and replace” operations.PCRE2_NO_DOTSTAR_ANCHOR, PCRE2_NEVER_BACKSLASH_C, and PCRE2_ALT_CIRCUMFLEX options.(*NOTEMPTY) or (*NOTEMPTY_ATSTART) to set the PCRE2_NOTEMPTY or PCRE2_NOTEMPTY_ATSTART options for every subject line that is matched by that pattern.(?(VERSION>=10)yes|no) against a string such as “yesno”.^(\x{23a})\1*(.) is matched caselessly (and in UTF-8 mode) against x{23a}\x{2c65}\x{2c65}\x{2c65}, group 2 should capture the final character, which is the three bytes E2, B1, and A5 in UTF-8. Incorrect backtracking meant that group 2 captured only the last two bytes. This bug has been fixed; the new code is slower, but it is used only when the strings matched by the repetition are not all the same length.()a was not setting the “first character must be 'a'” information. This applied to any pattern with a group that matched no characters, for example: (?:(?=.)|(?<!x))a.(*ACCEPT) is triggered inside capturing parentheses, it arranges for those parentheses to be closed with whatever has been captured so far. However, it was failing to mark any other groups between the highest capture so far and the currrent group as “unset”. Thus, the ovector for those groups contained whatever was previously there. An example is the pattern (x)|((*ACCEPT)) when matched against “abcd”.(*NO_JIT) pattern feature.