Delphi Inspiration

Components and Applications

User Tools

Site Tools



This shows you the differences between two versions of the page.

Link to this comparison view

products:pcre2:changes [2016/01/22 15:08] (current)
Line 1: Line 1:
 +====== YuPcre2: Changes from DIRegEx ======
 +This document describes the differences and similarities between the new [[products:​pcre2:​|YuPcre2]] and the old [[products:​regex:​|DIRegEx]] to help convert existing projects. If you never used [[products:​regex:​|DIRegEx]] or start a new project with [[products:​pcre2:​|YuPcre2]],​ you might skip this document.
 +[[products:​pcre2:​|YuPcre2]] is a new project, not just a drastic update to DIRegEx. A lot has changed, even though some units, classes, and functions carry familiar names. Unfortunately,​ it was not possible to keep identical identifiers because Delphi rejects them if both YuPcre2 and DIRegEx are installed into the IDE. Overall, ''​DIRegEx''​ names have changed to ''​DIRegEx2''​ where possible, which should simplify transition to YuPcre2.
 +===== Unit Name Changes =====
 +Unit names had to be changed to allow [[products:​pcre2:​|YuPcre2]] to be installed into the IDE in parallel with DIRegEx. Unit names start with the ''​YuPcre2''​ prefix. The native PCRE2 API is in ''​YuPcre2.pas''​. ''​DIRegEx''​ units with class wrappers and helper routines have been renamed to ''​YuPcre2_RegEx2...'':​
 +^ DIRegEx ​                 ^ YuPcre2 ​                        |
 +| DIRegEx_Api.pas ​         | YuPcre2.pas ​                    |
 +| n/a                      | YuPcre2OptInfo.pas ​             |
 +| DIRegEx_Reg.pas ​         | YuPcre2Reg.pas ​                 |
 +| DIRegEx.pas ​             | YuPcre2_RegEx2.pas ​             |
 +| DIRegEx_Consts.pas ​      | YuPcre2_RegEx2_Consts.pas ​      |
 +| DIRegEx_MaskControls.pas | YuPcre2_RegEx2_MaskControls.pas |
 +| DIRegEx_SearchStream.pas | YuPcre2_RegEx2_SearchStream.pas |
 +| DIRegEx_Utils.pas ​       | YuPcre2_RegEx2_Utils.pas ​       |
 +===== Class and Identifier Name Changes =====
 +Class names now contain "​RegEx2"​ – the number 2 is appended to "​RegEx"​. Most members, helper routines and identifier names are unchanged. Deprecated warnings are issued where appropriate.
 +|{{products:​regex:​tdiperlregex16.png|TDIPerlRegEx16}} TDIPerlRegEx16|{{TDIPerlRegEx2_16.png|TDIPerlRegEx2_16.png}} TDIPerlRegEx2_16|
 +|{{products:​regex:​tdidfaregex16.png|TDIDfaRegEx16.gif}} TDIDfaRegEx16 ​        | {{TDIDfaRegEx2_16.png|TDIDfaRegEx2_16.png}} TDIDfaRegEx2_16|
 +|{{products:​regex:​tdiperlregex.gif|TDIPerlRegEx.gif}} TDIPerlRegEx ​           | {{TDIPerlRegEx2_8.png|TDIPerlRegEx2_8.png}} TDIPerlRegEx2_8 |
 +|{{products:​regex:​tdidfaregex.gif|TDIDfaRegEx.gif}} TDIDfaRegEx ​              | {{TDIDfaRegEx2_8.png|TDIDfaRegEx2_8.png}} TDIDfaRegEx2_8 |
 +|{{products:​regex:​tdiregexmaskedit.gif|TDIRegExMaskEdit.gif}} TDIRegExMaskEdit| {{TDIRegEx2MaskEdit.png|TDIRegEx2MaskEdit.png}} TDIRegEx2MaskEdit |
 +|{{products:​regex:​tdiregexmaskcombobox.gif|TDIRegExMaskComboBox.gif}} TDIRegExMaskComboBox| {{TDIRegEx2MaskComboBox.png|TDIRegEx2MaskComboBox.png}} TDIRegEx2MaskComboBox |
 +''​TDIRegEx2Base.CompileOptions''​ is empty by default. In [[products:​regex:​|DIRegEx]],​ ''​coCaseLess''​ and ''​coDotAll''​ were set by default. [[products:​pcre2:​|YuPcre2]] excludes them for compatibility with PCRE2. If matching relies on these options, set them like this:
 +<code pascal>
 +{ Set YuPcre2 CompileOptions to DIRegEx default: }
 +RegEx.CompileOptions := [coCaseLess,​ coDotAll];
 +''​TDIRegEx2Base.BSR''​ and ''​TDIRegEx2Base.NewLine''​ options are new properties of their own. In [[products:​regex:​|DIRegEx]] they were be part of the ''​CompileOptions''​ and ''​MachOptions''​. As a consequence,​ ''​BSR''​ and ''​NewLine''​ options can no longer be passed to ''​CompileMatchPatternStrOpt''​ but must be set beforehand.
 +===== PCRE2 Native API Changes =====
 +  * Names of the native API functions start with the "​pcre2_"​ prefix. The "​_8",​ "​_16",​ and "​_32"​ suffixes denote the width of the function'​s string code unit in bits.
 +  * Many names have been changed; in particular, ''​pcre_exec''​ has become ''​pcre2_match''​. The ''​PCRE_JAVASCRIPT_COMPAT''​ option has been split into independent functional options ''​PCRE2_ALT_BSUX'',​ ''​PCRE2_ALLOW_EMPTY_CLASS'',​ and ''​PCRE2_MATCH_UNSET_BACKREF''​.
 +  * Patterns, subject strings, and replacement strings may all contain binary zeros and for this reason are always passed as a pointer and a length. However, the length may be given as ''​PCRE2_ZERO_TERMINATED''​ for zero-terminated strings.
 +  * The output vector that holds offsets of matched strings is now a vector of ''​PCRE2_SIZE''​ elements instead of Integers. The special value ''​PCRE2_UNSET''​ is used for unset elements.
 +  * Error handling has been redesigned and error messages are available in all code unit widths. The error codes have been redesignated.
 +  * Explicit "​studying"​ of compiled patterns has been abolished – it now always happens automatically. JIT compiling is done by calling a new function, ''​pcre2_jit_compile''​ after a successful return from ''​pcre2_compile''​.
 +  * The ''​capture_last''​ field of the ''​pcre2_callout_block''​ is now an unsigned integer, set to zero if there have been no captures.
 +  * Saving / restoring a compiled pattern is accomplished by a set of serializing functions.
 +  * There is a new function called ''​pcre2_substitute''​ that performs "find and replace"​ operations.
 +  * Implement the ''​PCRE2_NO_DOTSTAR_ANCHOR'',​ ''​PCRE2_NEVER_BACKSLASH_C'',​ and ''​PCRE2_ALT_CIRCUMFLEX''​ options.
 +=====PCRE2 Funcionality Changes=====
 +  * Patterns may start with ''​(*NOTEMPTY)''​ or ''​(*NOTEMPTY_ATSTART)''​ to set the ''​PCRE2_NOTEMPTY''​ or ''​PCRE2_NOTEMPTY_ATSTART''​ options for every subject line that is matched by that pattern.
 +  * For the benefit of those who use PCRE2 via some other application,​ that is, not writing the function calls themselves, it is possible to check the PCRE2 version by matching a pattern such as ''​(?​(VERSION>​=10)yes|no)''​ against a string such as "​yesno"​.
 +  * There are case-equivalent Unicode characters whose encodings use different numbers of code units in UTF-8. U+023A and U+2C65 are one example. (It is theoretically possible for this to happen in UTF-16 too.) If a backreference to a group containing one of these characters was greedily repeated, and during the match a backtrack occurred, the subject might be backtracked by the wrong number of code units. For example, if ''​^(\x{23a})\1*(.)''​ is matched caselessly (and in UTF-8 mode) against ''​x{23a}\x{2c65}\x{2c65}\x{2c65}'',​ group 2 should capture the final character, which is the three bytes E2, B1, and A5 in UTF-8. Incorrect backtracking meant that group 2 captured only the last two bytes. This bug has been fixed; the new code is slower, but it is used only when the strings matched by the repetition are not all the same length.
 +  * Update Unicode to 8.0.0.
 +  * A pattern such as ''​()a''​ was not setting the "first character must be '​a'"​ information. This applied to any pattern with a group that matched no characters, for example: ''​(?:​(?​=.)|(?​%%<​%%!x))a''​.
 +  * When an ''​(*ACCEPT)''​ is triggered inside capturing parentheses,​ it arranges for those parentheses to be closed with whatever has been captured so far. However, it was failing to mark any other groups between the highest capture so far and the currrent group as "​unset"​. Thus, the ovector for those groups contained whatever was previously there. An example is the pattern ''​(x)|%%((*%%ACCEPT))''​ when matched against "​abcd"​.
 +  * Add the ''​(*NO_JIT)''​ pattern feature.
 +  * Add callouts with string arguments.
products/pcre2/changes.txt · Last modified: 2016/01/22 15:08 (external edit)