DIUtils.pasUnicode functions to Unicode 14.0.0.
Extend character support to the full range of Unicode Code Points from $000000 to $10FFFF.
Up to now, DIUnicode stored code points as WideChars. This limited Unicode support to the Basic Multilingual Plane (BMP) from $0000 to $FFFF. Code points from the Supplementary Planes were converted to the $FFFD replacement character. This went well with a great number of languages. But less common scripts did not work, just like the increasingly popular emojis from the Symbols and Pictographs Unicode blocks.
DIUnicode 7.0.0 overcomes these limitations and now covers the complete Unicode range. Changes are almost entirely internal and maintain backwards compatibility as much as possible. Existing applications should compile with no or minor changes only. WideChar routines are marked as deprecated and hint at their new complementary UCP routines.
TDIUnicode.Data is still a WideChar buffer. However, its contents is now fully UTF-16 encoded. This means that it may contain code points > $FFFF which take up two WideChars (surrogate pairs). As a result, indexed access to the buffer is no longer guaranteed.
TDIUnicode.Data related methods, like
TDIUnicode.DataAsStrTrimW are adjusted accordingly.
UnicodeString utility routines are rewritten to handle full UTF-16, including surrogate pairs. Most of them are in
YuUtf.pas also contains new utility routines for UTF-16 testing, encoding, and decoding. If possible, string handling routines now take NativeInt type parameters for the buffer length.
Other noteworthy changes:
EndPosfrom unsigned Cardinal to signed NativeInt.
TDIUnicodealways descends from
Classesunit is always used. Source code only.
DI.incinclude file. Directly link in
DICompilers.incinstead. Source code only.
TDIUnicodeWritermemory leak if
TDIUnicodeWriteMethods.Initallocates its own memory.
TDIUnicodeWriteMethods.Flushto reset encoder state.
Read_iso_2022_jp_msread methods and
TDIUnicodeReader.SourceStream, the size of the internal source buffer was not correctly calculated. Depending on the decoding, this slowed down reading or even stoped it before the end of the stream was reached.
TDIUnicodeReader.SkipEmptyLinesconsumed additional chars after the line break.
TDIUnicodeReader.FillSourceBuffer(source code edition only).
TDIUnicodeReaderwhen a pushed source was popped at the end of a nested document.
TDIUnicodeReader.ReadBOMfunction which returns the Byte Order Mark (BOM) found at the current position and advances the position accordingly.
TDIUnicodeReader.SourceFileproperty as a simple means to read from a file.
TDIUnicodeReader.SaveDataToStreamwhich controls if a UTF-16/UCS-2 little endian byte order mark is being written in front of the data.
Write_UTF_7_ODC/ reads as
Write_UTF_7) or without (
Write_UTF_7_ODC) encoding optional direct characters. UTF-7 reading (
Read_UTF_7) works equaly well for both writing methods.
TDIUnicodeWriterto allow data buffering between consecutive reads and writes.
TDIUnicodeReader.PopSourcemethods added to which allow to insert one source into another, like for Pascal
TDIUnicodeReadercan optionally free its source stream if the reading reached the end of the stream. This is especially usefull when reading nested files using the
TDIUnicodeReader.PopSourcemethods. The protected property
TDIUnicodeReader.AutoFreeSourceStreamsmay be used by descendent classes which implement specialized reading / parsing.
TDIUnicodeReader, as well as for retrieving data as trimmed strings.