TDICsvParser.SkipBlankRows property.TDICsvParser.ReadNextData improvements:DIUtils.pas Unicode functions to Unicode 14.0.0.Extend character support to the full range of Unicode Code Points from $000000 to $10FFFF.
Up to now, DIUnicode stored code points as WideChars. This limited Unicode support to the Basic Multilingual Plane (BMP) from $0000 to $FFFF. Code points from the Supplementary Planes were converted to the $FFFD replacement character. This went well with a great number of languages. But less common scripts did not work, just like the increasingly popular emojis from the Symbols and Pictographs Unicode blocks.
DIUnicode 7.0.0 overcomes these limitations and now covers the complete Unicode range. Changes are almost entirely internal and maintain backwards compatibility as much as possible. Existing applications should compile with no or minor changes only. WideChar routines are marked as deprecated and hint at their new complementary UCP routines.
TDIUnicodeReader.Data is still a WideChar buffer. However, its contents is now fully UTF-16 encoded. This means that it may contain code points > $FFFF which take up two WideChars (surrogate pairs). As a result, indexed access to the buffer is no longer guaranteed. TDIUnicodeReader.Data related methods, like TDIUnicodeReader.DataAsStrTrimW are adjusted accordingly.
UnicodeString utility routines are rewritten to handle full UTF-16, including surrogate pairs. Most of them are in DIUtils.pas. YuUtf.pas also contains new utility routines for UTF-16 testing, encoding, and decoding. If possible, string handling routines now take NativeInt type parameters for the buffer length.
Other noteworthy changes:
TDIUnicodeReader.UCP complements TDIUnicodeReader.Char.DI_No_Classes and DI_No_Unicode_Component. TDIUnicodeReader always descends from TComponent and the Classes unit is always used. Source code only.DI.inc include file. Directly link in DICompilers.inc instead. Source code only.TDIUnicodeWriter memory leak if TDIUnicodeWriteMethods.Init allocates its own memory.TDIUnicodeWriter.Clear calls TDIUnicodeWriteMethods.Flush to reset encoder state.Read_iso_2022_jp_ms read methods and Write_iso_2022_jp_ms write methods.TDIUnicodeReader.SourceStream, the size of the internal source buffer was not correctly calculated. Depending on the decoding, this slowed down reading or even stoped it before the end of the stream was reached.TDIUnicodeReader.SkipEmptyLines consumed additional chars after the line break.TDIUnicodeReader.FillSourceBuffer (source code edition only).TDIUnicodeWriter.WriteStr8 and WriteBuf8 methods.TDIUnicodeReader.DataAsStrTrim8 method.TDIUnicodeReader when a pushed source was popped at the end of a nested document.TDIUnicodeReader.ReadBOM function which returns the Byte Order Mark (BOM) found at the current position and advances the position accordingly.TDIUnicodeReader.SourceFile property as a simple means to read from a file.WriteByteOrderMark parameter to TDIUnicodeReader.SaveDataToFile and TDIUnicodeReader.SaveDataToStream which controls if a UTF-16/UCS-2 little endian byte order mark is being written in front of the data.Write_UTF_7 / Read_UTF_7)Write_UTF_7_ODC / reads as Read_UTF_7)Write_UTF_7) or without (Write_UTF_7_ODC) encoding optional direct characters. UTF-7 reading (Read_UTF_7) works equaly well for both writing methods.TDIUnicodeReader and TDIUnicodeWriter to allow data buffering between consecutive reads and writes.TDIUnicodeReader.PushSource and TDIUnicodeReader.PopSource methods added to which allow to insert one source into another, like for Pascal {$INCLUDE …} directive.TDIUnicodeReader can optionally free its source stream if the reading reached the end of the stream. This is especially usefull when reading nested files using the TDIUnicodeReader.PushSource and TDIUnicodeReader.PopSource methods. The protected property TDIUnicodeReader.AutoFreeSourceStreams may be used by descendent classes which implement specialized reading / parsing.TDIUnicodeReader, as well as for retrieving data as trimmed strings.