...mostly done with a rewriting Clang plugin, with just some manual tweaking
necessary to fix poor macro usage.
Change-Id: I71fa20213e86be10de332ece0aa273239df7b61a
Issue:
OUString uses UTF-16, so for a Unicode surrogate character there are 2
values stored, not just 1.
So we are getting assert failure in "rtl_uString_iterateCodePoints" method.
erAck: Underlying cause was that the dictionary breakiterator misused UTF-16 positions as Unicode code point positions.
Change-Id: I923485f56c2d879b63687adaea2b489a3479991c
Reviewed-on: https://gerrit.libreoffice.org/6955
Reviewed-by: Eike Rathke <erack@redhat.com>
Tested-by: Eike Rathke <erack@redhat.com>
Convert code like:
aOStringBuf.append( RTL_CONSTASCII_STRINGPARAM( " is missing )") );
to:
aOStringBuf.append( " is missing )" );
which compiles down to the same code.
Change-Id: I3d8ed0cbf96a881686524a167412d5f303c06b71
commit 30c303 "Make charmap.cxx compile with icu >= 4.4." was incomplete
and had wrong version checks. After ICU 4.8 (4.8.1.1) the next version
of ICU was 49 (49.1) so U_ICU_VERSION_MAJOR_NUM contains two digets (49),
earlier that it was just one digit (4). The correct header to include to
do version checks is unicode/uversion.h. USCRIPT_MANDAEAN is the old
alias of USCRIPT_MANDAIC (same numeric value). U_JG_FARSI_YEH is only
available since ICU 4.4. Note that on older icu versions (4.2.1) the
200B (ZWSP) Zero Width Space breakiterator testcase fails (others
succeed).
Change-Id: If73c1402239a28546077437e9382f0bd38642bad
Reviewed-on: https://gerrit.libreoffice.org/4139
Reviewed-by: Luboš Luňák <l.lunak@suse.cz>
Tested-by: Luboš Luňák <l.lunak@suse.cz>
Modules sal, salhelper, cppu, cppuhelper, codemaker (selectively) and odk
have kept them, in order not to break external API (the automatic using declaration
is LO-internal).
Change-Id: I588fc9e0c45b914f824f91c0376980621d730f09
Done with a perl regex:
s/OUString\s*\(\s*RTL_CONSTASCII_USTRINGPARAM\s*\((\s*"[^")]*?"\s*)\)\s*\)/OUString\($1\)/gms
Change-Id: Idf28320817cdcbea6d0f7ec06a9bf51bd2c3b3ec
Reviewed-on: https://gerrit.libreoffice.org/2832
Reviewed-by: Thomas Arnhold <thomas@arnhold.org>
Tested-by: Thomas Arnhold <thomas@arnhold.org>
a) the default properties for the code point make it not split a word it
appears in into two different words in any break mode we have. Which is what we
want from a CH_TXTATR_INWORD
b) unicode TR#20 gives for the interlinear annotation anchor: "What to do if
detected: In a proxy context or browser context, remove U+FFF9", so when we
need to strip it from text to run that text through e.g. the spellchecker or
word counting then there's a solid precedent for stripping it
In addition I *do* want the footnote placeholder to break the word it appears
in, that gives the desired wordcount and cursor travelling behaviour
The BREAKWORD and other *random* selection of CH_TXTATR are still odd choices,
and there's way too many of them.
Change-Id: I930ff8ff806af448829bc1a1ae6cb92053e9a284
a) remove special handling of 0x0002 in our custom icu rules.
Which brings us a step closer to getting rid of at least
some of them in favour of the defaults
b) expand the 0x02 in SwTxtNode::BuildConversionMap like we
do for fields so
Good side effect is our word count and character count now take into account
the actual footnote indicator text, as does our cursor travelling. Both of
which are more word-alike.
Change-Id: I3b0024ac4b10934bee7a9e83b0fce08a18556c7b