From: Stephen Viles (sviles_abi@iinet.net.au)
Date: Thu Jul 24 2003 - 05:15:40 EDT
Re-posting as text (rather than HTML)
24/07/03 3:33:26 AM, Raphael Finkel <raphael@cs.uky.edu> wrote:
I enclose a patch that I have built to strip leading and trailing
non-alphabetic characters from strings sent to the spellcheck apparatus.
It's a bit tricky: alphabetic characters are those with Unicode types
Lm, Lo, Lu, Ll, Lt, Mn, Me, and Mc (the L ones are letters, the M ones
are letter-like marks, such as accents).
The patch introduces two new routines in ut_string:
UT_UCS2_isalphaormark() and UT_UCS4_isalphaormark(), which are
implemented by binary search (included) through a table I built with the
uniset program. Then in fl_BlockLayout.cpp I changed two "if"s to
"while"s, to repeatedly remove leading/trailing stuff, and changed the
choice of what to remove to call my isalphaormark() routines.
There may be more elegant ways to do this.
Raphael
This archive was generated by hypermail 2.1.4 : Thu Jul 24 2003 - 05:32:12 EDT