From: Raphael Finkel (raphael@cs.uky.edu)
Date: Wed Jul 23 2003 - 13:33:26 EDT
I enclose a patch that I have built to strip leading and trailing
non-alphabetic characters from strings sent to the spellcheck apparatus.
It's a bit tricky: alphabetic characters are those with Unicode types
Lm, Lo, Lu, Ll, Lt, Mn, Me, and Mc (the L ones are letters, the M ones
are letter-like marks, such as accents).
The patch introduces two new routines in ut_string:
UT_UCS2_isalphaormark() and UT_UCS4_isalphaormark(), which are
implemented by binary search (included) through a table I built with the
uniset program. Then in fl_BlockLayout.cpp I changed two "if"s to
"while"s, to repeatedly remove leading/trailing stuff, and changed the
choice of what to remove to call my isalphaormark() routines.
There may be more elegant ways to do this.
Raphael
This archive was generated by hypermail 2.1.4 : Wed Jul 23 2003 - 13:47:53 EDT