searching accented characters

Subject: searching accented characters
From: Lukas Pietsch (
Date: Tue Mar 06 2001 - 04:01:01 CST

I noticed some rather strange (inconsistent?) behaviour of Abiword 0.7.13
with respect to searching and accented characters (characters in the
Latin-Extended-A range). When doing a case-insensitive search for a base
Ascii Latin character, some, but not all, accented characters are matched.
For instance:

"o" matches U+014F (o with breve)
"g" matches U+011F (g with breve)
"i" matches U+0131 (small dotless i) and U+0130 (capital dotted i)

This is a Good Thing, I should say.

"o" does not match U+0151 (o with double acute)
"a" does not match U+00E4 (a with umlaut), or U+1E00 (A with ring below)
and so on. This is a Bad Thing, because it's inconsistent.

Incidentally, this seems to be related to two other features affecting
those accented characters that do get matched:

(a) when highlighting a passage and then opening the search dialogue, the
highlighted passage is displayed there. Non-Latin-1 characters are either
left out, or converted to Ascii best-match equivalents. Characters such as
U+014F, U+011F, U+0131, or U+0130 are displayed as Ascii (o, g, i, i
respectively). They are the ones that will get matched. U+0151 gets
displayed as '"o' (but searching for neither 'o' nor '"o' will match it.)
The Euro symbol gets displayed as "EUR" (but searching for "EUR" will not
match it.)
Note that these "best matches" are not the ones Windows defines (I'm on
Win98). They are apparently of Abiword's own making.

(b) Characters such as U+014F, U+011F, U+0131, or U+0130 are also converted
to plain Ascii when saving a document in .abw or .html format. Needless to
say, this is a Very Bad Thing. (I just filed a bug in Bugzilla about that

I just thought I'd ask in case there's any hidden logic behind this
behaviour, before feeding it into Bugzilla.


Lukas Pietsch
University of Freiburg
English Department

Phone (p.) (#49) (761) 696 37 23

