Spell checking (was Re: various questions)

Paul Rohr (paul@abisource.com)
Tue, 21 Sep 1999 18:46:48 -0700

At 04:55 AM 9/19/99 -0500, Justin Bradford wrote:
>3. Spell checking
> I have local modifications to ispell reintegrating its simple word
> suggestion code. My plan was to make use of this in the dialog
> and possibly in a right-click menu on squiggled words.
> That and the dialog ought to hit the tree tomorrow.

Cool. For some reason the guy who did the original ispell integration left
out that functionality, which never made sense to me. It'll be good to have
it back. :-)

> Also, I've looked through aspell, and the trick to it's good suggestions
> is a combination of ispell's method and metaphones. It doesn't look too
> hard to integrate, but the metaphone transformations will be language
> dependent. We could probably rig up an external ruleset which would
> allow localizers to implement there a new language for spell check.

Sounds intriguing. Remember, though, that the longstanding objection to
aspell is its use of advanced C++ features like templates, which greatly
reduce portability.

We'd love to have a cooler engine than ispell, and aspell's results sure
look cool, but given all the platforms people want to run AbiWord on, the
portability problem is a biggie. I suspect that the problems of generating
and distributing aspell-format dictionaries for various languages pale in
comparison to this.

> And, we should have a way to preserve "ignore all" words for at least
> a session (possibly store in the file, too?), and the stripped ispell
> should be expanded to handle user-defined dictionaries.

For both personal dictionaries and ignore lists, we need to decide two

1. how they're stored in memory, and
2. how they persist.

Since ignore lists tend to be small, actually using ispell to manage them
seems like rampant overkill. A trivial in-memory representation would be to
just store the words in a per-document UT_AlphaHashTable. Then if we need
to persist that information in the file format -- does Word do this, BTW? --
we could serialize that word list in a header section of the document.

Likewise, personal dictionaries also tend to be far, far smaller than ispell
dictionaries -- my idiolect is a *lot* smaller than the rest of the English
language :-) -- so a similar approach should work there too. In this case,
the UT_AlphaHashTable would be app-wide, and could easily persist to a
simple text file with one word per line. In fact, iterative calls to
UT_AlphaHashTable::getNthEntryAlpha() would even ensure that the resulting
file is alpha-sorted, which is pretty nice.

This should also make it quite easy to mimic W97's UI trick of editing
personal dictionaries by loading them in as plain-text documents. ;-)

> As these get done, a generic spelling interface should get built, which
> could then be replaced by another spelling checker.

Yep. The current API isn't a very nice hack, is it?

If we follow the alphahash approach suggested above, then the replacement
API could stay quite simple, since personal dictionary management and the
whole ignore concept would both be handled by the app, and not by the
particular dictionary lookup engine.

This should make switching from ispell dictionaries to your favorite *spell
engine much easier, because all that engine would need to do is check and
suggest individual words.


This archive was generated by hypermail 1.03b2.