Re: The AbiWord side of a grammar checker (was Re: Implementing support for barbarisms correction)

From: Andrew Dunbar (hippietrail@yahoo.com)
Date: Sun Sep 22 2002 - 21:27:12 EDT

Next message: Andrew Dunbar: "Re: The AbiWord side of a grammar checker (was Re: Implementing support for barbarisms correction)"

Previous message: Dom Lachowicz: "Vietnamese PO file"
In reply to: Martin Sevior: "The AbiWord side of a grammar checker (was Re: Implementing support for barbarisms correction)"
Next in thread: Jordi Mas: "Re: Implementing support for barbarisms correction"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]

--- Martin Sevior <msevior@physics.unimelb.edu.au>
wrote:
>
> On Sun, 22 Sep 2002, [iso-8859-1] Andrew Dunbar
> wrote:
> >
> > What we probably need to do is start designing a
> > grammar checker framework, complete with a plugin
> > interface for extensions, and design the barbarism
> > checker as a plugin for it.
>
> I've discovered that I personally definately need a
> grammar checker so I'm happy to help out though not
> take the lead on a grammar checker.

It's probably time to start building a list of what
we want a grammar checker to actually do. It's a
pretty vague thing really. I think there's an RFE
you can add to already.

> There are two components. The "squiggling"
> implementation and the actually parsing of text.
>
> Regarding the squiggling, we can borrow much of the
> design from the spell-checker.

In fact I'd like to refactor a little and make a
generic squiggling class that both the spelling and
the grammar can use. There's lots of little odditties
in the squiggles (especially on Windows) that will be
much easier to maintain in a single place.

> To remind people this works by building a vector of
> pointers to fl_BlockLayout classes then processing
> these during idle time in the GUI mainloop.
>
> The fl_BlockLayout classes container pointer to text
> in the piecetable which is seperated by white space
> characters into words. These words are fed through
> the spell checker.

White space is not good enough for word separation.
Some of the problems with quotes and/or apostrophes
causing spellcheck problems will be due to this.
Then there are Asian languages which do not use spaces
between words but for which there are open source
libraries available to do it the right way.

The first step is to use the Unicode character
functions that are available now in Win32 and glibc
to tell us whether a character is a letter or not.

> A grammar check would do exactly the same except it
> would have to recognize sentences and parse these
> through to the grammar checker.

Perhaps. The existing grammar checkers don't really
seem to do any parsing at all. They seem to have
some-
thing maybe similar to a regex engine for finding
patterns.

Having some kind of "sentence iterator" and "word
iterator" is a very good idea though. ICU is open
source and has both. But it's big and I'm not sure
how possible it is to use just pieces of it...

> I think we can reuse much of the spell checker code
> so that fl_BlockLayouts are parsed through to both
> the spell checker and the grammar checker.
>
> If a region of the text is found to be suspect the
> text is marked with a green squiggle two pixels
> below the red squggle.
>
> Hmm the more I think about this, the easier it
> seems. We can re-use a lot of the existing classes
> and methods and just add extra code to split
> the text into sentences as well as words.
>
> The grammar checker would have to mark the start and
> end points of the dodgy text and send this info
> back. Then we reuse the squiggle code to draw
> between the points.
>
> I think this would not be hard to get working rather
> quickly.

This sounds very good and is one of the things I've
always wanted to work on! I really want it to be
easily extensibel, especially via plugins.
For instance, I'd love to see a German extension that
can correct when you've used the wrong article:
der vs. die vs. das - MS Word doesn't even do this for
German but it would be great for 2nd language users.

Andrew.

> see the code in the file fl_BlockLayout.cpp
>
> Cheers!
>
> Martin
>

=====
http://linguaphile.sourceforge.net/cgi-bin/translator.pl http://www.abisource.com

__________________________________________________
Do You Yahoo!?
Everything you'll ever need on one web page
from News and Sport to Email and Music Charts
http://uk.my.yahoo.com

Next message: Andrew Dunbar: "Re: The AbiWord side of a grammar checker (was Re: Implementing support for barbarisms correction)"
Previous message: Dom Lachowicz: "Vietnamese PO file"
In reply to: Martin Sevior: "The AbiWord side of a grammar checker (was Re: Implementing support for barbarisms correction)"
Next in thread: Jordi Mas: "Re: Implementing support for barbarisms correction"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ] [ attachment ]

This archive was generated by hypermail 2.1.4 : Sun Sep 22 2002 - 21:31:58 EDT