From: Andrew Dunbar (firstname.lastname@example.org)
Date: Sun Sep 22 2002 - 00:58:38 EDT
--- Jordi Mas <email@example.com> wrote: > Hello,
> A common problem in Catalan language are barbarisms,
> they basically words that are incorrect but that are
> widely use. One common reason for this to happen,
> is because the areas were Catalan is spoken usually
> people also speak French or Spanish, then people
> easily borrow words from other language to other.
> In Catalan, for example, there may glossaries of
> barbarisms, they usually have listed the incorrect
> word, the barbarism, and a proper replacement. For
> example "tamany, mida". Tamany is borrowed from
> Spanish "tamaño" but in Catalan is incorrect, the
> proper word is "mida", that means "size". For spell
> checking programs since "tamany" and "mida" are very
> different words they just cannot make a good
> suggestion, because this is not a typo, is just an
> incorrect words been used.
> Well, coming back to Abiword. I have been thinking
> of implementing an optional barbarism file for every
> language, if the file is present is used, if not
> nothing happens. It is just a list of incorrect
> words and they correct replacemnt.
> Does anybody have a problem with me implementing
> this? Any other languages were this can be useful?
I think that most languages have a concept like this.
English is probably largely the exception. Languages
with active "police" or language academies could
certainly benefit from this and probably already have
word lists around. French comes to mind straight
I think Serbian, Croatian, and Bosnian are now under-
going a process of seperating out the vocabularies in
a way which would be compatible with this too.
It ought to be part of the spelling/grammar/style
infrastructure (not there's a whole lot of
infrastructure yet), it should be extensible in a
similar way to user dictionaries, files should have
language tag names such as "ca.barb" - and try to
avoid over-specific tags such as "ca-ES" if the same
rules are applicable to say France or Andorra.
Files should either always be in UTF-8 or specify
encoding as part of their format. A very simple XML
format is preferred.
In fact maybe it should be part of the grammar-checker
world with a separate option switch. This would make
it easy to share say green squiggle underlines with
the grammar checker.
If adding it to the spelling-checker makes more sense
that would also work.
> Jordi Mas
Do You Yahoo!?
Everything you'll ever need on one web page
from News and Sport to Email and Music Charts
This archive was generated by hypermail 2.1.4 : Sun Sep 22 2002 - 01:03:03 EDT