Commit: nl.dic WAS: Re: OTS - Help in translations is needed!

From: Marc Maurer (j.m.maurer@student.utwente.nl)
Date: Wed Jul 09 2003 - 07:44:08 EDT

  • Next message: Rodrigo Moya: "Re: Commit: GDA work"

    Start of a dutch dictionary

    CVS:
    ----------------------------------------------------------------------
    CVS: Enter Log. Lines beginning with `CVS:' are removed automatically
    CVS:
    CVS: Committing in .
    CVS:
    CVS: Modified Files:
    CVS: dic/Makefile.am
    CVS: Added Files:
    CVS: dic/nl.dic
    CVS:
    ----------------------------------------------------------------------

    Marc

    Op wo 09-07-2003, om 12:55 schreef Nadav Rotem:
    > Hi
    >
    > At the moment Open Text Summarizer can only summarize documents in
    > english and hebrew. Enabling Abiword to summarize documents in your
    > language is easy and fun! All you have to do is create a short text file
    > that has about 200 special words in it.
    >
    >
    > Here is how its done:
    >
    > Name your file (LangCode).dic (for example en.dic for english).
    > In that file you need to put words that are common in your language but
    > are NOT the subject of any article. For example the word "the" in
    > english is very common but is not an "important" word;
    > In other words , we can find the word "the" in almost every sentence and
    > we can't tell anything about the sentence from it. Another example is
    > the word "such" that is redundent (for this use).
    > I know its a little strage but it works.
    >
    > Here is what I do. I take a UTF-8 text file (it has to be unicode) and
    > ask OTS to tell me what words it thinks are key words in the article.
    > here:
    >
    > ots letter.txt --dic=he --keywords | more
    >
    > where "he" is the "Hebrew" dictionary file and letter.txt is the text
    > file.
    >
    > here is an example of such a file (in english this time)
    > Word[15][to]
    > Word[8][the]
    > Word[6][a]
    > Word[5][love]
    > Word[5][Becky]
    > Word[5][October]
    > Word[5][north]
    > ...
    > ...
    >
    > As you can see the word "to" appears 15 times in the text. "To" is not a
    > key-word so we need to place it in our dictionary file. The same goes
    > for "the" and "a". Translating doc/en.dic would work for most germanic
    > languages. Just play with it until you feel you get it right.
    >
    > for more info look into http://www.abisource.com/lxr/source/ots/README
    >
    > Other OTS related news:
    > * OTS made it into Gentoo! to get OTS 0.2.0 under gentoo type "emerge
    > ots";

    -- 
    Marc Maurer <j.m.maurer@student.utwente.nl>
    


    This archive was generated by hypermail 2.1.4 : Wed Jul 09 2003 - 07:51:09 EDT