User:Chenxiajian

From AbiWiki

(Difference between revisions)
Jump to: navigation, search
(Summary of What I have done in GSoc2011_ chenxiajian)
Line 1: Line 1:
-
h1. [S|]ummary of What I have done in GSoc2011
 
-
*Chen Xiajian*
+
=[[ame=_Toc301650913|<SPAN class=apple-style-span><SPAN lang=EN-US style=BACKGROUND: white; COLOR: #00b0f0; LINE-HEIGHT: 240%; mso-bidi-font-size: 12.0pt><FONT face=Calibri>S</FONT></SPAN></SPAN>]]<FONT face=Calibri><SPAN style="mso-bookmark: _Toc301650913"><SPAN class=apple-style-span><SPAN lang=EN-US style="BACKGROUND: white; COLOR: #00b0f0; LINE-HEIGHT: 240%; mso-bidi-font-size: 10.0pt">ummary </SPAN></SPAN></SPAN><SPAN style="mso-bookmark: _Toc301650913"><SPAN class=apple-style-span><SPAN lang=EN-US style="BACKGROUND: white; COLOR: #00b0f0; LINE-HEIGHT: 240%; mso-bidi-font-size: 12.0pt">of </SPAN></SPAN></SPAN><SPAN style="mso-bookmark: _Toc301650913"><SPAN class=apple-style-span><SPAN lang=EN-US style="BACKGROUND: white; COLOR: #00b0f0; LINE-HEIGHT: 240%; mso-bidi-font-size: 10.0pt">What I have done in GSoc2011</SPAN></SPAN></SPAN><SPAN lang=EN-US style="BACKGROUND: white; COLOR: #00b0f0"><?xml:namespace prefix = o ns = "urn:schemas-microsoft-com:office:office" /><o:p></o:p></SPAN></FONT>=<B style="mso-bidi-font-weight: normal"><SPAN lang=EN-US style="FONT-SIZE: 16pt; COLOR: #548dd4; FONT-FAMILY: 'Arial','sans-serif'; mso-bidi-font-size: 12.0pt; mso-themecolor: text2; mso-themetint: 153">Chen Xiajian<o:p></o:p></SPAN>'''</P><SPAN lang=EN-US style="FONT-SIZE: 12pt; FONT-FAMILY: 'Arial','sans-serif'"><o:p></o:p></SPAN></P><SPAN lang=EN-US>[[index.htm#_Toc301650913|<FONT face=Calibri><SPAN style=FONT-SIZE: 14pt; BACKGROUND: white; mso-no-proof: yes>Summary of What I have done in GSoc2011</SPAN><SPAN style=DISPLAY: none; FONT-SIZE: 14pt; COLOR: windowtext; TEXT-DECORATION: none; text-underline: none; mso-no-proof: yes; mso-hide: screen><SPAN style=mso-tab-count: 1 dotted>. </SPAN></SPAN><SPAN style=DISPLAY: none; FONT-SIZE: 14pt; COLOR: windowtext; TEXT-DECORATION: none; text-underline: none; mso-no-proof: yes; mso-hide: screen>1</SPAN></FONT>]]</SPAN><SPAN lang=EN-US style="FONT-SIZE: 14pt; mso-no-proof: yes"><o:p></o:p></SPAN></P><SPAN lang=EN-US>[[index.htm#_Toc301650914|<SPAN style=FONT-SIZE: 14pt; BACKGROUND: white; FONT-FAMILY: 'Arial','sans-serif'; mso-no-proof: yes>1 Hyphenation module in Enchant</SPAN><FONT face=Calibri><SPAN style=DISPLAY: none; FONT-SIZE: 14pt; COLOR: windowtext; TEXT-DECORATION: none; text-underline: none; mso-no-proof: yes; mso-hide: screen><SPAN style=mso-tab-count: 1 dotted> </SPAN></SPAN><SPAN style=DISPLAY: none; FONT-SIZE: 14pt; COLOR: windowtext; TEXT-DECORATION: none; text-underline: none; mso-no-proof: yes; mso-hide: screen>3</SPAN></FONT>]]</SPAN><SPAN lang=EN-US style="FONT-SIZE: 14pt; mso-no-proof: yes"><o:p></o:p></SPAN></P><SPAN lang=EN-US>[[index.htm#_Toc301650915|<SPAN style=FONT-SIZE: 14pt; BACKGROUND: white; FONT-FAMILY: 'Arial','sans-serif'; mso-no-proof: yes>1.1 Add hyphenation function in Enchant</SPAN><FONT face=Calibri><SPAN style=DISPLAY: none; FONT-SIZE: 14pt; COLOR: windowtext; TEXT-DECORATION: none; text-underline: none; mso-no-proof: yes; mso-hide: screen><SPAN style=mso-tab-count: 1 dotted> </SPAN></SPAN><SPAN style=DISPLAY: none; FONT-SIZE: 14pt; COLOR: windowtext; TEXT-DECORATION: none; text-underline: none; mso-no-proof: yes; mso-hide: screen>3</SPAN></FONT>]]</SPAN><SPAN lang=EN-US style="FONT-SIZE: 14pt; mso-no-proof: yes"><o:p></o:p></SPAN></P><SPAN lang=EN-US>[[index.htm#_Toc301650916|<SPAN style=FONT-SIZE: 14pt; BACKGROUND: white; FONT-FAMILY: 'Arial','sans-serif'; mso-no-proof: yes>1.2 Add five backends to support hyphenation</SPAN><FONT face=Calibri><SPAN style=DISPLAY: none; FONT-SIZE: 14pt; COLOR: windowtext; TEXT-DECORATION: none; text-underline: none; mso-no-proof: yes; mso-hide: screen><SPAN style=mso-tab-count: 1 dotted>. </SPAN></SPAN><SPAN style=DISPLAY: none; FONT-SIZE: 14pt; COLOR: windowtext; TEXT-DECORATION: none; text-underline: none; mso-no-proof: yes; mso-hide: screen>3</SPAN></FONT>]]</SPAN><SPAN lang=EN-US style="FONT-SIZE: 14pt; mso-no-proof: yes"><o:p></o:p></SPAN></P><SPAN lang=EN-US>[[index.htm#_Toc301650917|<SPAN style=FONT-SIZE: 14pt; BACKGROUND: white; FONT-FAMILY: 'Arial','sans-serif'; mso-no-proof: yes>1.3 ISpell</SPAN><FONT face=Calibri><SPAN style=DISPLAY: none; FONT-SIZE: 14pt; COLOR: windowtext; TEXT-DECORATION: none; text-underline: none; mso-no-proof: yes; mso-hide: screen><SPAN style=mso-tab-count: 1 dotted> </SPAN></SPAN><SPAN style=DISPLAY: none; FONT-SIZE: 14pt; COLOR: windowtext; TEXT-DECORATION: none; text-underline: none; mso-no-proof: yes; mso-hide: screen>5</SPAN></FONT>]]</SPAN><SPAN lang=EN-US style="FONT-SIZE: 14pt; mso-no-proof: yes"><o:p></o:p></SPAN></P><SPAN lang=EN-US>[[index.htm#_Toc301650918|<SPAN style=FONT-SIZE: 14pt; BACKGROUND: white; FONT-FAMILY: 'Arial','sans-serif'; mso-no-proof: yes>1.4 MySpell</SPAN><FONT face=Calibri><SPAN style=DISPLAY: none; FONT-SIZE: 14pt; COLOR: windowtext; TEXT-DECORATION: none; text-underline: none; mso-no-proof: yes; mso-hide: screen><SPAN style=mso-tab-count: 1 dotted> </SPAN></SPAN><SPAN style=DISPLAY: none; FONT-SIZE: 14pt; COLOR: windowtext; TEXT-DECORATION: none; text-underline: none; mso-no-proof: yes; mso-hide: screen>5</SPAN></FONT>]]</SPAN><SPAN lang=EN-US style="FONT-SIZE: 14pt; mso-no-proof: yes"><o:p></o:p></SPAN></P><SPAN lang=EN-US>[[index.htm#_Toc301650919|<SPAN style=FONT-SIZE: 14pt; BACKGROUND: white; FONT-FAMILY: 'Arial','sans-serif'; mso-no-proof: yes>1.5 zemberek</SPAN><FONT face=Calibri><SPAN style=DISPLAY: none; FONT-SIZE: 14pt; COLOR: windowtext; TEXT-DECORATION: none; text-underline: none; mso-no-proof: yes; mso-hide: screen><SPAN style=mso-tab-count: 1 dotted>. </SPAN></SPAN><SPAN style=DISPLAY: none; FONT-SIZE: 14pt; COLOR: windowtext; TEXT-DECORATION: none; text-underline: none; mso-no-proof: yes; mso-hide: screen>6</SPAN></FONT>]]</SPAN><SPAN lang=EN-US style="FONT-SIZE: 14pt; mso-no-proof: yes"><o:p></o:p></SPAN></P><SPAN lang=EN-US>[[index.htm#_Toc301650920|<SPAN style=FONT-SIZE: 14pt; BACKGROUND: white; FONT-FAMILY: 'Arial','sans-serif'; mso-no-proof: yes>1.6 voikko</SPAN><FONT face=Calibri><SPAN style=DISPLAY: none; FONT-SIZE: 14pt; COLOR: windowtext; TEXT-DECORATION: none; text-underline: none; mso-no-proof: yes; mso-hide: screen><SPAN style=mso-tab-count: 1 dotted>. </SPAN></SPAN><SPAN style=DISPLAY: none; FONT-SIZE: 14pt; COLOR: windowtext; TEXT-DECORATION: none; text-underline: none; mso-no-proof: yes; mso-hide: screen>7</SPAN></FONT>]]</SPAN><SPAN lang=EN-US style="FONT-SIZE: 14pt; mso-no-proof: yes"><o:p></o:p></SPAN></P><SPAN lang=EN-US>[[index.htm#_Toc301650921|<SPAN style=FONT-SIZE: 14pt; BACKGROUND: white; FONT-FAMILY: 'Arial','sans-serif'; mso-no-proof: yes>1.7 Deploy of enchant in Abiword</SPAN><FONT face=Calibri><SPAN style=DISPLAY: none; FONT-SIZE: 14pt; COLOR: windowtext; TEXT-DECORATION: none; text-underline: none; mso-no-proof: yes; mso-hide: screen><SPAN style=mso-tab-count: 1 dotted>. </SPAN></SPAN><SPAN style=DISPLAY: none; FONT-SIZE: 14pt; COLOR: windowtext; TEXT-DECORATION: none; text-underline: none; mso-no-proof: yes; mso-hide: screen>7</SPAN></FONT>]]</SPAN><SPAN lang=EN-US style="FONT-SIZE: 14pt; mso-no-proof: yes"><o:p></o:p></SPAN></P><SPAN lang=EN-US>[[index.htm#_Toc301650922|<FONT face=Calibri><SPAN style=FONT-SIZE: 14pt; BACKGROUND: white; mso-no-proof: yes>1.8 Test in Linux</SPAN><SPAN style=DISPLAY: none; FONT-SIZE: 14pt; COLOR: windowtext; TEXT-DECORATION: none; text-underline: none; mso-no-proof: yes; mso-hide: screen><SPAN style=mso-tab-count: 1 dotted>. </SPAN></SPAN><SPAN style=DISPLAY: none; FONT-SIZE: 14pt; COLOR: windowtext; TEXT-DECORATION: none; text-underline: none; mso-no-proof: yes; mso-hide: screen>8</SPAN></FONT>]]</SPAN><SPAN lang=EN-US style="FONT-SIZE: 14pt; mso-no-proof: yes"><o:p></o:p></SPAN></P><SPAN lang=EN-US>[[index.htm#_Toc301650923|<FONT face=Calibri><SPAN style=FONT-SIZE: 14pt; BACKGROUND: white; mso-no-proof: yes>2 Call the Hyphenation function in Abiword.</SPAN><SPAN style=DISPLAY: none; FONT-SIZE: 14pt; COLOR: windowtext; TEXT-DECORATION: none; text-underline: none; mso-no-proof: yes; mso-hide: screen><SPAN style=mso-tab-count: 1 dotted> </SPAN></SPAN><SPAN style=DISPLAY: none; FONT-SIZE: 14pt; COLOR: windowtext; TEXT-DECORATION: none; text-underline: none; mso-no-proof: yes; mso-hide: screen>8</SPAN></FONT>]]</SPAN><SPAN lang=EN-US style="FONT-SIZE: 14pt; mso-no-proof: yes"><o:p></o:p></SPAN></P><SPAN lang=EN-US>[[index.htm#_Toc301650924|<FONT face=Calibri><SPAN style=FONT-SIZE: 14pt; BACKGROUND: white; mso-no-proof: yes>3 Simple Implementation of Chinese Spell-Check in Enchant</SPAN><SPAN style=DISPLAY: none; FONT-SIZE: 14pt; COLOR: windowtext; TEXT-DECORATION: none; text-underline: none; mso-no-proof: yes; mso-hide: screen><SPAN style=mso-tab-count: 1 dotted> </SPAN></SPAN><SPAN style=DISPLAY: none; FONT-SIZE: 14pt; COLOR: windowtext; TEXT-DECORATION: none; text-underline: none; mso-no-proof: yes; mso-hide: screen>10</SPAN></FONT>]]</SPAN><SPAN lang=EN-US style="FONT-SIZE: 14pt; mso-no-proof: yes"><o:p></o:p></SPAN></P><SPAN lang=EN-US>[[index.htm#_Toc301650925|<FONT face=Calibri><SPAN style=FONT-SIZE: 14pt; BACKGROUND: white; mso-no-proof: yes>4 Code Re-factor and debug</SPAN><SPAN style=DISPLAY: none; FONT-SIZE: 14pt; COLOR: windowtext; TEXT-DECORATION: none; text-underline: none; mso-no-proof: yes; mso-hide: screen><SPAN style=mso-tab-count: 1 dotted>. </SPAN></SPAN><SPAN style=DISPLAY: none; FONT-SIZE: 14pt; COLOR: windowtext; TEXT-DECORATION: none; text-underline: none; mso-no-proof: yes; mso-hide: screen>10</SPAN></FONT>]]</SPAN><SPAN lang=EN-US style="FONT-SIZE: 14pt; mso-no-proof: yes"><o:p></o:p></SPAN></P><SPAN lang=EN-US>[[index.htm#_Toc301650926|<FONT face=Calibri><SPAN style=FONT-SIZE: 14pt; BACKGROUND: white; mso-no-proof: yes>5. User interface to manage hyphenation</SPAN><SPAN style=DISPLAY: none; FONT-SIZE: 14pt; COLOR: windowtext; TEXT-DECORATION: none; text-underline: none; mso-no-proof: yes; mso-hide: screen><SPAN style=mso-tab-count: 1 dotted>. </SPAN></SPAN><SPAN style=DISPLAY: none; FONT-SIZE: 14pt; COLOR: windowtext; TEXT-DECORATION: none; text-underline: none; mso-no-proof: yes; mso-hide: screen>10</SPAN></FONT>]]</SPAN><SPAN lang=EN-US style="FONT-SIZE: 14pt; mso-no-proof: yes"><o:p></o:p></SPAN></P><SPAN lang=EN-US>[[index.htm#_Toc301650927|<SPAN style=FONT-SIZE: 14pt; BACKGROUND: white; mso-no-proof: yes><FONT face=Calibri>6. How to </FONT></SPAN><SPAN style=FONT-SIZE: 14pt; BACKGROUND: white; FONT-FAMILY: 'Arial','sans-serif'; mso-no-proof: yes>Support more languages</SPAN><FONT face=Calibri><SPAN style=DISPLAY: none; FONT-SIZE: 14pt; COLOR: windowtext; TEXT-DECORATION: none; text-underline: none; mso-no-proof: yes; mso-hide: screen><SPAN style=mso-tab-count: 1 dotted>. </SPAN></SPAN><SPAN style=DISPLAY: none; FONT-SIZE: 14pt; COLOR: windowtext; TEXT-DECORATION: none; text-underline: none; mso-no-proof: yes; mso-hide: screen>11</SPAN></FONT>]]</SPAN><SPAN lang=EN-US style="FONT-SIZE: 14pt; mso-no-proof: yes"><o:p></o:p></SPAN></P><SPAN lang=EN-US>[[index.htm#_Toc301650928|<FONT face=Calibri><SPAN style=FONT-SIZE: 14pt; BACKGROUND: white; mso-no-proof: yes>6.1 How to support more languages in ISpell</SPAN><SPAN style=DISPLAY: none; FONT-SIZE: 14pt; COLOR: windowtext; TEXT-DECORATION: none; text-underline: none; mso-no-proof: yes; mso-hide: screen><SPAN style=mso-tab-count: 1 dotted> </SPAN></SPAN><SPAN style=DISPLAY: none; FONT-SIZE: 14pt; COLOR: windowtext; TEXT-DECORATION: none; text-underline: none; mso-no-proof: yes; mso-hide: screen>12</SPAN></FONT>]]</SPAN><SPAN lang=EN-US style="FONT-SIZE: 14pt; mso-no-proof: yes"><o:p></o:p></SPAN></P><SPAN lang=EN-US>[[index.htm#_Toc301650929|<FONT face=Calibri><SPAN style=FONT-SIZE: 14pt; BACKGROUND: white; mso-no-proof: yes>6.2 How to support more languages in mySepll</SPAN><SPAN style=DISPLAY: none; FONT-SIZE: 14pt; COLOR: windowtext; TEXT-DECORATION: none; text-underline: none; mso-no-proof: yes; mso-hide: screen><SPAN style=mso-tab-count: 1 dotted> </SPAN></SPAN><SPAN style=DISPLAY: none; FONT-SIZE: 14pt; COLOR: windowtext; TEXT-DECORATION: none; text-underline: none; mso-no-proof: yes; mso-hide: screen>13</SPAN></FONT>]]</SPAN><SPAN lang=EN-US style="FONT-SIZE: 14pt; mso-no-proof: yes"><o:p></o:p></SPAN></P><SPAN lang=EN-US>[[index.htm#_Toc301650930|<FONT face=Calibri><SPAN style=FONT-SIZE: 14pt; BACKGROUND: white; mso-no-proof: yes>7. How to extend the enchant function</SPAN><SPAN style=DISPLAY: none; FONT-SIZE: 14pt; COLOR: windowtext; TEXT-DECORATION: none; text-underline: none; mso-no-proof: yes; mso-hide: screen><SPAN style=mso-tab-count: 1 dotted>. </SPAN></SPAN><SPAN style=DISPLAY: none; FONT-SIZE: 14pt; COLOR: windowtext; TEXT-DECORATION: none; text-underline: none; mso-no-proof: yes; mso-hide: screen>13</SPAN></FONT>]]</SPAN><SPAN lang=EN-US style="FONT-SIZE: 14pt; mso-no-proof: yes"><o:p></o:p></SPAN></P>
-
 
+
-
[Summary of What I have done in GSoc2011. 1|#_Toc301650913]
+
-
 
+
-
[1 Hyphenation module in Enchant 3|#_Toc301650914]
+
-
 
+
-
[1.1 Add hyphenation function in Enchant 3|#_Toc301650915]
+
-
 
+
-
[1.2 Add five backends to support hyphenation. 3|#_Toc301650916]
+
-
 
+
-
[1.3 ISpell 5|#_Toc301650917]
+
-
 
+
-
[1.4 MySpell 5|#_Toc301650918]
+
-
 
+
-
[1.5 zemberek. 6|#_Toc301650919]
+
-
 
+
-
[1.6 voikko. 7|#_Toc301650920]
+
-
 
+
-
[1.7 Deploy of enchant in Abiword. 7|#_Toc301650921]
+
-
 
+
-
[1.8 Test in Linux. 8|#_Toc301650922]
+
-
 
+
-
[2 Call the Hyphenation function in Abiword. 8|#_Toc301650923]
+
-
 
+
-
[3 Simple Implementation of Chinese Spell-Check in Enchant 10|#_Toc301650924]
+
-
 
+
-
[4 Code Re-factor and debug. 10|#_Toc301650925]
+
-
 
+
-
[5. User interface to manage hyphenation. 10|#_Toc301650926]
+
-
 
+
-
[6. How to Support more languages. 11|#_Toc301650927]
+
-
 
+
-
[6.1 How to support more languages in ISpell 12|#_Toc301650928]
+
-
 
+
-
[6.2 How to support more languages in mySepll 13|#_Toc301650929]
+
-
 
+
-
[7. How to extend the enchant function. 13|#_Toc301650930]
+
-
 
+
-
Until now, my works in GSoc2011 including four parts as following:
+
-
 
+
-
*1*.*Hyphenation module in Enchant***
+
-
 
+
-
Ø  Read and get totally understand the source code of Enchant
+
-
 
+
-
Ø  Reuse the abstract layer of Enchant and add Hyphenation function in Enchant, so that we can add more language easily
+
-
 
+
-
Ø  Deal with more languages
+
-
 
+
-
Ø  Add five backend implementation, including ispell, myspell, zemberek, voikko, uspell
+
-
 
+
-
Ø  Deal with the spelling-checking module
+
-
 
+
-
*2**.**Call the Hyphenation function in Abiword. *
+
-
 
+
-
Ø  Find split info using enchant_dict_hyphenate
+
-
 
+
-
Ø  Split Text_Run to split word pass the line width and keep their format
+
-
 
+
-
Ø  Deal with user's operation(select, delete, cut, paste)
+
-
 
+
-
Ø  User can select weather to enable the hyphenation function
+
-
 
+
-
*3. Simple Implementation of Chinese Spell-Checking in Enchant*
+
-
 
+
-
Ø  Add a simple spell-check framework for Chinese in Enchant
+
-
 
+
-
Ø  Add library to support
+
-
 
+
-
Ø  Some survey about Chinese Spell-checking
+
-
 
+
-
*4. Code Re-factor and debug*
+
-
 
+
-
Ø  Code Re-factor, include keep the code flexible
+
-
 
+
-
Ø  Debug coding problem
+
-
 
+
-
*5. User interface to manage hyphenation*
+
-
 
+
-
Ø  Windows, Linux, and Cocoa
+
-
 
+
-
*6. How to Support more languages*
+
-
 
+
-
Ø  How to support more languages in ISpell
+
-
 
+
-
Ø  How to support more languages in mySepll
+
-
 
+
-
*7. How to extend the enchant function*
+
-
 
+
-
The detail things:
+
-
 
+
-
h2. [1 Hyphenation module in Enchant|]
+
-
 
+
-
h3. [1.1 Add hyphenation function in Enchant|]
+
-
 
+
-
Firstly, I add hyphenation method in Enchant:
+
-
 
+
-
================the code===========
+
-
 
+
-
I think we can combine the hyphenation with spell-checking together, So that we can make the code more flexible. In my opinion, the hyphenation function defines as following:
+
-
 
+
-
*EnchantDict* enchant_broker_request_dict (EnchantBroker* broker, const***
+
-
 
+
-
*char *const lang); //same as spell-checking***
+
-
 
+
-
*char *enchant_dict_hyphenate(EnchantDict *dict, const char *const word,size_t len);*
+
-
 
+
-
In order to achieve the function and implement in abstract layer, we need to add hyphenation function in EnchantDict. something like, just as a function pointer:
+
-
 
+
-
*char* (*hyphenate) (struct str_enchant_dict * me,*
+
-
 
+
-
*                          const char *const word, size_t len,*
+
-
 
+
-
*                          size_t * out_n_suggs);*
+
-
 
+
-
and the function is implement by the backend. Take “ispell” as example:
+
-
 
+
-
*static char * ispell_dict_hyphenate (EnchantDict * me, const char *const word,*
+
-
 
+
-
*                    size_t len, size_t * out_n_suggs)*
+
-
 
+
-
*{*
+
-
 
+
-
*       ISpellChecker * checker;*
+
-
 
+
-
*       checker = (ISpellChecker *) me->user_data;*
+
-
 
+
-
*       return checker->hyphenate (word, len, out_n_suggs);*
+
-
 
+
-
*}*
+
-
 
+
-
Finally, we set the connetion
+
-
 
+
-
*dict->hyphenate = ispell_dict_hyphenate;*
+
-
 
+
-
* dict->suggest = hspell_dict_hyphenate;*
+
-
 
+
-
*dict->suggest = zemberek_dict_hyphenate;*
+
-
 
+
-
h3. [1.2 Add five backends to support hyphenation|]
+
-
 
+
-
including ispell, myspell, zemberek, voikko, uspell
+
-
 
+
-
Ø  *Hunspell: using seperated dictionary: such as hyph_en_us.dic.  we can download dic from internet***
+
-
 
+
-
Ø  *Libhyphenaiton: the dictionary is provided by author, sometimes limited***
+
-
 
+
-
Ø  *Zemberek: for Turkis***
+
-
 
+
-
Ø  *Voikko: for Finnish***
+
-
 
+
-
the changes:
+
-
 
+
-
1 deleted the unneed connection, such as HSpell
+
-
 
+
-
2 add hunspell(myspell) hyphenation code
+
-
 
+
-
3 implement hyphenation using hunspell
+
-
 
+
-
4 implement hyphenation using Zemberek
+
-
 
+
-
======1 deleted the unneed connection, such as HSpell===========
+
-
 
+
-
Hebrew don’t need any hyphenation
+
-
 
+
-
Yiddish don’t need any hyphenation
+
-
 
+
-
=======2 Implement hyphenation using hunspell
+
-
 
+
-
In order to use libhyphenation. We need to add files:
+
-
 
+
-
*hyphen/hnjalloc.h***
+
-
 
+
-
*hyphen/hnjalloc.c***
+
-
 
+
-
*hyphen/hyph_en_US.dic***
+
-
 
+
-
*hyphen/hyphen.c***
+
-
 
+
-
*hyphen/hyphen.gyp***
+
-
 
+
-
*hyphen/hyphen.h***
+
-
 
+
-
*hyphen/hyphen.patch***
+
-
 
+
-
*hyphen/hyphen.tex***
+
-
 
+
-
========3 Implement hyphenation using Zemberek
+
-
 
+
-
just using dbus_g_proxy_call the same as Spell-Check in Zemberek:
+
-
 
+
-
the hyphenation is as following
+
-
 
+
-
* char* Zemberek::hyphenate(const char* word)***
+
-
 
+
-
*{***
+
-
 
+
-
*       char* result;***
+
-
 
+
-
*       GError *Error = NULL;***
+
-
 
+
-
*       if (!dbus_g_proxy_call (proxy, "hecele", &Error,***
+
-
 
+
-
*               G_TYPE_STRING,word,G_TYPE_INVALID,***
+
-
 
+
-
*               G_TYPE_STRV, &result,G_TYPE_INVALID)) {***
+
-
 
+
-
*                       g_error_free (Error);***
+
-
 
+
-
*                       return NULL;***
+
-
 
+
-
*       }***
+
-
 
+
-
*       char*result=0;***
+
-
 
+
-
*       return result;***
+
-
 
+
-
*}***
+
-
 
+
-
h3. [1.3 ISpell|]
+
-
 
+
-
I used Libhyphenation in ISpell. The simple code is just like this:
+
-
 
+
-
*static char **
+
-
 
+
-
*ispell_dict_hyphenate (EnchantDict * me, const char *const word)*
+
-
 
+
-
*{*
+
-
 
+
-
*       ISpellChecker * checker;*
+
-
 
+
-
* *
+
-
 
+
-
*       checker = (ISpellChecker *) me->user_data;*
+
-
 
+
-
*       if(me->tag!="")*
+
-
 
+
-
*         return checker->hyphenate (word,me->tag);*
+
-
 
+
-
*    return checker->hyphenate (word,"en_us");*
+
-
 
+
-
*}*
+
-
 
+
-
*The concrete code in ISpellChecker is :*
+
-
 
+
-
*char **
+
-
 
+
-
*ISpellChecker::hyphenate(const char * const utf8Word, const char *const tag)*
+
-
 
+
-
*{  //we must choose the right language tag*
+
-
 
+
-
*       char* param_value = enchant_broker_get_param (m_broker, "enchant.ispell.hyphenation.dictionary.path");*
+
-
 
+
-
*       if(languageMap[tag]!="")*
+
-
 
+
-
*       {*
+
-
 
+
-
*              string result=Hyphenator(RFC_3066::Language(languageMap[tag]),param_value).hyphenate(utf8Word).c_str();*
+
-
 
+
-
* *
+
-
 
+
-
*              char* temp=new char[result.length()];*
+
-
 
+
-
*              strcpy(temp,result.c_str());*
+
-
 
+
-
*              return temp;*
+
-
 
+
-
*       }*
+
-
 
+
-
*       return NULL;*
+
-
 
+
-
*}*
+
-
 
+
-
h3. [1.4 MySpell|]
+
-
 
+
-
I used Libhyphenate in ISpell. The simple code is just like this:
+
-
 
+
-
*char**
+
-
 
+
-
*MySpellChecker::hyphenate (const char* const word, size_t len,char* tag)*
+
-
 
+
-
*{*
+
-
 
+
-
*       if(len==-1) len=strlen(word);*
+
-
 
+
-
*       if (len > MAXWORDLEN *
+
-
 
+
-
*              || !g_iconv_is_valid(m_translate_in)*
+
-
 
+
-
*              || !g_iconv_is_valid(m_translate_out))*
+
-
 
+
-
*              return 0;*
+
-
 
+
-
*       char* result=0;*
+
-
 
+
-
*       myspell->hyphenate(word,result,tag);*
+
-
 
+
-
*       return result;*
+
-
 
+
-
*}*
+
-
 
+
-
*The concrete code in MySpellChecker is :*
+
-
 
+
-
*void Hunspell::hyphenate( const char* const word, char* result, char* tag )*
+
-
 
+
-
*{*
+
-
 
+
-
*       HyphenDict *dict;    *
+
-
 
+
-
*       char buf[BUFSIZE + 1];  *
+
-
 
+
-
*       char *hyphens=new char[BUFSIZE + 1]; *
+
-
 
+
-
*       char ** rep;*
+
-
 
+
-
*       int * pos;*
+
-
 
+
-
*       int * cut;*
+
-
 
+
-
*       /* load the hyphenation dictionary */  *
+
-
 
+
-
*       string filePath="hyph_";*
+
-
 
+
-
*       filePath+=tag;*
+
-
 
+
-
*       filePath+=".dic";*
+
-
 
+
-
*       if ((dict = hnj_hyphen_load(filePath.c_str())) == NULL) {*
+
-
 
+
-
*              fprintf(stderr, "Couldn't find file %s\n",tag);*
+
-
 
+
-
*              fflush(stderr);*
+
-
 
+
-
*              exit(1);*
+
-
 
+
-
*       }*
+
-
 
+
-
*     int len=strlen(word);*
+
-
 
+
-
*     if (hnj_hyphen_hyphenate2(dict, word, len-1, hyphens, NULL, &rep, &pos, &cut)) {*
+
-
 
+
-
*                            free(hyphens);*
+
-
 
+
-
*                            fprintf(stderr, "hyphenation error\n");*
+
-
 
+
-
*                            exit(1);*
+
-
 
+
-
*              }*
+
-
 
+
-
* *
+
-
 
+
-
*       hnj_hyphen_free(dict);*
+
-
 
+
-
*       result=hyphens;       *
+
-
 
+
-
*}*
+
-
 
+
-
h3. [1.5 zemberek|]
+
-
 
+
-
The way in Zemberek is same with the two above:
+
-
 
+
-
*static char**
+
-
 
+
-
*zemberek_dict_hyphenate (EnchantDict * me, const char *const word)*
+
-
 
+
-
*{*
+
-
 
+
-
*         Zemberek *checker;*
+
-
 
+
-
*         checker = (Zemberek *) me->user_data;*
+
-
 
+
-
*         return checker->hyphenate (word);*
+
-
 
+
-
*}*
+
-
 
+
-
But the way for the concrete implementation is different from the two. We use *zemberek_service*
+
-
 
+
-
*char* Zemberek::hyphenate(const char* word)*
+
-
 
+
-
*{*
+
-
 
+
-
*         char* result;*
+
-
 
+
-
*         GError *Error = NULL;*
+
-
 
+
-
*         if (!dbus_g_proxy_call (proxy, "hecele", &Error,*
+
-
 
+
-
*                   G_TYPE_STRING,word,G_TYPE_INVALID,*
+
-
 
+
-
*                   G_TYPE_STRV, &result,G_TYPE_INVALID)) {*
+
-
 
+
-
*                            g_error_free (Error);*
+
-
 
+
-
*                            return NULL;*
+
-
 
+
-
*         }*
+
-
 
+
-
* *
+
-
 
+
-
*         char*result=0;*
+
-
 
+
-
*         return result; *
+
-
 
+
-
*}*
+
-
 
+
-
h3. [1.6 voikko|]
+
-
 
+
-
The hyphenation implementation in Voikko is easy since Voikko has hyphenaiton’s API.
+
-
 
+
-
*static char ***
+
-
 
+
-
*voikko_dict_suggest (EnchantDict * me, const char *const word,*
+
-
 
+
-
*                        size_t len, size_t * out_n_suggs)*
+
-
 
+
-
*{*
+
-
 
+
-
*         char **sugg_arr;*
+
-
 
+
-
*         int voikko_handle;*
+
-
 
+
-
* *
+
-
 
+
-
*         voikko_handle = (long) me->user_data;*
+
-
 
+
-
*         sugg_arr =** voikko_suggest_cstr**(voikko_handle, word);*
+
-
 
+
-
*         if (sugg_arr == NULL)*
+
-
 
+
-
*                   return NULL;*
+
-
 
+
-
*         for (*out_n_suggs = 0; sugg_arr[*out_n_suggs] != NULL; (*out_n_suggs)++);*
+
-
 
+
-
*         return sugg_arr;*
+
-
 
+
-
*}*
+
-
 
+
-
h3. [1.7 Deploy of enchant in Abiword|]
+
-
 
+
-
I just copy the buliding result of enchant to the right place in Abiword:
+
-
 
+
-
*enchant\bin\Debug\libenchant_myspell.dll ---->abiword\msvc2008\Debug\lib\enchant\libenchant_myspell.dll***
+
-
 
+
-
*enchant\bin\Debug\libenchant_ispell.dll ---->abiword\msvc2008\Debug\lib\enchant\libenchant_ispell.dll***
+
-
 
+
-
*enchant\bin\Debug\libenchant.dll---->*
+
-
 
+
-
*abiword\msvc2008\Debug\bin\ibenchant.dll***
+
-
 
+
-
h3. [1.8|] Test in Linux
+
-
 
+
-
I have test the Enchant module in RedHat.  It works fine for me.
+
-
 
+
-
h2. [2 Call the H|]yphenation function in Abiword.
+
-
 
+
-
Ø  Split run to split word and keep the format
+
-
 
+
-
Ø  Find split info
+
-
 
+
-
Ø  Deal with user's operation(select, delete, cut, paste)
+
-
 
+
-
*Main Goal*: call hyphenation module of enchant to display the hyphenation result in abiword. After user's operation, refresh the hyphenation-result accordingly include user adding new word, delete word, copy word, cut word
+
-
 
+
-
The main code is adding in the format function in LineBreaker.h(cpp)
+
-
 
+
-
*// find the split point*
+
-
 
+
-
*while (pRunToBump && pLine->getNumRunsInLine() && (pLine->getLastRun() != m_pLastRunToKeep))*
+
-
 
+
-
*              {*
+
-
 
+
-
*                     UT_ASSERT(pRunToBump->getLine() == pLine);*
+
-
 
+
-
*                     if(!pLine->removeRun(pRunToBump))*
+
-
 
+
-
*                     {*
+
-
 
+
-
*                            pRunToBump->setLine(NULL);*
+
-
 
+
-
*                     }*
+
-
 
+
-
*                     UT_ASSERT(pLine->getLastRun()->getType() != FPRUN_ENDOFPARAGRAPH);*
+
-
 
+
-
*                     if(pLine->getLastRun()->getType() == FPRUN_ENDOFPARAGRAPH)*
+
-
 
+
-
*                     {*
+
-
 
+
-
*                            fp_Run * pNuke = pLine->getLastRun();*
+
-
 
+
-
*                            pLine->removeRun(pNuke);*
+
-
 
+
-
*                     }*
+
-
 
+
-
*              pRunToBump->printText();  //trace out debug message & run two time*
+
-
 
+
-
*              pNextLine->insertRun(pRunToBump);  //called when create new line*
+
-
 
+
-
*                     // to get the split word               *
+
-
 
+
-
*                     if (!(pRunToBump->getPrevRun() && pLine->getNumRunsInLine() && (pLine->getLastRun() != m_pLastRunToKeep)))*
+
-
 
+
-
*                     {*
+
-
 
+
-
*                            pRunToSplit=pRunToBump;*
+
-
 
+
-
*                            PD_StruxIterator text(pRunToBump->getBlock()->getStruxDocHandle(),*
+
-
 
+
-
*                                   pRunToBump->getBlockOffset() + fl_BLOCK_STRUX_OFFSET);*
+
-
 
+
-
* *
+
-
 
+
-
*                            text.setUpperLimit(text.getPosition() + pRunToBump->getLength() - 1);*
+
-
 
+
-
*                            UT_ASSERT_HARMLESS( text.getStatus() == UTIter_OK );*
+
-
 
+
-
*                            UT_UTF8String sTmp;*
+
-
 
+
-
*                            while(text.getStatus() == UTIter_OK)*
+
-
 
+
-
*                            {*
+
-
 
+
-
*                                   UT_UCS4Char c = text.getChar();*
+
-
 
+
-
*                                   UT_DEBUGMSG(("| %d |",c));*
+
-
 
+
-
*                                   if(c >= ' ' && c <128)*
+
-
 
+
-
*                                          sTmp +=  static_cast<char>(c);*
+
-
 
+
-
*                                   ++text;*
+
-
 
+
-
*                            }*
+
-
 
+
-
*                            UT_DEBUGMSG(("The Split Text |%s| \n",sTmp.utf8_str()));*
+
-
 
+
-
*                            if(sTmp.utf8_str()!=0) *
+
-
 
+
-
*                            {*
+
-
 
+
-
*                    pWordToSplit=sTmp;                              *
+
-
 
+
-
*                                   UT_DEBUGMSG(("wordToSplit |%s| \n",pWordToSplit.utf8_str()));*
+
-
 
+
-
*                            }                           *
+
-
 
+
-
*                     }                    *
+
-
 
+
-
*                     pRunToBump = pRunToBump->getPrevRun();*
+
-
 
+
-
*                     UT_DEBUGMSG(("Next runToBump %x \n",pRunToBump));*
+
-
 
+
-
*              }*
+
-
 
+
-
*       }*
+
-
 
+
-
*       //modify src/text/fmt/xp/fb_LineBreaker.cpp to place hypernation points*
+
-
 
+
-
*       //spit the word*
+
-
 
+
-
*       if(pWordToSplit.length()!=NULL)*
+
-
 
+
-
*       {*
+
-
 
+
-
*       pWordHyphenationResult=pBlock->_hyphenateWord(pWordToSplit.ucs4_str().ucs4_str(),0,0);*
+
-
 
+
-
*              int tickLeft=pLine->getAvailableWidth();*
+
-
 
+
-
*              if (pWordHyphenationResult && *pWordHyphenationResult){*
+
-
 
+
-
*                     gchar *c = g_ucs4_to_utf8(pWordHyphenationResult, -1, NULL, NULL, NULL);*
+
-
 
+
-
*                     for(int index=g_utf8_strlen(c,NULL);index>=0;--index)*
+
-
 
+
-
*                     {*
+
-
 
+
-
*                            if(pWordHyphenationResult[index]=='-'&&index<tickLeft)*
+
-
 
+
-
*                            {*
+
-
 
+
-
*                                   pBreakPoint=index;*
+
-
 
+
-
*                                   fp_TextRun* textout=static_cast<fp_TextRun*>(pRunToSplit);*
+
-
 
+
-
*                                   textout->split(pBreakPoint);*
+
-
 
+
-
*                            }*
+
-
 
+
-
*                     }*
+
-
 
+
-
*              }*
+
-
 
+
-
*       }*
+
-
 
+
-
* *
+
-
 
+
-
h2. [3 Simple Implementation of Chinese Spell-Check in Enchant|]
+
-
 
+
-
After GSoc2011, I would like to add Chinese Spell-Check in Enchant. Chinese Spell-Check is also a very important issue in Word-Processor. I found some lib to support; I just build a simple framework since time is limit.
+
-
 
+
-
The main function:
+
-
 
+
-
h2. [4 Code Re-factor and debug|]
+
-
 
+
-
I have finish the code re-factor both in Enchant and Abiword. Code Re-factor works:
+
-
 
+
-
1 deal with some ugly code
+
-
 
+
-
2 deal with the exception
+
-
 
+
-
h2. [5. |]User interface to manage hyphenation
+
-
 
+
-
Doing now, user can enable or disable hyphenation function in user interface (GUI).
+
-
 
+
-
Ø  I have finished GUI in Windows, Linux, and Cocoa.
+
-
 
+
-
Ø  Most languages have been translated for the globalization.
+
-
 
+
-
Take Windows GUI for example, user can check the checkbox for enable or disable hyphenation function.
+
-
 
+
-
Linux and Cocoa need more tests.
+
-
 
+
-
h2. [6. |]How to Support more languages
+
-
 
+
-
As mentioned before, we use Enchant to support more languages. So we have five backend to support more language. Take ISpell and mySpell for example.
+
-
 
+
-
In the folder “abiword\msvc2008\Debug\” there are the folder for hyphenation: Spell and mySpell. And there is two folder for their dictionary.
+
-
 
+
-
h3. [6.1|] How to support more languages in ISpell
+
-
 
+
-
Go into the ISpell, you will see the folder language; you can just copy your languages’ hyphenation dictionary into it. So that our abiword will support your language’s hyphenation.
+
-
 
+
-
Now we support de, en, es, and fr.
+
-
 
+
-
h3. [6.2|] How to support more languages in mySepll
+
-
 
+
-
The same as ISpell, to support more languages in mySpell, we can refer to the myspell folder.
+
-
 
+
-
h2. [7. How to extend the enchant function|]
+
-
 
+
-
I have read much codes in enchant. So I think enchant is a very useful framework for you to support dictionary-need function, such as spell-check, hyphenation. To extend the function in Enchant, we need to do the following things:
+
-
 
+
-
*1 In order to achieve this, we need to add concreate function in EnchantDict firstly. Something like:*
+
-
 
+
-
char **(*hyphenate) (struct str_enchant_dict * me,
+
-
 
+
-
const char *const word, size_t len,
+
-
 
+
-
size_t * out_n_suggs);
+
-
 
+
-
*2 the function is implement by the backend.*
+
-
 
+
-
static char **
+
-
 
+
-
ispell_dict_hyphenate (EnchantDict * me, const char *const word,
+
-
 
+
-
size_t len, size_t * out_n_suggs)
+
-
 
+
-
{
+
-
 
+
-
ISpellChecker * checker;
+
-
 
+
-
checker = (ISpellChecker *) me->user_data;
+
-
 
+
-
return checker->hyphenate (word, len, out_n_suggs);
+
-
 
+
-
}
+
-
 
+
-
*3 we set the connetion with dic*
+
-
 
+
-
dict->hyphenate = ispell_dict_hyphenate;
+
-
 
+
-
dict->suggest = hspell_dict_hyphenate;
+
-
 
+
-
dict->suggest = zemberek_dict_hyphenate;
+

Revision as of 17:44, 20 August 2011

=Summary of What I have done in GSoc2011<?xml:namespace prefix = o ns = "urn:schemas-microsoft-com:office:office" /><o:p></o:p>=Chen Xiajian<o:p></o:p></P><o:p></o:p></P>Summary of What I have done in GSoc2011. 1<o:p></o:p></P>1 Hyphenation module in Enchant 3<o:p></o:p></P>1.1 Add hyphenation function in Enchant 3<o:p></o:p></P>1.2 Add five backends to support hyphenation. 3<o:p></o:p></P>1.3 ISpell 5<o:p></o:p></P>1.4 MySpell 5<o:p></o:p></P>1.5 zemberek. 6<o:p></o:p></P>1.6 voikko. 7<o:p></o:p></P>1.7 Deploy of enchant in Abiword. 7<o:p></o:p></P>1.8 Test in Linux. 8<o:p></o:p></P>2 Call the Hyphenation function in Abiword. 8<o:p></o:p></P>3 Simple Implementation of Chinese Spell-Check in Enchant 10<o:p></o:p></P>4 Code Re-factor and debug. 10<o:p></o:p></P>5. User interface to manage hyphenation. 10<o:p></o:p></P>6. How to Support more languages. 11<o:p></o:p></P>6.1 How to support more languages in ISpell 12<o:p></o:p></P>6.2 How to support more languages in mySepll 13<o:p></o:p></P>7. How to extend the enchant function. 13<o:p></o:p></P>

Personal tools