Subject: Re: String encoding questions
From: Dom Lachowicz (dominicl@seas.upenn.edu)
Date: Wed Aug 22 2001 - 15:58:56 CDT
Quoting David Mandelin <mandelin@cs.wisc.edu>:
> 1. When I see a string variable of type 'char *' in AbiWord, how do I
> tell if it is ASCII, native, or something else?
>
> 2. What is the Right Way to convert a string from ASCII to Unicode (or
> native to Unicode) in AbiWord? I found 4 ways: (1) UT_Mbtowc, (2)
> UT_iconv, (3) UT_convert, and (4) XAP_EncodingManager::nativeToU.
> Methods (1) and (4) convert a character at a time.
>
> 3. What exactly is UT_convert supposed to do? I ask because the comment
> above it is incorrect, it doesn't quite work if you are converting to a
> wide-char encoding, and it doesn't seem to be used anywhere.
Hello,
It really depends on what you want to do. It's safe to say that any char *
string that you see floating around in Abi isn't unicode. We have 2 string
classes that I suggest you use if you want better string handling:
UT_String
UT_UCS2String
Now, for converting things, I really don't recommend ::nativeToU or UT_Mbtowc
at all. I would use UT_iconv or UT_convert.
UT_convert is a wrapper around iconv because of how much I hate iconv.
UT_convert does a 1-shot conversion between charsets, and I'm pretty sure that
it works (the original code was based on working code found in GLIB, then Mike
Nordell and I rewrote it and Frodo made sure that it worked for use in our
Pspell spell-checking driver.
I highly recommend UT_convert if you don't need to keep around an iconv_t
handle or need a 1-shot conversion. If not, I recommend UT_iconv. Do not use
iconv - use our wrapper functions.
Also, I need to re-integrate a parts of a large patch from Andrew Dunbar that
I had lying around which dealt with UCS-2 strings and character encodings. I'm
afraid that I'll have to do this by hand, though ;-(
Dom
This archive was generated by hypermail 2b25 : Wed Aug 22 2001 - 15:58:59 CDT