Re: commit (HEAD): IMPORTANT - 32-bit UT_UCSChar

From: Andrew Dunbar (hippietrail@yahoo.com)
Date: Mon May 06 2002 - 10:45:23 EDT

  • Next message: Alan Horkan: "Re: Maybe we shouldn't spam the world with the 1.0 release announcement?"

     --- Hubert Figuiere <hfiguiere@teaser.fr> wrote: > On
    dim, 2002-05-05 at 17:56, Tomas Frydrych wrote:
    > >
    > > I have committed the changes toward 32-bit
    > internal representation
    > > of Unicode and removed the lock from the src
    > directory. These
    > > changes cover only the main module XP, win32 and
    > gtk code and
    > > the wordperfect importer. I will leave the other
    > platforms and plugins
    > > for others to do, see the notes below.
    > >
    >
    > So from now on, I'll no longer back port stuff to
    > STABLE branch... But
    > the reverse, I will to some extent.

    Just keep it stable (:

    > > Summary of the changes
    > > -------------------------------------
    > > There are three new types now: UT_UCS4Char,
    > UT_UCS2Char and
    > > UT_GrowBufElement. There is a new string class
    > UT_UCS4String,
    > > and new sets of UT_UCS4_ and UT_UCS2_ string
    > functions
    > > replacing the UT_UCS_ functions. All internal
    > Unicode processing
    > > should be done using the UT_UCS4Char and
    > functions. I have left
    > > the UT_UCSChar type in place for the time being,
    > as an equivalent
    > > of the new UT_UCS4Char type; this is a temporary
    > measure that is
    > > meant to make the transition easier and once we
    > are done we will
    > > do a global replace and remove UT_UCSChar from the
    > ut_type.h
    > > file. Consequently, all new code should only use
    > UT_UCS4Char.
    > >
    > > Notes on transferring the remaining code:
    > > (1) Replace any UT_UCS_ calls with UT_UCS4_ or
    > UT_UCS2_ as
    > > appropriate; replace any UT_UCS2String instances
    > with
    > > UT_UCS4String, where appropriate. Outside of
    > impexp code and
    > > the input methods and platform specific text
    > drawing calls this can
    > > be done blindly; in these special case more care
    > is needed.
    >
    > Is there a way to translate UCS2 to UCS4 easily ?
    > Because sometime I get
    > UCS strings from Cocoa and have to pass them as
    > UCS4....

    If it really is UCS-2 yes. You pad it with zeros.
    Assuming the endian is native, of course.
    If it's UTF-16, but telling you it's UCS-2, like
    Windows does, then surrogages will get broken and
    very difficult to track down bugs will appear.

    The other easy way is to just use iconv. There should
    be functions to convert from a UCS2String to a
    UCS4String and vice versa. If there isn't they are
    pretty easy to implement using iconv. It might
    require memory allocation though so you'll have to
    be prepared to handle the case of it failing.

    Oh, and find out if Cocoa is really using UCS-2 or
    UTF-16!

    Andrew Dunbar.

    > Hub
    >
    >

    > ATTACHMENT part 2 application/pgp-signature
    name=signature.asc
     

    =====
    http://linguaphile.sourceforge.net http://www.abisource.com

    __________________________________________________
    Do You Yahoo!?
    Everything you'll ever need on one web page
    from News and Sport to Email and Music Charts
    http://uk.my.yahoo.com



    This archive was generated by hypermail 2.1.4 : Mon May 06 2002 - 10:48:00 EDT