From: F J Franklin (F.J.Franklin@sheffield.ac.uk)
Date: Wed May 08 2002 - 10:34:21 EDT
> > Support is there but incomplete. Byte sequences
> > longer 3 bytes will cause
> > problems, and there isn't a UTF-8 -> UCS-4
> > conversion yet.
> Sorry to keep whining about this but it was all in my lost huge Unicode
> patch over a year ago. UTF-8 sequences can be up to 6 bytes long. We
> should probably leave it up to iconv anyway since we have to handle
> things like overlong sequences, illegal sequences etc. iconv should
> handle this. I think my implementation used the ByteBuf class so that
> it could handle UCS-2 and UCS-4 properly without worrying about all
> those null bytes looking like string terminators and stuff.
Andrew, Andrew, I know. The reason why only 3-byte sequences are handled
is that the routine was written to convert Abi's internal UCS-2. Now that
Abi uses UCS-4 internally I'll add the code to handle 6-byte sequences.
In general I support the use of iconv for conversion between encodings,
but conversion between validated UTF-8 and UCS-4 is trivial and the
[UT_]UTF8String class was designed to handle the conversion without
resorting to iconv.
ps. BTW, do you know anything about the overheads of using various iconv
implementations? or their thread-safety, for that matter? (Genuinely
Francis James Franklin
"No, she really likes me. She told me I look like Britney Spears, and why
would you say that to somebody you don't like?"
--- Elle Woods
This archive was generated by hypermail 2.1.4 : Wed May 08 2002 - 10:37:09 EDT