Subject: font for quotes (was Re: Some problems with the word importer...)
From: Paul Rohr (firstname.lastname@example.org)
Date: Tue Feb 22 2000 - 15:20:05 CST
This is essentially a font problem. The canonical Unicode encodings for
various quote characters include:
#define UCS_LQUOTE ((UT_UCSChar)0x2018)
#define UCS_RQUOTE ((UT_UCSChar)0x2019)
#define UCS_LDBLQUOTE ((UT_UCSChar)0x201c)
#define UCS_RDBLQUOTE ((UT_UCSChar)0x201d)
We're currently importing these as-is from Word, which is the correct
behavior. These are *very* common characters, so on platforms which have
Unicode-aware fonts, everything Just Works.
The trouble starts when using fonts which do *not* have entries at a
given codepoint. In this situation, depending on your platform, the font
renderer will draw a slug or whitespace or whatever for each unmatched
I'm not totally up to speed on the state of the Unix font-handling code, but
I'm pretty sure that most (or all) of the Unix fonts we're using do *not*
have entries at these codepoints, which would explain the behavior you're
There are two categories of valid solutions to this kind of problem:
1. Fix the fonts.
Over the long term, this is clearly the ideal solution. It would be a
*very* useful project to remap existing fonts so the glyphs are indexed to
the correct Unicode codepoints (instead of whatever charset they're
currently encoded in).
Unlike attempts to create *new* Unicode fonts, there's no need for
typographic skills here. Basically, you'd just need code which knew how to:
- open up a font file,
- recognize the current charset / encoding,
- remap the index for each code point (probably using libiconv),
- and then save the "Unicoded" font back out to a new file,
- being sure to update the charset indicator for that new font. ;-)
I haven't tried to do this, but I suspect the biggest obstacle here would
probably be the IP issues, if any.
2. Add code workarounds to let people use "broken" fonts.
However, in the mean time, people may want to investigate font-substitution
and character-substitution tricks in our text measurement and rendering
code. Essentially what you'd be doing here would be recognizing cases where
a Unicode character couldn't get rendered in a given font, and instead using
- the same entry point in a different font, or
- a different codepoint in the same font.
If anyone's interested in going down this path, remember that the code
needed should probably be isolated either in the Graphics layer, or even in
the underlying platform APIs being called.
Note, however, that this could start turning into a heck of a lot of code.
Worse, the performance implications of doing extra work at measurement and
drawing time can be severe for a GUI-intensive app.
In essence, you'd be doing the same work as in step #1, but interactively,
instead of once per font.
This archive was generated by hypermail 2b25 : Tue Feb 22 2000 - 15:14:38 CST