Subject: Re: how should we localize locale names?
From: Paul Rohr (paul@abisource.com)
Date: Thu Mar 08 2001 - 13:14:08 CST
At 08:40 PM 3/8/01 +1100, Tim Allen wrote:
>You beat me to it, Paul, as soon as I saw the dialog screenshot I started
>wondering about exactly this.
I couldn't wait for the screenshot, so I peeked at the code.  Otherwise, I 
would have been glad to let you beat me to it.  I'm sending *entirely* too 
much mail these days.  ;-)
>The general idea is definitely good. But like Dom, I have slight
>reservations about adding an extra wrinkle to the way our string sets get
>used. This would imply that every language string set would be at least
>partially loaded. 
Agreed.  However, if we're ever going to transition to a world where *all* 
locale-specific stuff gets loaded dynamically at run-time -- ie toolbars and 
menus, too -- then something like this is unavoidable.  
For this dialog to Just Work in that kind of situation, you need to do a 
dynamic scan of some directory or index at run-time to see what locales you 
have available.  
All I was suggesting was that we augment the header of the strings files so 
we could quickly get in, grab this string, and get out.  (For details, see 
below.)  We certainly wouldn't want to keep the whole @#$%^&* thing in 
memory.  Gack.  
>I still like the idea of moving to gettext at some point
>in the medium-term future; I actually would like to implement that myself,
>except for minor impediments like work, moving house, a wife and son etc
>etc etc :-). 
Ooooh.  You might be the first person who's expressed interest in actually 
*doing* the work to fix gettext for other platforms, instead of just 
complaining about it.  Be careful -- this could make you very very popular.  
;-)
However, you might also want to look at what Eazel's been doing with XML 
i18n tools.  I have *no* idea what that's for, but if they're moving away 
from gettext, that might be worth paying attention to. 
>Adding more cruft to the existing model would seem to make
>such a transition more difficult. Is there some other paradigm we can use?
>We certainly don't want to have to do anything as silly as temporarily
>switching locales to resolve the language name, then switching back.
>
>Maybe we want a list of non-translatable strings somewhere, defined in
>such a way that it's very easy for a new translator to add the name of the
>new language to the list.
You could certainly do that.  A static index would be faster to parse than 
scanning a directory at run-time to extract the same information from 
whatever translations are available.  Of course, it could also be out of 
sync, but it *is* faster.  
Unless someone is willing to do the up-front work to generate a global 
lookup list of UTF8 locale names in the matching languages, I assume that 
we'd want each translator to add their own lookup.  
Hardwiring this list into the binary would be fairly translator-hostile, so 
I expect you'd want to add another installable file to the binary distro.  
If so, here's a minimal XML proposal:
-- snip --
<?xml version="1.0">
<! -- some comment explaining how to get UTF8 characters into this file -->
<locales
en-US="English -- United States"
fr-FR="Français -- France"
du-DE="Deutsch -- Deutschland"
...
/>
-- snip --
I still think that augmenting the existing strings files to add one more 
attribute as follows is simpler and more reliable:
-- snip --
<AbiStrings app="AbiWord" ver="1.0" language="du-DE" label="Deutsch --
Deutschland">
-- snip --
However, I've anted up for a lot more than my $.02 on this, so I'll be quiet 
now.  
>Dom's other point is also sensible, in that you may not have any fonts on
>your system capable of displaying the names of, eg Nihon go and Mandarin,
>in their native character sets. I suppose the previous argument holds, ie
>if you don't have the right fonts then chances are you're not planning to
>use them in your document. 
Yeah.  That's the trickery I was worried about.  
To be clear, though.  For this user in this situation, we really *don't* 
support those languages in any meaningful way.  My first reaction would be 
to just display the naked lang tag -- ie, (zh-TW) -- to indicate that either:
  - we don't have an appropriate localization to identify it, or
  - it's not currently usable (due to fonts issues, say).
We'd need to do something like this anyway for users who receive a document 
that got tagged with a locale that they don't have installed.  
Over the long term, this feels like the Right Thing to do.  If they want 
zh-TW support, but they don't have it installed, they need to go get an 
appropriate zh-TW "language pack" which might include some or all of the 
following:
  - string-like stuff to localize the UI
  - fonts to view the content
  - dictionaries, etc. to clean up the content
  - other locale-specific defaults
  - help in that language
  - etc. 
We don't have such a solution now, but I think people generally agree that 
we're likely to head in this direction.  Eventually.  
>But it would be nice to show off the languages
>we support, and ugly if the language choice dialog shows random gibberish.
Oh, it's definitely ugly, but at least it's clear that their text would look 
like that too. ;-)
I have to admit that I've briefly flirted with the idea of creating and 
shipping a single Unicode font with just enough codepoints to render the 
text in this dialog.  Talk about hacks!  
Of course, then users would wind up with the expectation that *choosing* 
that language would also work, when it wouldn't.  So much for that idea.  
Violating expectations you've gone to the trouble of setting is a great way 
to piss people off. 
>To do that we need not language-localised language names, but
>character-set localised names (in practice, Romanised names would do, I
>think, as in eg "Nihon go" for Japanese). And then some way of detecting
>that we can't display the native names, and using the romanised names
>instead.
This sounds like another promising alternative (for people who don't like 
the naked tag approach described above).  Can each of our platform font APIs 
tell us when a string will get rendered using slugs or garbage instead of 
"real" glyphs?
>This doesn't seem to fit all that nicely with the existing localisation
>paradigm, nor with any likely paradigm that would be supported by gettext.
>Pity. More thought required, I think.
I don't know about gettext, but for the current paradigm, you could just as 
easily add two labels to the strings file:
  language="ja-JP" label="#$^#^#^"  romanized="Nihon go"
... where the first looks like the correct line noise (sorry for the bad 
impersonation), and the second is the Latin-1 romanized equivalent.  
>Dreamer? I thought your points seemed reasonably down-to-earth and
>pragmatic :-).
Why thank you.  I try.  :-)
Paul
This archive was generated by hypermail 2b25 : Thu Mar 08 2001 - 13:06:42 CST