Re: urmaslt - r29620 - abiword/trunk/plugins/mswrite/xp

From: Urmas <>
Date: Sat Jul 16 2011 - 14:53:27 CEST

From: "Ingo Brückl" <>
Sent: Saturday, July 16, 2011 5:26 PM
To: <>
Subject: Re: urmaslt - r29620 - abiword/trunk/plugins/mswrite/xp

>> - Some documents [mostly from Windows NT 3] are in local codepage, but
>> use plain font names, as 'Times New Roman' instead of tagged 'Times New
>> Roman Cyr'. So it's safe to assume that if document has untagged name,
>> but no tagged, that font has local encoding instead of CP1252.
> No it isn't!
> I had some problems with the codepage patch from the very beginning, because
> there is no such thing as a codepage information in MSWrite files! (See the
> Microsoft font documentation.) CP1252 is default unless a font substitute is
> used. It may be true that there are documents in "Windows local codepage"
> without font substitute, but this is just the usual Windows crap. When such
> documents are shared between two users with different locales, then the same
> thing happens as it did in Windows all the time: The contents created on one
> computer is unreadable on the other one unless the encoding is explicitly
> changed.

If we say about Windows 3, it's not strictly true. The non-Unicode TTF fonts used there give precise information about its codepage, because they cannot contain other symbols.

On Windows NT or 9x, however, it's true, because when Write is used there it doesn't allow setting tagged typefaces imitating old TTF fonts, and plain name is used there, however it displays in local codepage only.

Then our plugin should respect user configuration setting and use his local codepage to convert the document.

> As I mentioned in the original codepage thread, the only reasonable solution
> would be to have either a compile time parameter for the default codepage
> (should be fairly easy to do: something like --with-mswrite-codepage= with
> default CP1252, leading to something like -DDEFAULT_CODEPAGE=\"CP1252\" in
> the Makefile and a simple source change):

Most users do not run Gentoo Linux, and we don't ship localized versions separately. Specifying codepage in compile time is, therefore, unacceptable. Simple heuristics like currently implemented works correctly for most documents, hardcoded 1252 does not.

> inline void IE_Imp_MSWrite::translate_char (char ch, UT_UCS4String &buf)
> or - preferable - a configuration dialog in AbiWord to set the favored
> default codepage for the plugin so that it can be changed on a document basis.

It's a wonderful solution. However, specifying default codepage as 1252, is, apparently, a solution suitable for Germans, but definitely not appropriate for the rest of the world. We cannot turn documents, readable and printable in actual Write for some user, in unreadable mess, just because selected European cultures find this more convenient.

>> - If there's an authentic Write document, using codepage different from
>> 1252, 1251 or 1250, with corresponding standard font typefaces, we can
>> include additional tags in the list, but no sooner.

MS Write files are produced by single application, MS Write itself, so it would be useful to stick to what it did actually produce. Do you have actual samples, not just Windows 95 documentation, using those names.

> ...And BTW, did you do the major
> work in this plugin so that you can claim to decide what shall go into the
> plugin and what not?


Received on Sat Jul 16 14:54:30 2011

This archive was generated by hypermail 2.1.8 : Sat Jul 16 2011 - 14:54:30 CEST