Re: A Proposal (why we should have setBold(true))

Subject: Re: A Proposal (why we should have setBold(true))
From: Paul Rohr (
Date: Wed May 31 2000 - 22:52:19 CDT

At 10:09 PM 5/31/00 -0500, sam th wrote:
>The way it looks to me from the Abi side of the word importer is that wv
>provides us with all the properties for a span at once. However, in HTML,
>you have to deal with lots of messy inheritance. I don't think we should
>assume that every file format we deal with will be as nice to our system
>as wv is currently. But then again, maybe HTML is an exception.

Bingo. Sounds like we've found the core issue.

AFAICT, for most word processing formats (not just Word) you can easily
determine all the properties of a span at once. Since they share this
characteristic with AbiWord's internal format, doing the required mappings
is tedious, but not usually that hard.

By contrast, classic HTML definitely has "lots of messy inheritance" --
which is what makes that importer somewhat harder to write. You have to
keep track all of the goofy nesting situations.

The necessary state machines to flatten nested markup really aren't that
bad, though. For well-formed XHTML, essentially the transformation you're
doing is just a tree-walking exercise:

  <B>one --> font-weight:bold
  <I>two</I> --> font-weight:bold; font-style:italic
  three</B> --> font-weight:bold

However, you'll need a different state machine for classic HTML so that you
can properly interpret format-toggling "messes" like this:

  <B>one --> font-weight:bold
  <I>two</B> --> font-weight:bold; font-style:italic
  three</I> --> font-style:italic

Sounds to me like those importers are *exactly* where such code belongs, no?


This archive was generated by hypermail 2b25 : Wed May 31 2000 - 22:46:44 CDT