Subject: Re: UCS-2 vs. UCS-4
From: Thomas Fletcher (thomasf@qnx.com)
Date: Tue Jun 26 2001 - 14:35:57 CDT
On Tue, 26 Jun 2001, Joaquin Cuenca Abela wrote:
>
> --- Thomas Fletcher <thomasf@qnx.com> wrote:
> > On Sat, 23 Jun 2001, Martin Sevior wrote:
> > >
> > > This is an interesting debate. One extra point we
> > should all keep in mind
> > > is that we probabally don't waste much more space
> > going from 16 => 32 bits
> > > for character representation.
> >
> > [Other comments about sizes of data structures
> > snipped]
> >
> > Martin,
> >
> > Call me crazy ... but I _totally_ don't believe
> > this statement. For
> > anyone working on documents of any size, our memory
> > consumption is an
> > issue. Deciding to double the per character memory
> > requirements will
> > add up. While some systems are swappable ... we
> > certainly don't want
> > to count out the fact that Abi could be used on
> > smaller devices.
>
> while I agree that we should try to remain as little
> as possible, I agree with Martin.
>
> Suppose a document of 66 chars per line, 30 lines per
> page, 100 pages. If the doc contains no images (only
> lines and lines of chars), we have:
>
> 66 * 30 * 100 = 198000 chars
>
> If we use UCS-2, we will need ~400K to store only the
> text. If we use UCS-4, we will need ~800K
Assuming that you do absolutely nothing with the
document. As soon as you start editing you start
to grow this value since we are an "append only"
system. Agreed that we are looking at a relatively
little amount of memory ... but we are still doubling
it. Also since it is likely that all internals will
likely get converted to UCS-4 out of convenience other
parts will grow (helper functions, other data structures,
caches and the like).
> Last time I took a look at files so big (with the test
> that I executed with the perl bindings), gtop was
> saying me that AbiWord was using 10M of memory (and I
> think that the file was not 100 pages long, it was ~50
> pages long, I think).
>
> So we're using only ~5% of space to save the text (if
> we use UCS-4 we will need 10%). So IMO we can switch
> to UCS-4 without caring about memory consumation (at
> least at the first time).
Perhaps ... but my experience is that a change like
this is very uni-directional. Once performed we won't
be switching back (or attempting to reduce the impact).
Of course it goes without saying that it troubles me
that our internal data structures and housekeeping
cause the actual document to only be 5% of the memory
consumption of the overall program. Where oh where
is that other memory going ... and how can we reduce
it.
> Of course, if somebody cares enough to change for the
> simplistic "only UCS-4" aproach to a more complex
> approach such Mike's one that saves memory &/| speed,
> I will be more than happy, but it seems to me that
> this time will be best spended fixing the remaining
> 90% of memory.
Sure ... as long as we come out ahead in end.
Thomas ... smaller is better ... Fletcher
-------------------------------------------------------------
Thomas (toe-mah) Fletcher QNX Software Systems
thomasf@qnx.com Neutrino Development Group
(613)-591-0931 http://www.qnx.com/~thomasf
This archive was generated by hypermail 2b25 : Tue Jun 26 2001 - 14:36:27 CDT