Re: UCS-2 vs. UCS-4


Subject: Re: UCS-2 vs. UCS-4
From: Joaquin Cuenca Abela (e98cuenc@yahoo.com)
Date: Tue Jun 26 2001 - 11:14:38 CDT


--- Martin Sevior <msevior@mccubbin.ph.unimelb.edu.au>
wrote:
>
>
> On Tue, 26 Jun 2001, Joaquin Cuenca Abela wrote:
>
> >
> > --- Thomas Fletcher <thomasf@qnx.com> wrote:
> > > On Sat, 23 Jun 2001, Martin Sevior wrote:
> > > >
> > > > This is an interesting debate. One extra point
> we
> > > should all keep in mind
> > > > is that we probabally don't waste much more
> space
> > > going from 16 => 32 bits
> > > > for character representation.
> > >
> > > [Other comments about sizes of data structures
> > > snipped]
> > >
> > > Martin,
> > >
> > > Call me crazy ... but I _totally_ don't
> believe
> > > this statement. For
> > > anyone working on documents of any size, our
> memory
> > > consumption is an
> > > issue. Deciding to double the per character
> memory
> > > requirements will
> > > add up. While some systems are swappable ... we
> > > certainly don't want
> > > to count out the fact that Abi could be used on
> > > smaller devices.
> >
> > while I agree that we should try to remain as
> little
> > as possible, I agree with Martin.
> >
> > Suppose a document of 66 chars per line, 30 lines
> per
> > page, 100 pages. If the doc contains no images
> (only
> > lines and lines of chars), we have:
> >
> > 66 * 30 * 100 = 198000 chars
> >
> > If we use UCS-2, we will need ~400K to store only
> the
> > text. If we use UCS-4, we will need ~800K
> >
> > Last time I took a look at files so big (with the
> test
> > that I executed with the perl bindings), gtop was
> > saying me that AbiWord was using 10M of memory
> (and I
> > think that the file was not 100 pages long, it was
> ~50
> > pages long, I think).
> >
>
> Cool! I get to explain "theory" to Joaquin as to why
> his experimental
> numbers are so big.

that's getting recursive ;-)

> OK. For *Single View* on Joaquin's document,
> assuming there are no Changes
> of any formatting properties what-so ever look at
> this:
>
> Remember all our text is not only stored in a big 16
> bit array but it is
> also stored in all sorts of classes.
>
> A Frag_strux class and a fl_BlockLayout class per
> paragraph.
> (Every Frag_strux gets its own class, every
> fl_BlockLayout gets it's own
> Block)
> One Line class per line. (Every line get's it's own
> class.)
> Let's say an average of two runs per line. (Every
> Run gets it's own
> class. The text Frag's are hard to work out by let's
> assume a frag per run
> too.)
>
> Assume Joaquin's document has 4 lines per paragraph.
>
> 100 pages => 33 lines * 100 + 66 runs * 100 + 132
> Frags * 100 + 8 blocks
> * 100
>
> This is the main source of memery usage. Other
> sources include the Hash
> table for each unique attribute/property
> combination, the Page classes,
> the container classes.
>
> Now look through the header files of fp_Line.h,
> fp_Run.h, fp_TextRun.h,
> pf_Frag.h, fl_BlockLayout.h

until here, I agree 100% (my script was a little more
savage changing the formatting, but that's a good
approximation).
 
> Each function listed is worth 4 bytes on a 32 bit
> CPU.

? You mean 4 bytes per class? per object?
it's neither of them. You can have a class with 500
functions, and if none of them are virtual, you will
pay 0 bytes per object (if only one of them is
virtual, you will pay 4 bytes per object).

> Each Member variable is also worth about 4 bytes.
> Don't forget all the static variables in each
> function either. They have
> to get counted in the total memory per class
> instance too. Finally there
> are classes embedded in these classes that can also
> grow (like a UT_Vector
> of squiggles.)

yup

> Now a quick glance through fl_BlockLayout makes me
> guess there are about
> 200 methods and member variables. That's 800 bytes
> right there. It's too
> hard to add up all the static variables. A
> sizeof(fl_BlockLayout) would be
> the most scientific.

yes, but it will only give you the size per object.
The static variables will not be counted.

> I guess there are around 130 methods and member
> variables per fp_run
> that's 520 bytes per run class.
>
> And around 110 methods and member variables per
> fp_line so that's around
> 440 bytes per line.
>
> OK So our calculation based only on the layout
> classes is:
>
> 100 pages => 33 lines * 100* 440 + 66 runs * 100*520
> + 8 blocks *
> 100*800
>
> = 1452000 + 3432000 + 640000 = 5524000 bytes for a
> 100 page doc. Which
> is half Joaquins measurement of 10 megabytes for 50
> page
> document. Not bad given the crude calculation!

well, you should recalculate it removing the 4 bytes
per funcion * instance, but yes, a nice approximation
(specially taking in account that when abi starts, it
takes up to 5M, 4 of them are shared)

Cheers,

--
Joaquin Cuenca Abela
e98cuenc@yahoo.com

__________________________________________________ Do You Yahoo!? Get personalized email addresses from Yahoo! Mail http://personal.mail.yahoo.com/



This archive was generated by hypermail 2b25 : Tue Jun 26 2001 - 11:27:05 CDT