From: Paul Rohr (paul@abisource.com)
Date: Mon Jun 03 2002 - 12:01:31 EDT
At 08:32 AM 6/3/02 -0700, I wrote:
>However, I'd also like to point out that this entire category of problems 
>could be solved *permanently* by a sufficiently-clever programmer who's 
>willing to:
>
>  Do the ugly, thankless work needed to allow us to load *any* 
>  commonly-available variant of the ispell hash formats, instead 
>  of just one.  
>
>For some hints on the work required to do so, see:
>
>  http://www.abisource.com/mailinglists/abiword-dev/01/April/1030.html
>  http://www.abisource.com/mailinglists/abiword-dev/01/March/0769.html
>
>To date, we haven't found any volunteers who are both brave enough and 
>talented enough to tackle this, but I'm still hopeful that we will.  :-)
Bummer.  I just Googled myself *after* hitting send -- doh! -- and realized 
I'd missed the following hints:
  http://www.abisource.com/mailinglists/abiword-dev/01/May/0251.html
  http://www.abisource.com/mailinglists/abiword-dev/01/May/0151.html
As alluded to in that thread, there are (at least) three factors that affect 
the variability of ispell hash file formats:
  bits/flags (aka MASKBITS)/characters
The specific permutations found in the wild tend to vary a lot -- some 
distros ship 8/56/128 hashes, others ship 7/26/100, and so on.  
Insofar as we *already* have code which allows for variability in the 
*middle* of these -- we handle 8/N/100 hashes (for N <= 64), my suggestion 
is essentially to add code for even more flexibility here.  For instance, we 
could probably get very very far if we could handle any of the following 
permutations:
  B/N/C  (for say, B = 7|8; N <= 64; C <= 128)
The key insight here remains:
  1.  ispell hashes define a family of *very* closely-related file formats.  
  2.  The variances are simply different values of known #defines.  
  3.  These variances change the widths of key structs on disk. 
  4.  The file format includes sanity check fields which explicitly tell 
      you which #defines were used. 
In short, #1-4 provide all the information needed for a smart hashfile 
loader to determine which variant of the file format is being read.  The 
problem of all legacy ispell implementations is that they do something 
incredibly dumb at this point: 
  5.  Try to do a struct copy (!!) from disk to memory using a specific 
      permutation of #defines.  
  6.  *Recognize* the existence of other valid #define permutations... 
      and refuse to load those files at all!
I claim that it's much easier to do something sufficiently smart here than 
it is to, say, reverse-engineer the family of Word binary file formats.  
;-)
Paul,
design evangelist
This archive was generated by hypermail 2.1.4 : Mon Jun 03 2002 - 12:05:33 EDT