File sniffing help

Subject: File sniffing help
From: Kevin Vajk (
Date: Sat Feb 05 2000 - 13:07:18 CST

I've written up some preliminary code to do automatic file type
detection based on file contents. It works for me (tm).

I was wondering if an abiword guru would help me by looking over
my patch, because I am new to C++ and I'm a little unclear on
your coding conventions.

Also, a few issues have come up, which I'd like to discuss
with someone, although maybe not on the list since it would
most likely bore most people to tears.

For now, I do it by slurping in a few thousand bytes, and
passing this buffer to the sniffer routines. However, I
don't think this is gonna work in the long term. Consider
an OLE file; right now I just assume that any OLE file is
an msword document. (Terrible, I know.) Eventually, we'll
want to use Caolan's file(1) code to do this. But that code
needs access to the entire file, since the OLE header could
say that the file's "table of contents" is at the very end
of the file. Or consider gzipped abiword documents. Right
now I just assume that any gzipped file is a gzipped abiword
file. But this assumption will break down if we ever
support any other compressed formats, so we really should
be using zlib. But as near as I can tell, zlib isn't happy
being passed a buffer; it wants to open the file itself.
I don't know what to do about these cases; my current
best solution is a hack.

Help graciously solicited. :)

- Kevin Vajk

This archive was generated by hypermail 2b25 : Sat Feb 05 2000 - 13:04:49 CST