FORWARDED MAIL FROM RAVEN

From: Brian Buhrow (buhrow@moria)
Date: Fri Apr 08 1994 - 22:15:25 PDT


This is a followup to Robert's response.

Robertj Jaquiss (503)627-4444 DS 50-454 <robertj@tekgen.bv.tek.com> writes:

>Hello:

> This is in reply to the article on PDF file format. This morning I spoke
>with Paul Fontaine of the GSA.
I also met Paul Fontaine at the CSUn conference and we had some very
interesting discussions on accessibility.
I've been talking to Paul during and since the conference about these issues.

>PDF is a format that incorporates ASCII.
The above statement is misleading. Seen by itself, it is easy to misinterpret
it as meaning PDF is nothing to worry about.
I think we'd be making a big big mistake here if we assume this.
As I explained, the information is well nigh impossible to extract from a PDF
file at present.

>It is like PostScript in that it contains the ASCII text to be printed or viewed
>as well as codes for Font, color, location on the page etc.
That's the problem. It only contains the above information. A PDF file knows
that something appears at position (x,y) in a particular font on a page, the
information as to why that particular visual layout has been chosen is not
stored. This is the problem of loss of structural information I was talking
about in my original posting.

>Although the current
>viewers for PDF files are the accrobat products from Adobe, it is possible
>to write a viewer to make the files accessible.
The above is a blanket statement that is not being justified: I have been
carrying on an email discussion with people involved on the technical side of
PDF and it is a hard research problem to extract structural information from
PDF files.
Just saying the PDF files contain ASCII and can be made accessible is just
side-stepping the issue.

>In the case of the IRS >forms, the IRS went off on its own with this project.
>It turns out that the >PDF forms can only be viewed or printed. You cannot
> fill in a PDF form.

This is because as I point out above, PDF files are not
designed to exchange document structure and content, they are designed to
allow people to view documents on disparate platforms. Depending on who
generates the PDF file and why, the PDF file can contain about as much
information as a standard Postscript file, i.e. character 'c' appears in font
'f' at position '(x,y)' to just containing bitmap images. The irs tax forms
for instant contain mostly bitmap images.

>Adobe is aware of the issues involved.
Does not really help anyone if they're just aware of it.

> NIST may get involved with the creation of
>a viewer. I suggest that we track activities in this area
That was precisely why I posted my article on this list in the first place. We
need to be aware of developments in this area and know what is happening
before things happen and it is too late. In the case of the IRS forms, it was
a case of no one knowing that it was happening until it happened. If the same
does happen with a few more important government agencies, we'll be left out
in the cold.

Once a large amount of information has been created and stored in PDF, we'll
have no alternative but to live with it.
In fact my gut feeling is that we are already at that point.

>and we should
>consider working with Master Soft (The creator of Word For Word) to make a
>converter package. Word for Word is the conversion package used by Arkenstone
>and Raised Dot Computing for conversion between word processor formats.
Conversion packages are all good, and we should work with anyone and everyone
who is designing accessible convertors. However, it is wrong to hope that
conversion packages will become available and therefore go ahead with things
that are at present inaccessible. One problem that blind users have always faced
is that we've been playing 'catchup' for a long time, and here is another
situation where we are heading in the same direction.

I hope this message fuels some interesting technical discussion.
--Raman

> Robert Jaquiss

>Internet: robertj@tekgen.bv.tek.com



This archive was generated by hypermail 2b29 : Sun Dec 02 2012 - 01:30:03 PST