*** Resending note of 04/21/95 11:01
From: Steve Jacobson - IT Marketing Applications
3M Company - 555-01-03 Phone: (612) 733-9780
St. Paul, MN 55144 FAX: (612) 736-6037
Subject: Adobe PDF
To: MAIL --INTERNET INTERNET, MAIL
Since the subject of Adobe's PDF has been raised several times here, I
would like to pass on several things I have heard and read over the past
six months, some of which I haven't seen in writing anywhere. I
appologize in advance if this material is "old hat" to you.
There are times when the producers of documents want to keep their
documents in the original format, with protection against unauthorized
modification. Apparently, the Adobe product provides this functionality.
Among other things, this means that the ability to reformat a page using
a larger font, for example, is lost since part of the intent is to
preserve the original page format. There are questions about the passing
of structural information. Can a heading line be easily identified, for
example? Some have said that the bit-mapped nature of PDF excludes the
blind from accessing PDF documents.As most things seem to go nowadays,
most questions do not have yes or no answers, and most statements made
are neither completely true or false.
For example, take a look at this note that recently appeared on the
ICADD list from Doug Wakefield:
On Tue, 18 Apr 1995, T. V. Raman wrote:
> You can';t use PDF with speech/braille.
>
> Fortunately most of the WWW is not yet PDF
>
Sorry, but this message is very invorrect. Many PDF
files read very nicely with print and or braille. Some do not. The
eight
page new york times is totally unreadable. The IRS tax forms take a
little tweaking, other text docs are fine. I have acroread plugged
into
netscape. You do have to use Windows. Adobe is being very helpful,
I'll
be spending today with an engiener from adobe to test some
solutions
that may make more or maybe even all PDF files readable. I really
feel
this is a problem that must be solved because the WEB is going to
move
rapidly in this direction.
(End of note)
Of course, Mr. Raman is right, as things stand now, if you don't or
can't use WINDOWS.
One bit of confusion is that the letters PDF can mean different things.
"Portable Document Format" is really a descriptive phrase that isn't
necessarily tied to Adobe. The PDF standard being developed by the
federal government will probably not simply be Adobe's PDF, although it
will likely be very similar. Government standard file formats, as I
understand it, (are you there to correct me Lloyd?) must be in the
public domain, and Adobe will probably want to keep some control over
their format for commercial use. Therefore, the government standard will
probably not be completely tied to Adobe's PDF, so there may well be
some room for experimentation as to what changes can make such documents
more accessible to us.
Even if we get everything we want into a government standard, problems
associated with Adobe's PDF will not go away. There seems to be a real
groundswell to use Adobe's PDF on the Internet by commercial entities.
It has been announced that IBM and Macintosh computers will supply the
ACROREAD program that permits viewing Adobe PDF documents on future
personal computers. Netscape will be including the ability to read Adobe
PDF documents in their World Wide Web products. Several newspapers with
on-line services are using the product. A number of large companies have
settled upon Adobe's PDF to distribute catalogs electronically to their
customers. Even if the federal government scrapped their portable
document standard, and they won't, we'd still need to take this question
seriously.
There is some movement on Adobe's part to try to make Adobe PDF more
accessible. It was reported at the October ICADD meeting at Closing the
Gap that Adobe was willing to commit personnel and resources to this
end. Specifically mentioned was the creation of a DOS reader and the
inclusion of more document structure information.
A possible benefit involves bit-mapped graphics. In most markup
languages, bit-mapped graphics are supported by simply inserting a code
that identifies what follows at bit-mapped. As I understand it, Adobe's
PDF conceptually builds a page in layers. It is often the case, as was
related at the October ICADD meeting, that the text that is printed as
part of a bit-mapped graphic can be recovered because it is still
treated as text by Adobe PDF. However, I would guess that such text
would only be available if the graphic was built using this layered
approach.
Another development that may be more good than bad is that Adobe has
announced a software package that can be used to scan paper documents
preserving their original appearance. This can be accomplished since
type face and font information are stored with the document. This is not
a bit-map image of a page. It is, rather, a scanned image to which OCR
has been applied along with a process that identifies or creates fonts
that correspond to the appearance of the characters being scanned. As it
stands now, we would have some difficulty accessing such documents, and
the verdict is still out on how well we'll be able to do so in the
future. However, this is the first time I have heard of a product that
advocates mass storage of documentsin electronic format where OCR
techniques have been applied but the original appearance is preserved.
Preserving the original appearance should increase the appeal of
electronic storage of documents to libraries and the like. The software
will cost several thousand dollars and is obviously intended for
large-scale document scanning.
My conclusions are that (1) Adobe's PDF will be widely employed to
distribute information; (2) since it preserves appearance, it will
reduce the liklihood of mass storage of scanned documents in bit-mapped
images; and (3) the complexity of document format together with the wide
variety of techniques and software used to prepare them will always
require that we apply some degree of format analysis when converting
such documents to speech, braille, or large print.
What this all means to me is that we must separate what we would like to
have from what we must have to get the text out of PDF documents. Part
of making this determination is establishing which elements of document
structure can be recreated and what must be in the original document.
Our experience with scanner software should help here. For now, though,
we'll probably need to wait and see how far Adobe is willing to go to
help before we know what the problems are likely to be. At least they
seem more willing to help than Prodigy ever was.
Regards,
Steve Jacobson
INTERNET: SOJACOBSON@MMM.COM
This archive was generated by hypermail 2b29 : Sun Dec 02 2012 - 01:30:03 PST