PDF in the modern world
by Simon Williams
Adobe’s PDF became an ISO standard four years ago and is now being used to solve all sorts of business problems, as Simon Williams discovers.
HardCopy Issue: 58 | Published: November 1, 2012
The Portable Document Format (PDF), originally created by Adobe, was designed to be a platform independent file format which could hold text, graphics and images in a form that could accurately duplicate a paper layout. Created as long ago as 1993, by Adobe’s co-founder John Warnock, it was a proprietary standard until Adobe released it in 2008, when it was also adopted as an ISO standard.
Since then, Adobe and other software makers have worked hard to improve the way PDFs can be created and to expand the uses to which they can be put. The ability to save a document in PDF format has been added to many applications, for example, including Microsoft Office from version 2007 SP2 on. Other forms of the PDF standard, such as PDF/A and PDF/X, have also grown up to address specialist needs.
PDF/A, where the ‘A’ stands for Archive, is a subset of the PDF standard. The key goal for this version of the format is that the file should be completely self-contained. This means all elements, including fonts, have to be embedded within the file.
The successful digital preservation of data depends on both hardware and software. With regards to software, the main problem concerns the obsolescence of file formats. As requirements change and software is rewritten to deal with them, it’s not always possible to maintain compatibility with legacy file types. If that compatibility is lost, access to information saved in those formats may also be lost.
Herbert Smith LLP uses PDF
Herbert Smith LLP is an international law firm with its HQ in London. It specialises in finance, real estate, competition, intellectual property, employment, pensions and incentives in the energy, natural resources and finance industries. It’s also well known for its dispute resolution services.
The new UK Supreme Court requires all court documents to be submitted in electronic form, and there’s a growing trend for other courts to follow suit. The firm decided to start the switch from paper to electronic documentation by standardising on Adobe Acrobat.
One particular problem that was hard for the firm to address with paper documentation is sudden, late changes in the content of a ‘bundle’: the paper evidence which might include witness statements, expert reports and other supporting texts. Courts require such bundles to be numbered sequentially, so any removal of pages requires complete reprints. Obviously, a move to electronic documentation, in this case PDF files, makes such changes comparatively trivial.
Another problem Acrobat can help with is secure redacting of non-relevant parts of evidence documents which may still be confidential. There are high-profile examples of information being made available to the press or other parties unintentionally, by removal of insecure redacting marks.
Because of their work with intellectual property cases, it’s also important for staff at Herbert Smith LLP to work collaboratively and mark up diagrams and charts with notes and comments. This is only useful if it leaves the underlying material unaltered, so in a paper environment, it would have been done on photocopies of the original pages.
All these advantages have made the move to PDF through Acrobat and Acrobat Pro easy to justify. The firm has seen a quick staff take-up of the offer to provide Acrobat to every desk in the organisation.
(Since this was written, Herbert Smith has merged with the Australian law firm Freehills to become Herbert Smith Freehills LLP.)
Making files as self-contained as possible helps; if a file references information from auxiliary files, such as a document needing photos or even fonts from external sources, there’s a greater likelihood of losing content. This can happen if all the files needed to reconstitute the main file are not held together, or if any of them degrade.
Simplifying the contents of a file can also help, so restricting the type of content and hence the number of file formats that have to be maintained, such as those for audio, video or 3D content, makes successful long-term archiving more likely.
PDF/X is designed primarily for document eXchange and is formalised into two ISO standards, 15929 and15930, though 15930 is the only current one. There have been eight revisions to this standard so far, with the latest based on PDF 1.6. It’s likely there will be further releases of this standard, designed to take advantage of updates to PDF itself.
Because of its specialism in dealing with graphics, early versions of the standard required all images to use CMYK or spot colours. Later versions have eased this requirement, also allowing calibrated RGB and CIELAB colours.
A PDF/X file may contain more than one graphic, and in this case each graphic gets its own colour profile. Active content is not permitted in PDF/X files so forms, signatures, comments and audio and video material may not be included.
And there are other variants of the PDF standard. PDF/E is designed for engineering documents. The first part of this standard addresses the electronic conversion of items such as technical drawings, but doesn’t include support for 3D content.
PDF/UA, for Universal Access, is designed to provide additional accessibility for people who need to use assistive technology such as screen readers, screen magnifiers, joysticks and other navigational tools. PDF/VT is optimised for Variable and Transactional printing, which is mainly relevant to large, commercial systems.
Although the release of PDF as an ISO standard has helped Adobe by establishing it as a ubiquitous platform for all kinds of printed and electronic documents, the company now has to work with other interested parties – often its rivals – within the ISO committees when it comes to revising the standards.
Three important improvements to PDF have been electronic signing, form filling, and the ability to access and edit the contents of a PDF file directly.
If you’re using PDF documents to replace paper ones, it’s important to be able sign them in a way that’s as secure as a physical signature. Signing on paper with pen and ink can be easily distinguished from a photocopy or other facsimile. The electronic equivalent is a digital signature, which uses an encrypted code to tie a particular person and, in some cases, a time and place of signing, to a document.
PDF is of course ideal for replicating the appearance of a form, and now it is possible to enter data directly into such forms. Several of the packages highlighted below support such features and allow you to collate the data for storage and analysis.
The development of searchable PDF has also been a big step forward in the use of the format. It’s all very well scanning paper documents into an archive, but what you get is effectively a photo of the page. This can be displayed and read, but its content isn’t accessible which means it can’t be searched by computer.
If an archive is made up of thousands of photos of pages, the ease of finding particular text is no greater than with the original paper documents. It’s only when the text on the page is converted into searchable text, using character recognition technologies, that electronic archives become more useful than their paper equivalents.
It’s no coincidence that two major providers of Optical Character Recognition (OCR) software – Nuance which makes OmniPage and ABBYY, the producers of FineReader – have PDF conversion applications (see panel overleaf).
The latest incarnation of Adobe’s own PDF creator is Acrobat Pro XI. This new version includes the ability to edit text and graphics in situ, without having to refer back to the source document and the application that created it. Although the editing is basic in comparison with, say, Word, text can be changed, searched and replaced and one graphic can be swapped for another.
PDF conversion tools
ABBYY PDF Transformer 3 uses the company’s OCR technology to produce searchable and editable PDF files from scanned pages and claims to reproduce page layouts accurately, including elements like tables, charts and titles.
The other big advantage with PDF Transformer 3 comes when you need to archive documents that are in several languages. The program supports 184 different languages, including those using different alphabets such as Greek, Hebrew, Chinese and Japanese. Language detection is automatic so you should be able to batch process multi-lingual originals.
Slightly strangely, given that older scanned archives might well be saved as graphic files, PDF Transformer only converts from PDF formats. It’ll take a PDF composed of a page photo and convert it into a searchable document, but it won’t take a graphic file saved as JPG or TIF and turn it into a PDF. It can convert a PDF into a word file, but to make PDFs from other sources, you’ll need to look to Nuance.
Nuance PDF Converter Pro 8 has many of the capabilities of Acrobat Pro XI, but for a small fraction of the price. The masthead feature is direct editing of PDF documents. Like Acrobat, the editing tools may seem basic when compared to a fully-fledged word processor, but text editing and creation, find and replace, cut and paste and graphics replacement are all supported.
The FormTyper utility can add interactive fields to a paper form, though creation of a PDF form from scratch is basic. PDF files can also be exported to Word, Excel, PowerPoint, WordPerfect and other applications, and saved directly to Cloud services such as Dropbox, Evernote or Nuance’s own PaperPort Anywhere.
Nuance’s other well-known application is Dragon NaturallySpeaking and the company has incorporated Dragon Notes into PDF Converter Pro 8, so you can create mark-ups in peer review or while collaborating on PDF documents through speech, something no other PDF tool can do.
msCentral. This desktop app can be used to create electronic or printed forms, distribute them as HTML or PDF files, and collect and analyse the responses. Once the forms have been returned, FormsCentral can be used to view responses in tables and to produce summaries, including graphs, which can themselves be incorporated into presentations or other PDFs.
Existing paper forms can be converted into their electronic equivalents. Acrobat Pro XI intelligently searches through a scanned form, or one created in Word or Excel, and converts fields into their interactive counterparts. Although the conversion needs to be checked and some field names may need to be changed to ensure they’re unique, it’s much faster than recreating a form from scratch.
PDF files can now be converted and exported to Microsoft Word, Excel and PowerPoint (something that some other PDF tools, such as Nuance PDF Converter, have been able to do for a while). You can export complete PDF files or select items, such as tables, graphics or paragraphs of text, for conversion. Most formatting is retained as you move from PDF to other formats, and the facilities make it a lot quicker to repurpose PDF material.
Electronic signatures can be typed or added as images or electronic certificates and are then copied to Adobe’s EchoSign service, where they are archived in the Cloud, in case of loss at the local level.
Adobe Creative Suite 6 includes Acrobat Pro X in all versions of the CS6 suite except Production Premium. Most of the other applications within the suites can also produce PDF files directly, and there are more original PDFs created from InDesign and Photoshop than from Acrobat – although this doesn’t hold true if you include archival materials, of course.
CS6 is the foremost integrated graphics suite on the market, with special abilities to transfer file formats between its component applications, such as Photoshop, Illustrator and InDesign, and to call on the specialist apps to edit photos, drawings and other elements in place on desktop published pages.
Certain improvements in CS6 components have particular relevance to the use of PDF files. For example, InDesign can now create and edit interactive form elements which translate directly to PDF forms when saved.
Another popular business PDF creation tool is Nitro Pro 8, which is claimed to be used by 50 percent of Fortune 100 companies. It performs the same kind of tasks as Adobe Acrobat and can convert to PDF from over 300 different file formats.
The program can combine files of different formats into a single PDF and includes a virtual print driver to enable PDFs to be produced from any application that can print. The program links into Word, Excel and PowerPoint, so PDF files can be created with a single click from their ribbons. It supports PDF/A archival and enables direct scanning to PDF from a variety of devices.
Nitro Pro 8 supports direct editing of text with automatic reflow and paragraph detection and image editing, from resolution changes to adjustment of contrast, colour and brightness. As well as being able to export to Word and Excel, it can extract text to plain text files and export images from PDF pages in the most popular graphics formats.
Electronic form creation is helped by the ability to add live fields to existing static PDF forms, and to add form logic to produce automatically calculated field values.
activePDF provides modular, company-wide PDF support through products such as DocConverter, Meridian, Portal, Server, Toolkit and WebGrabber. DocConverter converts up to 300 different formats to PDF, including 40 image types, and can be used to create ‘watch folders’ where files are simply dragged and dropped for conversion.
Meridian provides network PDF printing by setting up virtual printers to which jobs can be sent. Both the virtual printers and settings for designated physical output devices can be set centrally so that all PDF output adheres to house standards. The other modules handle PDF form filling, PDF generation, PDF tools such as merging and encryption and direct conversion from HTML pages.
Corel PDF Fusion is a versatile document viewer which can also generate PDF files. It supports up to 100 different file types, where pages can be pulled in, viewed, rearranged, edited and annotated, before being saved out as a PDF. It also supports the output of XPS and Microsoft Word formats and has the added advantage of a comparatively low price for basic PDF conversion.
Global Graphics gDoc Fusion is a PDF portfolio tool, providing quick and easy compilation of disparate files into a single PDF. It supports up to 200 different file types including raster and vector graphics, simple text, word-processed text, spreadsheets and presentations.
The program allows you to arrange them by dragging and dropping thumbnails, and view them quickly page by page, before saving as PDF or Word format documents. The imported files can be of dissimilar formats, so a PDF can be compiled from photographs, charts, tables and words all produced by different applications and saved in their native formats.
Foxit IFilter is an indexing tool for PDF files. It can batch and index large numbers of PDF documents which can then be searched from the desktop, from a server or through the Web. IFilter indexes content, titles, subjects, authors, keywords, annotations, bookmarks and attachments, giving more precise search results than Adobe’s own search tools. It can work with PDF documents from a variety of different sources, including email attachments and database records.
pdfFactory Professional is for PDF-based archiving and includes direct support of PDF/A. When creating an archive it gives you the option to add page numbering, headers, footers and watermarks, and to automatically generate a table of contents. The program supports annotations, fill-in forms and signatures, and enables quick emailing of resulting PDFs.