PDF/A – The format of the future – Part 2: PDF/A-2

As our blog series discussed earlier in the first part entitled PDF/A – The format of the future – Part 1: PDF/A-1, the PDF/A format (A=Archive) is already being used with great success for archiving data.

And because the benefits of PDF/A are so compelling, even official offices and the administrative sector, which have to archive documents for periods of years if not decades, have since come to fully accept and appreciate using this format. It’s a message worth repeating: Thanks to ISO certification, PDF/A is the ideal archiving format for government agencies, archives, libraries, publishing houses and similar institutions. The PDF/A file format is also being placed into wider service in conjunction with the use of the e-case file (electronic records), simply because it is superbly suited for ensuring legally compliant archiving.

Using PDF/A for long-term archiving

It really only makes sense to store data over long periods of time when such files can be found and opened anytime at a later date. The requirement for the unrestricted reproduction of a PDF/A document in the future necessitates that all the important information be contained within the document at all times. Such information includes texts, fonts, and graphics. Archived documents must remain readable and permanently retain their visual appearance so that the conditions that apply for statutory retention periods can also be fulfilled dependably.

A number of subformats, all of which are advanced developments, were produced after PDF/A was declared the global standard for archiving in 2005. The PDF/A-1 format (PDF/A-1a & PDF/A-1b) was extended in 2011 by the ISO 19005-2 standard. What was added was the PDF/A-2 format, which is divided into the three conformance levels PDF/A-2a, PDF/A-2b and PDF/A-2u.

What sets PDF/A-2 apart and what is it used for?

The Portable Document Format (PDF) is especially well-suited for long-term archiving purposes, particularly for businesses faced with the challenge of organizing and managing large volumes of documents (including e-mails and their attachments). Just like PDF/A-1, the PDF/A-2 format offers exact visual reproducibility and satisfies the requirements for accessibility. Furthermore, it enables users to handle JPEG 2000 and process very large page formats. JPEG 2000 image compression is important for scanned documents so as to achieve better quality. JPEG2000 offers a lossless level of compression, a factor that plays an especially important role for libraries and archives interested in scanning and preserving historical documents (such as maps).

With PDF/A-2 (unlike PDF/A-1) you can also merge multiple files into one container PDF. This feature can also be used as part of e-mail archiving when you want to file the attachments separately from the e-mail texts. In addition to that, PDF/A-2 has improved the ability to work with transparency effects, something that is also important when the original file is a PowerPoint presentation or a PDF with highlighted texts. Other important features of PDF/A-2 are that layers are allowed, OpenType fonts can be embedded, and the digital signatures it uses comply with the PAdES standard.

The PDF/A levels 2a, 2b and 2u are intended to be an extension to PDF/A-1 (i.e. PDF/A-1a and PDF/A-1b). With them, long-term archiving can be undertaken that even allows transparent objects, layers, page scaling and OpenType fonts. PDF/A-2 stores these in a central file (which in turn allows more fonts). With PDF/A-2 the stored information is preserved faithfully and exactly, and can be reproduced flawlessly – even after an extended period of time and when different technical tools and systems must be used to access and render them.

The subformats PDF/A-2a, PDF/A-2b and PDF/A-2u differ as follows:

  • PDF/A-2a, where a stands for accessible, mostly concerns the structural and semantic properties of the documents to be preserved.
  • PDF/A-2b, where b stands for basic, emphasizes exact visual reproducibility and is especially well-suited for long-term archiving; an ideal format for preserving images and graphics.
  • PDF/A-2u, where u stands for Unicode, whereby all text in the document has Unicode equivalents (and can be depicted later without any problems); level 2u ensures reproduction can also be made in other countries and writing systems using international coding.

Important: Both formats PDF/A-1 and PDF/A-2 remain valid even after the introduction of PDF/A-3. Although levels 1a and 2a are already adequate for archiving text, it is recommended that at least levels 1b or 2b be used for any long-term archiving that includes graphics, images or tables. The PDF/A-2 format described here is employed in particular when custom fonts and images are being used, as these are not so easily capable of being archived in other formats. The current standard format is PDF/A-3. We will go into more detail about this format in Part 3 of our blog series.