Standardised long-term archiving with PDF/A

Structured digital archiving folders

Why an ISO standard from 2005 is still future-proof

Analogue archiving media such as microfilm and paper have long been a thing of the past. Even the image format TIFF, the first digital format, was replaced in the 1990s by a more efficient one, namely the Portable Document Format (PDF) published by Adobe Systems. Based on this, the archive format standard PDF/A was developed, which still plays a decisive role today for long-term archiving as required by law.

What became standard in 2005

In 2002, a working group for digital long-term archiving began its work at the International Organization for Standardization (ISO). Three years later, the ISO published the PDF/A format under the name "ISO 19005-1:2005" as the first file format standard for electronic long-term archiving, and this worldwide.

Why long-term archiving at all?

Business documents, such as invoices or contracts, must be archived for up to ten years - our legislation stipulates this. The most important retention periods for companies are set out in the German Fiscal Code (AO), the German Commercial Code (HGB) and the German Value Added Tax Act (UStG). In addition, there are also sector-specific retention obligations for documents in public administration, hospitals, the construction industry, etc.

For digital long-term archiving, the following criteria apply to file formats:

  • Openly standardized (openly specified if necessary); no proprietary format
  • widespread
  • low complexity
  • without access protection mechanisms such as copy protection or encryption
  • self-documenting
  • robust
  • no dependencies on other file formats
  • license-free
  • validatable

Companies must therefore consider how they will store business documents in the long term. In addition, high standards apply to data archiving because content must always be presented consistently and meet archiving criteria.

What is behind PDF/A?

Almost everyone has opened or worked with a PDF document at some point. The format is secure, easy to use, and storage-efficient. PDFs look the same everywhere, regardless of device or operating system. Another advantage is that different file elements can be embedded, such as fonts, graphics, 3D objects, and audio/video.

However, standard PDF files are not sufficient for long-term archiving because they can be modified afterward. PDF/A, on the other hand, enforces strict archival requirements. For example, archived documents must not be password-encrypted so content remains accessible at all times. In addition, video or audio files are not permitted, because content requiring external playback software must be excluded. JavaScript is also not allowed in archival documents.

After PDF/A became the global archiving standard in 2005, several sub-versions were developed as further evolutions.

The PDF/A-1 format with conformance levels a and b is based on PDF version 1.4 and was supplemented in 2011 by the ISO 19005-2 standard, which offers extended possibilities for archiving PDF documents. The PDF/A-2 format can be divided into three conformance levels: PDF/A-2a, PDF/A-2b and PDF/A-2u. PDF/A-3 has been available since 2012.

PDF/A-1 (2005)PDF/A-2 (2011)PDF/A-3 (2012)
Based on PDF 1.4Based on PDF 1.7 (ISO 32000-1)Based on PDF 1.7 (ISO 32000-1)
ISO 19005-1ISO 19005-2ISO 19005-3
Exact visual reproducibility + accessibilityExact visual reproducibility + accessibility + extensions (e.g., JPEG 2000 support and very large page formats)Exact visual reproducibility + accessibility + extensions (e.g., embedding source files such as XML, including support for e-invoicing use cases)

Why different levels of conformity?

The different conformance levels a, b and u reflect the quality of the archived documents and are based on the input material and the intended use.

For PDF/A-1, a distinction is made between PDF/A-1a (level a) and PDF/A-1b (level b). The latter stands for clear visual long-term reproducibility. With PDF/A-1b, all inserted images must be firmly embedded in the document so that it functions completely autonomously. In addition, the text modules must have Unicode representation in order to be reproducible forever. PDF/A-1a - Level a offers more. Here, clear visual reproducibility, including the ability to reproduce text according to Unicode and structuring of the content of the document in terms of accessibility, is required.

While PDF/A-1b meets minimum ISO requirements, level a goes further and should be used, for example, when document structure and accessibility requirements must be fully met.

Archiving with PDF/A

The PDF/A-1 format is ideal for long-term archiving. Both PDF/A-1a and PDF/A-1b offer requirements for secure data transmission and archiving. Archived documents, including attachments, remain readable over time regardless of the software used. It also ensures legally compliant storage according to prescribed retention periods. Documents remain readable and their appearance is preserved.

In addition to the original PDF/A-1 format, there are now the newer versions PDF/A-2 and PDF/A-3, which contain more features. Compared to the PDF/A-1 format, the PDF/A-2 format additionally offers the possibility to process JPEG 2000 as well as large page formats, which is especially interesting for libraries or large archives. PDF/A-2 also allows several files to be merged into a container PDF.

Where PDF/A is used for long-term archiving

The paperless office is far from being an everyday reality in all companies. For purely digital archiving, paper files and documents must be scanned in order to be digitised.

Incoming mail by e-mail including document attachments and other incoming mail by letter must be stored for ten years, just like other office documents (text documents, tables, presentations, etc.). Brochures or magazines that originate from layout programmes or editorial systems must also be converted to PDF/A for storage.

Since PDF/A-3, image files and complex CAD drawings can be embedded in a PDF/A file in their original format. Hybrid archiving of PDF document plus original file is therefore no longer necessary.

Conversion to PDF/A format

PDF/A has established itself as a widely used standard for long-term archiving. Many companies are faced with the challenge of making the path of their documents to PDF/A format as efficient as possible. After all, different source formats should be converted directly into PDF/A for archiving without much effort.

The most important advantages of webPDF for PDF/A conversion at a glance:

  • webPDF converts documents from over 100 file formats directly into PDF/A and applies required corrections and additions.
  • All conformance levels (A Accessible, B Basic, and U Unicode) are supported.
  • On request, the PDF engine checks compliance with PDF/A-1 (ISO 19005-1:2005), PDF/A-2 (ISO 19005-2:2011), and PDF/A-3 (ISO 19005-2:2012).
  • webPDF can output detailed reports in XML format.

Learn more about webPDF

This is why PDF/A will remain up-to-date for a long time to come

Companies benefit from the ISO standard because it helps to keep digital documents compliant with legal requirements. PDF/A is the preferred long-term archiving format - and for good reasons. For a whole decade, experts have been developing PDF/A and its conformance levels. Experience shows that what has become a quasi-standard does not disappear from the market so quickly. Experts agree that PDF/A will remain a future-proof format that companies and public authorities can rely on. And the fact that Microsoft allows the direct creation of a PDF/A document from the Office palette is also an indication that PDF/A is more than just a flash in the pan.