Transferring PDF/A documents into an archiving system

In this specific case, a customer had the requirement to automatically transfer PDF/A documents that were created with webPDF into an archiving system and to add a series of metadata to the documents.

The files were initially available in many different formats and were then to be converted into PDF/A and metadata added.

Aim and content of the project

The customer’s requirement and the aim of the project was to develop a graphical application which, after selecting a file or IMAP folder, converts the documents contained therein to PDF/A (PDF/A 3b). In the process, the subfolders of the selected base folder were to be included in the conversion and the folder structure was to be retained in the output. The created PDF/A documents should also be supplemented with metadata so that a downstream archiving system can archive the documents in a targeted manner using the metadata.

For this purpose, a Java application (OpenJDK based) with a graphical user interface (GUI) has been developed. The programme should be able to run as a “stand-alone” application without installation under Windows (version 10 or higher) or Linux and provide all the necessary resources (e.g. Java). A prerequisite for the project was the installation of webPDF so that the required web services for the conversion would be available.

Conversion details

During conversion, the user can choose between a local file folder (“File-To-PDF”) and an IMAP folder (“IMAP-To-PDF”) for the base folder. Access to the IMAP mailbox is set via the administration. During conversion, the folder structure of the basic folder (also for IMAP) should be retained. The names of the files/documents contained should also be retained. Duplicate file names are numbered consecutively.

Settings made are saved in a configuration file for the programme and loaded and used when the application is restarted.

All files in the selected IMAP/file folder and subfolders are converted to PDF/A-3b. In addition, certain data formats can be excluded from the conversion by configuration. If files should not or cannot be converted, a PDF with a reference to the original file is created.

Furthermore, it was determined how to proceed with file attachments from e-mails or with password-protected files and image formats. In the latter case, for example, there is the option of OCR character recognition for each image format (as an option in the dialogue).

Individual functions

In addition, details regarding administration, logging and metadata were agreed upon, so that it was clearly regulated which settings were specifically intended for administration. For logging, it was planned that all (error) messages of the programme should be displayed in the GUI and additionally (with extended information) written to a log file of the application. This should ensure that they can be used for support purposes. During conversion, additional metadata should be written to the XMP block of the PDF/A document. This data should be partly entered via the GUI and partly determined from the document itself.

If you have any further questions about archiving projects, please do not hesitate to contact us. We are also able to deal with individual cases and will be happy to work out solutions for them: https://www.webpdf.de/en/support.

Read more about other archiving projects: