Transferring PDF/A documents into an archiving system

In this specific case, a customer required PDF/A documents created with webPDF to be transferred automatically into an archiving system and enriched with metadata.
The files were initially available in many different formats and were then to be converted into PDF/A and metadata added.
Aim and content of the project
The customer's requirement and the project's goal were to develop a graphical application that converts documents from a selected file or IMAP folder to PDF/A (PDF/A-3b). Subfolders of the selected base folder should be included in the conversion, and the folder structure should be preserved in the output. The generated PDF/A documents should also be enriched with metadata so that a downstream archiving system can archive them accurately.
For this purpose, an OpenJDK-based Java application with a graphical user interface (GUI) was developed. The program should run as a stand-alone application without installation on Windows (version 10 or higher) or Linux and include all required resources (for example, Java). A prerequisite for the project was the installation of webPDF so that the required web services for conversion were available.
Conversion details
During conversion, the user can choose between a local file folder ("File-To-PDF") and an IMAP folder ("IMAP-To-PDF") as the base folder. Access to the IMAP mailbox is configured in the administration. During conversion, the folder structure of the base folder (including IMAP) should be preserved. File and document names should also be retained. Duplicate file names are numbered consecutively.
Configured settings are saved in a configuration file and loaded again when the application is restarted.
All files in the selected IMAP/file folder and subfolders are converted to PDF/A-3b. In addition, certain data formats can be excluded from the conversion by configuration. If files should not or cannot be converted, a PDF with a reference to the original file is created.
It was also defined how to handle email attachments, password-protected files, and image formats. For images, optional OCR text recognition can be enabled in the dialog.
Individual functions
In addition, details regarding administration, logging, and metadata were defined so that it was clear which settings are intended for administration. For logging, all messages (including errors) should be displayed in the GUI and additionally written to the application's log file with extended information. This ensures that logs can be used for support purposes.
During conversion, additional metadata should be written to the XMP block of the PDF/A document. Some of this data is entered via the GUI, while other values are extracted from the document itself.
If you have any further questions about archiving projects, feel free to contact us: https://www.webpdf.de/en/support.