Happy Birthday: 25 years of PDF

The most important data format in today's working world, whether for online applications, email archiving, or long-term archiving, is without question the Portable Document Format (PDF).

The most important data format in today's working world, whether for online applications, email archiving, or long-term archiving, is without question the Portable Document Format (PDF).

An electronically or optoelectronically readable representation of data is called a barcode. Classic linear barcodes use bars and spaces, while 2D barcodes use geometric patterns to store more information in less space. A QR code, for example, uses a square matrix pattern.

This example explains how to use the OCR webservice of webPDF. OCR in webPDF is based on Tesseract. By default, German, English, French, Spanish, and Italian are supported. Additional languages can be installed in the Tesseract folder (see the webPDF manual for details).
Languages using a multibyte character set are currently not supported, for example Arabic and several Far Eastern languages. OCR is mainly useful for documents that contain text visually, but not as embedded searchable text. For extracting already embedded text from PDF documents, webPDF provides an option in the Toolbox webservice.