In a previous post, we conducted an overview of the two leading output formats, TIFF and PDF and document capture. We already know that those are the two file formats that are the most used in document imaging project and their main characteristics. But when we scan a paper document, which is the best format for us to use?
There are a number of criteria to keep in mind:
- Conservation: PDF/A, thanks to the ISO 19005-1:2005 standard, is the best option when it comes time guarantee the longevity of the files subjected to document imaging.
- Size: Normal PDFs are a format that take up less space than TIFF files. That, however, changes with PDF/A: those files become larger because they have the source archives embedded in them. With PDF files of images, the size is going to depend on the compression used for the image contained in the PDF file. Regarding “searchable” PDFs, those are typically 10% bigger than the equivalent image. Conclusion: speaking generally, PDFs tend to be of a “lighter” format, but it’s also necessary to consider the class of PDF we’re going to be working with before we draw conclusions.
- Search capacity within content: The PDF comes out on top once more, since the TIFF format was created to store images, and not text. Microsoft has developed a searchable TIFF format, but we’re not talking about an industry standard. To be able to search for text within a TIFF image, we’ll need to have an OCR application and for the extracted text to be stored in another manner (a database or other file).
- Security: Unlike the TIFF format, PDFs permit restricting access by using passwords and other mechanisms.
- Multiplatforms: Both types are perfectly recognized by UNIX and Windows operating systems.
- Metadata: Both systems allow users to store metadata. However, the system behind PDF is more sophisticated, since it permits embedding metadata contained in PDF files in XML format.
- Rich text: The winner, once again, is PDF: it allows you to include links, annotations, marks, labels and other elements in the file’s content.
- Accessibility: Unlike TIFF files, PDF files can be used with access technologies for people with special needs; for example, a screen reader can read a PDF; with TIFF, that isn’t possible.
- Quality of presentation and visualization: Both formats can produce these, but TIFF and image PDFs are subject to restrictions on the resolution of the image. In this case, a normal PDF is the best option. There are a number of applications to visualize both types, although the range is wider for PDFs. Regarding on-line visualization, neither of the two formats has native support for web browsers, although the majority of them already contain Adobe Reader to solve this issue. Regarding web browsers, the PDF format does offer the chance for web content optimization.
Without a doubt, if we pay attention to all of these criteria, the format to go with is PDF. However, not all of these criteria have the same weight with all projects, which means that, in each case, analysis should be carried out. Even if we still decide PDF is the best option of all of them, we also have to decide at the same time which of the PDF formats best meets our needs.
NB: A fair amount of the information contained in this post has been taken from the document called "TIFF versus PDF for Document Storage".