In document capture projects, whether they’re for conducting imaging of documents on paper or capture from mobile units, it’s important to choose a file format that, after the scanning of documents, allows us to save those documents with the highest quality and most information possible. With this in mind, we’ve got two winning formats:
- TIFF (Tagged Information File Format): These files carry the .tif or .tiff suffixes. TIFF is a 27-year-old creation of Adobe that had the objective of creating a standardized format for document imaging. TIFF is probably the best option for preserving images for more than one reason (there are one or more pages, it supports all means of color coding and many algorithms for document compression), although it has one major drawback: the size of the files. Sharing images in TIFF format probably isn’t the best solution, but capture or document management solutions have options for converting .tiff files in easier-to-carry formats.
- PDF (Portable Document Format): The standard for open format, converted into an international standard by the ISO. It’s another one of Adobe’s inventions, and even though it’s a bit younger, it’s more widely used than TIFF is. In order to guarantee the survival and conservation of PDF documents, ISO32000 tells software developers who produce, read or operate with PDF files the characteristics that these files should have. PDF handles multi-page documents and its strongest point is that it allows users to visualize documents independently from the tech environment it was created in, or in which it’s being viewed (multiplatform). There are a lot of different classes of PDFs and the two most important groups are normal PDFs and image PDFs. True PDFs (“normal” PDFs) include formatted text and users can search within the content, or copy and paste text. The second group is image PDFs (Wrapped PDF), which consist of a PDF format that contains an image, generally in TIFF format. Because they’re images within a PDF format, you can’t search in the text or copy/paste text. In this category of PDFs, OCR software is vital for indexing file content, doing searches or extracting data. There’s also a third group called “Searchable” PDFs, which is an image PDF that can have a layer of text added to it. This layer is generated by an OCR motor and offers all of the possibilities that a normal PDF offers.
In a future post, we’ll explain how to choose between the two formats; today, we just wanted to highlight the two formats that are most commonly used when it comes time to undertake a document imaging project.