Wednesday, June 5, 2013

Improvements with Digitized Document Images

As you know, we here at Athento have been dedicating ourselves tirelessly to investigating document capture for some time. Our objective is to get many manual tasks, such as data extraction or document classification, to be done in a completely automatic way, with the highest precision possible.


In order that these tasks can be automated, especially the extraction of data, the images have to meet certain minimum quality criteria. Anyone who’s ever had to scan a document knows that once the thing’s been scanned, the document can end up with defects like blurring, black (or white) edges, being off-center, etc.


When data has been extracted from a document, one of the base technologies applied to it is OCR (Optical Character Recognition). Current OCR motors have problems reading the content of the document when the document ends up with quality defects like noise. “Salt and pepper” noise, which isn’t anything but a bunch of grainy spots spread throughout the image, negatively affects the performance of OCR.


Below, you can see a digitalized image which is grainy and contains a fair bit of noise:



In order for data extraction to be the most precise possible, noise has to be eliminated from the image. Francisco González, one of our engineers (affectionately known as “Kurro”), has made it possible for Athento to significantly “clean the noise” from digitalized images.
Here’s the same image, but after being improved and cleaned up by Athento:




Congratulations, Kurro: impressive work!



DOWNLOADSWe explain how Athento helped Crisa manage technical documents.


Popular posts:
Comparing Document Capture Solutions (Athento, Kofax, Ephesoft etc.)
Document Management Success Case at BBVA, managing 7 million records.

LikeUs Yerbabuena Software on LinkedIn
Share

No comments:

Post a Comment

AddThis