Monday, September 27, 2010


OCR Software (Optical Character Recognition) is, in short, to extract text from images.
According to Wikipedia it is the mechanical or electronic translation of scanned images of handwritten, typewritten or printed text into machine-encoded text. It is widely used to convert books and documents into electronic files, to computerize a record-keeping system in an office, or to publish the text on a website. OCR makes it possible to edit the text, search for a word or phrase, store it more compactly, display or print a copy free of scanning artifacts, and apply techniques such as machine translation, text-to-speech and text mining to it.

Here we present a video, if a picture is worth a thousand words, we have to get the calculator to estimate its value ;-)

Of course, "interacting with a text editor program" has many implications:
-Being able to search documents in our intranet, corporate document management or ECM, making a full text search, which means searching inside the document text.
-Ability to transform the document (TIFF, JPG, PDF unindexed, PNG or other image formats) in Word format (. Doc), Open Office (. Odt) or any other editable format we choose, to edit and improve the document (text format).

It also has some implications of its own in the Intelligent Document Management environment we provide in Yerbabuena:

  • We can metadata automatically, because we can reason about the contents of the documents.

  • We can start a workflow for review, approval etc. based on the content of the document (again because we can extract all the text of the document and with that information, decide what type of document (invoice, contract etc...) and start the given workflow.

  • We can improve the outcome of any OCR software with ICR (Intelligent Character Recognition): we are able to infer automatically the contents of a word that might not have been scanned properly because the paper was folded in a corner, ink on paper was deleted or anything else.

There are proprietary solutions such as Kofax, Abby and many others, integrated with ECM solutions.
As always, the advantage of our OCR is that it is based on open source, so the price advantage is great, without losing the technological power
and gaining freedom: you can choose another company to evolve this software, if you think necessary, without any limitation.

In addition, these proprietary solutions scale in price greatly: usually you purchase per user licensing, so if your organization has 2,000 users, multiply by 2,000 the cost of each license (with volume discounts).
In the case of our OCR module, the cost is only for implementation. We charge by the car engine, we do not care if you're going to bring the car always full or empty.
Obviously, and as the organization has a certain size, cost becomes a fraction of the cost of proprietary OCR systems without loss in reliability (unlike the proprietary solutions, which do not incorporate semantic technology to make intelligent OCR or ICR) .

You can watch more on Yerbabuena Software's Youtube Channel. Share

No comments:

Post a Comment