Wednesday, September 4, 2013

How does Athento classify documents and extract the data?

Some of you are already familiar with Athento’s document capture features. We’ve also told you that, very soon, version 2.0 of Athento will be available. But for some of you who are new to the platform, we’d like to explain how Athento works.

Athento basically works using the definition of models. A model is a type of document and indicates various things to Athento:

  • The physical appearance of a document and its content;
  • The metadata that should be extracted from a document type;

Defining a model in Athento makes classification possible in two ways:
  • Athento can identify a document that is uploaded to the system and determine that it belongs to a particular document type (such as “invoice from Amazon”)
  • Athento can extract the metadata that have been previously defined as data to be extracted from this particular type of document (such as the total invoice amount, for example).

That means that it’s necessary to create models before beginning to classify documents and extract data from them. Creating a model means defining the following characteristics for a certain class or type of documents:

  • Basic data: A name and, what’s most important, a document that Athento can use as an example to know the physical characteristics of this type of documents (layout, colors, limits, etc.) 
  • Key words (regular expressions): These are expressions, words or groups of terms which normally appear together in a document of this type.
  • Metadata: Points out expressions that help us find the metadata to be extracted within the text of the documents. 
  • Extraction templates: Templates which define the physical location (coordinates) of the metadata within a document, so that the OCR system can extract them.

It’s really easy to create models in Athento. To see how it’s done, we invite you to consult our Athento Documentation Center; and, specifically, the entry called “How to create a new model in Athento.

Discover how smart document capture can make any document imaging process more efficient.
Discover how a smart document capture process it

No comments:

Post a Comment