Monday, January 9, 2012

Why should we apply semantic technology to Document Management?

"Semantics is starting to be seen as the ally of document management to address the excess of digital information we have"

Today we work and share all kinds of digital information. We search what we need on the internet, we have information available everywhere. With so much information it begins to be increasingly harder to find what you're looking for. According to a McAfee study in 10 different countries and a sample of 3000 users of new technologies, on average each of these people has 2777 digital files stored in at least one device.

Multiply this figure by the 500 employees of a mid-size company, and we have a volume of more than a million, precisely 1,388,500 files. DMS or ECM systems help us keep them organized, having them safely accessible, decrease the amount of paper, but the truth is it's becoming more difficult to find quickly what we need, and worse, it is increasingly difficult to take advantage of the information contained in company documents, since they are so many that it is difficult to construct knowledge from them.

Neither is easy nowadays to find a document or file using the traditional data by which we identify it, as its name, author or creation date, being that these data are not exactly what people remember accurately.

To help solve all these difficulties we have semantic technology. Defining descriptions over documents or entities such as people, companies, countries, or simply, any relevant data within documents, and getting the documents associated because they share similar content or entities, we can dramatically improve the mechanisms we have to retrieve documents or find information about something we need.

Let's see an example, in the following picture there are two distinct mechanisms to find a document.
The image below is a traditional form that asks you to remember precise details, such as title, description or creation date range. On the right we see a cloud of tags. If we click on a word from the tag cloud we will go to a bunch of documents whose content is related to that term.


Which method is more effective? Well it depends on if we know precisely all the document data, the problem is that we don't always know all that data from our 2,777 files (on average per person). Which way is faster? A click will always be faster, of course, than filing out a form.

But the secret is not in the search mechanism used, that's not an example of semantics, most document management systems have the ability to tag documents. The difference comes when the system itself fails to understand the content of documents and identify relevant information in them, in order to relate all those documents dealing with that same content.


There we find semantics, when the system fails to understand that a document is about "innovation" and group it with others that also address the subject.
The semantics then, in this particular example, saves us work like having to think about what are the words that summarize the contents of the document and helps us find other documents quickly which we may also want to review because they are about the same subject.

This is just one example of semantics applied to document management (Athento autotagging), but we are also developing other applications that will gradually facilitate managing such huge amounts of information.

LikeUs Yerbabuena Software on LinkedIn Share

No comments:

Post a Comment

AddThis