Thursday, May 2, 2013

Good practices on Document Management- Folders vs Tags vs Taxonomy vs Metadata

A few days ago, on Document Mangement System Group -Where there's always a good amount interesting discussions- there was a debate about what was the best way to store, organize and recover documents. Among the options provided, some were storage using folders, the use of taxonomies and metadata.

Personally, I believe that on the question there are some misunderstood issues. Below there is an explanation to some of them.

On a real Document Management System metadata is simply -ESSENTIAL-. They are not an option, and if your provider doesn't offer you the metadata option it means we are not talking about a Document Management System. In fact, they have to be the most basic option when recovering documents: we should be able to find a document by its title, creator or modification date at least. The system should maintain the Dublin Core schema, but it is much more flexible and tailored to our needs if we have the possibility of defining our own specific metadata for different types of documents.

2. What is useful to browse and what is useful to find
One thing is browsing documents, I mean, moving around a document hierarchy and another thing is trying to find a document directly, that is, without browsing. Indexing a document content and indexing metadata allow us to go directly to a document, or browsing a folder structure. If we know where a document is on a folder structure, we could navigate to it. If we don't have any idea -frequently when we work with many users and many documents- what we need is technology that allows us to find, not to browse. Physical classifications may serve to browse, virtual classifications to find.

3. Physical and logic classifications
With a Document Management System we could, or we should, be flexible when organizing our documents and when retrieving them.

On the physical classification we find the folder structure where we are going to store the documents. This folders can be organized in many ways, for example, by year, by type of document, by department or by any other characteristic that fits the company's work method.

Logic or virtual classifications are not bound to a phisical structure, they are classifications that take one criteria to generate a virtual classification. For example, on a blog, we could show posts ordered by year but this doesn't mean that each year has a folder where the entries are stored. On the other hand, this virtual classifications are not linked to storage, because they would lose their virtual essence. But they are linked to the document retrieval. There are options to recover documents following different criteria that several systems offer us. Usually, virtual classifications take the criteria of document metadata organization.

For example, in Athento, we have the following virtual classifications:

Tag clouds: They are tags generated automatically by the system. They allow us to browse all the documents on the system by key terms that might describe the document content.

Classification by subjects: Usually, these classifications organize documents by the metadata “subject” or “theme” included on the Dublin Core Schema.

Classification by geographical coverage: These classifications organize documents by the metadata “coverage” that represents the geographical context where the document is relevant.

Here are other possibilities in Athento for recovering documents. These are not classifications, but virtual organization mechanisms:

Document geolocalization: Implies having the geographical point (GPS coordinates) where the document was created. In Athento documents could be seen on a map by its localization.

Relations: Allows access to a document that has a relation with another document.

4. Taxonomy
Taxonomies are categorizations based on meanings related with the document content. We can have physical classifications that follow a specified taxonomy and at the same time we can have taxonomical logical classification. For example, in case we have to organize the company's bills: We can organize them by the year they were generated or by the managing department. On this case, we are talking about a business classification, not a taxonomical one, because this clasification doesn't take into account the document content. We can also organize them depending if they are related to a product or a service. In this case, a basic one, we are taking into account the bills meaning and its content to categorize them.

Having a categorization by the document topic will be a way to use a taxonomy.

Taxonomies enrich the browsing possibilities and the document retrieval possibilities. 

 As a final conclussion, we can say that any of these methods are complementary. In fact, only few systems used only one method in particular and if they do, that makes them inefficient. The more options to browse and find in our DM system, the better, since the possibilities that it becomes useful to us will be much greater.

DOWNLOADSWe explain how Athento helped Crisa manage technical documents.

Popular posts:
Comparing Document Capture Solutions (Athento, Kofax, Ephesoft etc.)
Document Management Success Case at BBVA, managing 7 million records.
Comparing ECM Systems (including Alfresco, OpenText, Documentum, Filenet, Sharepoint or Nuxeo).

LikeUs Yerbabuena Software on LinkedIn

No comments:

Post a Comment