Wednesday, November 22, 2017

Countdown to GDPR

Soon the new European data protection regulation will come into force. Are we ready to face it?
The entry into force of GDPR in Europe is imminent. The effective date is May 25, 2018 and by that day companies must be ready to comply with the new regulations. The consequences of non-compliance can range between 4% of the company's income or 20 million euros in fines.

However, many organizations have not yet implemented the appropriate and necessary processes and technologies to comply with this new regulation on the privacy of personal information. The cases of ransomware or leakware have increased in these last two years, with serious consequences for the economy and the prestige of its victims. Are we prepared to reduce the impact of these types of attacks on our companies, and to comply with GDPR?

GDPR is the most stringent data protection regulation in the world. It has been designed so that the individual has the greatest possible control over their personal data: how they are processed, used and stored. This has a special impact not only on multinational organizations with access to personal information of European citizens, but also on companies. 90% of corporate documents contain some type of personal information, whether customers, employees or third parties.

For this reason, the document management software used in a company is key to comply with the rules of protection of personal data and privacy. The software that we use to store our documents must guarantee the protection of the personal data of the individuals and help us in the management of the rights of the individuals and our obligations as data holders.

The functionalities available in the document management software should include the requests of the users, such as the "right to be forgotten", as well as the ability to qualify the information according to the levels of privacy and confidentiality. They must also give us the tools to make sure that we can report any incidents to the competent authority within 72 hours of the event.

It is important to consider performing a situation analysis to determine if our company and our document management software are prepared to comply with the GDPR standard. This analysis should include the following steps:

  • Analyze thoroughly the current situation of personal data management. It is important to define the incoming and outgoing data of the company: what information is stored, how it is processed, who accesses it, etc. This includes exchanges with suppliers, partners and government agencies, which should also comply with GDPR.
  • Define what data is unnecessary. You can't accumulate data regardless. Each stored data must have its reason for being now and should not be stored for future potential use. This will reduce the responsibility and the workload on that data.
  • Identify those interactions with clients that require their permission, or in which they should be notified. You must work on terms that meet the requirements of the standard and at the same time are not complicated to understand. These documents must be easily available to the client.
  • Identify other actors involved. The fulfillment of the norm is extinguished if one piece in our chain of access to the information is not fulfilling it. The rules of personal data deletion applies to the entire chain. As part of compliance, we must notify third parties who access our data, for the corresponding elimination on their side.
  • Define the "right to access" correctly. This is an aspect that includes several facets. Our clients should not only know that they have the right to obtain a copy of their data, but should establish the mechanism by which they can do so. This is a joint task for the technology area and the legal area. The document manager will be key in this aspect.
  • Develop a violation notification process. GDPR requires that we notify the appropriate data protection agency when there is a possibility that user data may have been accessed in an unauthorized manner. The regulation specifies a window of 72 hours. The processes and mechanisms to make this notification, and the texts and messages that will be sent must be defined. A specific workflow for the management of these incidents is a technological tool that can be used.
  • Hire additional specialists if necessary. Or assign the task to someone from the company. But the transition must have a leader. The standard contemplates the figure of the "Data Protection Officer" (DPO).
There is little margin in terms of time to adapt to the regulations, but with the right tools, a quick transition can be achieved. The situation analysis must be approached by manageable stages to reach a good port, in such a way that the transition is as traumatic as possible.

Friday, November 10, 2017

Semantic Tagging in with Athento

Below we will learn how it is possible to use the Athento app on to tag documents automatically. Follow our semantic tagging tutorial for Box. is one of the largest companies in the exchange, storage and collaboration with files and documents in the cloud. According to the Gartner Magic Quadrant, Box continues to be a leader in the Collaboration market in 2017, backed by its more than 74,000 corporate clients.

One of the most frequent challenges encountered by users working in FileSharing applications is the control over the organization of documents.

Today we are going to show you how it's possible - automatically - to achieve an organization and virtual categorization through tags generated by the Athento app in Box.

Semantic tagging

Semantic technologies, in very simple language, seek to find meaning within a given set of data. If we transfer the application of semantic technologies to the world of documents and, specifically, to semantic tagging, the goal is for these technologies to be able to "understand" what the text of the document is about and return a series of concepts or entities that describe that text. That is, technology that will tell us what the document is about without us having to read it.

So far the semantic application. Then, we have the power of a very simple but useful functionality: labels. Labels help us access all the documents that share a given label. If the labeling has been semantic, it allows us to access all the documents whose content refers to the same concepts or entities.

That is, we create a virtual classification, detached from a physical folder structure, which allows us to access documents whose meaning is related or shared by entities (names of people, companies, cities, themes, etc.)

Let's now explain step by step how to do semantic tagging on with Athento.

Tutorial: Athento's Semantic tagging in Box

You can follow  this tutorial step by step to do semantic tagging in thanks to the Athento application. All you need is to have a Box account, free or paid. Once you have your account, you only need to continue with the following steps to tag documents automatically with Athento in Box.

1. Install the application: To do this, within the Box applications, look for "Athento". This option is found in your user menu, in the upper right corner of the screen. In applications, you can put "Athento" in the search engine. When you find it, you have to add it from the left panel and then choose the option to accept.

2. Upload documents: If you are already a Box user and you have your documents there, you do not need to upload any new documents. If you have made a Box account to test Athento's semantic tagging, you must go to the document panel and upload a new document. You can do it by dragging it from the desktop to the web or from the "Load" option in the corner panel. Important, the automatic labeling in its free version is only available for PDF files.

3. Analyze the document with the Athento app: We locate the file and select the option of 3 ellipses "more options", which appears to the right of the file name. In the options that appear we go to "Integrations" and then select "Athento Semantic Tagging". Finally, we accept and wait for the result.

4. Review the results: Wait for Athento to extract the text from the document (OCR) and then apply its semantic technology. When the result is ready, you will see a green message at the top of the screen. You will see the new labels under the name of the document.

Semantic tagging on is quite easy, as you have seen in this tutorial. For clients with more complex needs, the Athento team can activate specific ontologies (related to your business), as well as massively analyze the documents.


Monday, November 6, 2017

Reasons to use MongoDB (NoSQL) as your ECM's Database

In this article we will explore the advantages of using NoSQL databases such as MongoDB to store data and documents

MongoDB is currently one of the most popular NoSQL databases. Unlike relational data bases, the data is not stored in tables, but rather flat files are used in JSON (JavaScript Object Notation) format, which is a widely used standard among a large number of current applications. This allows the integration between MongoDB and these applications much easier.

The term NoSQL refers to "Not only SQL". This means that NoSQL does not use a relational model, and this is useful when the structures of the data you use can vary. It is possible to make changes to the schemas without having to stop the database. NoSQL data bases can be adapted to real projects more easily than an entity-relationship model.
It also has a decentralized structure, which allows it to use distributed schemes. This feature makes it easily scalable. The scalability is horizontal: you can use more machines with less computing capacity, instead of having to resort to a single, more powerful machine. The choice of a NoSQL database is more reasonable if you do not have a large budget for equipment.

Another advantage is that in a NoSQL database, queries for large amounts of data are optimized. To give you an idea, Facebook, Twitter, Reddit or Foursquare use NoSQL databases.

Regarding the limitations of NoSQL, the NoSQL databases do not offer such strict control over the atomicity of transactions. This is a significant advantage in relational databases. The atomicity, is what allows to perform a complete operation involving several tables, without there being changes in the environment, before the transaction ends completely. That is, either the entire transaction is carried out or it is not carried out. For example, the atomicity ensures that in a bank transfer, the transaction is not half done, but if the money is entered into an account, it must leave the other. While this quality impacts the performance of the database, atomicity is what maintains the integrity of the data and, eventually, allows an ordered rollback, if necessary. The NoSQL databases, on the other hand, support an eventual consistency of the data.

The relational model already has more than 40 years of use, so the evolution of the products and the tools of relational databases have a great maturation. The NoSQL databases are not yet fully standardized, so each of them has its own characteristics in terms of queries and does not necessarily maintain compatibility with SQL statements. However, the fact that today has to deal with such large amounts of data, open an extensive panorama for these repositories of data. NoSQL is a good option for those companies that detect performance and scalability problems or costs due to large volumes of data.

Use in Document Management Systems

In a document management system, with large volumes of documents and high numbers of query, write or update transactions, the use of a NoSQL scheme is much more efficient.

The relational data bases require that a schema shall be defined, which is no more than a structure described in some formal language interpreted by the database engine and that describes the skeleton of the existing tables and their interrelation. This introduces a limitation, since the metadata of the stored documents will be restricted to the data type defined in this schema. This limitation does not exist in a NoSQL database.

In addition, within each table you must define restrictions regarding the rows and columns, as well as the data type that can be stored in each column. With NoSQL, based on the type of data, the definition of this restriction is automated, reducing the time spent for development.
The high performance and high scalability of MongoDB makes it ideal for a system of this type. In addition to providing a JSON document structure, it supports a dynamic schema called BSON. In a relational database, files can be stored as BLOB (binary large objects) data types, which have a maximum size of approximately 4.25 Gbytes, which indicates a limit to the maximum size of the stored document. Although MongoDB supports a maximum size of 16Mbytes per document, this limitation can be overcome if GridFS is used, which divides the file into pieces in order to store it, allowing the total size to be virtually unlimited.

It also has a feature called "sharding", which is what allows load balancing between servers, by assigning different data to each one of them, and in this way the query task and data insertion are distributed. A mapping or transformation between the objects of the application and the objects in the database is not necessary. It uses internal memory to store the work set, which allows faster access to data. MongoDB supports dynamic queries using a powerful document-based query language.

These general characteristics make MongoDB the most suitable database mechanism to be used in a document management system.