Monday, November 28, 2016

Elasticsearch fine tuning for optimal ECM performance in Nuxeo


One of the key elements for any Enterprise Content Management tool is its ability to scale. At Athento, we base the core ECM in Nuxeo, precisely for scalability reasons.

In October of 2014, Nuxeo reached an important milestone in ECM scalability and performance, achieving One Billion Documents with Elasticsearch and PostgreSQL as the database.

Alfresco, another key player in Enterprise Content Managament and perhaps one of the leaders according to Gartner published a similar milestone achievement, but they had to wait one full year later (October 2015).

Nuxeo has kept making progress ever since, innovating in providing NOSQL support for the MongoDB Database. But recently we found an article named "Goodbye MongoDB hello PostgreSQL" about transitioning back from Mongo to Postgres once again.


There are some hints in Nuxeo documentation about recommended tuning of Elasticsearch node, in which it includes having half of the total memory (RAM) size for the Elasticsearch Heap Size (in the example, 6g in a machine with a total of 12g). But this post is mainly about setting Page Provider to query over Elasticsearch instead of a less efficient SQL Database. In our experience, this can improve slow queries of 5, 7 or 10 seconds to under 1 second, and typically in the range of 0 to 200 ms.

In this sense there is documentation: How to make a Page Provider or Content View Query Elasticsearch, but the Page Providers can be set up also from the Administrator Panel:


And scrolling all the way down you can select "Advanced setup" button here:


You'll be notified the advanced setup is for advanced users.


And inside the advanced setup area scroll down to the elasticsearch.override.pageproviders


You can see by default "default_search" appears, but you can add most providers, here is a complete list based on our experience:

elasticsearch.override.pageproviders=default_search,document_content,section_content,document_content,tree_children,default_document_suggestion,simple_search,advanced_search,nxql_search,DEFAULT_DOCUMENT_SUGGESTION, GET_TASKS_FOR_ACTORS, GET_TASKS_FOR_PROCESS,GET_TASKS_FOR_PROCESS_AND_ACTORS, GET_TASKS_FOR_PROCESS_AND_NODE, GET_TASKS_FOR_TARGET_DOCUMENT, GET_TASKS_FOR_TARGET_DOCUMENT_AND_ACTORS, GET_TASKS_FOR_TARGET_DOCUMENTS, GET_TASKS_FOR_TARGET_DOCUMENTS_AND_ACTORS, GET_TASKS_FOR_TARGET_DOCUMENTS_AND_ACTORS_OR_DELEGATED_ACTORS, SAVED_SEARCHES, user_sections, user_workspaces, user_documents, user_favorites, domain_published_documents, GET_TASKS_FOR_ACTORS_OR_DELEGATED_ACTORS, domain_documents

One of the few Page Providers NOT to use with Elasticsearch is the one used by the content view orderable_document_content.

The reason why is because this involves reindexing the position of documents, which has a high cost.

To summarize: enabling a separate physical or virtual machine with an Elasticsearch separate instance (it's not recommended to use the by-default embedded version that Nuxeo provides) will enhance performance a great deal, by adding the Page Providers discussed.

Share

Thursday, February 11, 2016

Athento integrates its document capture software with Alfresco 5.x

Now you can export processed documents from Athento’s document capture module to Alfresco Enterprise and Community editions.

Thanks to Athento’s document capture module -Athento Smart Engine-, auto-classified documents can be sent to Alfresco with their metadata already extracted. 

San Jose, CA. February 10, 2016

Athento Smart Engine, the document capture software by Athento, is now integrated with Alfresco. The integration, available for versions 5 and up, will provide Alfresco  ECM software users with a data capturing application for carrying out automatic operations related to digital document handling, including dividing batches, automatically classifying documents, OCR and data extraction.

Integration is carried out using CMIS 1.0, which lets users export documents and data from Athento to the Alfresco platform.

According to the Athento CEO Jose Luis de la Rosa,

“The benefit of using CMIS as the vehicle for integration is that it allows integration to become easily configurable and can potentially be used for any version of Alfresco that supports CMIS”

Athento’s Documentation Center now provides information on how to integrate Athento and Alfresco. The integration between the two platforms means storage folders and routes in Alfresco can be dynamically defined, according to metadata values or document types.  It also allows data extracted from documents to be sent to Alfresco.

In addition to the CMIS integration, Athento Smart Engine has an API that provides access to its features as services. Athento SE is available as a cloud service and also for on-premise deployments.




About Athento:

Athento incorporates cutting-edge technology such as Machine Learning, Semantics and Image Processing to automate processes related to working with document capture, document management, storage and all those operations needed to cover the complete life cycle of documents. Athento currently works with more than 100 clients in Europe, Africa and the Americas. It also works with a wide-reaching network of authorized partners, and is the product that has been chosen by Barclaycard, Reed Elsevier, Leroy Merlin, Yellow Pages and the Spanish General Traffic Directorate to manage documents.

Share

AddThis