Monday, November 28, 2016

Elasticsearch fine tuning for optimal ECM performance in Nuxeo

One of the key elements for any Enterprise Content Management tool is its ability to scale. At Athento, we base the core ECM in Nuxeo, precisely for scalability reasons.

In October of 2014, Nuxeo reached an important milestone in ECM scalability and performance, achieving One Billion Documents with Elasticsearch and PostgreSQL as the database.

Alfresco, another key player in Enterprise Content Managament and perhaps one of the leaders according to Gartner published a similar milestone achievement, but they had to wait one full year later (October 2015).

Nuxeo has kept making progress ever since, innovating in providing NOSQL support for the MongoDB Database. But recently we found an article named "Goodbye MongoDB hello PostgreSQL" about transitioning back from Mongo to Postgres once again.

There are some hints in Nuxeo documentation about recommended tuning of Elasticsearch node, in which it includes having half of the total memory (RAM) size for the Elasticsearch Heap Size (in the example, 6g in a machine with a total of 12g). But this post is mainly about setting Page Provider to query over Elasticsearch instead of a less efficient SQL Database. In our experience, this can improve slow queries of 5, 7 or 10 seconds to under 1 second, and typically in the range of 0 to 200 ms.

In this sense there is documentation: How to make a Page Provider or Content View Query Elasticsearch, but the Page Providers can be set up also from the Administrator Panel:

And scrolling all the way down you can select "Advanced setup" button here:

You'll be notified the advanced setup is for advanced users.

And inside the advanced setup area scroll down to the elasticsearch.override.pageproviders

You can see by default "default_search" appears, but you can add most providers, here is a complete list based on our experience:

elasticsearch.override.pageproviders=default_search,document_content,section_content,document_content,tree_children,default_document_suggestion,simple_search,advanced_search,nxql_search,DEFAULT_DOCUMENT_SUGGESTION, GET_TASKS_FOR_ACTORS, GET_TASKS_FOR_PROCESS,GET_TASKS_FOR_PROCESS_AND_ACTORS, GET_TASKS_FOR_PROCESS_AND_NODE, GET_TASKS_FOR_TARGET_DOCUMENT, GET_TASKS_FOR_TARGET_DOCUMENT_AND_ACTORS, GET_TASKS_FOR_TARGET_DOCUMENTS, GET_TASKS_FOR_TARGET_DOCUMENTS_AND_ACTORS, GET_TASKS_FOR_TARGET_DOCUMENTS_AND_ACTORS_OR_DELEGATED_ACTORS, SAVED_SEARCHES, user_sections, user_workspaces, user_documents, user_favorites, domain_published_documents, GET_TASKS_FOR_ACTORS_OR_DELEGATED_ACTORS, domain_documents

One of the few Page Providers NOT to use with Elasticsearch is the one used by the content view orderable_document_content.

The reason why is because this involves reindexing the position of documents, which has a high cost.

To summarize: enabling a separate physical or virtual machine with an Elasticsearch separate instance (it's not recommended to use the by-default embedded version that Nuxeo provides) will enhance performance a great deal, by adding the Page Providers discussed.