Elastic Search

Elastic search is a real-time search and analytics engine. It is based on Apache Lucene an open source.

It is designed to be scalable which means it is distributed and has Node Discovery in it. So it can automatically recognize other elastic search nodes and connect to them, if required. It does automatic sharding, in a very simple way, it has its own identifier and just uses identifier modulo number of shards to determine what shard everything goes in. As a result of this, it can do a lot of smart things like where to rout some queries and if an update comes where to put that update to make sure things are local as well. It does query distribution, so on querying one node it goes to all the nodes as well. All of those things one requires in this cloud type of world is available for free in elastic search.

It has a RESTful, HTTP API with a wrapper for any language one can think off. Almost every day a new language wrapper comes out. One of the things about elastic search is that it is old JASON. It really fits the document model, because the document model uses the JASON structure. If there is a structure in a book with several authors and each author is having a last name, elastic search will put this in a machine index so it can be searched.

Elastic search is schema less. It does field type recognition, because JASON document structure is not just strings but it can recognize date, number or a floating point number. Also, if a schema is not provided it gives a field number and tries to be smart about it. Any JASON document that is put in is stored in the source document. It maintains a version number automatically, so if any update is done it increments an internal version document.

Elastic search has integrated faceting, which works really fast using all the caches that are available. It adds statistical aggregates like sum, average and number fields which are very powerful. Many of the queries that one wants to do can actually be fulfilled by this. It has many different field types, strings, all types of numeric, geospatial attachments and arrays (arrays of numbers, arrays of strings). Among documents it can have both sub-documents and nested documents.

Elastic search assumes certain things about data, sharding and configuration because it has a RESTful HTTP interface. It is possible to do cross-index searching, multi-document type for all of those books/journals/documents that have author. It is a really flexible tool that supports almost everything expected and is set to become the next evolution of search.



Leave a Reply

thirteen − four =