You don’t have to stretch for search: introduction to elasticsearch


Warning: this is going to be technical

We haven’t published that many technical posts in the last months, so this is just a fair warning. This post is an introduction to a very fascinating piece of software: elasticsearch.

DevOps love it, because it gives them a useful tool for log analysis. Devs love it for the great search and word analysis capabilities and the REST API. If you dare to dive deeper, just follow me along.

Elasticsearch – What it is

Elasticsearch is a database, that’s built around Apache Lucene, the formidable open source search library. It is also a document oriented database, which can be compared to the popular MongoDB. And it comes with a complete REST API!

Database features

The database offers several data types:

  • string
  • integer
  • long
  • float
  • double
  • boolean
  • date
  • geo_point
  • geo_shape
  • null
  • array
  • ip
  • attachment
  • object

You can define document’s data types explicitly via mappings. If you don’t, elasticsearch will guess the mapping from the provided data.

The database provides automatic, yet configurable, clustering and sharding. I haven’t tried those features extensively. But my overall impression of elasticsearch is pretty positive, so I guess, those features also work nicely.

A note on terminology: a “collection” (MongoDB) or “database” (RDBMS) is called “index”in elasticsearch. This can cause some confusion, as the term index has a different meaning traditionally.

Search features

Now to the really cool part: search.

Lucene has a lot to offer, but the configuration can be cumbersome sometimes. Elasticsearch assists the user with sensible defaults and a consistent JSON based API.

Conceptually, there are a lot of queries and filters. The main difference is, that filters can be cached and are therefore preferred to queries performancewise. Both, queries and filters, are defined as a JSON document, that you send to the server. A simple match query looks like this:

    "match" : {
        "company" : "crealytics"

If you want to combine queries, this can easily be done with bool query. Simple example:

    "bool" : {
        "must" : {
            "term" : { "company" : "crealytics" }
        "should" :
                "term" : { "tag" : "wow" }
        "minimum_should_match" : 1,
        "boost" : 1.0

If you want to find out, which part of the word matched, you should use the highlighting feature.

And there’s also the percolator API, that stores queries in the database instead of documents. That way you can easily detect, if a new estate entry in the database matches the house search of one of your users, for example.

There are lots more options, like the suggestion feature or faceted search. But I haven’t tried those out, yet. So there’s still land to conquer for you.

A good way to find out, how the queries and settings work, is to just play around with your elasticsearch instance. There are several nice plugins to do that. So you don’t even have to touch a single line of code or use CURL like a pro.

Cool add ons

A very useful plugin is sense, now included in the official marvel plugin. Another plugin, that I found very practical, is inquisitor.

All known plugins can be found here:

Debugging queries

There will be times when you ask yourself “Why did this query match word X?”. Elasticsearch provides several useful tools to help answer this question. The following approaches have worked very well for me:

Is the token stored in the index the way you expect it to?

Take a look at the mappings and settings of the index. Are your customizations actually there? Elasticsearch creates default settings and mappings without error message when you pass them wrong. You can take a look at both when calling the route http://localhost:9200/INDEXNAME/_mapping and http://localhost:9200/INDEXNAME/_settings.

What does the tokenizer do to the token?

You can check this by requesting the tokenization result from the API.

curl -XGET 'localhost:9200/_analyze?analyzer=standard' -d 'crealytics is awesome'

yields a different result than

curl -XGET 'localhost:9200/_analyze?analyzer=keyword' -d 'crealytics is awesome'

Why exactly does this match?

If you have a strange looking search match, you can use the EXPLAIN API to have a closer look at the internals of elasticsearch. This can be compared to the EXPLAIN function of other databases like PostgreSQL.

Usage: curl -XGET 'localhost:9200/INDEX_NAME/TYPE_NAME/ID/_explain' -d 'YOUR_QUERY'

Elasticsearch explains then, why YOUR_QUERY matched exactly ID document.


There’s a lot to see and get from elasticsearch. It’s really exciting to play with the technology, because it has never been easier to use such a great search solution. I hope, this short introduction and the tips have quickened your appetite.

Additional useful links

Securing your elasticsearch cluster

[shareaholic app="share_buttons" id="19406647" link=""]


Udo Gröbner

Developer at the crealytics innovation hub. Interested in new technologies, pragmatic solutions and stuff that works.

    Find more about me on:
  • googleplus
  • linkedin