MongoDB, Elastic Search, Setup

I've been working on another project recently and have decided to stray away from typical relational databases (IE SQL), and get involved in the NoSQL revolution.... Primarily because I was sold the idea by a colleague!

This naturally brought up the question of searching, and I found myself looking at solutions such as Lucene, but finally landed on ElasticSearch (which, is based on Lucene).

Getting this to work however with the latest version of MongoDb (2.4.6), proved to be some what of a challenge. Therefore I've wrote this guide for anyone else suffering! We will be installing and configuring the following on Ubuntu Server 12.04 LTS:

Prerequisites

This guide is not going to show you how to set up MongoDb. You're on your own for that one, it's rather easy though.

Note: As I discovered, the version of the components you use is extremely important. Therefore please stick to the verions I use in order to get it working correctly.

Configuration of MongoDB

ElasticSearch is kept up to date with MongoDb through a "river". In order for this to work you need to configure a ReplicaSet, even if you're using a standalone instance. To do this, follow the instructions at: http://docs.mongodb.org/manual/tutorial/convert-standalone-to-replica-set/

Installation of Elasticsearch

The following script will download and install ElasticSearch, and the required plugins. It will also install ElasticSearch-Head, which is the best GUI interface I've found to date.

wget https://download.elasticsearch.org/elasticsearch/elasticsearch/elasticsearch-0.90.5.deb
dpkg -i elasticsearch-0.90.5.deb
sudo /usr/share/elasticsearch/bin/plugin -install elasticsearch/elasticsearch-mapper-attachments/1.9.0
sudo /usr/share/elasticsearch/bin/plugin --url "http://jambr.blob.core.windows.net/articledownloads/elasticsearch-river-mongodb-1.7.1-SNAPSHOT.zip" --install elasticsearch-river-mongod
sudo /usr/share/elasticsearch/bin/plugin -install mobz/elasticsearch-head
sudo service elasticsearch restart

Once this is done, you should be able to access the GUI at the following URL, there will be a default index called "_river": http://localhost:9200/_plugin/head/

We don't need this index, so remove it with the following command:

curl -XDELETE localhost:9200/_river

Setting up a River

The next thing we need to do is create our "River" and Index. This script is a simple example of how to do this:

curl -XPUT "localhost:9200/_river/artist/_meta" -d'
{
  "type": "mongodb",
    "mongodb": {
      "db": "DatabaseManager", 
      "collection": "CollectionName"
    },
    "index": {
      "name": "NameForYourIndex", 
      "type": "NameForYourObjectType"
    }
}'

You should get a response like this:

{"ok":true,"_index":"_river","_type":"NameForYourObjectType","_id":"_meta","_version":1}

Done

That should be it, if you go to the Head GUI I linked further up the post, you should see your index populated away. Test it with a query on the "Any Query" page, try:

{"query":{"match":{"_all":{"query":"YourQuery","operator":"or"}}}}