Wednesday, June 29, 2016

Elasticsearch Install on Linux

Introduction

Elasticsearch (ES) is a very easy and powerful search platform.  Based on the trusted and beloved Lucene platform, ES offers everything Lucene offers and more such as replication and sharding for horizontal scaling and faceted filtration (aggregations) of results and you can get an instance up and running in less time than Solr.  This is one of the biggest advantages over Solr.  All you have to do is install, add documents, and then query for results.  That's all there is to get started.  Of course there are much more to it if you want to use the platform to maximize its benefits, but you can have a simple system up and running in no time.


Installation

  1. sudo -i (root user)
  2. yum install java-1.8.0-openjdk.x86_64 (latest Java)
  3. java -version (verify)
  4. wget https://download.elastic.co/elasticsearch/release/org/elasticsearch/distribution/rpm/elasticsearch/2.3.3/elasticsearch-2.3.3.rpm (download)
  5. rpm -ivh elasticsearch-2.3.3.noarch.rpm (install)
Elasticsearch is now installed in:
/usr/share/elasticsearch/

Configuration files are in:
/etc/elasticsearch

Init script in:
/etc/init.d/elasticsearch

Execution Commands:
service elasticsearch start
service elasticsearch stop
service elasticsearch restart


Configuration

Two configuration files are located in: /etc/elasticsearch

elasticsearch.yml - Everything except logging
logging.yml - Logging (by default all logs are in: /var/log/elasticsearch)

For the most part, all your config changes should take place in the elasticsearch.yml file.  This is where you can determine if everything should run locally only or could be bound to an external IP address.  In a production environment you would choose the localhost option.  But if you do have to bind to an external IP for the purpose of testing and visualizing your data, please make sure these  config line items are in there:

network.host: localhost
network.bind_host: 0.0.0.0
http.port: 9200

In your firewall settings, please open up port 9200 to all incoming traffic.  This is the default port for ES.


Usage

ES does not officially have a web UI that displays system status and health like Solr.  ES communicates purely on a RESTful API to perform all its tasks.  Therefore if you point your browser to:

http://IP:9200

All you will see is a JSON response object that should tell you basic info such as cluster and node info if the installation was successful.  Some other URLs you can try are:

http://IP:9200/
http://IP:9200/_nodes
http://IP:9200/_cluster/health


GUI Tools

Having a tool to visualize your data would be great.  There are tools available and they all make RESTful calls to ES and parse the resulting JSON response into a more pleasant visual experience that you can see on your screen.  One such tool is the Head Plugin.  The Head Plugin allows for visual display of system status instead of JSON data and performs search queries.

To install, issue the following commands:

cd /usr/share
elasticsearch/bin/plugin -install mobz/elasticsearch-head

To view, point your browser here:
http://IP:9200/_plugin/head/

 With this tool, you can see all your clusters, nodes. shards, as well as issue search commands and much more.


Another option is the Kibana App.  This app also allows for visualizations but can render results in graphs and charts for even more detailed analyses.

To install, please issue the following commands:
  1. sudo -i
  2. rpm --import https://packages.elastic.co/GPG-KEY-elasticsearch (download and install public signing key)
  3. nano /etc/yum.repos.d/kibana.repo (create repo)
  4. Copy, paste, save, exit:

    [kibana-4.5]
    name=Kibana repository for 4.5.x packages
    baseurl=http://packages.elastic.co/kibana/4.5/centos
    gpgcheck=1
    gpgkey=http://packages.elastic.co/GPG-KEY-elasticsearch
    enabled=1
  5. yum install kibana
  6. service kibana start

To view, please point your browser here:

http://IP:5601/

Again, please make sure your firewall inbound rules allow for TCP connections from port 5601

The default config file should allow for external IP connections but if you want to disable that in the production environment, do it here and change 0.0.0.0 to localhost:

/opt/kibana/config/kibana.yml


Mappings and Schemas 

A major difference between ES and Solr is that in Solr, the index schemas have to be predefined before you can add to the indexes.   Types such as strings or integers and whether or not a field is analysed or not have to be defined right at the beginning.  With ES, you can start adding documents to an index right from the start.  If an index does not exist, it will be created.  There are no schemas in ES, rather mappings.  Mappings define what the fields are and what their datatypes are.  If a mapping is not defined, then ES will be smart enough to guess what it is based on the first document added to the index.  The only complication in this approach is if your data changes after the first document was added.  If you keep your data consistent then there is no reason to worry.  Even if you change your mind in the future, it is easy enough to define a new mapping in a new index and re-index everything from the old index into the new.  This is necessary because once a field is defined in a custom mapping, you cannot change it.


Sample Commands

Add a new document and show response formatted neatly:

curl -X PUT 'localhost:9200/tutorial/helloworld/1?pretty' -d '
    {
      "message": "Hello People!"
    }' 

This document will be stored in the "tutorial" index of type "helloworld"


Retrieve a document and show response formatted neatly:

curl -X GET 'localhost:9200/tutorial/helloworld/1?pretty'


Search for a term in specific fields:

curl -X GET "http://localhost:9200/_search?pretty" -d'
{
    "query": {
        "query_string": {
            "query": "hello",
            "fields": ["id","message"]
        }
    }
}'


Retrieve a mapping:

curl -X GET 'http://localhost:9200/_mapping?pretty'


Add a new field to existing mapping:

curl -X PUT 'http://localhost:9200/tutorial/helloworld/_mapping' -d '
{
    "helloworld" : {
        "properties" : {
            "message2" : {"type" : "string", "store" : "true"}
        }
    }
}'


Re-index documents from one index to another without retaining versions:

    curl -X POST 'localhost:9200/tutorial2' -d '{
     "mappings" : {
      "helloworld2" : {
       "properties" : {
        "message" : { "type" : "string", "store": "true" }
       }
      }
     }
    }'

    curl -X POST 'localhost:9200/_reindex' -d '{
     "source" : {
      "index" : "tutorial"
     },
     "dest" : {
      "index" : "tutorial2",
      "version_type": "internal"
     }
    }'


Summary

We have just scratched the surface of what ES can do.  ES is great if time is crucial and minimal time is allocated for setup.  A common pitfall is that since ES is so easy to set up, there could be long term issues for the novice developer who end up spending time re-configuring the indexes via mapping updates and re-indexes. Solr ensures that there are no data inconsistencies because you have to know in advance how the data is suppose to be.  ES allows for changes and changes can create problems with searching.  You can change datatypes as frequently as you want when adding documents but the search will run into errors when the engine looks at the mappings to determine datatype and documents have fields that are inconsistent with the definitions there.  A dynamic mapping is created upon initial document insertion.  Again, there are others reasons to change mapping definitions besides changing your mind on field datatypes.  A good reason could be to index fields in a different way such as allowing for case-insensitive searching as well as partial matches in strings. 

No comments:

Post a Comment