Monday, June 16, 2014

Elasticsearch Notes

Elasticsearch short introduction

Elasticsearch use case

  • GitHub – search code base(Github uses Elasticsearch to search 20TB data,including 1.3 billion files  and 130 billion code lines)
  • Foursquare – location search(50 million location data search)
  • SoundCloud – music search(provide music search service for 180 million users)
  • Fog Creek – search code base(support 30 million search in 40 billion code lines every month)

Elasticsearch installation

Java install

cd ~
sudo apt-get update
sudo apt-get install openjdk-7-jre-headless -y

install elasticsearch 1.2.1

wget https://download.elasticsearch.org/elasticsearch/elasticsearch/elasticsearch-1.2.1.deb
sudo dpkg -i elasticsearch-1.2.1.deb
sudo service elasticsearch start

Configuration

  • system config file: /etc/elasticsearch/elasticsearch.yml
    • Application-wide settings (zen discovery, available analyzers)
    • index default configurations (number of shards)
    • Cluster Name and Node Name
  • log config file:  /etc/elasticsearch/logging.yml, default logging files are located in /var/log/elasticsearch/
  • Another important part is tuning your operating system
    • Elasticsearch will created several files when indexing, so the system cannot limit the open file descriptors to less than 32000, can be edit in /etc/security/limits.conf
    • ES is written in Java and obviously runs inside a JVM. The most apparent JVM option is -Xmx. I set it to about 50% of the total physical memory, it happened to be a 64GB of RAM machine, so I set the ES_HEAP_SIZE size to 32GB. 
Cluster name is important because elasticsearch will auto discover new nodes and connect nodes in to cluster based on cluaster name. Node name is also configured to make management easier.

Elasticsearch distribution model

  • Node
    • A running ES instance, or process running on a machine
    • Run on same or different machine
    • Testing: same machine can have several nodes. Production:suggestion one machine single node
  • Cluster
    • Distributed es system made of several nodes
    • Dynamic master election, no single node fail(fail as a whole)
    • Communication between nodes and data distribution and balancing is automatically handled
    • View as a whole from outside
  • Index
    • Multiple index(like mysql database) support
    • Multiple types(like mysql tables) inside indexes
  • Shard
    • Building blocks of index, index is divided in to shards
    • Each Shard is a Lucence index
    • Shards will be placed on different machines
    • Moving of shards does not require reindex
  • Replica
    • Each shards can have 0 or more replicas
    • True copy of primary shard
    • Increase system fault tollerace, and search performance

Usage

Please refer to http://www.elasticsearch.org/resources/ for more information or please take a look at the ppt attachment in the first section.

Monitoring

We use ElastcHQ to monitor and manage our cluster, it's simply a local website. No configuration is needed.
Important coloring of cluster status:
  • Red – Index cannot be used, some primary shards are not allocated
  • Yellow – Index can still be used, but some replicas are not allocated
  • Green – Index run as normal

Manage outdated indexes

We use elasticsearch-curator to manage our outdated indexes

Installation

cd
sudo git clone https://github.com/elasticsearch/curator.git curator
cd curator
sudo python setup.py install

Configuration cron job

Manage indexes with test- as a prefix and - as date separator(example index name: test-2014-06-18), to delete indexes order than 4 days and close indexes order than 1 day:
20 12 * * * /usr/local/bin/curator --host 192.168.1.1 -s - --prefix test- -d 4 -c 1 -s -

Some problems

Address yellow or red cluster status

red status usually caused by cluster fail, will be fixed by restarting problem nodes(take a look at the logs located at /var/log/elasticsearch/{cluster name}.log)
 yellow status usually means some replica shards are not allocated. Can be solved by manually allocate the shard.
  1. Take a look at the cluster management tool (ElasticHQ) for index status: you can find out which shard fails, for example: shard 1 is not allocated
  2. find an available node which the shard is not allocated on, say Xemu.
  3. Invoke api to reroute:
    1. put the following json in a file reroute.json
      {
          "commands" : [
             {
                "allocate" : {
                    "index" "test""shard" 1"node" "Xemu"
                }
              }
          ]
      }
    2. invoke
      curl -XPOST 'localhost:9200/_cluster/reroute' -d @reroute

Add replicas on the fly

curl -XPUT 'localhost:9200/test/_settings' -d '
{
    "index" : {
        "number_of_replicas" 1
    }
}'

18 comments:

  1. This comment has been removed by the author.

    ReplyDelete
  2. Good Post! Thank you so much for sharing this pretty post, it was so good to read and useful to improve my knowledge as updated one, keep blogging.

    rpa training in velachery| rpa training in tambaram |rpa training in sholinganallur | rpa training in annanagar| rpa training in kalyannagar

    ReplyDelete
  3. I love the blog. Great post. It is very true, people must learn how to learn before they can learn. lol i know it sounds funny but its very true. . .

    Data Science course in kalyan nagar | Data Science course in OMR
    Data Science course in chennai | Data science course in velachery
    Data science course in jaya nagar | Data science training in tambaram

    ReplyDelete
  4. This is ansuperior writing service point that doesn't always sink in within the context of the classroom. In the first superior writing service paragraph you either hook the reader's interest or lose it. Of course your teacher, who's getting paid to teach you how to write an good essay, 
    python training in rajajinagar | Python training in btm | Python training in usa

    ReplyDelete
  5. Thanks for sharing such an informative post with us. I hope you will share more such post. Please keep sharing.
    Apple Service center in Ameerpet, Hyderabad
    Best Laptop Service center in Ameerpet, Hyderabad

    ReplyDelete
  6. You can install office setup by visiting official website of MS Office.
    Office.com/setup

    ReplyDelete
  7. Students अपने विश्वविद्यालयों के Results के बारे में नवीनतम अपडेट की जाँच करें। Timetable-Result.com यह साइट आपकी सभी क्वेरी को समझने में मदद करेगी है।
    BA 3rd semester Exam result 2020
    BCom Final/3rd Year Exam result 2020

    BA Exam result Part 1st, 2nd, Final Year
    BCom Exam result 1st, 2nd, 3rd semester

    ReplyDelete
  8. Watching TV the historic common way is notable till you are caught away from domestic and pass over tonight's recreation or a new episode of your favourite show. With FuboTV, you can watch stay TV channels and stay sports activities somewhere — even away from domestic — the usage of a gadget like your smartphone or tablet. how to use fubo tv guide

    ReplyDelete
  9. Hi I read your blog and found that your blog is full of informative content. So keep posting thanks for share this article. Your article is very amazing. I like your article. thanks you ones again.
    B.Com Theory Exam Routine

    bcom 1st year date sheet | B Com Part 2 Schedule | b com 3rd year time time

    ReplyDelete
  10. Really great article, Glad to read the article. It is very informative for us. Thanks for posting.
    Satta King
    Disawar

    ReplyDelete
  11. Sarswatienterprises is a trusted Die Set Manufacturers, Power Press Manufacturer, and Flip off Seals Machinery in Delhi, India. For more information visit our website.
    Air Blower Machine Manufacturer in Delhi

    ReplyDelete
  12. Amazing Blog. Visit AAR Fragrance for Perfumes for Men Online at Best Prices in India, Buy Perfumes for Women Online in India, Davidoff Cool Water For Men EDT 125ml, and Versace Pour Homme Dylan Blue EDT 100ml.
    Buy Perfumes for Women Online in India

    ReplyDelete
  13. Nice information, thank you so much sharing with us. Visit Amfez for Thakur ji ke vastra, kanha ji poshak, and krishna dress at an affordable price.
    krishna dress

    ReplyDelete