Friday, April 25, 2014

Why you need centralised logging system

When you encountered a large system serving millions of users, things can not be done using only one single server. It could be a distributed system which can scale up vertically, or it could be a no-state service like PHP websites or Apis using a list of servers. The problem is how are you going to store the logs. A centralised logging system comes handy as it can solve the following constants your system might be encountered:
  1. Out of disk space to save logs
  2. Too many servers to trace a single log
  3. To analyse the logs
  4. Provide a way for many other service or group of people to check their invoking of your service.

A list of Centralised logging system

You may want to keep your logging system separated from the online service, because anything happens in the offline centralised logging system, you don't really want any influence on the online service. Searching through the available complete solution, you might be able to find the following potential good solutions:
  • Logstash+Elasticsearch+Kibana
  • Flume+Elasticsearch+Kibana or Flume+HDFS+HIVE+PIG
  • Greylog2
  • Fluentd+MongoDB
I'm not here to compare the pros and cons. Here I am going to give a introduction of Logstash+Elasticsearch+Kibana solution.

Official Site:


Simple configuration

The simplest configuration is as below, directly dumping the log files to Elasticsearch(ES), and you can use kibana to check the service.


Complex configuration

When you considering the size of your system, usually it's not enough just to have such configuration above. Instead, you may need more complex version:

Extra added:
Logstash-forwarder: a light-weighted logstash log shipping module to reduce the memory consumption of logstash.
RabbitMQ: a queue service to act as a buffer in between, official suggestion is Redis, however, I believe RabbitMQ is better, as you can do a lot of other things here, like monitoring the logstash service(just by checking the status of the queue and tcp connections)

Installation and configuration:

Install logstash

#install logstash
sudo wget https://download.elasticsearch.org/logstash/logstash/logstash-1.3.3-flatjar.jar
sudo mkdir /opt/logstash
sudo mv logstash-1.3.2-flatjar.jar /opt/logstash/logstash.jar
sudo wget http://logstash.net/docs/1.3.2/tutorials/10-minute-walkthrough/hello.conf
sudo wget http://logstash.net/docs/1.3.2/tutorials/10-minute-walkthrough/hello-search.conf
sudo mv hello.conf /opt/logstash/hello.conf
sudo mv hello-search.conf /opt/logstash/hello-search.conf
cd /opt/logstash/
#example configuration
java -jar logstash.jar agent -f hello.conf
java -jar logstash.jar agent -f hello-search.conf

Install Logstash-forwarder

Logstash-forwarder can provide a faster and more secure log collection
make deb package
#Install Lumberjack
wget https://go.googlecode.com/files/go1.2.linux-amd64.tar.gz
tar -C /usr/local -xzf go1.2.linux-amd64.tar.gz
export PATH=$PATH:/usr/local/go/bin

apt-get install rubygems
gem install fpm
export PATH=$PATH:/var/lib/gems/1.8/bin
git clone https://github.com/jordansissel/lumberjack.git
cd lumberjack
make
make deb
I have the deb file made, you can use it to install:
#use the deb file to install
wget https://dl.dropboxusercontent.com/s/7s7fplenhx768ii/logstash-forwarder_0.3.1_amd64.deb
sudo dpkg -i logstash-forwarder_0.3.1_amd64.deb

Configure Logstash

for both logstash producer and consumer, you will have to make your own configuration, below is the example of collecting nginx access log
sudo mkdir /etc/logstash
sudo vim /etc/logstash/logstash.conf
#logstash producer conf
input {
   lumberjack {
    # The port to listen on
    port => 5000

    # The paths to your ssl cert and key
    ssl_certificate => "/etc/logstash/logstash.crt"
    ssl_key => "/etc/logstash/logstash.key"

    # Set this to whatever you want.
    type => "nginx-accesslog"
  } 
}

filter {
  grok {
    match => ["message","%{IPORHOST:clientip} - (?:%{USER:ident}|-) [%{HTTPDATE:timestamp}] \"(?:%{WORD:verb} %{NOTSPACE:request}(?: HTTP/%{NUMBER:httpversion})?|%{DATA:rawrequest})\" %{QS:referrer} %{QS:agent} %{NUMBER:response} (?:%{NUMBER:bytes}|-) (?:%{NUMBER:requesttime}|-)"]
  }
}

output {
  rabbitmq {
    exchange => "slog1"
    host => "192.168.xxx.xxx"
    exchange_type => "topic" # We use topic here to enable pub/sub with routing keys
    key => "slog1"
  }
}

#logstash consumer conf
input {
  rabbitmq {
    queue => "xx"
    host => "192.168.xxx.xxx"
    durable => true
    key => "xx"
    #ack => false
    exchange => "xx" # This matches the exchange declared above
    auto_delete => false
    exclusive => false
  }
}

output {
  # Print each event to stdout.
  stdout {
    # Enabling 'rubydebug' codec on the stdout output will make logstash
    # pretty-print the entire event as something similar to a JSON representation.
    codec => rubydebug
  }

  # You can have multiple outputs. All events generally to all outputs.
  # Output events to elasticsearch
  elasticsearch {
    # Setting 'embedded' will run  a real elasticsearch server inside logstash.
    # This option below saves you from having to run a separate process just
    # for ElasticSearch, so you can get started quicker!
    cluster => "elasticsearch"
  }
}

now you also may need a logstash service wrapper to start or stop the service
cd /etc/init.d
sudo vim logstash
#logstash service wrapper
#! /bin/sh
 
### BEGIN INIT INFO
# Provides:          logstash
# Required-Start:    $remote_fs $syslog
# Required-Stop:     $remote_fs $syslog
# Default-Start:     2 3 4 5
# Default-Stop:      0 1 6
# Short-Description: Start daemon at boot time
# Description:       Enable service provided by daemon.
### END INIT INFO
 
. /lib/lsb/init-functions
 
name="logstash"
logstash_bin="/usr/bin/java -- -jar /opt/logstash/logstash.jar"
logstash_conf="/etc/logstash/logstash.conf"
logstash_log="/var/log/logstash.log"
pid_file="/var/run/$name.pid"
 
start () {
        command="${logstash_bin} agent -f $logstash_conf --log ${logstash_log}"
 
        log_daemon_msg "Starting $name" "$name"
        if start-stop-daemon --start --quiet --oknodo --pidfile "$pid_file" -b -m --exec $command; then
                log_end_msg 0
        else
                log_end_msg 1
        fi
}
 
stop () {
        log_daemon_msg "Stopping $name" "$name"
        start-stop-daemon --stop --quiet --oknodo --pidfile "$pid_file"
}
 
status () {
        status_of_proc -p $pid_file "" "$name"
}
 
case $1 in
        start)
                if status; then exit 0; fi
                start
                ;;
        stop)
                stop
                ;;
        reload)
                stop
                start
                ;;
        restart)
                stop
                start
                ;;
        status)
                status && exit 0 || exit $?
                ;;
        *)
                echo "Usage: $0 {start|stop|restart|reload|status}"
                exit 1
                ;;
esac
 
exit 0

sudo chmod 777 logstash

Configure logstash forwarder

sudo vim /etc/logstash-forwarder
#change the configuration file
{
  "network": {
    "servers": [ "localhost:5000"],
    "ssl ca": "/etc/logstash/logstash.crt",
    "timeout": 15
  },

  "files": [
    {
      "paths": [
        "/var/log/nginx/shabikplus.access_log"
      ],
      "fields": { "host": "192.168.xxx.xxxx" }
    }
  ]
}
#generate the openssl key and certificate for security reasons
sudo openssl req -x509 -batch -nodes -newkey rsa:2048 -keyout /etc/logstash/logstash.key -out /etc/logstash/logstash.crt

install elasticsearch

refer to my another post: http://jakege.blogspot.sg/2014/03/how-to-install-elasticsearch.html

Configure elastic search

you can use a es template to manage your index
#template:
{
    "template": "logstash-*",
    "settings" : {
        "number_of_shards" : 5,
        "number_of_replicas" : 0,
        "index.cache.field.type" : "soft",
        "index.refresh_interval" : "5s",
        "index" : {
            "query" : { "default_field" : "message" },
            "store" : { "compress" : { "stored" : true, "tv": true } }
        }
    },
    "mappings": {
        "_default_": {
            "_all": { "enabled": false },
            "_source": { "compress": true },
            "_ttl": { "enabled": true, "default": "4w" },
            "dynamic_templates": [
                {
                    "string_template" : {
                        "match" : "*",
                        "mapping": { "type": "string", "index": "not_analyzed" },
                        "match_mapping_type" : "string"
                     }
                 }
             ],
             "properties" : {
                "fields": { "type": "object", "dynamic": true, "path": "full" },
                "message": { "type" : "string", "index" : "not_analyzed" },
                "agent" : { "type" : "string", "index" : "analyzed" },
                "request" : { "type" : "string", "index" : "analyzed" },
                "host" : { "type" : "string", "index" : "not_analyzed" },
                "clientip" : { "type" : "string", "index" : "not_analyzed" },
                "file" : { "type" : "string", "index" : "not_analyzed" },
                "bytes": { "type": "integer"},
                "offset": {"type": "integer"},
                "requesttime": {"type": "float"},
                "@timestamp": { "type": "date", "index": "not_analyzed" },
                "timestamp": {"type":"string", "index": "not_analyzed"},
                "type": { "type": "string", "index": "not_analyzed" }
            }
        }
    }
}

Manage Elasticsearch Index

usually we manage our index using elastic search curator:https://github.com/elasticsearch/curator. You can schedule your optimisation, close and delete operations using the tool

Install Kibana

Kibana is just an offline website, you can extract it and open index.html in the brower
wget https://download.elasticsearch.org/kibana/kibana/kibana-3.0.0milestone4.tar.gz
tar xzvf kibana-3.0.0milestone4.tar.gz

screen shot of the final solution:

12 comments:

  1. Here via your /. comment. Great writeup. FYI - your graphic has RabbitMQ spelled with the MQ part reversed.

    ReplyDelete
  2. Where is the setup for the RabbitMQ nodes?

    ReplyDelete
  3. Very rapidly this web page will be famous amid all blogging and site-building users, due to its pleasant articles or reviews Spot on with this write-up, I truly believe that this site needs a great deal more attention. I’ll probably be back again to read more, thanks for the info!

    ReplyDelete