Archive

Tag Archives: grafana

This post is about how to make your log files being aggregated in a single place and easy searchable via a convenient web interface.

You might think: wait a minute; doesn’t this exist already? Well, yes and no. Let me explain.
a) traditional log management: traditionally, logs are/were searched for certain strings, typically error messages or parts of error messages and a shell script run by cron, and when matched, a system that was setup to be able to email would send an indicator that the aforementioned string was found.
b) log management via a monitoring tool: the next step was the individual script on each server was exchanged for a monitoring tool, which performed the same task as the shell script. In reality quite often a default monitoring set/template was enabled, instead of the specific strings that were searched for with the shell script. Sometimes this was an improvement, sometimes this meant the specific issues (and thus messages) gone invisible. This is still the way the monitoring works in 90% of the cases, including a general completely standard monitoring template. At least in my experience.
c) other log gathering and indexing: there are many more products that perform this function. The first one that comes to my mind is splunk, and all the options for doing this cloud based (many there!), and a lot of tools based on elasticsearch, like the “ELK stack”.

I think it’s clear “a)” and “b)” are not dynamic and in fact very static. My major concern is it doesn’t allow exploratory investigation, it simply is a warning that is raised, any investigation means you have to log on and start browsing the available information locally. Everybody who worked with HP Openview or Oracle Enterprise Manager will recognise this. Yes, it’s probably all possible with these tools, but it never (ever(!)) is implemented.

For the last category, the likes of splunk, the ELK stack and the cloud based log tools: for the first two I feel definitely serves a function, but it’s aimed at aggregating logs of multiple servers, and is simply too much to setup on a server alongside the processes it is meant to monitor. For the cloud based tools: it might be my conservatism, but getting a subscription and loading up logging feels awkward, especially if it’s for my own test purposes.

This is where Loki comes in. Finally, I would say, there is a tool that can function on small scale (!) and perform log aggegration and provide a searchable database without a huge setup. This is ideal for your own test or development box to able to discover what is going on, and have all the log files at your fingertips without endlessly going through the filesystem performing cd, tail, grep, wc, sort, uniq, et cetera. I think lots of people recognise travelling from log file to log file.

Loki gives you a database that orders log entries based on time, and Grafana provides a web based UI to view and query the loki database. This is how it looks like:

(click to enlarge)
This is an example, what this shows is my test machine, where I decided to see when linux started, as well when the Oracle database instance was started.
* The first query uses the label “rdbms_alert” (indicating it came from the text-based oracle alert.log file, this is a label I added to it), and within the log lines with the label job with value “rdbms_alert”, I added a filter for the string “Starting ORACLE”, which indicates an Oracle database instance start.
* The second query uses the label “messages” (indicating it came from the linux /var/log/messages file, this label is mine too), and within the log lines with the label job with the value “messages”, I added a filter for the string “kernel: Command line”, which indicates linux startup. I additionally added a negative filter for “loki”, because loki logs the queries to the messages file, which I don’t want to see.

I hope you can see the power of having all the logs in a single place, and completely searchable.

This is just a start, this is a very simple proof-of-concept setup, for example the date/time in the log lines is not used, the date/time of the log lines is when it was ingested into loki, it is possible to have loki interpret these.

If you are interested, but are uncertain if this is for you, and would like to test this: I got a couple of Ansible scripts that can setup the combination of:
* promtail (the default loki log streaming tool)
* loki (the database)
* Grafana (the web UI)
The scripts are created on Oracle Linux 7.8.

Install git, and clone the https://gitlab.com/FritsHoogland/loki-setup.git repository using a user that has sudo rights:

git clone https://gitlab.com/FritsHoogland/loki-setup.git

Install ansible (you might have to install the EPEL repository).

In the loki-setup repo you find the setup scripts for loki, promtail and grafana.
You can execute the scripts in that order (loki, promtail and Grafana) by executing the setup_loki.yml, setup_promtail,yml and setup_grafana.yml scripts.
IMPORTANT: do proofread the scripts, and validate the variables for your situation. Don’t worry: the scripts are easy to read.

After installing, you can go to http://yourmachine:3000, login with admin and the password you set in the setup_grafana.yml script, and click on the compass (explore) option, and you can perform your own log browsing.

If you decide you want to remove it, I got a remove script for each of the components, which will remove the specific component entirely. Same applies here too: validate the script.

This blogpost is a follow up of this blogpost, with the exception that the install method in this blogpost is way easier, it uses an Ansible playbook to do most of the installation.

1. Install git and ansible via EPEL:

# yum -y localinstall https://dl.fedoraproject.org/pub/epel/epel-release-latest-7.noarch.rpm
# yum -y install ansible git

2. Clone my ‘prometheus_node_exp_grafana_install’ repository:

# git clone https://gitlab.com/FritsHoogland/prometheus_node_exp_grafana_install.git

3. Run the prometheus.yml playbook to install prometheus, node_exporter and grafana:

# cd prometheus_node_exp_grafana_install
# ansible-playbook prometheus.yml

Now all you need to do is setup grafana (at port 3000) to talk to prometheus to fetch data.
If you are interested, the memory usage dashboard can be imported, dashboard id 2747.

In fact, it seems node_exporter has renamed the operating system statistics, so the old Linux memory dashboard didn’t work anymore. I updated it, and uploaded a new version of the memory usage dashboard.

Prometheus is an open source systems monitoring and alerting toolkit originally build at Soundcloud. This blogpost shows how to install the needed components to do visualisation of linux system statistics via Grafana.

Addition June 29, 2018: a really quick simple install is provided in this blogpost: very quick install of prometheus, node exporter and grafana This uses an ansible script that does most of the installation and configuration for you.

The setup consists of 3 components:
node_exporter, an exporter of system and hardware metrics.
prometheus, a metric collection and persistence layer.
grafana, the visualisation layer.

1. Preparation
The needed components are installed in the home directory of the user ‘prometheus’. In order for that user exist, it must obviously first be created:

# useradd prometheus
# su - prometheus
$

This installation guide uses Oracle Linux 7.3, but should work for RHEL or Centos too.

2. Node exporter
The next thing to do is install the node exporter. Please mind new version do come out, so you might want to verify the latest release on

$ curl -LO "https://github.com/prometheus/node_exporter/releases/download/v0.14.0/node_exporter-0.14.0.linux-amd64.tar.gz"
$ mkdir -p Prometheus/node_exporter
$ cd $_
$ tar xzf ../../node_exporter-0.14.0.linux-amd64.tar.gz

Now become root and create a unit file to automatically startup the node exporter using systemd:

# echo "[Unit]
Description=Node Exporter

[Service]
User=prometheus
ExecStart=/home/prometheus/Prometheus/node_exporter/node_exporter-0.14.0.linux-amd64/node_exporter

[Install]
WantedBy=default.target" > /etc/systemd/system/node_exporter.service

And make systemd start the node exporter:

# systemctl daemon-reload
# systemctl enable node_exporter.service
# systemctl start node_exporter.service

Next you can verify if the node exporter is running by using ‘systemctl status node_exporter.service:

# systemctl status node_exporter.service
● node_exporter.service - Node Exporter
   Loaded: loaded (/etc/systemd/system/node_exporter.service; enabled; vendor preset: disabled)
   Active: active (running) since Mon 2017-07-31 15:20:54 UTC; 7s ago
 Main PID: 3017 (node_exporter)
   CGroup: /system.slice/node_exporter.service
           └─3017 /home/prometheus/Prometheus/node_exporter/node_exporter-0.14.0.linux-amd64/node_exporter

Jul 31 15:20:54 test.local node_exporter[3017]: time="2017-07-31T15:20:54Z" level=info msg=" - hwmon" source="node_exporter.go:162"
Jul 31 15:20:54 test.local node_exporter[3017]: time="2017-07-31T15:20:54Z" level=info msg=" - infiniband" source="node_exporter.go:162"
Jul 31 15:20:54 test.local node_exporter[3017]: time="2017-07-31T15:20:54Z" level=info msg=" - textfile" source="node_exporter.go:162"
Jul 31 15:20:54 test.local node_exporter[3017]: time="2017-07-31T15:20:54Z" level=info msg=" - conntrack" source="node_exporter.go:162"
Jul 31 15:20:54 test.local node_exporter[3017]: time="2017-07-31T15:20:54Z" level=info msg=" - diskstats" source="node_exporter.go:162"
Jul 31 15:20:54 test.local node_exporter[3017]: time="2017-07-31T15:20:54Z" level=info msg=" - entropy" source="node_exporter.go:162"
Jul 31 15:20:54 test.local node_exporter[3017]: time="2017-07-31T15:20:54Z" level=info msg=" - loadavg" source="node_exporter.go:162"
Jul 31 15:20:54 test.local node_exporter[3017]: time="2017-07-31T15:20:54Z" level=info msg=" - sockstat" source="node_exporter.go:162"
Jul 31 15:20:54 test.local node_exporter[3017]: time="2017-07-31T15:20:54Z" level=info msg=" - wifi" source="node_exporter.go:162"
Jul 31 15:20:54 test.local node_exporter[3017]: time="2017-07-31T15:20:54Z" level=info msg="Listening on :9100" source="node_exporter.go:186"

Additionally, you can go to hostname:9100, and look if that page says ‘node exporter’, and has a link called ‘metric’, which has all the metrics.

3. Prometheus
After we installed node_exporter to provide measurements, we must install the software that can fetch that information and store it. That is what prometheus does. First, become the prometheus user again, and install prometheus. Here too is important to realise that newer versions will come out after this article has been written:

# su - prometheus
$ curl -LO "https://github.com/prometheus/prometheus/releases/download/v1.7.1/prometheus-1.7.1.linux-amd64.tar.gz"
$ cd Prometheus
$ tar xzf ../prometheus-1.7.1.linux-amd64.tar.gz
$ cd prometheus-1.7.1.linux-amd64
$ echo "scrape_configs:

  - job_name: 'prometheus'
    scrape_interval: 1s
    static_configs:
      - targets: ['localhost:9090']

  - job_name: 'node_exporter'
    scrape_interval: 1s
    static_configs:
      - targets: ['localhost:9100']"> prometheus.yml

This downloaded and unzipped prometheus, and created prometheus scrape config to fetch data from prometheus itself and the node exporter. Now become root, and install the systemd unit file for prometheus:

# echo "[Unit]
Description=Prometheus Server
Documentation=https://prometheus.io/docs/introduction/overview/
After=network-online.target

[Service]
User=prometheus
Restart=on-failure
ExecStart=/home/prometheus/Prometheus/prometheus-1.7.1.linux-amd64/prometheus -config.file=/home/prometheus/Prometheus/prometheus-1.7.1.linux-amd64/prometheus.yml -storage.local.path=/home/prometheus/Prometheus/prometheus-1.7.1.linux-amd64/data

[Install]
WantedBy=multi-user.target" > /etc/systemd/system/prometheus.service

And make systemd start prometheus:

# systemctl daemon-reload
# systemctl enable prometheus.service
# systemctl start prometheus.service

And verify prometheus is running:

# systemctl status prometheus.service
● prometheus.service - Prometheus Server
   Loaded: loaded (/etc/systemd/system/prometheus.service; enabled; vendor preset: disabled)
   Active: active (running) since Mon 2017-07-31 15:36:55 UTC; 9s ago
     Docs: https://prometheus.io/docs/introduction/overview/
 Main PID: 22656 (prometheus)
   CGroup: /system.slice/prometheus.service
           └─22656 /home/prometheus/Prometheus/prometheus-1.7.1.linux-amd64/prometheus -config.file=/home/prometheus/Prometheus/prometheus-1.7.1....

Jul 31 15:36:55 test.local systemd[1]: Started Prometheus Server.
Jul 31 15:36:55 test.local systemd[1]: Starting Prometheus Server...
Jul 31 15:36:55 test.local prometheus[22656]: time="2017-07-31T15:36:55Z" level=info msg="Starting prometheus (version=1.7.1, branch=mast...n.go:88"
Jul 31 15:36:55 test.local prometheus[22656]: time="2017-07-31T15:36:55Z" level=info msg="Build context (go=go1.8.3, user=root@0aa1b7fc43...n.go:89"
Jul 31 15:36:55 test.local prometheus[22656]: time="2017-07-31T15:36:55Z" level=info msg="Host details (Linux 3.10.0-514.26.2.el7.x86_64 ...n.go:90"
Jul 31 15:36:55 test.local prometheus[22656]: time="2017-07-31T15:36:55Z" level=info msg="Loading configuration file /home/prometheus/Pro....go:252"
Jul 31 15:36:55 test.local prometheus[22656]: time="2017-07-31T15:36:55Z" level=info msg="Loading series map and head chunks..." source="....go:428"
Jul 31 15:36:55 test.local prometheus[22656]: time="2017-07-31T15:36:55Z" level=info msg="0 series loaded." source="storage.go:439"
Jul 31 15:36:55 test.local prometheus[22656]: time="2017-07-31T15:36:55Z" level=info msg="Starting target manager..." source="targetmanager.go:63"
Jul 31 15:36:55 test.local prometheus[22656]: time="2017-07-31T15:36:55Z" level=info msg="Listening on :9090" source="web.go:259"
Hint: Some lines were ellipsized, use -l to show in full.

Additionally you can go to hostname:9090/targets and verify both node_exporter and prometheus report state=UP.

At this point, system metrics are fetched and stored. All we need to do, is visualise it. An excellent tool for doing so is grafana. This is how grafana is installed:

4. Grafana
This webpage shows installation instructions and a link to the latest version. During the time of writing of this blogpost, the latest version was 4.1.1. This is how grafana is installed: (please mind installation and systemd require root privileges)

# yum install https://s3-us-west-2.amazonaws.com/grafana-releases/release/grafana-4.4.1-1.x86_64.rpm

Next up make systemd handle grafana and start it:

# systemctl daemon-reload
# systemctl enable grafana-server.service
# systemctl start grafana-server.service

And check if grafana is running:

# systemctl status grafana-server.service
● grafana-server.service - Grafana instance
   Loaded: loaded (/usr/lib/systemd/system/grafana-server.service; enabled; vendor preset: disabled)
   Active: active (running) since Mon 2017-07-31 15:43:11 UTC; 1min 58s ago
     Docs: http://docs.grafana.org
 Main PID: 22788 (grafana-server)
   CGroup: /system.slice/grafana-server.service
           └─22788 /usr/sbin/grafana-server --config=/etc/grafana/grafana.ini --pidfile= cfg:default.paths.logs=/var/log/grafana cfg:default.path...

Jul 31 15:43:12 test.local grafana-server[22788]: t=2017-07-31T15:43:12+0000 lvl=info msg="Starting plugin search" logger=plugins
Jul 31 15:43:12 test.local grafana-server[22788]: t=2017-07-31T15:43:12+0000 lvl=warn msg="Plugin dir does not exist" logger=plugins dir=/...plugins
Jul 31 15:43:12 test.local grafana-server[22788]: t=2017-07-31T15:43:12+0000 lvl=info msg="Plugin dir created" logger=plugins dir=/var/lib...plugins
Jul 31 15:43:12 test.local grafana-server[22788]: t=2017-07-31T15:43:12+0000 lvl=info msg="Initializing Alerting" logger=alerting.engine
Jul 31 15:43:12 test.local grafana-server[22788]: t=2017-07-31T15:43:12+0000 lvl=info msg="Initializing CleanUpService" logger=cleanup
Jul 31 15:43:12 test.local grafana-server[22788]: t=2017-07-31T15:43:12+0000 lvl=info msg="Initializing Stream Manager"
Jul 31 15:43:12 test.local grafana-server[22788]: t=2017-07-31T15:43:12+0000 lvl=info msg="Initializing HTTP Server" logger=http.server ad...socket=
Jul 31 15:44:34 test.local grafana-server[22788]: t=2017-07-31T15:44:34+0000 lvl=info msg="Request Completed" logger=context userId=0 orgI...eferer=
Jul 31 15:44:34 test.local grafana-server[22788]: t=2017-07-31T15:44:34+0000 lvl=info msg="Request Completed" logger=context userId=0 orgI...eferer=
Jul 31 15:44:34 test.local grafana-server[22788]: t=2017-07-31T15:44:34+0000 lvl=info msg="Request Completed" logger=context userId=0 orgI...eferer=
Hint: Some lines were ellipsized, use -l to show in full.

5. Grafana configuration
Next, we need to hook up grafana with prometheus. First, go to hostname:3000.
– Login with admin/admin
– Click ‘add datasource’
– Name: prometheus, Type: Prometheus
– Http settings: http://localhost:9090, select Access: ‘proxy’.
– Click ‘save and test’. This should result in ‘success’ and ‘datasource updated.’

Now click on the grafana symbol in the left upper corner, dashboards, import. Enter ‘2747’ at ‘grafana.com dashboard’. This will say ‘Linux memory’, select the prometheus datasource which you just defined, and click import.

This should result in a dashboard the shows you the linux memory area’s (click on the picture to get a better view!):

%d bloggers like this: