This post is about how to make your log files being aggregated in a single place and easy searchable via a convenient web interface.
You might think: wait a minute; doesn’t this exist already? Well, yes and no. Let me explain.
a) traditional log management: traditionally, logs are/were searched for certain strings, typically error messages or parts of error messages and a shell script run by cron, and when matched, a system that was setup to be able to email would send an indicator that the aforementioned string was found.
b) log management via a monitoring tool: the next step was the individual script on each server was exchanged for a monitoring tool, which performed the same task as the shell script. In reality quite often a default monitoring set/template was enabled, instead of the specific strings that were searched for with the shell script. Sometimes this was an improvement, sometimes this meant the specific issues (and thus messages) gone invisible. This is still the way the monitoring works in 90% of the cases, including a general completely standard monitoring template. At least in my experience.
c) other log gathering and indexing: there are many more products that perform this function. The first one that comes to my mind is splunk, and all the options for doing this cloud based (many there!), and a lot of tools based on elasticsearch, like the “ELK stack”.
I think it’s clear “a)” and “b)” are not dynamic and in fact very static. My major concern is it doesn’t allow exploratory investigation, it simply is a warning that is raised, any investigation means you have to log on and start browsing the available information locally. Everybody who worked with HP Openview or Oracle Enterprise Manager will recognise this. Yes, it’s probably all possible with these tools, but it never (ever(!)) is implemented.
For the last category, the likes of splunk, the ELK stack and the cloud based log tools: for the first two I feel definitely serves a function, but it’s aimed at aggregating logs of multiple servers, and is simply too much to setup on a server alongside the processes it is meant to monitor. For the cloud based tools: it might be my conservatism, but getting a subscription and loading up logging feels awkward, especially if it’s for my own test purposes.
This is where Loki comes in. Finally, I would say, there is a tool that can function on small scale (!) and perform log aggegration and provide a searchable database without a huge setup. This is ideal for your own test or development box to able to discover what is going on, and have all the log files at your fingertips without endlessly going through the filesystem performing cd, tail, grep, wc, sort, uniq, et cetera. I think lots of people recognise travelling from log file to log file.
Loki gives you a database that orders log entries based on time, and Grafana provides a web based UI to view and query the loki database. This is how it looks like:
(click to enlarge)
This is an example, what this shows is my test machine, where I decided to see when linux started, as well when the Oracle database instance was started.
* The first query uses the label “rdbms_alert” (indicating it came from the text-based oracle alert.log file, this is a label I added to it), and within the log lines with the label job with value “rdbms_alert”, I added a filter for the string “Starting ORACLE”, which indicates an Oracle database instance start.
* The second query uses the label “messages” (indicating it came from the linux /var/log/messages file, this label is mine too), and within the log lines with the label job with the value “messages”, I added a filter for the string “kernel: Command line”, which indicates linux startup. I additionally added a negative filter for “loki”, because loki logs the queries to the messages file, which I don’t want to see.
I hope you can see the power of having all the logs in a single place, and completely searchable.
This is just a start, this is a very simple proof-of-concept setup, for example the date/time in the log lines is not used, the date/time of the log lines is when it was ingested into loki, it is possible to have loki interpret these.
If you are interested, but are uncertain if this is for you, and would like to test this: I got a couple of Ansible scripts that can setup the combination of:
* promtail (the default loki log streaming tool)
* loki (the database)
* Grafana (the web UI)
The scripts are created on Oracle Linux 7.8.
Install git, and clone the https://gitlab.com/FritsHoogland/loki-setup.git repository using a user that has sudo rights:
git clone https://gitlab.com/FritsHoogland/loki-setup.git
Install ansible (you might have to install the EPEL repository).
In the loki-setup repo you find the setup scripts for loki, promtail and grafana.
You can execute the scripts in that order (loki, promtail and Grafana) by executing the setup_loki.yml, setup_promtail,yml and setup_grafana.yml scripts.
IMPORTANT: do proofread the scripts, and validate the variables for your situation. Don’t worry: the scripts are easy to read.
After installing, you can go to http://yourmachine:3000, login with admin and the password you set in the setup_grafana.yml script, and click on the compass (explore) option, and you can perform your own log browsing.
If you decide you want to remove it, I got a remove script for each of the components, which will remove the specific component entirely. Same applies here too: validate the script.