Archive

Tag Archives: ELK

This blog post is about two things: one how you can monitor who is bringing you database up and down (there is a twist at the end!) and two how you can very conveniently do that with aggregated logs in a browser with a tool called ‘Kibana’, which is the K in ELK.

What is the ‘ELK stack’?
The ELK stack gets it’s name from Elasticsearch, Logstash and Kibana.
– Elasticsearch is an open source search engine based on Apache Lucene, which provides a distributed, multitenant-capable full-text search engine with a http web interface and schema-free JSON documents.
– Logstash is a fully configurable open source data processing pipeline that can receive data from a multiple sources simultaneously, transform it and output it based on the output plugin, which is the elastic search plugin in this blogpost but could be anything from STDOUT, an unix pipe, a file, a file in CSV, HTTP, email, IRC, Jira, graphite, kafka, mongodb, nagios, S3, SolR, … really whatever you want.
– Kibana is an open source data visualisation plugin for Elasticsearch.
When looking at Kibana, it quite much looks like the splunk interface.

Installing the ELK stack.
Installing the ELK stack in a basic way is easy. In this blogpost I will install everything on the same host, everything being the ELK stack and an Oracle database installation. In reality you should have a log gatherer on every host (called ‘filebeat’) and a dedicated host which runs the rest of the stack (logstash, elasticsearch and kibana). The below install actions were executed on a Linux 64 bit host running Oracle Linux 6.8.
In order to make the installation really easy, I use the yum repository of the elastic company, this is how to set that up (all done as root, ‘#’ indicates root):

# rpm --import https://packages.elastic.co/GPG-KEY-elasticsearch
# vi /etc/yum.repos.d/elastic.repo
[elastic-5.x]
name=Elastic repository for 5.x packages
baseurl=https://artifacts.elastic.co/packages/5.x/yum
gpgcheck=1
gpgkey=https://artifacts.elastic.co/GPG-KEY-elasticsearch
enabled=1
autorefresh=1
type=rpm-md

Install elasticsearch:

# yum install java-1.8.0-openjdk
# yum install elasticsearch
# chkconfig --add elasticsearch
# service elasticsearch start

Install logstash:

# yum install logstash

Configure logstash input and output:

# vi /etc/logstash/conf.d/input.conf
input {
  beats {
    port => 5044
  }
}
# vi /etc/logstash/conf.d/output.conf
output {
  elasticsearch {
    hosts => "localhost:9200"
    manage_template => false
    index => "%{[@metadata][beat]}-%{+YYYY.MM.dd}"
    document_type => "%{[@metadata][type]}"
  }
}

Verify the logstash config files:

# sudo -u logstash /usr/share/logstash/bin/logstash --path.settings /etc/logstash -t
Sending Logstash's logs to /var/log/logstash which is now configured via log4j2.properties
Configuration OK

If you see the ‘Configuration OK’ message, it means logstash could interprent the configuration files. It does not mean it will all work as desired, there could be runtime issues.
Now let’s start logstash. Logstash uses upstart (meaning a startup script in /etc/init) instead of the legacy startup mechanism using the chkconfig and service utilities.

# initctl start logstash

The last part of the data pipeline is ‘filebeat’. There are and could be multiple input products, in this blogpost I use ‘filebeat’, which keeps track of logfiles.

# yum install filebeat
# chkconfig --add filebeat

We are going to look into linux and oracle auditing. So we need to keep track of a couple of files:
– /var/log/secure: this is the default linux logfile which contains all kinds of authentication messages, as defined in /etc/rsyslog.conf (authpriv.* /var/log/secure).
– /u01/app/oracle/admin/*/adump/*.aud: this is the default place where the oracle database stores it’s audit files. These audit files provide what is called ‘mandatory auditing’, and includes at least connections to the instance with administrator privilege, database startup and database shutdown. The default is a normal text based logfile, it could be set to XML.
– /var/log/audit/audit.log: this is the logfile of the linux kernel based audit facility. This is actually a lesser known hidden gem in Linux, and provides audit information from the Linux kernel.

These files need to be configured in filebeat, in the file: /etc/filebeat/filebeat.yml. As the extension of the file indicates, this is a file organised in YAML syntax. The best way to configure the file is to move the file, and create your own file with your desired configuration. First of all we add the output, which is logstash in our case. Please mind the default configuration of filebeat is direct output to elasticsearch, which means we don’t have an option to enrich the data!

# mv /etc/filebeat/filebeat.yml /etc/filebeat/filebeat.yml.orig
# vi /etc/filebeat/filebeat.yml
output.logstash:
  hosts: ["localhost:5044"]

Please mind the two spaces in front of ‘hosts’, which is mandatory for a YAML document!
Next up we add the files to monitor in the configuration file. The linux based logfiles are easy:

filebeat.prospectors:
- input_type: log
  paths:
    - /var/log/secure
  document_type: secure

- input_type: log
  paths:
    - /var/log/audit/audit.log
  document_type: audit

One thing to notice is that a type is set for each file (which is really just a name for the file filebeat monitors), which makes it able to find data from these specific files later on. Now the Oracle audit file:

- input_type: log
  paths:
    - /u01/app/oracle/admin/*/adump/*.aud
  document_type: oracle_audit
  multiline:
    pattern: '^[A-Za-z]{3} [A-Za-z]{3} [0-9]{2} [0-9]{2}:[0-9]{2}:[0-9]{2} [0-9]{4}'
    negate: true
    match: after

This looks a bit more complicated. The reason for the complication is the multiline specification. An Oracle database audit file contains a timestamp, after which the audit data is written; it looks like this:

Thu Jan 19 13:44:12 2017 +00:00
LENGTH : '198'
ACTION :[49] 'ALTER DATABASE OPEN /* db agent *//* {0:0:476} */'
DATABASE USER:[1] '/'
PRIVILEGE :[6] 'SYSDBA'
CLIENT USER:[6] 'oracle'
CLIENT TERMINAL:[0] ''
STATUS:[1] '0'
DBID:[10] '2622783786'

The important things at this time: the ‘pattern’ keyword specifies the timestamp, you can see you can match it with the timestamp, and all the following data needs to be processed together, this is a single record, written over multiple lines. ‘negate: true’ means that anything that does not fit the pattern needs to be added to this piece of data, ‘match: after’ means that this is added after the pattern is matched.

Now that filebeat is setup, we can start the filebeat daemon:

# service filebeat start

The last component is kibana:

# yum install kibana
# chkconfig --add kibana
# service kibana start

Now that we’ve set the entire pipeline up, a next thing to do is to configure logstash to enrich the data. Here’s the how it’s done for the Oracle database audit file:

# vi /etc/logstash/conf.d/oracle-audit.conf
filter {
  if [type] == "oracle_audit" {
    grok {
      match => { "message" => "^%{DAY} %{MONTH:M} %{MONTHDAY:d} %{HOUR:h}:%{MINUTE:m}:%{SECOND:s} %{YEAR:y}" }
      add_tag => [ "grok", "oracle_audit" ]
    }
    grok {
      match => { "message" => "ACTION :\[[0-9]*\] '(?<ora_audit_action>.*)'.*DATABASE USER:\[[0-9]*\] '(?<ora_audit_dbuser>.*)'.*PRIVILEGE :\[[0-9]*\] '(?<ora_audit_priv>.*)'.*CLIENT USER:\[[0-9]*\] '(?<ora_audit_osuser>.*)'.*CLIENT TERMINAL:\[[0-9]*\] '(?<ora_audit_term>.*)'.*STATUS:\[[0-9]*\] '(?<ora_audit_status>.*)'.*DBID:\[[0-9]*\] '(?<ora_audit_dbid>.*)'" }
    }
    grok {
      match => { "source" => [ ".*/[a-zA-Z0-9_#$]*_[a-z0-9]*_(?<ora_audit_derived_pid>[0-9]*)_[0-9]*\.aud" ] }
    }
    mutate {
      add_field => { "ts" => "%{y}-%{M}-%{d} %{h}:%{m}:%{s}" }
    }
    date {
      locale => "en"
      match => [ "ts", "YYYY-MMM-dd HH:mm:ss" ]
    }
    mutate {
      remove_field => [ "ts", "y", "M", "d", "h", "m", "s" ]
    }
  }
}

It’s beyond the scope of this article to go through every detail, but as you can see we apply a filter. Everything in this filter takes place for the type “oracle_audit”, which is set by filebeat. The next thing we encounter a couple of times is ‘grok’s’. The term grok comes from the Robert Heinlein science-fiction novel ‘Stranger in a Strange land’. Essentially, a grok with logstash means you specify a pattern, for which the actions are applied if the specified pattern matches. The first grok looks for the date pattern for which extra fields are created (M,d,h,m,s, after the colon) in the field ‘message’, and adds a tag (a word in the tags field for the record that is created). The second grok also looks in the ‘message’ field, and specifies text (ACTION for example), some other characters and then (?.*) is visible. This is a custom pattern, for which the field name to be created is in between < and > and is followed by a pattern. This grok line (including all the patterns) creates fields for all the Oracle audit fields in the audit file! The next grok picks up the PID from the filename of the logfile (the filename is in a field ‘source’), and the two mutates create and destroy a new field ts which is used for the date, and date specifies the date/time with the data flowing through logstash is filled with the date and time in the ts field, instead of the time filebeat picked up the data and sent it through logstash. Please mind that if you add (or change) configuration in a logstash configuration file, you need to restart logstash.

We are all set now! Last words on this configuration: kibana and elasticsearch by default do not require authentication. Do not expose the ports of these products to the internet! I am using a tunnel to the kibana website, which runs on port 5601. It’s very easy to ssh into the machine running the ELK stack using ssh user@machine -L 5601:localhost:5601, which creates a port on localhost:5601 on my machine at home (-L = local), for which communication is tunnelled to localhost:5601 on the remote machine (the localhost in the ssh line example is an address on the machine you ssh in to, this could also be another server which is only visible from the machine you ssh into.

First let’s login to the machine, and see what information is revealed with /var/log/secure:
kibana-secure-login
You get this screen when you goto kibana at port 5601, enter: ‘type: secure’ in the search bar to display data of the type secure (which is what is set with document_type: secure in filebeat.yml), and login to the machine where filebeat is watching the /var/log/secure file. As you can see, you get two lines from the ssh deamon, one indicating something’s done with pam (pam_unix), and one line which indicates it authenticated via a public key for user ops from an ip address (which is anonymised) at port 39282 via ssh2.

With a lot of cloud providers you get a user which has public key authentication setup (which you saw above), after which you need to sudo to for example the oracle user. In a lot of companies, you get a personalised account to log on to servers, after which you need to sudo to oracle. In both cases you need to use sudo to become the user that you need to administer, for example oracle. This is what sudo generates in the /var/log/secure file:
kibana-secure-sudo
The secure log displays sudo was invoked by the user opc, on TTY pts/1 and the command executed via sudo was ‘/bin/su – oracle’.

Now that I have became oracle using sudo, I set the environment of my database using oraenv and started up a database. Now go over to kibana, and issued a search for ‘type: oracle_audit’. This is how that looks like:
kibana-oracle_audit
Now if you look at what the audit record provides, the only things that provide something useful for the purpose of investigating who did stop or start a database are ACTION and CLIENT TERMINAL (I assume the database is stopped and started by the ‘oracle’ user). Now change the ‘selected fields’ in kibana and add the (dynamically created!) fields: ora_audit_action, ora_audit_term and ora_audit_derived_pid, and remove message. This is how that looks like:
kibana-oracle-audit-startup
The important thing to look for here is the ora_audit_action ‘startup’, then look at the ora_audit_derived_pid, and two rows down we see terminal ‘pts/1’ was the terminal on which this was entered.

Now that we know the terminal, we can add in searching in the message field for the secure type. Enter ‘type: oracle_audit OR (type: secure AND message: “*pts/1*”)’ in the search bar.
kibana-secure-oracle_audit
Okay, this works. But it’s far from perfect. In fact, it only works if the username of the session doing the sudo is the only session with that username, otherwise if there is more than one session it can be any of these sessions doing the sudo, since there is nothing more than the username. This also means that if there is a direct logon to the oracle user, there is no way to identify a session with a TTY, and thus database startup and shutdown are completely anonymous, there’s no way to link a specific session to that action outside of probably the oracle user and a TTY which can not be linked to anything like for example an ip address.

Is there a better way? Yes! We can also use the linux, kernel based, auditing service, which is on by default. This service keeps a log file at /var/log/secure/secure.log, and gives way more granular auditing events than the /var/log/secure log. Linux audit generates a lot of diverse types of rows, so it’s actually not easy to grok them, but in order to understand which session executed a startup or shutdown, the only audit row that is important for this specific use case is an audit type called ‘CRED_ACQ’. The grok for this type looks like this:

# vi /etc/logstash/conf.d/linux-audit.conf
filter {
  if [type] == "audit" {
    grok {
        match => { "message" => ""type=%{WORD:audit_type} msg=audit\(%{NUMBER:audit_epoch}:%{NUMBER:audit_counter}\): pid=%{NUMBER:audit_pid} uid=%{NUMBER:audit_uid} auid=%{NUMBER:audit_auid} ses=%{NUMBER:audit_ses} msg='op=%{NOTSPACE:audit_op} ((acct=\"%{GREEDYDATA:audit_acct}\")|(id=%{NUMBER:audit_id})|acct=%{BASE16NUM:audit_acct}) exe=\"%{GREEDYDATA:audit_exe}\" hostname=%{NOTSPACE:audit_hostname} addr=%{NOTSPACE:audit_addr} terminal=%{NOTSPACE:audit_terminal} res=%{NOTSPACE:audit_res}'" }
        add_tag => [ "grok", "audit" ]
    }
    date {
      locale => en
      match => [ "audit_epoch", "UNIX" ]
    }
  }
}

This grok matches the CREDIT_ACQ audit type which we will use to trace back the session via the audit log. Another nicety of this logstash configuration is the audit records time using an epoch timestamp, which logstash can translate back to a human readable timestamp. Once this is in place, log in again and use sudo to switch to oracle (or log in directly as oracle, it doesn’t really matter that much now!), and search in kibana for: ‘type: oracle_audit OR (type: audit AND audit_type: CRED_ACQ)’. Now get the relevant fields; remove ‘message’, and add: audit_hostname, audit_acct, audit_ses, audit_terminal, ora_audit_term, ora_audit_derived_pid, ora_audit_action. This probably returns a log of rows, now scroll (“back in time”) and search for the startup or shutdown command, and then follow the trail:
kibana-oracle_audit-audit-raw
Startup points to (oracle server process) PID 17748, which was instantiated by a session using by pts/1 (two rows down), one row further down we see the audit information which shows pts/1, which is connected to audit_ses 4230. The audit_ses number is a number that sticks with a session, regardless of using sudo. If you follow down number 4230, you see multiple rows of audit_ses 4230, some of them with root, which is typical for sudo switching from one user to another. The final row shows the user logging in with it’s ip address. In other words: using the linux kernel audit facility, you can get all available information!

Okay, all happy now? Are you sure? Now for the twist!

Whenever you use RAC, or use ASM, or use both, or you are using the grid infra single instance as a framework to track your your listener(s) and database(s) and start and stop them automatically, you can still stop and start an instance directly using sqlplus, but in most cases you will be using the grid infrastructure crsctl or srvctl commands. When the grid infrastructure crsctl and srvctl commands are used, this is how the Oracle database audit information looks like:
kibana-oracle_audit-crs-shutdown
As you can see, because the cluster ware brought the database down, there is no terminal associated with the shutdown. So the above mentioned way of first searching for startup and shutdown in the oracle audit information, finding the associated terminal, and then tracing it through the audit records can NOT be used whenever the Oracle cluster ware is used, because a grid infrastructure deamon is actually stopping and starting the database, and the grid infrastructure does not keep any information (that I am aware of) about which client invoked a grid infrastructure command. I guess a lot of auditors will be very unhappy about this.

Now the good news: you can solve this issue very easy. The downside is it requires additional configuration of the linux auditing. The solution is to put an ‘execution watch’ on srvctl and crsctl; this is how this is done:

# auditctl -w /u01/app/12.1.0.2/grid/bin/srvctl -p x -k oracrs
# auditctl -w /u01/app/12.1.0.2/grid/bin/crsctl -p x -k oracrs

In order to validate the working, I started a database using srvctl, and searched for: ‘(type: oracle_audit AND ora_audit_action: STARTUP) OR (type: audit AND message: key=\”oracrs\”)’. This is how that looks like:
kibana-oracle_audit-audit-watch
As you can see, there’s the Oracle database record indicating the startup of the database, and a little while back in time there’s the linux audit row indicating the execution of the srvctl executable. Once you are at that point, you can using the earlier mentioned way of using the audit_ses number to trace the session execution, including sudo and ip address at logon time.

%d bloggers like this: