Archive

Tag Archives: installation

Prometheus is an open source systems monitoring and alerting toolkit originally build at Soundcloud. This blogpost shows how to install the needed components to do visualisation of linux system statistics via Grafana.

The setup consists of 3 components:
node_exporter, an exporter of system and hardware metrics.
prometheus, a metric collection and persistence layer.
grafana, the visualisation layer.

1. Preparation
The needed components are installed in the home directory of the user ‘prometheus’. In order for that user exist, it must obviously first be created:

# useradd prometheus
# su - prometheus
$

This installation guide uses Oracle Linux 7.3, but should work for RHEL or Centos too.

2. Node exporter
The next thing to do is install the node exporter. Please mind new version do come out, so you might want to verify the latest release on

$ curl -LO "https://github.com/prometheus/node_exporter/releases/download/v0.14.0/node_exporter-0.14.0.linux-amd64.tar.gz"
$ mkdir -p Prometheus/node_exporter
$ cd $_
$ tar xzf ../../node_exporter-0.14.0.linux-amd64.tar.gz

Now become root and create a unit file to automatically startup the node exporter using systemd:

# echo "[Unit]
Description=Node Exporter

[Service]
User=prometheus
ExecStart=/home/prometheus/Prometheus/node_exporter/node_exporter-0.14.0.linux-amd64/node_exporter

[Install]
WantedBy=default.target" > /etc/systemd/system/node_exporter.service

And make systemd start the node exporter:

# systemctl daemon-reload
# systemctl enable node_exporter.service
# systemctl start node_exporter.service

Next you can verify if the node exporter is running by using ‘systemctl status node_exporter.service:

# systemctl status node_exporter.service
● node_exporter.service - Node Exporter
   Loaded: loaded (/etc/systemd/system/node_exporter.service; enabled; vendor preset: disabled)
   Active: active (running) since Mon 2017-07-31 15:20:54 UTC; 7s ago
 Main PID: 3017 (node_exporter)
   CGroup: /system.slice/node_exporter.service
           └─3017 /home/prometheus/Prometheus/node_exporter/node_exporter-0.14.0.linux-amd64/node_exporter

Jul 31 15:20:54 test.local node_exporter[3017]: time="2017-07-31T15:20:54Z" level=info msg=" - hwmon" source="node_exporter.go:162"
Jul 31 15:20:54 test.local node_exporter[3017]: time="2017-07-31T15:20:54Z" level=info msg=" - infiniband" source="node_exporter.go:162"
Jul 31 15:20:54 test.local node_exporter[3017]: time="2017-07-31T15:20:54Z" level=info msg=" - textfile" source="node_exporter.go:162"
Jul 31 15:20:54 test.local node_exporter[3017]: time="2017-07-31T15:20:54Z" level=info msg=" - conntrack" source="node_exporter.go:162"
Jul 31 15:20:54 test.local node_exporter[3017]: time="2017-07-31T15:20:54Z" level=info msg=" - diskstats" source="node_exporter.go:162"
Jul 31 15:20:54 test.local node_exporter[3017]: time="2017-07-31T15:20:54Z" level=info msg=" - entropy" source="node_exporter.go:162"
Jul 31 15:20:54 test.local node_exporter[3017]: time="2017-07-31T15:20:54Z" level=info msg=" - loadavg" source="node_exporter.go:162"
Jul 31 15:20:54 test.local node_exporter[3017]: time="2017-07-31T15:20:54Z" level=info msg=" - sockstat" source="node_exporter.go:162"
Jul 31 15:20:54 test.local node_exporter[3017]: time="2017-07-31T15:20:54Z" level=info msg=" - wifi" source="node_exporter.go:162"
Jul 31 15:20:54 test.local node_exporter[3017]: time="2017-07-31T15:20:54Z" level=info msg="Listening on :9100" source="node_exporter.go:186"

Additionally, you can go to hostname:9100, and look if that page says ‘node exporter’, and has a link called ‘metric’, which has all the metrics.

3. Prometheus
After we installed node_exporter to provide measurements, we must install the software that can fetch that information and store it. That is what prometheus does. First, become the prometheus user again, and install prometheus. Here too is important to realise that newer versions will come out after this article has been written:

# su - prometheus
$ curl -LO "https://github.com/prometheus/prometheus/releases/download/v1.7.1/prometheus-1.7.1.linux-amd64.tar.gz"
$ cd Prometheus
$ tar xzf ../prometheus-1.7.1.linux-amd64.tar.gz
$ cd prometheus-1.7.1.linux-amd64
$ echo "scrape_configs:

  - job_name: 'prometheus'
    scrape_interval: 1s
    static_configs:
      - targets: ['localhost:9090']

  - job_name: 'node_exporter'
    scrape_interval: 1s
    static_configs:
      - targets: ['localhost:9100']"> prometheus.yml

This downloaded and unzipped prometheus, and created prometheus scrape config to fetch data from prometheus itself and the node exporter. Now become root, and install the systemd unit file for prometheus:

# echo "[Unit]
Description=Prometheus Server
Documentation=https://prometheus.io/docs/introduction/overview/
After=network-online.target

[Service]
User=prometheus
Restart=on-failure
ExecStart=/home/prometheus/Prometheus/prometheus-1.7.1.linux-amd64/prometheus -config.file=/home/prometheus/Prometheus/prometheus-1.7.1.linux-amd64/prometheus.yml -storage.local.path=/home/prometheus/Prometheus/prometheus-1.7.1.linux-amd64/data

[Install]
WantedBy=multi-user.target" > /etc/systemd/system/prometheus.service

And make systemd start prometheus:

# systemctl daemon-reload
# systemctl enable prometheus.service
# systemctl start prometheus.service

And verify prometheus is running:

# systemctl status prometheus.service
● prometheus.service - Prometheus Server
   Loaded: loaded (/etc/systemd/system/prometheus.service; enabled; vendor preset: disabled)
   Active: active (running) since Mon 2017-07-31 15:36:55 UTC; 9s ago
     Docs: https://prometheus.io/docs/introduction/overview/
 Main PID: 22656 (prometheus)
   CGroup: /system.slice/prometheus.service
           └─22656 /home/prometheus/Prometheus/prometheus-1.7.1.linux-amd64/prometheus -config.file=/home/prometheus/Prometheus/prometheus-1.7.1....

Jul 31 15:36:55 test.local systemd[1]: Started Prometheus Server.
Jul 31 15:36:55 test.local systemd[1]: Starting Prometheus Server...
Jul 31 15:36:55 test.local prometheus[22656]: time="2017-07-31T15:36:55Z" level=info msg="Starting prometheus (version=1.7.1, branch=mast...n.go:88"
Jul 31 15:36:55 test.local prometheus[22656]: time="2017-07-31T15:36:55Z" level=info msg="Build context (go=go1.8.3, user=root@0aa1b7fc43...n.go:89"
Jul 31 15:36:55 test.local prometheus[22656]: time="2017-07-31T15:36:55Z" level=info msg="Host details (Linux 3.10.0-514.26.2.el7.x86_64 ...n.go:90"
Jul 31 15:36:55 test.local prometheus[22656]: time="2017-07-31T15:36:55Z" level=info msg="Loading configuration file /home/prometheus/Pro....go:252"
Jul 31 15:36:55 test.local prometheus[22656]: time="2017-07-31T15:36:55Z" level=info msg="Loading series map and head chunks..." source="....go:428"
Jul 31 15:36:55 test.local prometheus[22656]: time="2017-07-31T15:36:55Z" level=info msg="0 series loaded." source="storage.go:439"
Jul 31 15:36:55 test.local prometheus[22656]: time="2017-07-31T15:36:55Z" level=info msg="Starting target manager..." source="targetmanager.go:63"
Jul 31 15:36:55 test.local prometheus[22656]: time="2017-07-31T15:36:55Z" level=info msg="Listening on :9090" source="web.go:259"
Hint: Some lines were ellipsized, use -l to show in full.

Additionally you can go to hostname:9090/targets and verify both node_exporter and prometheus report state=UP.

At this point, system metrics are fetched and stored. All we need to do, is visualise it. An excellent tool for doing so is grafana. This is how grafana is installed:

4. Grafana
This webpage shows installation instructions and a link to the latest version. During the time of writing of this blogpost, the latest version was 4.1.1. This is how grafana is installed: (please mind installation and systemd require root privileges)

# yum install https://s3-us-west-2.amazonaws.com/grafana-releases/release/grafana-4.4.1-1.x86_64.rpm

Next up make systemd handle grafana and start it:

# systemctl daemon-reload
# systemctl enable grafana-server.service
# systemctl start grafana-server.service

And check if grafana is running:

# systemctl status grafana-server.service
● grafana-server.service - Grafana instance
   Loaded: loaded (/usr/lib/systemd/system/grafana-server.service; enabled; vendor preset: disabled)
   Active: active (running) since Mon 2017-07-31 15:43:11 UTC; 1min 58s ago
     Docs: http://docs.grafana.org
 Main PID: 22788 (grafana-server)
   CGroup: /system.slice/grafana-server.service
           └─22788 /usr/sbin/grafana-server --config=/etc/grafana/grafana.ini --pidfile= cfg:default.paths.logs=/var/log/grafana cfg:default.path...

Jul 31 15:43:12 test.local grafana-server[22788]: t=2017-07-31T15:43:12+0000 lvl=info msg="Starting plugin search" logger=plugins
Jul 31 15:43:12 test.local grafana-server[22788]: t=2017-07-31T15:43:12+0000 lvl=warn msg="Plugin dir does not exist" logger=plugins dir=/...plugins
Jul 31 15:43:12 test.local grafana-server[22788]: t=2017-07-31T15:43:12+0000 lvl=info msg="Plugin dir created" logger=plugins dir=/var/lib...plugins
Jul 31 15:43:12 test.local grafana-server[22788]: t=2017-07-31T15:43:12+0000 lvl=info msg="Initializing Alerting" logger=alerting.engine
Jul 31 15:43:12 test.local grafana-server[22788]: t=2017-07-31T15:43:12+0000 lvl=info msg="Initializing CleanUpService" logger=cleanup
Jul 31 15:43:12 test.local grafana-server[22788]: t=2017-07-31T15:43:12+0000 lvl=info msg="Initializing Stream Manager"
Jul 31 15:43:12 test.local grafana-server[22788]: t=2017-07-31T15:43:12+0000 lvl=info msg="Initializing HTTP Server" logger=http.server ad...socket=
Jul 31 15:44:34 test.local grafana-server[22788]: t=2017-07-31T15:44:34+0000 lvl=info msg="Request Completed" logger=context userId=0 orgI...eferer=
Jul 31 15:44:34 test.local grafana-server[22788]: t=2017-07-31T15:44:34+0000 lvl=info msg="Request Completed" logger=context userId=0 orgI...eferer=
Jul 31 15:44:34 test.local grafana-server[22788]: t=2017-07-31T15:44:34+0000 lvl=info msg="Request Completed" logger=context userId=0 orgI...eferer=
Hint: Some lines were ellipsized, use -l to show in full.

5. Grafana configuration
Next, we need to hook up grafana with prometheus. First, go to hostname:3000.
– Login with admin/admin
– Click ‘add datasource’
– Name: prometheus, Type: Prometheus
– Http settings: http://localhost:9090, select Access: ‘proxy’.
– Click ‘save and test’. This should result in ‘success’ and ‘datasource updated.’

Now click on the grafana symbol in the left upper corner, dashboards, import. Enter ‘2747’ at ‘grafana.com dashboard’. This will say ‘Linux memory’, select the prometheus datasource which you just defined, and click import.

This should result in a dashboard the shows you the linux memory area’s (click on the picture to get a better view!):

Advertisements

This is a blogpost about how I setup my test virtual machines. The seasoned sysadmin and DBA will notice that the techniques used here are perfectly usable for real production environments. The most important thing is there is no need to download or stage any software for installing the virtual machine, everything is downloaded when needed during installation. Obviously this works best when you have got reasonable bandwidth available for connecting to the internet.

The main infrastructure software components of this setup are:
Virtualbox as the virtualisation software.
Ansible as the configuration and management engine.
Vagrant as the virtualisation manager.

Installation (Mac OSX specific).
– Virtual box is installed by downloading and installing the installation image in the normal way.
– Ansible requires a few steps. Ansible relies on python.
First install pip using easy_install:

$ sudo easy_install pip

Then install ansible for your current user:

$ sudo pip install ansible --user

Now to pick up the ansible local install, add the following to .bash_profile:

export PATH=$PATH:~/Library/Python/2.7/bin

– Vagrant is installed by downloading and installing the installation image in the normal way.

My versions:

$ VBoxManage -v
5.0.26r108824
$ ansible --version
ansible 2.1.0.0
  config file = /Users/fritshoogland/.ansible.cfg
  configured module search path = Default w/o overrides
$ vagrant version
Installed Version: 1.8.5
Latest Version: 1.8.5

You're running an up-to-date version of Vagrant!

Okay! Now all the software is installed, up to the next step!

SSH keys issue for vagrant (OSX specific).
For ssh certificate based authentication to work, the private key file needs to protected by mode 600 (rw-|—|—). Vagrant uses a ssh key to access the virtual machine. I am running the vagrant directory from a share, which forces the mode of the files to be 700. In order to workaround that issue, I setup my own keys in ~/.vagrant_ssh:

$ mkdir .vagrant_ssh
$ cd .vagrant_ssh
$ ssh-keygen -q -N "" -f id_rsa

Later on some settings are made in the Vagrantfile to use the private key just generated.

SSH keys for normal authentication (OSX specific).
The Ansible scripts copy your ssh public key and put it in the authorised_keys file of both the oracle and the root user. By putting your public key in the authorized_keys file of a user, the authentication part of logging on is done via the public key.

However, for this to work, you first need to have a private and public key pair. There are many ways of doing that, this is an example of that:

Check if you have a dsa key pair:

$ cd ~
$ find .ssh
.ssh/id_dsa
.ssh/id_dsa.pub

In this case, as you can see, there are two id_dsa files, one with the extension “.pub”, which is the public key, and the other one without an extension is the private key. The private key should be kept private (at all times). If you don’t have id_dsa keys, set them up in the following way:

$ cd ~
$ mkdir .ssh
$ chmod 700 .ssh
$ cd .ssh
$ ssh-keygen -q -N "" -t dsa -f id_dsa

Setting the Virtualbox images directory.
Virtualbox will put the virtual machines in it’s default machine folder. You can see where the default machine folder is set to using the VBoxManage utility:

$ VBoxManage list systemproperties | grep machine
Default machine folder:          /Volumes/VirtualBox

Because the virtual machines are disk space intensive, you might want to change the location. At least, I later decided to dedicate an external disk to it. The way to get the disk images in another location, is change the virtualbox settings:

$ VBoxManage setproperty machinefolder /Volumes/Virtualbox

The vagrant ‘box’ location.
Vagrant uses operating system images that it is calling ‘boxes’. These ‘boxes’ are the base image that is copied when a virtual machine is created. It is stored to save it from downloading over and over. The location of these images or ‘boxes’ is ~/.vagrant.d/boxes. If you want to store the ‘boxes’ somewhere else, you can set the environment variable VAGRANT_HOME to make vagrant use a different location for storing its global state and the ‘boxes’.

Setting up the vagrant directory.
The next step is to set up the directory where the virtual machine will be run. The simplest way of doing so is cloning the files from github:

$ git clone https://github.com/FritsHoogland/Vagrant-oracle-setup.git

This will create a ‘Vagrant-oracle-setup’ directory.
Now you need to fill out the Vagrantfile in the directory:
a) line 12: set the hostname. (change optional)
b) line 14: set an IP address. (do not choose an already existing network, like the default virtualbox 10.0.2/24 network)
c) line 18/19: set the memory and CPU capacity for the VM to use.
d) line 41/42: set your My Oracle Support credentials to download the installation media.
e) line 43: set the IP address for ansible, as set at b)/line 14.
f) line 46: set the database name to create. (change optional)

Install the virtual machine with linux, the oracle database and setup a database.
Now the cool part: install linux, install the oracle database software and create a database without any manual intervention! For this you need to go into the Vagrant-oracle-setup directory, and execute:

$ vagrant up

That’s it.

You can follow the steps vagrant and ansible are executing. You’ll see vagrant first fetching (downloading) the OS image (which it will save to prevent from downloading it again), and then copy this image to be the image for the virtual machine to run. Vagrant then makes some settings, mostly network related, and hands over the control to ansible.

Ansible executes the playbook as set in the Vagrantfile, which I’ve put in the ‘ansible’ directory. Ansible sets up linux, configures the extra block devices using LVM for the software home and the database, gets the installation media from My Oracle Support, installs the software and then runs DBCA to create a database.

Further things.
Handling vagrant to administer the virtual machine is easy. Go into the directory with the Vagrantfile and use:
– vagrant halt: stop the virtual machine.
– vagrant up: for a new machine, get the base image and provision the machine. For an existing machine: startup the virtual machine.
– vagrant destroy: stop the virtual machine and remove all the information it has used like configuration/provisioning status and the disks.
– vagrant provision: run the ansible playbook again.

Anything else? Well, yes, I didn’t setup a listener for example. It should be really simple and straightforward: just start the listener. The listener will start listening on all network devices, and the database will register itself. There might be much more, depending on your specific need.

Thanks to Maris Elsins for getMOSPatch.

This post looks like I am jumping on the bandwagon of IT orchestration like a lot of people are doing. Maybe I should say ‘except for (die hard) Oracle DBA’s’. Or maybe not, it up to you to decide.

Most people who are interested in IT in general will have noticed IT orchestration has gotten attention, especially in the form of Puppet and/or Chef. I _think_ IT orchestration has gotten important with the rise of “web scale” (scaling up and down applications by adding virtual machines to horizontal scale resource intensive tasks), in order to provision/configure the newly added machines without manual intervention, and people start picking it up now to use it for more tasks than provisioning of virtual machines for web applications.

I am surprised by that. I am not surprised that people want boring tasks like making settings in configuration files and restarting daemons, installing software with all the correct options, etc. being automated. Instead, I am surprised that people are now picking this up after it has been around for so long.

A little history.
As far as I know, IT orchestration started with cfengine, which was really a configuration engine (hence the name). Despite having a little different purpose (configuration vs. orchestration), this tool is the parent of all the orchestration/configuration tools which exist nowaday. cfengine started off as a study in 1993 by Mark Burgess at the university of Oslo, with the creation of the cfengine software as a result. As far as I can see, it has been available as open source software since the beginning.

Now back to what I am surprised at: with cfengine, there has been a way to configure linux/unix systems in a structured way, and install and configure software on systems since the mid-nineties. Yet, this wasn’t picked up (of course with a few exceptions). Fast forward to today, we see it is being adopted. And that is a good thing.

I created a setup with cfengine for a client a long time ago, which had the ability to install the Oracle software, different PSU’s in different home’s, and remove it by adding or removing machines to groups in a cfengine configuration file. It wasn’t picked up by the client, it’s way more fun running X to install the software, and make the choices by hand, and redo this over and over on every machine, right?

I almost forgotten about my work with cfengine, until I spoke with Alex Gorbatchev at a conference, at which he pointed me to Ansible. At first I didn’t do a lot with it, but lately I’ve given it a go, and I am very happy with it.

Another redo of cfengine?
From what I read, most of the configuration/orchestration engines created after cfengine are created to circumvent all kinds of difficulties with cfengine. I can understand that. It took me a while to learn cfengine, and indeed it forces you to think in a different way.

The Ansible project decided to radically do it different than all the other engines. It is different in the sense that it advertises itself as simple, agentless and powerful.

Simple.
Simple is a terrific goal. For those of you that have worked with configuration/orchestration engines, there is a steep learning curve. It is just hard to get the basic principles in your head. To be honest, also Ansible took me a while too, to grasp the basic principles, and get the picture correctly in my head. Yet, having worked with cfengine comparing it with Ansible’s playbooks, which are the scripts to do things on the targets, it is a breath of fresh air. Playbooks are so clean they (almost) can be read and understood as plain english.

Agentless.
This is where Ansible is truly different than any of the other configuration/orchestration tools. Ansible does not require any agent installation on the targets. The obvious next question then is: how can this work? Well, quite simple: Ansible uses ssh to connect to the host, and executes commands via the shell. Having that said, it requires a little more detail; Ansible uses python on the remote host for it’s normal execution. However, you can use it without python, for example to setup the host up for the Ansible normal usage mode Which requires python and the simple-json module.

This is truly important, and makes it an excellent fit for my daily work as an IT consultant.

Powerful.
Ansible is powerful in the way that you can do the configuration and orchestration tasks in a simple clean way.

Summary on the introduction.
Above was a brief personal history, and some of the “marketed” features of Ansible. I think being agentless is the true “killer feature” here. All the other configuration/orchestration engines require you to setup and configure a fixed client-server connection, and install a deamon and a central server process. In case you wondered, yes, authentication is important, and it’s simply brilliant that the ssh password authentication or public key infrastructure can be used.

Because there’s no daemon to install, you can run your created play books everywhere. So instead of a fixed client configuration, you can create play books to do routine tasks, and repeat it at multiple sites.

Okay, how does this work?

Installation: add EPEL and install ansible.
If you are on one of the clones of RedHat Enterprise Linux (I use Oracle Linux), you simply need to add the EPEL repository to your yum source list, and run:

# yum install ansible

First steps.
One of the first things I do, is create a directory for a typical ansible ‘project’. Project means a set of tasks you want to do to a set of hosts here. Next, I create a file called ‘hosts’ which is the list of hosts you want to use for executing tasks on. By default, Ansible looks in /etc/ansible/hosts. In this case, I put a single machine in it (a test VM), but it can be a list of machines (ip addresses or hostnames).

$ cat hosts
192.168.101.2

In fact, you can create groups in the hosts file in the “ini style”. But I just put one host in for this example.
The next thing is to check if Ansible reads the file correctly. This is done in the following way:

$ ansible all -i hosts --list-hosts
    192.168.101.2

Okay, this means Ansible will operate on this one host if invoked. The next logical thing (typically done when you are in a new client environment to check if you can reach the hosts):

$ ansible all -i hosts -m ping
192.168.101.2 | FAILED => FAILED: Authentication failed.

Ping might be a bit misleading for some people. What ping does here (-m means module), is trying to connect to the host over ssh, and log in. Because I didn’t specify a user, it used the username of the current user on the machine, which is ‘ansible’. A user ‘ansible’ typically doesn’t exist on a normal server (and is not necessary or should be created), and also not on my test server. So it failed, as the message said, on authentication.

My test VM is a basic installed (OL) linux 6 server. This means there’s only one user: root.

So, let’s specify the user root as user:

$ ansible all -i hosts -m ping -u root
192.168.101.2 | FAILED => FAILED: Authentication failed.

The authentication failed again. And it should! What this is doing, is trying to log on as root, and we haven’t given any password, nor have I put my local user’s public key in the remote authorised_keys file. So there is no way this could work. This is typically also the state when you want to do stuff with a “fresh” client system. Let’s add the ‘-k’ option (ask ssh password), and run again:

$ ansible all -i hosts -m ping -u root -k
SSH password:
192.168.101.2 | success >> {
    "changed": false,
    "ping": "pong"
}

To walk you through the output: It now asks for a password, which I’ve filled out, then lists the host and the state: success. During this execution, there was nothing changed on the remote host, and the ping command resulted in a pong (alike the ICMP ping response).

With what we have learned now, we can do things like this:

$ ansible all -i hosts -u root -k -a "ifconfig"
SSH password:
192.168.101.2 | success | rc=0 >>
eth0      Link encap:Ethernet  HWaddr 00:0C:29:14:65:ED
          inet addr:192.168.39.145  Bcast:192.168.39.255  Mask:255.255.255.0
          inet6 addr: fe80::20c:29ff:fe14:65ed/64 Scope:Link
          UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
          RX packets:47 errors:0 dropped:0 overruns:0 frame:0
          TX packets:25 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:1000
          RX bytes:6293 (6.1 KiB)  TX bytes:2594 (2.5 KiB)

eth1      Link encap:Ethernet  HWaddr 00:0C:29:14:65:F7
          inet addr:192.168.101.2  Bcast:192.168.101.255  Mask:255.255.255.0
          inet6 addr: fe80::20c:29ff:fe14:65f7/64 Scope:Link
          UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
          RX packets:188 errors:0 dropped:0 overruns:0 frame:0
          TX packets:112 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:1000
          RX bytes:142146 (138.8 KiB)  TX bytes:15545 (15.1 KiB)

lo        Link encap:Local Loopback
          inet addr:127.0.0.1  Mask:255.0.0.0
          inet6 addr: ::1/128 Scope:Host
          UP LOOPBACK RUNNING  MTU:65536  Metric:1
          RX packets:0 errors:0 dropped:0 overruns:0 frame:0
          TX packets:0 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:0
          RX bytes:0 (0.0 b)  TX bytes:0 (0.0 b)

Does this look familiar for you Exadata DBA’s? Yes, this replicates some of the functionality of dcli (although dcli is aimed at executing simple tasks to a group of hosts, whilst Ansible is aimed at enterprise configuration and orchestration).

One step beyond! Playbooks.
Now let’s progress to playbooks. An Ansible playbook is where the true strength lies of Ansible. It allows you to specify tasks to execute on the remote hosts, and create sequences of tasks and make decisions based on the outcome of a tasks for further execution. Let me show you a simple playbook, and guide you through it:

---
- hosts: all
  gather_facts: no
  remote_user: root
  tasks:

  - name: upgrade all packages
    yum: name=* state=latest

  - name: install python-selinux
    yum: name=libselinux-python state=installed

  - name: add public key to authorized_key file of root
    authorized_key: user=root state=present key="{{ lookup('file','/home/ansible/.ssh/id_rsa.pub') }}"

As you can see, this is a playbook with three tasks: upgrade all packages, install libselinux-python and adding my (local) public key to the authorised key file of root (to allow passwordless access).

Line 1 shows three dashes, which means the start of a YAML document.
Line 2 starts with a single dash, which indicates a list. There is one dash at this indention level, so it’s a list of one. The fields of this member are hosts, gather_facts and tasks. Tasks got his own list (mind the indention level, that is important). The fields are key/value pairs, with the separation indicated by the colon (:). The first field is ‘hosts’, with the value ‘all’. This means that all hosts in the hosts file are used for this playbook. I don’t think it’s hard to imagine how useful it can be to specify a group/kind of servers the playbook can run on. The next one is ‘gather_facts’. A normal playbook execution first gathers a lot of information from all the hosts it is going to run on before execution. These can be used during playbook execution. Next ‘remote_user’. This indicates with which user ansible is going to logon, so we don’t have to specify it on the command line. Then we see ‘tasks’ to indicate the list of tasks to be executed on the hosts.

It’s easy to spot we got three tasks. What is extremely important, is the indention of this list (it’s got a dash, so it’s a list!). Name is not mandatory, but it makes it easy to read if you give the tasks useful names and these will be shown when the playbook is executed. The first task has the name ‘upgrade all packages’. The next field shows the key is ‘yum’ indicating it is making use of the yum module. This key got two values: name=*, which means all ‘all packages’, and state=latest, which means we want all packages to be at the latest version. This means this command is the equivalent of ‘yum update’.

The second task is called ‘install python-selinux’. It makes use of the yum module again, and is self explanatory, it installs the libselinux-python package. This packages is necessary to work on a host which has selinux enabled on things that are protected by selinux.

The next task is called ‘add public key to authorised_key file of root’. It is making use of the authorized_key module. This module requires a parameter ‘key’, for which we use the lookup function to look up the local (!) public key, of the user with which I execute ansible, which is ‘ansible’. ‘state=present’ means we want this key to be present; ‘present’ is the default value, so it wasn’t necessary to put this in. Next ‘user=root’: we want the public key to be added to the authorized_keys file of the user root.

Of course these tasks could be executed using the ‘ansible’ executable as single tasks. To show the importance of the installation of the libselinux-python module on a host with selinux enabled (which is the state of selinux on a fresh installed Oracle Linux machine), let’s execute the task using the authorized_key module:

$ ansible all -i hosts -k -u root -m authorized_key -a "user=root state=present key=\"{{ lookup('file','/home/ansible/.ssh/id_rsa.pub') }}\""
SSH password:
192.168.101.2 | FAILED >> {
    "failed": true,
    "msg": "Aborting, target uses selinux but python bindings (libselinux-python) aren't installed!"
}

Clear, right? The host is selinux protected. Now, let’s execute the installation of the libselinux package as single task, and then add our public key to the authorized_key file of root:

$ ansible all -i hosts -k -u root -m yum -a "name=libselinux-python state=installed"
SSH password:
192.168.101.2 | success >> {
    "changed": true,
    "msg": "",
    "rc": 0,
    "results": [
        "Loaded plugins: security\nSetting up Install Process\nResolving Dependencies\n--> Running transaction check\n---> Package libselinux-python.x86_64 0:2.0.94-5.3.el6_4.1 will be installed\n--> Finished Dependency Resolution\n\nDependencies Resolved\n\n================================================================================\n Package             Arch     Version                 Repository           Size\n================================================================================\nInstalling:\n libselinux-python   x86_64   2.0.94-5.3.el6_4.1      public_ol6_latest   201 k\n\nTransaction Summary\n================================================================================\nInstall       1 Package(s)\n\nTotal download size: 201 k\nInstalled size: 653 k\nDownloading Packages:\nRunning rpm_check_debug\nRunning Transaction Test\nTransaction Test Succeeded\nRunning Transaction\n\r  Installing : libselinux-python-2.0.94-5.3.el6_4.1.x86_64                  1/1 \n\r  Verifying  : libselinux-python-2.0.94-5.3.el6_4.1.x86_64                  1/1 \n\nInstalled:\n  libselinux-python.x86_64 0:2.0.94-5.3.el6_4.1                                 \n\nComplete!\n"
    ]
}

$ ansible all -i hosts -k -u root -m authorized_key -a "user=root state=present key=\"{{ lookup('file','/home/ansible/.ssh/id_rsa.pub') }}\""
SSH password:
192.168.101.2 | success >> {
    "changed": true,
    "key": "ssh-rsa AAAAB3NzaC1yc2EAAAABIwAAAQEAliR905hxLnsOCRlOGnmN0H9dGH4NPV88ySC6GMv0KNnU7FfCXYE51Bkk97p2IWFsPhYO9qJDyAFxRm/lia1IZRDpCFcKKKMh5eXmEJC5XSrHWFdmGZRlFcS3VQ3rpCIyU3qFM6xMazh3JHKKEtE1J6nvw/hW3slY9G/6VoJ8CzpfeQMLDOdVXUIcZXqtCPuIEDBQ7yjfMzTGz+hEmz7ImbLaUyB4MDGrDnl33L8mkBEVYu8RrwgBcagDQSiQKnIca/EL45eX/74NG1e/6vxZkHZJz/W0ak4KD+o9vF4ikz0bdrGPMZ5gRYXWoSSHrVA+Rqk8A93qBXNKUUkzGoQYTQ== ansible@ansiblevm.local",
    "key_options": null,
    "keyfile": "/root/.ssh/authorized_keys",
    "manage_dir": true,
    "path": null,
    "state": "present",
    "unique": false,
    "user": "root"
}

Maybe your customer doesn’t want you to store your keys in their servers. It’s easy to do the reverse, and remove your key from the authorized_key file:

$ ansible all -i hosts -u root -m authorized_key -a "user=root state=absent key=\"{{ lookup('file','/home/ansible/.ssh/id_rsa.pub') }}\""
192.168.101.2 | success >> {
    "changed": true,
    "key": "ssh-rsa AAAAB3NzaC1yc2EAAAABIwAAAQEAliR905hxLnsOCRlOGnmN0H9dGH4NPV88ySC6GMv0KNnU7FfCXYE51Bkk97p2IWFsPhYO9qJDyAFxRm/lia1IZRDpCFcKKKMh5eXmEJC5XSrHWFdmGZRlFcS3VQ3rpCIyU3qFM6xMazh3JHKKEtE1J6nvw/hW3slY9G/6VoJ8CzpfeQMLDOdVXUIcZXqtCPuIEDBQ7yjfMzTGz+hEmz7ImbLaUyB4MDGrDnl33L8mkBEVYu8RrwgBcagDQSiQKnIca/EL45eX/74NG1e/6vxZkHZJz/W0ak4KD+o9vF4ikz0bdrGPMZ5gRYXWoSSHrVA+Rqk8A93qBXNKUUkzGoQYTQ== ansible@ansiblevm.local",
    "key_options": null,
    "keyfile": "/root/.ssh/authorized_keys",
    "manage_dir": true,
    "path": null,
    "state": "absent",
    "unique": false,
    "user": "root"
}

Please mind I didn’t specify ‘-k’ on the command line to send a password: in the previous step we added our key, so we can access our host using our public key. Another extremely important thing is ‘changed’. ‘changed’ indicates if the task did actually change something on the destination server.

I have ran single task until now, I changed the state of my test VM back to it’s state before I started changing it with ansible (by removing the libselinux package using ‘ansible all -i hosts -k -u root -m yum -a “name=libselinux-python state=absent”‘

Let’s run the above described playbook:

$ ansible-playbook -i hosts -k linux_setup_example.yml
 [WARNING]: The version of gmp you have installed has a known issue regarding
timing vulnerabilities when used with pycrypto. If possible, you should update
it (ie. yum update gmp).

SSH password:

PLAY [all] ********************************************************************

TASK: [upgrade all packages] **************************************************
changed: [192.168.101.2]

TASK: [install python-selinux] ************************************************
changed: [192.168.101.2]

TASK: [add public key to authorized_key file of root] *************************
changed: [192.168.101.2]

PLAY RECAP ********************************************************************
192.168.101.2              : ok=3    changed=3    unreachable=0    failed=0

Now at this point you might think: I get it, but these are all pretty simple tasks, it’s not special at all. Well, let me show you an actual thing which totally shows what the importance of using this is, even on a single machine, but even more when you got a large group of servers you have to administer.

The next example is a playbook created to apply PSU3 to an Oracle 11.2.0.4 home. It’s still quite simple, it just applies PSU3 to the Oracle home. But totally automatic. The point I am trying to make is that this is already nice to have automated a lot of work for a single home, but it saves a lot of hours (read: a lot of money), and saves you from human error.

---
- hosts: all
  vars:
    u01_size_gb: 1
    tmp_size_gb: 1
    oracle_base: /u01/app/oracle
    oracle_home: /u01/app/oracle/product/11.2.0.4/dbhome_1
    patch_dir: /u01/install
  remote_user: oracle
  tasks:

  - name: check u01 free disk space
    action: shell df -P /u01 | awk 'END { print $4 }'
    register: u01size
    failed_when: u01size.stdout|int < {{ u01_size_gb }} * 1024 * 1024

  - name: check tmp free disk space
    action: shell df -P /tmp | awk 'END { print $4 }'
    register: tmpsize
    failed_when: tmpsize.stdout|int < {{ tmp_size_gb }} * 1024 * 1024

  - name: create directory for installation files
    action: file dest={{ patch_dir }} state=directory owner=oracle group=oinstall

  - name: copy opatch and psu
    copy: src=files/{{ item }} dest={{ patch_dir }} owner=oracle group=oinstall mode=0644
    with_items:
     - p6880880_112000_Linux-x86-64.zip
     - p18522509_112040_Linux-x86-64.zip
     - ocm.rsp

  - name: install opatch in database home
    action: shell unzip -oq {{ patch_dir }}/p6880880_112000_Linux-x86-64.zip -d {{ oracle_home }}

  - name: unzip psu patch
    action: shell unzip -oq {{ patch_dir }}/p18522509_112040_Linux-x86-64.zip -d {{ patch_dir }}

  - name: patch conflict detection
    action: shell export ORACLE_HOME={{ oracle_home }}; cd {{ patch_dir }}/18522509; $ORACLE_HOME/OPatch/opatch prereq CheckConflictAgainstOHWithDetail -ph ./
    register: conflict_detection
    failed_when: "'Prereq \"checkConflictAgainstOHWithDetail\" passed.' not in conflict_detection.stdout"

  - name: apply psu
    action: shell export ORACLE_HOME={{ oracle_home}}; cd {{ patch_dir }}/18522509; $ORACLE_HOME/OPatch/opatch apply -silent -ocmrf {{ patch_dir }}/ocm.rsp
    register: apply_psu
    failed_when: "'Composite patch 18522509 successfully applied.' not in apply_psu.stdout"

  - name: clean up install directory
    file: path={{ patch_dir }} state=absent

Let me run you through this playbook! It starts off with the indication of a YAML document: ‘—‘. Next hosts: all again. I just put all the hosts in the hosts file, I did not create all kinds of groups of hosts (which would be fitting when you use it at a fixed environment, but I use it for various customers). Then vars, with a list of variables. As you can see, I can use the variables, which are shown in the playbook as {{ variable }}. Then remote_user: oracle and tasks.

The first and second task use variables, and use the argument ‘register’ to save all response into a named variable. I also use ‘failed_when’ to make the playbook stop executing when the argument after ‘failed_when’ is true. Arguments of ‘failed_when’ is the named variable, for which the output of the standard out is used (.stdout). Then a filter is used to cast the output to integer, and is compared with a calculation of the variable.

The third task is using the files module to create a directory. The fourth task is using the copy module. The copy module means a file or files (in this case) are copied from the machine from which the playbook is run, onto the destination host or hosts. Here is also another trick used, to process the task with a list of items. As you can see, the copy line contains a variable {{ items }}, and the task is executed for all the items in the list ‘with_items’. I found this is fine for smaller files (up to a few hundred of megabytes), but too slow for bigger files. I use http (the get_url module) to speed up file transfer.

The fifth and sixth tasks execute a shell command, unzip, to extract the contents of a zip file into a specific place.

The seventh task is executing a small list of shell commands, in order to be able to run the conflict detection option of opatch. The same trick as with the first two tasks is used, register a name for the output of the conflict detection. Here I check if the stdout contains what I would manually check for when I would run it. The eighth task is the main task of the whole playbook: the actual patch. However, it uses the same technique as task seven. The last task simply removes a directory, in order to remove the files we used for this patch.

Summary
I hope this shows what a tremendous help Ansible can be for a consultant. This kind of tool is simply mandatory if you got an environment with more than approximately ten to twenty servers to administer. Ansible can be used even if the organisation does not want to spend time on the implementation of a configuration tool.

%d bloggers like this: