This is a quick post on using git on a server. I use my Synology NAS as a fileserver, but also as a git repository server. The default git package for Synology enables git usage on the command line, which means via ssh, or via web-DAV. Both require a logon to do anything with the repository. That is not very handy if you want to clone and pull from the repository in an automated way. Of course there are ways around that (basically setting up password-less authentication, probably via certificates), but I wanted simple, read-only access without authentication. If you installed git on a linux or unix server you get the binaries, but no daemon, which means you can only use ssh if you want to use that server for central git repositories.

Running git via inetd
What I did is using inetd daemon to launch the git daemon. On any linux or unix server with the inetd daemon, and on Synology too, because it uses linux under the covers, it’s easy to setup git as a server.

First, check /etc/services for the following lines:

git               9418/tcp                   # git pack transfer service
git               9418/udp                   # git pack transfer service

Next, add the following line in the inetd.conf (which is /etc/inetd.conf on my synology):

git stream tcp nowait gituser /usr/bin/git git daemon --inetd --verbose --export-all --base-path=/volume1/homes/gituser

What you should look for in your setup is:
– gituser: this is the user which is used to run the daemon. I created a user ‘gituser’ for this.
– /usr/bin/git: of course your git binary should be at that fully specified path, otherwise inetd can’t find it.
– git daemon:
— –inetd: notify the git executable that it is running under inetd
— –export-all: all git repositories underneath the base directory will be available
— –base-path: this makes the git root directory be set to this directory. In my case, I wanted to have all the repositories in the home directory of the gituser, which is /volume1/homes/gituser in my case.

And make the inetd deamon reload it’s configuration with kill -HUP:

# killall -HUP inetd

Please mind this is a simple and limited setup, if you want to set it up in a way with more granular security, you should look into gitolite for example.

This post looks like I am jumping on the bandwagon of IT orchestration like a lot of people are doing. Maybe I should say ‘except for (die hard) Oracle DBA’s’. Or maybe not, it up to you to decide.

Most people who are interested in IT in general will have noticed IT orchestration has gotten attention, especially in the form of Puppet and/or Chef. I _think_ IT orchestration has gotten important with the rise of “web scale” (scaling up and down applications by adding virtual machines to horizontal scale resource intensive tasks), in order to provision/configure the newly added machines without manual intervention, and people start picking it up now to use it for more tasks than provisioning of virtual machines for web applications.

I am surprised by that. I am not surprised that people want boring tasks like making settings in configuration files and restarting daemons, installing software with all the correct options, etc. being automated. Instead, I am surprised that people are now picking this up after it has been around for so long.

A little history.
As far as I know, IT orchestration started with cfengine, which was really a configuration engine (hence the name). Despite having a little different purpose (configuration vs. orchestration), this tool is the parent of all the orchestration/configuration tools which exist nowaday. cfengine started off as a study in 1993 by Mark Burgess at the university of Oslo, with the creation of the cfengine software as a result. As far as I can see, it has been available as open source software since the beginning.

Now back to what I am surprised at: with cfengine, there has been a way to configure linux/unix systems in a structured way, and install and configure software on systems since the mid-nineties. Yet, this wasn’t picked up (of course with a few exceptions). Fast forward to today, we see it is being adopted. And that is a good thing.

I created a setup with cfengine for a client a long time ago, which had the ability to install the Oracle software, different PSU’s in different home’s, and remove it by adding or removing machines to groups in a cfengine configuration file. It wasn’t picked up by the client, it’s way more fun running X to install the software, and make the choices by hand, and redo this over and over on every machine, right?

I almost forgotten about my work with cfengine, until I spoke with Alex Gorbatchev at a conference, at which he pointed me to Ansible. At first I didn’t do a lot with it, but lately I’ve given it a go, and I am very happy with it.

Another redo of cfengine?
From what I read, most of the configuration/orchestration engines created after cfengine are created to circumvent all kinds of difficulties with cfengine. I can understand that. It took me a while to learn cfengine, and indeed it forces you to think in a different way.

The Ansible project decided to radically do it different than all the other engines. It is different in the sense that it advertises itself as simple, agentless and powerful.

Simple.
Simple is a terrific goal. For those of you that have worked with configuration/orchestration engines, there is a steep learning curve. It is just hard to get the basic principles in your head. To be honest, also Ansible took me a while too, to grasp the basic principles, and get the picture correctly in my head. Yet, having worked with cfengine comparing it with Ansible’s playbooks, which are the scripts to do things on the targets, it is a breath of fresh air. Playbooks are so clean they (almost) can be read and understood as plain english.

Agentless.
This is where Ansible is truly different than any of the other configuration/orchestration tools. Ansible does not require any agent installation on the targets. The obvious next question then is: how can this work? Well, quite simple: Ansible uses ssh to connect to the host, and executes commands via the shell. Having that said, it requires a little more detail; Ansible uses python on the remote host for it’s normal execution. However, you can use it without python, for example to setup the host up for the Ansible normal usage mode Which requires python and the simple-json module.

This is truly important, and makes it an excellent fit for my daily work as an IT consultant.

Powerful.
Ansible is powerful in the way that you can do the configuration and orchestration tasks in a simple clean way.

Summary on the introduction.
Above was a brief personal history, and some of the “marketed” features of Ansible. I think being agentless is the true “killer feature” here. All the other configuration/orchestration engines require you to setup and configure a fixed client-server connection, and install a deamon and a central server process. In case you wondered, yes, authentication is important, and it’s simply brilliant that the ssh password authentication or public key infrastructure can be used.

Because there’s no daemon to install, you can run your created play books everywhere. So instead of a fixed client configuration, you can create play books to do routine tasks, and repeat it at multiple sites.

Okay, how does this work?

Installation: add EPEL and install ansible.
If you are on one of the clones of RedHat Enterprise Linux (I use Oracle Linux), you simply need to add the EPEL repository to your yum source list, and run:

# yum install ansible

First steps.
One of the first things I do, is create a directory for a typical ansible ‘project’. Project means a set of tasks you want to do to a set of hosts here. Next, I create a file called ‘hosts’ which is the list of hosts you want to use for executing tasks on. By default, Ansible looks in /etc/ansible/hosts. In this case, I put a single machine in it (a test VM), but it can be a list of machines (ip addresses or hostnames).

$ cat hosts
192.168.101.2

In fact, you can create groups in the hosts file in the “ini style”. But I just put one host in for this example.
The next thing is to check if Ansible reads the file correctly. This is done in the following way:

$ ansible all -i hosts --list-hosts
    192.168.101.2

Okay, this means Ansible will operate on this one host if invoked. The next logical thing (typically done when you are in a new client environment to check if you can reach the hosts):

$ ansible all -i hosts -m ping
192.168.101.2 | FAILED => FAILED: Authentication failed.

Ping might be a bit misleading for some people. What ping does here (-m means module), is trying to connect to the host over ssh, and log in. Because I didn’t specify a user, it used the username of the current user on the machine, which is ‘ansible’. A user ‘ansible’ typically doesn’t exist on a normal server (and is not necessary or should be created), and also not on my test server. So it failed, as the message said, on authentication.

My test VM is a basic installed (OL) linux 6 server. This means there’s only one user: root.

So, let’s specify the user root as user:

$ ansible all -i hosts -m ping -u root
192.168.101.2 | FAILED => FAILED: Authentication failed.

The authentication failed again. And it should! What this is doing, is trying to log on as root, and we haven’t given any password, nor have I put my local user’s public key in the remote authorised_keys file. So there is no way this could work. This is typically also the state when you want to do stuff with a “fresh” client system. Let’s add the ‘-k’ option (ask ssh password), and run again:

$ ansible all -i hosts -m ping -u root -k
SSH password:
192.168.101.2 | success >> {
    "changed": false,
    "ping": "pong"
}

To walk you through the output: It now asks for a password, which I’ve filled out, then lists the host and the state: success. During this execution, there was nothing changed on the remote host, and the ping command resulted in a pong (alike the ICMP ping response).

With what we have learned now, we can do things like this:

$ ansible all -i hosts -u root -k -a "ifconfig"
SSH password:
192.168.101.2 | success | rc=0 >>
eth0      Link encap:Ethernet  HWaddr 00:0C:29:14:65:ED
          inet addr:192.168.39.145  Bcast:192.168.39.255  Mask:255.255.255.0
          inet6 addr: fe80::20c:29ff:fe14:65ed/64 Scope:Link
          UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
          RX packets:47 errors:0 dropped:0 overruns:0 frame:0
          TX packets:25 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:1000
          RX bytes:6293 (6.1 KiB)  TX bytes:2594 (2.5 KiB)

eth1      Link encap:Ethernet  HWaddr 00:0C:29:14:65:F7
          inet addr:192.168.101.2  Bcast:192.168.101.255  Mask:255.255.255.0
          inet6 addr: fe80::20c:29ff:fe14:65f7/64 Scope:Link
          UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
          RX packets:188 errors:0 dropped:0 overruns:0 frame:0
          TX packets:112 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:1000
          RX bytes:142146 (138.8 KiB)  TX bytes:15545 (15.1 KiB)

lo        Link encap:Local Loopback
          inet addr:127.0.0.1  Mask:255.0.0.0
          inet6 addr: ::1/128 Scope:Host
          UP LOOPBACK RUNNING  MTU:65536  Metric:1
          RX packets:0 errors:0 dropped:0 overruns:0 frame:0
          TX packets:0 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:0
          RX bytes:0 (0.0 b)  TX bytes:0 (0.0 b)

Does this look familiar for you Exadata DBA’s? Yes, this replicates some of the functionality of dcli (although dcli is aimed at executing simple tasks to a group of hosts, whilst Ansible is aimed at enterprise configuration and orchestration).

One step beyond! Playbooks.
Now let’s progress to playbooks. An Ansible playbook is where the true strength lies of Ansible. It allows you to specify tasks to execute on the remote hosts, and create sequences of tasks and make decisions based on the outcome of a tasks for further execution. Let me show you a simple playbook, and guide you through it:

---
- hosts: all
  gather_facts: no
  remote_user: root
  tasks:

  - name: upgrade all packages
    yum: name=* state=latest

  - name: install python-selinux
    yum: name=libselinux-python state=installed

  - name: add public key to authorized_key file of root
    authorized_key: user=root state=present key="{{ lookup('file','/home/ansible/.ssh/id_rsa.pub') }}"

As you can see, this is a playbook with three tasks: upgrade all packages, install libselinux-python and adding my (local) public key to the authorised key file of root (to allow passwordless access).

Line 1 shows three dashes, which means the start of a YAML document.
Line 2 starts with a single dash, which indicates a list. There is one dash at this indention level, so it’s a list of one. The fields of this member are hosts, gather_facts and tasks. Tasks got his own list (mind the indention level, that is important). The fields are key/value pairs, with the separation indicated by the colon (:). The first field is ‘hosts’, with the value ‘all’. This means that all hosts in the hosts file are used for this playbook. I don’t think it’s hard to imagine how useful it can be to specify a group/kind of servers the playbook can run on. The next one is ‘gather_facts’. A normal playbook execution first gathers a lot of information from all the hosts it is going to run on before execution. These can be used during playbook execution. Next ‘remote_user’. This indicates with which user ansible is going to logon, so we don’t have to specify it on the command line. Then we see ‘tasks’ to indicate the list of tasks to be executed on the hosts.

It’s easy to spot we got three tasks. What is extremely important, is the indention of this list (it’s got a dash, so it’s a list!). Name is not mandatory, but it makes it easy to read if you give the tasks useful names and these will be shown when the playbook is executed. The first task has the name ‘upgrade all packages’. The next field shows the key is ‘yum’ indicating it is making use of the yum module. This key got two values: name=*, which means all ‘all packages’, and state=latest, which means we want all packages to be at the latest version. This means this command is the equivalent of ‘yum update’.

The second task is called ‘install python-selinux’. It makes use of the yum module again, and is self explanatory, it installs the libselinux-python package. This packages is necessary to work on a host which has selinux enabled on things that are protected by selinux.

The next task is called ‘add public key to authorised_key file of root’. It is making use of the authorized_key module. This module requires a parameter ‘key’, for which we use the lookup function to look up the local (!) public key, of the user with which I execute ansible, which is ‘ansible’. ‘state=present’ means we want this key to be present; ‘present’ is the default value, so it wasn’t necessary to put this in. Next ‘user=root': we want the public key to be added to the authorized_keys file of the user root.

Of course these tasks could be executed using the ‘ansible’ executable as single tasks. To show the importance of the installation of the libselinux-python module on a host with selinux enabled (which is the state of selinux on a fresh installed Oracle Linux machine), let’s execute the task using the authorized_key module:

$ ansible all -i hosts -k -u root -m authorized_key -a "user=root state=present key=\"{{ lookup('file','/home/ansible/.ssh/id_rsa.pub') }}\""
SSH password:
192.168.101.2 | FAILED >> {
    "failed": true,
    "msg": "Aborting, target uses selinux but python bindings (libselinux-python) aren't installed!"
}

Clear, right? The host is selinux protected. Now, let’s execute the installation of the libselinux package as single task, and then add our public key to the authorized_key file of root:

$ ansible all -i hosts -k -u root -m yum -a "name=libselinux-python state=installed"
SSH password:
192.168.101.2 | success >> {
    "changed": true,
    "msg": "",
    "rc": 0,
    "results": [
        "Loaded plugins: security\nSetting up Install Process\nResolving Dependencies\n--> Running transaction check\n---> Package libselinux-python.x86_64 0:2.0.94-5.3.el6_4.1 will be installed\n--> Finished Dependency Resolution\n\nDependencies Resolved\n\n================================================================================\n Package             Arch     Version                 Repository           Size\n================================================================================\nInstalling:\n libselinux-python   x86_64   2.0.94-5.3.el6_4.1      public_ol6_latest   201 k\n\nTransaction Summary\n================================================================================\nInstall       1 Package(s)\n\nTotal download size: 201 k\nInstalled size: 653 k\nDownloading Packages:\nRunning rpm_check_debug\nRunning Transaction Test\nTransaction Test Succeeded\nRunning Transaction\n\r  Installing : libselinux-python-2.0.94-5.3.el6_4.1.x86_64                  1/1 \n\r  Verifying  : libselinux-python-2.0.94-5.3.el6_4.1.x86_64                  1/1 \n\nInstalled:\n  libselinux-python.x86_64 0:2.0.94-5.3.el6_4.1                                 \n\nComplete!\n"
    ]
}

$ ansible all -i hosts -k -u root -m authorized_key -a "user=root state=present key=\"{{ lookup('file','/home/ansible/.ssh/id_rsa.pub') }}\""
SSH password:
192.168.101.2 | success >> {
    "changed": true,
    "key": "ssh-rsa AAAAB3NzaC1yc2EAAAABIwAAAQEAliR905hxLnsOCRlOGnmN0H9dGH4NPV88ySC6GMv0KNnU7FfCXYE51Bkk97p2IWFsPhYO9qJDyAFxRm/lia1IZRDpCFcKKKMh5eXmEJC5XSrHWFdmGZRlFcS3VQ3rpCIyU3qFM6xMazh3JHKKEtE1J6nvw/hW3slY9G/6VoJ8CzpfeQMLDOdVXUIcZXqtCPuIEDBQ7yjfMzTGz+hEmz7ImbLaUyB4MDGrDnl33L8mkBEVYu8RrwgBcagDQSiQKnIca/EL45eX/74NG1e/6vxZkHZJz/W0ak4KD+o9vF4ikz0bdrGPMZ5gRYXWoSSHrVA+Rqk8A93qBXNKUUkzGoQYTQ== ansible@ansiblevm.local",
    "key_options": null,
    "keyfile": "/root/.ssh/authorized_keys",
    "manage_dir": true,
    "path": null,
    "state": "present",
    "unique": false,
    "user": "root"
}

Maybe your customer doesn’t want you to store your keys in their servers. It’s easy to do the reverse, and remove your key from the authorized_key file:

$ ansible all -i hosts -u root -m authorized_key -a "user=root state=absent key=\"{{ lookup('file','/home/ansible/.ssh/id_rsa.pub') }}\""
192.168.101.2 | success >> {
    "changed": true,
    "key": "ssh-rsa AAAAB3NzaC1yc2EAAAABIwAAAQEAliR905hxLnsOCRlOGnmN0H9dGH4NPV88ySC6GMv0KNnU7FfCXYE51Bkk97p2IWFsPhYO9qJDyAFxRm/lia1IZRDpCFcKKKMh5eXmEJC5XSrHWFdmGZRlFcS3VQ3rpCIyU3qFM6xMazh3JHKKEtE1J6nvw/hW3slY9G/6VoJ8CzpfeQMLDOdVXUIcZXqtCPuIEDBQ7yjfMzTGz+hEmz7ImbLaUyB4MDGrDnl33L8mkBEVYu8RrwgBcagDQSiQKnIca/EL45eX/74NG1e/6vxZkHZJz/W0ak4KD+o9vF4ikz0bdrGPMZ5gRYXWoSSHrVA+Rqk8A93qBXNKUUkzGoQYTQ== ansible@ansiblevm.local",
    "key_options": null,
    "keyfile": "/root/.ssh/authorized_keys",
    "manage_dir": true,
    "path": null,
    "state": "absent",
    "unique": false,
    "user": "root"
}

Please mind I didn’t specify ‘-k’ on the command line to send a password: in the previous step we added our key, so we can access our host using our public key. Another extremely important thing is ‘changed’. ‘changed’ indicates if the task did actually change something on the destination server.

I have ran single task until now, I changed the state of my test VM back to it’s state before I started changing it with ansible (by removing the libselinux package using ‘ansible all -i hosts -k -u root -m yum -a “name=libselinux-python state=absent”‘

Let’s run the above described playbook:

$ ansible-playbook -i hosts -k linux_setup_example.yml
 [WARNING]: The version of gmp you have installed has a known issue regarding
timing vulnerabilities when used with pycrypto. If possible, you should update
it (ie. yum update gmp).

SSH password:

PLAY [all] ********************************************************************

TASK: [upgrade all packages] **************************************************
changed: [192.168.101.2]

TASK: [install python-selinux] ************************************************
changed: [192.168.101.2]

TASK: [add public key to authorized_key file of root] *************************
changed: [192.168.101.2]

PLAY RECAP ********************************************************************
192.168.101.2              : ok=3    changed=3    unreachable=0    failed=0

Now at this point you might think: I get it, but these are all pretty simple tasks, it’s not special at all. Well, let me show you an actual thing which totally shows what the importance of using this is, even on a single machine, but even more when you got a large group of servers you have to administer.

The next example is a playbook created to apply PSU3 to an Oracle 11.2.0.4 home. It’s still quite simple, it just applies PSU3 to the Oracle home. But totally automatic. The point I am trying to make is that this is already nice to have automated a lot of work for a single home, but it saves a lot of hours (read: a lot of money), and saves you from human error.

---
- hosts: all
  vars:
    u01_size_gb: 1
    tmp_size_gb: 1
    oracle_base: /u01/app/oracle
    oracle_home: /u01/app/oracle/product/11.2.0.4/dbhome_1
    patch_dir: /u01/install
  remote_user: oracle
  tasks:

  - name: check u01 free disk space
    action: shell df -P /u01 | awk 'END { print $4 }'
    register: u01size
    failed_when: u01size.stdout|int < {{ u01_size_gb }} * 1024 * 1024

  - name: check tmp free disk space
    action: shell df -P /tmp | awk 'END { print $4 }'
    register: tmpsize
    failed_when: tmpsize.stdout|int < {{ tmp_size_gb }} * 1024 * 1024

  - name: create directory for installation files
    action: file dest={{ patch_dir }} state=directory owner=oracle group=oinstall

  - name: copy opatch and psu
    copy: src=files/{{ item }} dest={{ patch_dir }} owner=oracle group=oinstall mode=0644
    with_items:
     - p6880880_112000_Linux-x86-64.zip
     - p18522509_112040_Linux-x86-64.zip
     - ocm.rsp

  - name: install opatch in database home
    action: shell unzip -oq {{ patch_dir }}/p6880880_112000_Linux-x86-64.zip -d {{ oracle_home }}

  - name: unzip psu patch
    action: shell unzip -oq {{ patch_dir }}/p18522509_112040_Linux-x86-64.zip -d {{ patch_dir }}

  - name: patch conflict detection
    action: shell export ORACLE_HOME={{ oracle_home }}; cd {{ patch_dir }}/18522509; $ORACLE_HOME/OPatch/opatch prereq CheckConflictAgainstOHWithDetail -ph ./
    register: conflict_detection
    failed_when: "'Prereq \"checkConflictAgainstOHWithDetail\" passed.' not in conflict_detection.stdout"

  - name: apply psu
    action: shell export ORACLE_HOME={{ oracle_home}}; cd {{ patch_dir }}/18522509; $ORACLE_HOME/OPatch/opatch apply -silent -ocmrf {{ patch_dir }}/ocm.rsp
    register: apply_psu
    failed_when: "'Composite patch 18522509 successfully applied.' not in apply_psu.stdout"

  - name: clean up install directory
    file: path={{ patch_dir }} state=absent

Let me run you through this playbook! It starts off with the indication of a YAML document: ‘—‘. Next hosts: all again. I just put all the hosts in the hosts file, I did not create all kinds of groups of hosts (which would be fitting when you use it at a fixed environment, but I use it for various customers). Then vars, with a list of variables. As you can see, I can use the variables, which are shown in the playbook as {{ variable }}. Then remote_user: oracle and tasks.

The first and second task use variables, and use the argument ‘register’ to save all response into a named variable. I also use ‘failed_when’ to make the playbook stop executing when the argument after ‘failed_when’ is true. Arguments of ‘failed_when’ is the named variable, for which the output of the standard out is used (.stdout). Then a filter is used to cast the output to integer, and is compared with a calculation of the variable.

The third task is using the files module to create a directory. The fourth task is using the copy module. The copy module means a file or files (in this case) are copied from the machine from which the playbook is run, onto the destination host or hosts. Here is also another trick used, to process the task with a list of items. As you can see, the copy line contains a variable {{ items }}, and the task is executed for all the items in the list ‘with_items’. I found this is fine for smaller files (up to a few hundred of megabytes), but too slow for bigger files. I use http (the get_url module) to speed up file transfer.

The fifth and sixth tasks execute a shell command, unzip, to extract the contents of a zip file into a specific place.

The seventh task is executing a small list of shell commands, in order to be able to run the conflict detection option of opatch. The same trick as with the first two tasks is used, register a name for the output of the conflict detection. Here I check if the stdout contains what I would manually check for when I would run it. The eighth task is the main task of the whole playbook: the actual patch. However, it uses the same technique as task seven. The last task simply removes a directory, in order to remove the files we used for this patch.

Summary
I hope this shows what a tremendous help Ansible can be for a consultant. This kind of tool is simply mandatory if you got an environment with more than approximately ten to twenty servers to administer. Ansible can be used even if the organisation does not want to spend time on the implementation of a configuration tool.

(the details are investigated and specific to Oracle’s database implementation on Linux x86_64)

Exadata IO: This event is not used with Exadata storage, ‘cell single block physical read’ is used instead.
Parameters:
p1: file#
p2: block#
p3: blocks

Despite p3 listing the number of blocks, I haven’t seen a db file sequential read event that read more than one block ever. Of course this could change in a newer release.

Implementation:
One of the important things to realise here is that regardless of asynchronous IO settings (disk_asynch_io, filesystemio_options), Oracle always uses a pread() systemcall, so synchronous IO for reading blocks which are covered with this event. If you realise what the purpose of fetching the single block is in most cases: fetching a single database block which contents are necessary in order to continue processing, it should become apparent that issuing a synchronous IO call makes sense. This is also the reason the V$IOSTAT* view lists both SMALL_READ_REQS, SMALL_SYNC_READ_REQS and SMALL_READ_SERVICETIME, SMALL_SYNC_READ_LATENCY, to make a distinction between SYNC (pread()) reads and non-sync (thus asynchronous) calls, using the io_submit()-io_getevents() call combination.

IO done under the event ‘db file sequential read’ means a single block is read into the buffer cache in the SGA via the system call pread(). Regardless of physical IO speed, this wait always is recorded, in other words: there is a strict relation between the event and the physical IO. Just to be complete: if a block needed is already in the Oracle database buffer cache, no wait event is triggered and the block is read. This is called a logical IO. When the wait event ‘db file sequential read’ is shown, both a physical and a logical IO are executed.

This event means a block is not found in the database buffer cache. It does not mean the block is really read from a physical disk. If DIO (direct IO) is not used (filesystemio_options is set to ‘none’ or ‘async’ when using a filesystem, ASM (alias “Oracle managed raw devices”) is inherently direct path IO, except when the ASM “disks” are on a filesystem (when ASM is used with NFS (!), then filesystemio_options is obeyed)), the block could very well be coming from the filesystem cache of linux. In fact, without DIO a phenomenon known as ‘double buffering’ takes place, which means the IO doesn’t happen to it’s visible disk devices directly, but it needs to take a mandatory step in between, done at the kernel level, which means the data is put in the filesystem cache of linux too. It should be obvious that this extra work comes at the cost of extra CPU cycles being used, and is in almost any case unnecessary.

If you take a step back you should realise this event should take place for a limited amount of blocks during execution. Because of the inherent single block IO nature of db file sequential read, every physical read (when it needs to read from a physical disk device) takes the IO latency penalty. Even with solid state disk devices, which have an inherently lower latency time because there are no rotating parts and disk heads to be moved, chopping up an operation in tiny parts when a full table scan or fast full index scan could be done means a lot of CPU time is used whilst it could be done more efficient.

The time spend on ‘db file sequential read’ quite accurately times single block IO. This means a direct relationship between ‘db file sequential read’ timings and amount should exist with operating system measured IO statistics (iostat, sar and more).

Last week I’ve gotten a question on how storage indexes (SI) behave when the table for which the SI is holding data is changed. Based on logical reasoning, it can be two things: the SI is invalidated because the data it’s holding is changed, or the SI is updated to reflect the change. Think about this for yourself, and pick a choice. I would love to hear if you did choose the correct one.

First let’s do a step back and lay some groundwork first. The tests done in this blogpost are done on an actual Exadata (V2 hardware), with Oracle version 11.2.0.4.6 (meaning bundle patch 6). The Exadata “cellos” (Cell O/S) version is 11.2.3.3.1.140529.1 on both the compute nodes and the storage nodes.

A storage index is a memory structure used by the cell daemon, which is the storage server process on the storage layer of Exadata. By default (although I’ve never seen it different yet) a SI contains minimum and maximum values for up to eight columns of the table it is describing. The memory structure is transient. It describes a region of one megabyte of a table. A storage index is populated during the fetching of data for a smart scan, based on the filter predicates of the query causing the smart scan. In essence the whole SI management is done automatically by the cell daemon (“essence” means you can play around with some undocumented settings on the database level, and there are some undocumented settings and events you can set on the cell server level).

Okay, back to the the original question. Let’s test this. First we need a table large enough to be smartscanned. I’ve got a table called ‘BIGTAB_NOHCC’ (as you probably guessed, there is a table ‘BIGTAB’ in the schema I am using, which is hybrid columnar compressed). This table consists of 2671736 blocks of 8KB, which means it got a total size of 21G. This is big enough for my database instance (big enough is relative to the buffer cache size) to get a smart scan.

For this test, I use the session statistics. In v$sesstat there is a statistic called ‘cell physical IO bytes saved by storage index’, which tells us how many bytes we saved from being scanned because of the use of storage indexes. I also show some output of a (single) storage server, although a normal Exadata rack typically will have 3, 7 or 14 storage servers in use.

First of all, in order to get an idea and be pretty sure my query will populate storage indexes, I stop and start the cell daemon on the storage servers. I use ‘service celld restart’ for that.

Next, in order to get storage index information for the table I use, I need some metadata. The metadata I need are:
a) data object id

select data_object_id from dba_objects where owner = 'MARTIN' and object_name = 'BIGTAB_NOHCC';

b) tablespace number

select ts# from v$tablespace t, dba_segments s, dba_objects o where t.name=s.tablespace_name and s.segment_name=o.object_name and o.data_object_id = 18716;

c) database unique id from the x$ksqdn

select ksqdngunid from x$ksqdn;

(thanks to the progeeking website for this information)

Next, I run the query on my BIGTAB_NOHCC table. Please mind it’s fundamental to have a filter predicate, and that the execution plan shows we are doing a full table scan. Only with a filter predicate the storage server has a reason to build a storage index:

MARTIN:dbm011> @mystats
Enter value for name: %storage%
old   4: and name like nvl('%&name%',name)
new   4: and name like nvl('%%storage%%',name)

NAME										 VALUE
---------------------------------------------------------------------- ---------------
cell physical IO bytes saved by storage index					     0

MARTIN:dbm011> select count(*) from bigtab_nohcc where id=906259;

   COUNT(*)
-----------
	 16

MARTIN:dbm011> @mystats
Enter value for name: %storage%
old   4: and name like nvl('%&name%',name)
new   4: and name like nvl('%%storage%%',name)

NAME										 VALUE
---------------------------------------------------------------------- ---------------
cell physical IO bytes saved by storage index					     0

Now let’s see if the storage layer built a storage index:

CellCLI> alter cell events = "immediate cellsrv.cellsrv_storidx('dumpridx','all',18716,5,1126366144);
Dump sequence #1 has been written to /opt/oracle/cell/log/diag/asm/cell/enkcel01/trace/svtrc_26565_58.trc
Cell enkcel01 successfully altered

This is why it’s handy to have the aforementioned data on the table: you can dump the specific storage indexes for a single table. The three numbers after ‘all’ are data_object_id, tablespace number and the ksqdngunid from the x$ksqdn table.

Let’s look in the file which the cell event produced:

# less /opt/oracle/cell/log/diag/asm/cell/enkcel01/trace/svtrc_26565_58.trc
Trace file /opt/oracle/cell/log/diag/asm/cell/enkcel01/trace/svtrc_26565_58.trc
ORACLE_HOME = /opt/oracle/cell
System name:    Linux
Node name:      enkcel01.enkitec.com
Release:        2.6.39-400.128.17.el5uek
Version:        #1 SMP Tue May 27 13:20:24 PDT 2014
Machine:        x86_64
CELL SW Version:        OSS_11.2.3.3.1_LINUX.X64_140529.1

*** 2014-07-12 13:16:17.030

*** 2014-07-12 13:16:17.030
UserThread: LWPID: 27455 userId: 58 kernelId: 58 pthreadID: 0x7f2863fe7940
2014-07-12 13:21:36.499123 :0005C9DE: $$$ Dumping storage idx summary for griddisk DATA_CD_08_enkcel01:
2014-07-12 13:21:36.499217 :0005C9E0: Dump sequence #1:

*** 2014-07-12 13:21:36.499
2014-07-12 13:21:36.499212 :0005C9DF: 
***************************
2014-07-12 13:21:36.499249 :0005C9E1: Dumping RIDX summary for objd 18716, tsn 5, dbid 1126366144

2014-07-12 13:21:36.499249*: RIDX(0x7f27b4ddab64) : st 2(RIDX_VALID) validBitMap 0 tabn 0 id {18716 5 1126366144}
2014-07-12 13:21:36.499249*: RIDX: strt 0 end 2048 offset 6215958528 size 1048576 rgnIdx 5928 RgnOffset 0 scn: 0x099a.5cc294b8 hist: 2
2014-07-12 13:21:36.499249*: RIDX validation history: 
2014-07-12 13:21:36.499249*: 0:FullRead 1:Undef 2:Undef 3:Undef 4:Undef 5:Undef 6:Undef 7:Undef 8:Undef 9:Undef
2014-07-12 13:21:36.499249*: Col id [1] numFilt 4 flg 2 (HASNONNULLVALUES): 
2014-07-12 13:21:36.499249*: lo: c3 c 34 11 0 0 0 0
2014-07-12 13:21:36.499249*: hi: c3 5a 5 18 0 0 0 0

Here you see the typical heading of an Oracle trace file, next the announcement of the dump (“Dumping storage idx summary”,”Dumping RIDX summary”). The real storage index information starts with “RIDX(0x7f27b4ddab64)”. Starting with that and until the “lo” and “hi” values, you are looking at an actual storage index which holds data for a single column. You can see which column by looking at the “Col id” in the square brackets: 1. It’s interesting to note that there is a scn (system change number) included. The storage index shows if there are NULL values in the column (in this case it says HASNONNULLVALUES, so we don’t have any NULL values in the 1MB chunk in the column this storage index describes), and, of course, the low and high values in the Oracle internal data format.

So, despite any indication on the database layer, the query built storage indexes! That should mean that executing the same query again will result in actually using the storage indexes which were just build:

MARTIN:dbm011> select count(*) from bigtab_nohcc where id=906259;

   COUNT(*)
-----------
	 16

MARTIN:dbm011> @mystats
Enter value for name: %storage%
old   4: and name like nvl('%&name%',name)
new   4: and name like nvl('%%storage%%',name)

NAME										 VALUE
---------------------------------------------------------------------- ---------------
cell physical IO bytes saved by storage index				    5746368512

Yes! This using the storage index we caused the be built up previously!

With this being established, we can try to get an answer to our question: how does the storage index react to DML on the table it is describing?

We build the storage index, and used it. Now let’s update the ‘id’ field for which the storage index was build, and redo our query test:

MARTIN:dbm011> update bigtab_nohcc set id = id + 1;

16000000 rows updated.

MARTIN:dbm011> commit;

Commit complete.

Okay, now let’s redo the select again, and take stats before and after on storage index usage!

MARTIN:dbm011> @mystats
Enter value for name: %storage%
old   4: and name like nvl('%&name%',name)
new   4: and name like nvl('%%storage%%',name)

NAME										 VALUE
---------------------------------------------------------------------- ---------------
cell physical IO bytes saved by storage index				    5746368512

MARTIN:dbm011> select count(*) from bigtab_nohcc where id=906260;

   COUNT(*)
-----------
	 16

MARTIN:dbm011> @mystats
Enter value for name: %storage%
old   4: and name like nvl('%&name%',name)
new   4: and name like nvl('%%storage%%',name)

NAME										 VALUE
---------------------------------------------------------------------- ---------------
cell physical IO bytes saved by storage index				    5746368512

What we see here with the storage index session statistic, is the update statement didn’t use the storage index (which is obvious, we updated the data, which happened on the database layer, so we didn’t use a smartscan) because this number is the same as the last time we looked at it before the update statement.
When we executed the select query on the table with a filter on the id column again, there is no storage index usage, because the storage index session statistic didn’t increase.

Actually, this query built new storage indexes. When this query is executed again, we can use these:

MARTIN:dbm011> @mystats
Enter value for name: %storage%
old   4: and name like nvl('%&name%',name)
new   4: and name like nvl('%%storage%%',name)

NAME										 VALUE
---------------------------------------------------------------------- ---------------
cell physical IO bytes saved by storage index				    5746368512

MARTIN:dbm011> select count(*) from bigtab_nohcc where id=906260;

   COUNT(*)
-----------
	 16

MARTIN:dbm011> @mystats
Enter value for name: %storage%
old   4: and name like nvl('%&name%',name)
new   4: and name like nvl('%%storage%%',name)

NAME										 VALUE
---------------------------------------------------------------------- ---------------
cell physical IO bytes saved by storage index				   14534426624

(actually, it sometimes can take more than one execution for the storage indexes to be created again, my guess would be some heuristics are used to try to come up with the best candidates for storage indexes)

Let’s try another case: update a column in the table for which storage indexes are created on another column.
First make sure storage indexes are build:

MARTIN@dbm011 > @mystats
Enter value for name: %storage%
old   4: and name like nvl('%&name%',name)
new   4: and name like nvl('%%storage%%',name)

NAME										 VALUE
---------------------------------------------------------------------- ---------------
cell physical IO bytes saved by storage index					     0

MARTIN@dbm011 > select count(*) from bigtab_nohcc where id=906260;

  COUNT(*)
----------
	16

MARTIN@dbm011 > @mystats
Enter value for name: %storage%
old   4: and name like nvl('%&name%',name)
new   4: and name like nvl('%%storage%%',name)

NAME										 VALUE
---------------------------------------------------------------------- ---------------
cell physical IO bytes saved by storage index				    8790155264

For completeness sake, I verified the storage indexes for this table by using the dump shown previously. A simple grep on the dump shows this query created storage indexes for only column 1, the id column:

grep -e RIDX_VALID -e 'Col id' /opt/oracle/cell/log/diag/asm/cell/enkcel01/trace/svtrc_26565_4.trc
014-07-13 03:34:46.814235*: RIDX(0x7f27b4ddab64) : st 2(RIDX_VALID) validBitMap 0 tabn 0 id {18716 5 1126366144}
2014-07-13 03:34:46.814235*: Col id [1] numFilt 5 flg 2 (HASNONNULLVALUES): 
2014-07-13 03:34:46.814235*: RIDX(0x7f27b4ddac40) : st 2(RIDX_VALID) validBitMap 0 tabn 0 id {18716 5 1126366144}
2014-07-13 03:34:46.814235*: Col id [1] numFilt 4 flg 2 (HASNONNULLVALUES): 
2014-07-13 03:34:46.814235*: RIDX(0x7f27b4ddad1c) : st 2(RIDX_VALID) validBitMap 0 tabn 0 id {18716 5 1126366144}
2014-07-13 03:34:46.814235*: Col id [1] numFilt 5 flg 2 (HASNONNULLVALUES): 
2014-07-13 03:34:46.814587*: RIDX(0x7f27b4ddadf8) : st 2(RIDX_VALID) validBitMap 0 tabn 0 id {18716 5 1126366144}
2014-07-13 03:34:46.814587*: Col id [1] numFilt 5 flg 2 (HASNONNULLVALUES): 
2014-07-13 03:34:46.814587*: RIDX(0x7f27b4ddaed4) : st 2(RIDX_VALID) validBitMap 0 tabn 0 id {18716 5 1126366144}
2014-07-13 03:34:46.814587*: Col id [1] numFilt 5 flg 2 (HASNONNULLVALUES): 
2014-07-13 03:34:46.814587*: RIDX(0x7f27b4ddb08c) : st 2(RIDX_VALID) validBitMap 0 tabn 0 id {18716 5 1126366144}
2014-07-13 03:34:46.814587*: Col id [1] numFilt 5 flg 2 (HASNONNULLVALUES): 
...etc.

Now let’s update another field:

MARTIN@dbm011 > update bigtab_nohcc set spcol = spcol + 1;

16000000 rows updated.

MARTIN@dbm011 > commit;

Commit complete.

And query the storage index use, and do our query with filter predicate again:

MARTIN@dbm011 > @mystats
Enter value for name: %storage%
old   4: and name like nvl('%&name%',name)
new   4: and name like nvl('%%storage%%',name)

NAME										 VALUE
---------------------------------------------------------------------- ---------------
cell physical IO bytes saved by storage index				    8790155264

MARTIN@dbm011 > select count(*) from bigtab_nohcc where id=906260;

  COUNT(*)
----------
	16

MARTIN@dbm011 > @mystats
Enter value for name: %storage%
old   4: and name like nvl('%&name%',name)
new   4: and name like nvl('%%storage%%',name)

NAME										 VALUE
---------------------------------------------------------------------- ---------------
cell physical IO bytes saved by storage index				    8790155264

So…this strongly suggests the update invalidated our storage index, even when the column was not in the storage index.
Just for completeness sake, let’s issue the select statement again to see if the storage index was build up again:

MARTIN@dbm011 > select count(*) from bigtab_nohcc where id=906260;

  COUNT(*)
----------
	16

MARTIN@dbm011 > @mystats
Enter value for name: %storage%
old   4: and name like nvl('%&name%',name)
new   4: and name like nvl('%%storage%%',name)

NAME										 VALUE
---------------------------------------------------------------------- ---------------
cell physical IO bytes saved by storage index				    9808936960

Yes, there it is again, the previous statement didn’t benefit from the storage indexes, but build them, and now we take advantage of it again.

Conclusion.
I ran these tests on a quarter rack without any other usage. The method I used for measuring how storage indexes behave was to execute on the database layer, and see the result on the storage layer and in database layer session statistics. It’s important to realise that despite this being quite strong evidence, there isn’t definite proof on my conclusion. Also, the behaviour described can change in future storage server versions.

My conclusion is that the storage server uses a system change number to validate if the storage indexes are valid. If not, storage indexes are not used. If blocks contained in a storage index progresses it’s system change number (in other words: are updated), the storage index is invalidated right away (during my tests, when the update starts, no or few storage indexes remain).

So, the ones that picked storage indexes being invalidated were right. As far as I could detect, there is no updating of storage indexes.

There’s been some debate about how to get the parameters from a spfile. A spfile is a binary version of the parameter file of the Oracle database.

I added to the debate that my experience is that there are is some weirdness with using the strings command on the spfile. The discussion was on twitter, I didn’t add that doing that it most of the time meant it costed more time than I saved from using the “shortcut” of using strings on a spfile.

Let me show you what it means.

I’ve got a database with storage on ASM. Among other options, there are two simple methods to get the spfile from ASM:

You can get the spfile by logging on to the database, and create a pfile from the spfile, and create a spfile again:

SYS@v11204 AS SYSDBA> show parameter spfile

NAME				     TYPE	 VALUE
------------------------------------ ----------- ------------------------------
spfile				     string	 +DATA/v11204/spfilev11204.ora

Now let’s recreate the spfile on a filesystem:

SYS@v11204 AS SYSDBA> create pfile='/tmp/pfile' from spfile='+DATA/v11204/spfilev11204.ora';

File created.

SYS@v11204 AS SYSDBA> create spfile='/tmp/spfile' from pfile='/tmp/pfile';

File created.

Another option is to copy the spfile out of ASM:
Set the ASM environment and execute asmcmd

[oracle@ol65-oracle11204 [v11204] ~]$ +ASM
The Oracle base remains unchanged with value /u01/app/oracle
[oracle@ol65-oracle11204 [+ASM] ~]$ asmcmd
ASMCMD>

Now go to the DATA disk group, and the directory of the database (my database is called v11204). If you look here, you’ll see a link to the spfile to its true ASM place:

ASMCMD> cd data/v11204
ASMCMD> ls
CONTROLFILE/
DATAFILE/
ONLINELOG/
PARAMETERFILE/
TEMPFILE/
spfilev11204.ora

If you take a long listing, you see the true ASM place:

ASMCMD&gt; ls -l
Type           Redund  Striped  Time             Sys  Name
                                                 Y    CONTROLFILE/
                                                 Y    DATAFILE/
                                                 Y    ONLINELOG/
                                                 Y    PARAMETERFILE/
                                                 Y    TEMPFILE/
                                                 N    spfilev11204.ora =&gt; +DATA/V11204/PARAMETERFILE/spfile.265.847477361

In asmcmd, you can just copy the spfile (the real name, not the “spfilev11204.ora” one, which is a kind of symbolic link):

ASMCMD> cp +DATA/V11204/PARAMETERFILE/spfile.265.847477361 /tmp/spfile.asm
copying +DATA/V11204/PARAMETERFILE/spfile.265.847477361 -> /tmp/spfile.asm

Okay, now back to using strings on the spfile. If I issue strings on the spfile, I get what looks like a complete parameter file:

$ strings spfile.asm
v11204.__db_cache_size=692060160
v11204.__java_pool_size=4194304
v11204.__large_pool_size=8388608
v11204.__oracle_base='/u01/app/oracle'#ORACLE_BASE set from environment
v11204.__pga_aggregate_target=524288000
v11204.__sga_target=943718400
v11204.__shared_io_pool_size=0
v11204.__shared_pool_size=226492416
v11204.__streams_pool_size=0
*.audit_file_dest='/u01/app/oracle/admin/v11204/adump'
*.audit_trail='db'
*.compatible='11.2.0.4.0'
*.control_files='+DATA/v11204/controlfile/current.25
6.847475315'
*.db_block_size=8192
*.db_create_file_dest='+DATA'
*.db_domain=''
*.db_file_multiblock_read_count=1024
*.db_name='v11204'
*.diagnostic_dest='/u01/app/oracle'
*.disk_asynch_io=TRUE
*.dispatchers='(PROTOCOL=TCP) (SERVICE=v11204XDB)'
*.open_cursors=300
*.pga_aggregate_target=524288000
*.processes=150
*.remote_login_passwordfile='EXCLUSIVE'
*.sga_target=943718400
*.undo_tablespace='UNDOTBS1'

But is it? If you take a detailed look, you’ll see something which is not alright:

*.compatible='11.2.0.4.0'
*.control_files='+DATA/v11204/controlfile/current.25
6.847475315'
*.db_block_size=8192

This is exactly what I meant: for some reason, the line with ‘control_files’ seems to have been broken into two pieces. I think I don’t need to tell most readers of this blog that tracing this kind of oddities costs a lot of time, especially if you’ve got a big spfile. And: most of the time you are playing around with this, there probably is something wrong, and you simply don’t have the time for this fooling around.

But what why is this happening? Let’s look at the spfile contents using the ‘od’ utility (octal dump):

$ strings spfile.asm | od -t d1 -a -A d
0000000  118   49   49   50   48   52   46   95   95  100   98   95   99   97   99  104
           v    1    1    2    0    4    .    _    _    d    b    _    c    a    c    h
0000016  101   95  115  105  122  101   61   54   57   50   48   54   48   49   54   48
           e    _    s    i    z    e    =    6    9    2    0    6    0    1    6    0
0000032   10  118   49   49   50   48   52   46   95   95  106   97  118   97   95  112
          nl    v    1    1    2    0    4    .    _    _    j    a    v    a    _    p
0000048  111  111  108   95  115  105  122  101   61   52   49   57   52   51   48   52
           o    o    l    _    s    i    z    e    =    4    1    9    4    3    0    4

This is the beginning of the spfile, to get an idea what we are looking at.
The numbers on the left side (00000000, 00000016, etc) are the position numbers in decimal. This shows there are 16 characters per line. The numbers on the line of the position are the ASCII values. The character representation of the ASCII value is BENEATH it. If you now read the line (looking at the character representation, you see ‘v11204.__db_cach (new line) e_size=692060160 (new line)’ and then ASCII value 10, which is represented with ‘nl': newline.

Okay, now we are used to reading this output, now let’s look at the problem section with the control_files line:

0000432   46   48   39   10   42   46   99  111  110  116  114  111  108   95  102  105
           .    0    '   nl    *    .    c    o    n    t    r    o    l    _    f    i
0000448  108  101  115   61   39   43   68   65   84   65   47  118   49   49   50   48
           l    e    s    =    '    +    D    A    T    A    /    v    1    1    2    0
0000464   52   47   99  111  110  116  114  111  108  102  105  108  101   47   99  117
           4    /    c    o    n    t    r    o    l    f    i    l    e    /    c    u
0000480  114  114  101  110  116   46   50   53   10   54   46   56   52   55   52   55
           r    r    e    n    t    .    2    5   nl    6    .    8    4    7    4    7
0000496   53   51   49   53   39   10   42   46  100   98   95   98  108  111   99  107
           5    3    1    5    '   nl    *    .    d    b    _    b    l    o    c    k

If we look closely, you can see ‘*.control_fi’ on the first line, after the ASCII value 10, newline.
If we read on, it’s shows:
*.control_files=’+DATA/v11204/controlfile/current.25 (nl) 6.847475315′ (nl)
In other words, there is an additional newline. But the position (488) seems strange to me.

But when looking at how I generated this, I executed ‘strings’. This means the output is filtered to readable characters. Would there be unreadable characters in a spfile? Let’s look!

$ cat spfile.asm | od -t d1 -a -A d
0000000   67   34    0    0    1    0    0    0    0    0    0    0    0    0    1    4
           C    "  nul  nul  soh  nul  nul  nul  nul  nul  nul  nul  nul  nul  soh  eot
0000016   77   25    0    0    0    0    0    0    0    0    0    0    0    0    0    0
           M   em  nul  nul  nul  nul  nul  nul  nul  nul  nul  nul  nul  nul  nul  nul
0000032    0    0    0    0    0    0    0    0    0    0    0    0    5    0    0    0
         nul  nul  nul  nul  nul  nul  nul  nul  nul  nul  nul  nul  enq  nul  nul  nul
0000048    0    2    0    0    0    0    0    0    0    0    0    0    0    0    0    0
         nul  stx  nul  nul  nul  nul  nul  nul  nul  nul  nul  nul  nul  nul  nul  nul
0000064    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0    0
         nul  nul  nul  nul  nul  nul  nul  nul  nul  nul  nul  nul  nul  nul  nul  nul

Aha! So strings does filter a lot of stuff! Now let’s look at the control_files parameter again:

0000960   46   48   46   52   46   48   39   10   42   46   99  111  110  116  114  111
           .    0    .    4    .    0    '   nl    *    .    c    o    n    t    r    o
0000976  108   95  102  105  108  101  115   61   39   43   68   65   84   65   47  118
           l    _    f    i    l    e    s    =    '    +    D    A    T    A    /    v
0000992   49   49   50   48   52   47   99  111  110  116  114  111  108  102  105  108
           1    1    2    0    4    /    c    o    n    t    r    o    l    f    i    l
0001008  101   47   99  117  114  114  101  110  116   46   50   53    1   67    0    0
           e    /    c    u    r    r    e    n    t    .    2    5  soh    C  nul  nul
0001024   67   34    0    0    3    0    0    0    0    0    0    0    0    0    1    4
           C    "  nul  nul  etx  nul  nul  nul  nul  nul  nul  nul  nul  nul  soh  eot
0001040   34  121    0    0   54   46   56   52   55   52   55   53   51   49   53   39
           "    y  nul  nul    6    .    8    4    7    4    7    5    3    1    5    '

I think this is quite self explanatory to a lot of people. If not, let me help you: all is as we expect up to position 1018, at which there are a few non-readable characters, until position 1043. This means there are 25 character positions which contain something else, after which the parameter file contents continue.

But now look at the position: it’s around position 1024. It’s my guess that the spfile uses a block size of 1024 bytes (1KB). In order to check for consistency of the parameter file blocks, Oracle puts some extra (internal) data on the borders of the block so integrity can be checked. This is like an Oracle database block.

So, there you have a reason not to use strings on the spfile, unless you like a game of find the random newlines in your new pfile.

How should you create a new pfile then?

The preferred method is using ‘create pfile from spfile’. This requires logging on to the instance (nomount is enough).
An alternative is to look at the alert.log file. When starting an Oracle instance, the non-default parameters are printed in the alert.log. This is a very simple, yet useful method of reconstructing the parameter file.

Update: I’ve gotten an email from Bjoern Rost saying that my statement on the need of the instance at least needing to be in nomount phase is not true, because the ‘create pfile from spfile’ commands can be used with the instance being down.

I decided to take this for an additional test. First of all, there two methods which can be used pfile/spfile manipulation via sqlplus (as far as I know): logging on with SYSDBA privileges, and starting sqlplus on the local node without logging on to any instance (/nolog).

a) instance is OPEN (the which is the same with in the nomount and mount phase)

With SYSDBA privilege.

$ sqlplus / as sysdba 
...
SYS@v11204 AS SYSDBA> create pfile='/tmp/tt' from spfile;

File created.

With /nolog.

$ sqlplus /nolog
...
@ > create pfile='/tmp/tt' from spfile;
SP2-0640: Not connected

Upon given this a little thought, it’s kind of obvious only saying spfile can’t be used with /nolog: there are no settings, we are not connected to any instance. So let’s try specifying a full path for both pfile and spfile:

$ sqlplus /nolog
...
@ > create pfile='/tmp/tt' from spfile='+DATA/v11204/spfilev11204.ora';
SP2-0640: Not connected

No. sqlplus /nolog can’t be used in my situation (Oracle 11.2.0.4, Linux X64 OL 6u5, ASM, instance open).

b) instance down

With SYSDBA privilege:

$ sqlplus / as sysdba
...
Connected to an idle instance.

SYS@v11204 AS SYSDBA> create pfile='/tmp/tt' from spfile='+DATA/v11204/spfilev11204.ora';

File created.

Aha! So we can manipulate the pfile and spfile with the instance being down when we logon as SYSDBA!

Let’s look at another case:

$ sqlplus / as sysdba
...
Connected to an idle instance.

SYS@v11204 AS SYSDBA> create pfile='/tmp/tt' from spfile;
create pfile='/tmp/tt' from spfile
*
ERROR at line 1:
ORA-01565: error in identifying file '?/dbs/spfile@.ora'
ORA-27037: unable to obtain file status
Linux-x86_64 Error: 2: No such file or directory
Additional information: 3

What is shown here, is that when my instance is down, I can’t use ‘create pfile from spfile’ without a specification for spfile. Apparently, it looks at the default location ($ORACLE_HOME/dbs) for the default spfile name (spfilev11204.ora in my case): ORA-01565: error in identifying file ‘?/dbs/spfile@.ora’. ‘?’ means $ORACLE_HOME and ‘@’ means instance name. Let’s see if this works if we create a spfile on that location:

SYS@v11204 AS SYSDBA> create pfile='/tmp/tt' from spfile='+DATA/v11204/spfilev11204.ora';

File created.

SYS@v11204 AS SYSDBA> create spfile from pfile='/tmp/tt';

File created.

The first step is already done, but redone for completeness sake. The second step creates a spfile on the default location from the pfile ‘/tmp/tt’. Now we can ask sqlplus to essentially do the same but in reverse: create ‘/tmp/tt’ from spfile:

SYS@v11204 AS SYSDBA> create pfile='/tmp/tt' from spfile;

File created.

Yes! Now this works. The reason for this behaviour is I explicitly want my spfile not to be on the default location ($ORACLE_HOME/dbs), because this is on a local filesystem. This is not a problem with single instance databases with no shared storage at all, but this is not practical in the case of RAC, and with single instance databases with a cluster and shared storage (think Exadata here!), because it’s practical to have the spfile on shared diskspace so the instance can very easily be started on another node.

Let’s try sqlplus /nolog again:

$ sqlplus /nolog
...
@ > create pfile='/tmp/tt' from spfile='+DATA/v11204/spfilev11204.ora';
SP2-0640: Not connected

Nope. That doesn’t work. I’ve seen blogposts indicating that /nolog can be used for pfile/spfile manipulation, this didn’t work in my case. Bjoern was right that pfile/spfile can be done while the instance is down.

At the Accenture Enkitec Group we have a couple of Exadata racks for Proof of Concepts (PoC), Performance validation, research and experimenting. This means the databases on the racks appear and vanish more than (should be) on an average customer Exadata rack (to be honest most people use a fixed few existing databases rather than creating and removing a database for every test).

Nevertheless we gotten in a situation where the /etc/oratab file was not in sync with the databases registered in the cluster registry. This situation can happen for a number reasons. For example, if you clone a database (RMAN duplicate), you end up with a cloned database (I sincerely hope), but this database needs to be manually registered in the cluster registry. This is the same with creating a standby database (for which one of the most used methods is to use the clone procedure with a couple of changes).

However, above reasons are quite obvious. But there is a another reason which is way less obvious: bug 17172091 (Oracle restart may update oratab incorrectly after a node reboot) can also cause your oratab to get out of sync with the databases/instances in the cluster registry. Additional information: Oracle versions confirmed being affected are 11.2.0.3 and 11.2.0.2. This is bug is reported to be fixed with Grid Infra PSU 11.2.0.3.9 and 11.2.0.4.2. (yes I am aware of the inconsistency between versions affected and fixed versions, this is from the Oracle bug information available; thanks to Tanel Põder for finding this bug).

In order to recreate the oratab, you need to go through the cluster registry information, and compare it with your oratab. Especially if you’ve got a lot of databases, and/or single instance databases in different nodes, this can be quite some work. To relieve work for that situation, I created a little shell script to parse the cluster registry database information and get the db_unique_name (which is what the first field actually is in the oratab file, thanks to Randy Johnson) and oracle home path information and output this in “oratab format” (db_unique_name:oracle home path:N). Needless to say, this script just outputs it to STDOUT. If you want to use this information, redirect it to a file, or copy and past it in oratab yourself.

for resource in $(crsctl status resource -w "((TYPE = ora.database.type) AND (LAST_SERVER = $(hostname -s)))" | grep ^NAME | sed 's/.*=//'); do 
	full_resource=$(crsctl status resource -w "((NAME = $resource) AND (LAST_SERVER = $(hostname -s)))" -f)
	db_name=$(echo "$full_resource" | grep ^DB_UNIQUE_NAME | awk -F= '{ print $2 }')
	ora_home=$(echo "$full_resource" | grep ^ORACLE_HOME= | awk -F= '{ print $2 }')
	printf "%s:%s:N\n" $db_name $ora_home 
done

This post is aimed at people working with code, scripts and/or any other means of textual files. I try to give my point of view on revision control and git as revision control system in particular.

The first thing you should ask yourself is: why using revision control in the first place? I think that’s a good and fair question. A lot of people I talk to see revision control as something that’s for developers in projects with multiple people working on the same code to build history and provide a single point of truth. I think revision control in that situation indeed is needed (in fact, I think it is a necessity).

Let’s now look at the situation of a DBA. Most people I work with built up their own bundle of scripts to circumvent constantly redoing stuff they’ve worked out in the past, and/or scripts they gotten or borrowed from other people (in fact, if you do not keep a copy of the excellent script bundles of Tanel Poder and Kerry Osborne you’re either brilliant enough to do this all from the top of your head, or stupid enough not to use them or know them, with a fair chance of falling in the second category).

Probably most (if not: all) people change scripts, create entirely new ones, experiment, etc. How many times where you searching for this specific version/modification you made *somewhere*, but forgot where it was? In my own situation, when doing research, experiments and investigations, I use a lot of virtual machines (with/without ASM, different operating system versions, different Oracle versions; there are huge differences between 11.2.0.1/2/3/4, and Oracle version 12), and need my scripts and what to conventiently move changes and use newly created scripts among these machines. I don’t think I even have to go in the area of seeing the history of changes to a certain script or the repository as a whole: this is something you will use when using revision control, or miss sorely if your scripts are not in revision control.

Once you or your team is convinced you need revision control the immediate second thing which always pops up (in the situations I worked in, I am aware there are other revision control systems) is subversion or git? If you look at the title of this blog you know where I end up, but it’s good to give this some thought. Subversion is a decent revision control system, with which I’ve worked with great pleasure in the past. The history I’ve read on subversion is that it was made to be an open source version of the CVS revision control system and overcome some of the problems/limitations of it. Subversion is a revision control system that works in a client/server way: there is the central repository, and clients check out the source from that. This is no problem when client and server are on the same machine or in the local network, and even less a problem if there’s only one user.

But what if there are multiple persons working with it? And these people are located at vast distances from each other? And some of the people do not have internet access all the time? The problems that arise from that are a lot of potential problems with concurrent versions, performance can be very bad, because you need to connect to the central repository, and you need a connection to the repository in the first place to commit your change. From what I’ve read from the git revision control system, these were some of the problems Linus Torvalds wanted to overcome with git.

A git repository always works local on the machine you are actually working. You can link remote git repositories and push your changes to one or more repositories, or the other way around pull changes from a remote repository to your repository. I guess most people immediately understand why this easily overcomes a lot of problems that where the result of having a single point as the repository. You always work in your own repository.

I’ve found this (git versus subversion) yet another topic people easily get into a ‘religious discussion’, which means the discussion isn’t about actual properties and pros and con’s of both revision control systems, but on personal preference and sometimes ego. Another thing which is closely related to this is there is a learning curve if you need to start with using git.

Installation
I work on Mac OSX. In order to get git, just install the Xcode package from the App Store (free). On Oracle Linux, yum install the git package. I guess on RedHat this works the same. On Windows, download the installer from http://msysgit.github.com/ and run it. (please mind the examples in this blog are on OSX, and applicable to linux)

Initial configuration
Once you’ve installed git, the next thing to do is set your credentials:

$ git config --global user.name "John Doe"
$ git config --global user.email johndoe@example.com

I’ve set some aliases (which I’ve found in articles on git), to shorten typing a bit:

$ git config --global alias.co=checkout
$ git config --global alias.ci=commit
$ git config --global alias.st=status
$ git config --global alias.hist=log --pretty=format:"%h %ad | %s%d [%an]" --graph --date=short

Okay, at this point you are set, but no repository has been created yet.

Create a repository
The next obvious step is to create a repository:

$ git init

This creates a .git directory in the current working directory, which means you are now in the root directory of the newly created repository. Please mind you can initialise a repository in a root directory of a current project. It will still be an empty repository.

One of the other ways to start using a git repository which might have caught your eyes a few times, is clone ($ git clone git://github.com/username/reponame.git); which creates a local revision controlled copy of a remote git repository.

Let’s say I got a directory which contains a file ‘a’, and a subdirectory ‘tt’ which contains ‘b’ and ‘c':

$ find .
.
./a
./tt
./tt/b
./tt/c

Now start a git repository in the root:

$ git init
Initialized empty Git repository in /Users/fritshoogland/t/.git/

Now let’s look at how our repository looks like:

$ git status
On branch master

Initial commit

Untracked files:
  (use "git add ..." to include in what will be committed)

	a
	tt/

nothing added to commit but untracked files present (use "git add" to track)

Here we see git showing us that there are no files in it, but that it sees our test files, which are currently untracked. Let’s add all the files:

$ git add *

If we look at the status of the repository, we see the files are added, but not yet committed:

$ git status
On branch master

Initial commit

Changes to be committed:
  (use "git rm --cached <file>..." to unstage)

	new file:   a
	new file:   tt/b
	new file:   tt/c

And commit the additions:

$ git commit -m 'initial commit.'
[master (root-commit) 735ed81] initial commit.
 3 files changed, 3 insertions(+)
 create mode 100644 a
 create mode 100644 tt/b
 create mode 100644 tt/c

If you commit something, git wants you to add a remark with this commit. This can be done with the git command by adding ‘-m’ and a comment within quotation marks, or by omitting this, which makes git fire up an editor, in which you can type a comment.

Changes
At this point we have our files committed to the repository, and the files and repository are completely in sync:

$ git status
On branch master
nothing to commit, working directory clean

Now let’s change something. My file ‘a’ has got one line in it, with a single ‘a’. I add a second line to the file ‘a’ reading ‘second line':

$ cat a
a
second line

Now let’s ask git for the status:

$ git status
On branch master
Changes not staged for commit:
  (use "git add <file>..." to update what will be committed)
  (use "git checkout -- <file>..." to discard changes in working directory)

	modified:   a

no changes added to commit (use "git add" and/or "git commit -a")

Here we see that git knows about the modification in file a. The next thing we see is the line ‘Changes not staged for commit’. This is something which is different from subversion. Instead of changes which are committed, git follows a two stage approach: a changed file needs to be staged for commit first, after which it can be committed. This allows you to group changed files for a commit, instead of committing all.

Now let’s stage the change as indicated by the help provided with ‘git status’, using ‘git add’. Please mind the ‘add’ command is used both for making files version controlled and adding them to the stage list. I issued a subsequent git status to show the new status:

$ git add a
$ git status
On branch master
Changes to be committed:
  (use "git reset HEAD <file>..." to unstage)

	modified:   a

We can look at the difference between the version in git and changes made by using diff –cached:

$ git diff --cached
diff --git a/a b/a
index 7898192..c4fbe32 100644
--- a/a
+++ b/a
@@ -1 +1,2 @@
 a
+second line

Now let’s commit the change, and look at the status:

$ git commit -m 'added line to a.'
[master 17ec0e8] added line to a.
 1 file changed, 1 insertion(+)
$ git status
On branch master
nothing to commit, working directory clean

Now we build a tiny bit of history, we can look at it using git log. This command can be used to look at the changes of the entire repository, or at the history of a specific file. I use the alias ‘hist’ created above:

$ git hist a
* c8ec93c 2014-05-25 | added line to a. (HEAD, master) [FritsHoogland]
* 739f11a 2014-05-25 | initial commit. [FritsHoogland]

One common thing you want to do is to see what changes are made to a file. To see what the changes are between the current version in the repository (which is called ‘HEAD’, which means ‘last commit in current branch’), use git diff, and the change hash (look at the above example of git hist):

$ git diff 739f11a a
diff --git a/a b/a
index 7898192..c4fbe32 100644
--- a/a
+++ b/a
@@ -1 +1,2 @@
 a
+second line

Of course you can also checkout that specific version (that’s the reason for having version control!):

$ git checkout 739f11a a

Now the file a is reverted to the version without the line ‘second line’ in it. Because we changed the file a now again, git considers it changed:

$ git status
On branch master
Changes to be committed:
  (use "git reset HEAD <file>..." to unstage)

	modified:   a

So after checking out this version you can choose to commit this as new version, or revert it back to it’s original version using get reset HEAD (as described with the status command).

Remote repositories
However, everything described here until now are changes and version control done locally. How about collaboration with a team, like subversion has with the subversion server? Well, this is where git is inherently different. Git works in a peer-to-peer fashion, instead of a client-server way like subversion.

You can link your repository to a remote git repository, and push changes made in your local repository to a remote repository, or pull changes made in a remote repository to your own local repository. Especially if you work with a team, this is how you can centralise the source code in a very organised way (you can add a web interface for example).

These is how I create a remote repository on my synology NAS:

$ ssh Username@nas.local                                             # log on to remote server
$ mkdir burn.git                                                     # create directory for repo
$ cd $_
$ git init --bare                                                    # create empty git repository
$ git update-server-info                                             # not sure if this is needed
$ git remote add origin Username@nas.local:burn.git                  # add remote
$ git push -u origin master                                          # push local master to origin 

One of the things I use to keep track of current development version and the public client version of scripts is tags. This command tags the last commit with “prod 1.0″:

$ git tag "prod 1.0" HEAD

In order to use tags, you need to see which tags exist. This is very simple with the ‘git tag’ command. To look at the tags of a remote git repository, use:

$ git ls-remote --tags Username@nas.local:gdb_macros.git

If you want to get the whole repository for production usage, without the versioning, use the ‘git archive’ command (remove –remote and argument for archiving from a local repository). This version creates a gzipped tarball:

$ git archive master --remote Username@nas.local:gdb_marcos.git --format=tar | gzip > gdb_macros.tgz

You can also archive a tag, instead of the latest version. This version creates the files in the current directory:

$ git archive "prod 1.0" --remote Username@nas.local:gdb_macros.git --format=tar | tar xf -

I hope this blog encouraged you to put your files in a (git) repository.

Follow

Get every new post delivered to your Inbox.

Join 2,189 other followers

%d bloggers like this: