Do you crash your database on machine shutdown?

Most DBA’s would say ‘NO’ without hesitation. I dare you to take another look.

What is this blog about?
This blog is about DBA’s who run databases on linux. Linux sysadmins are target audience too; this blog is about the stop/start system on linux, and to be more precise: about the specific implementation of the stop/start system on redhat and it’s derivatives (Oracle’s enterprise linux for example, but also centos)

What about it?
The redhat stop/start system looks and feels like the traditional unix system V stop/start system found on most unixes. Correct usage of the redhat stop/start system is in fact quite different from that.

The traditional unix system V stop/start uses an executable script in the init.d directory (often /etc/init.d or /etc/rc.d/init.d) and uses a symbolic link in the runlevel directories (rc1.d, rc2.d, rc3.d, etc.) from the script in init.d, but prefixing the name it with ‘S’ (for start) or ‘K’ (for kill, thus stop) and a two digit number to indicate the order for starting or stopping.

The redhat stop/start system works with an executable script in the init.d directory (/etc/rc.d/init.d). The correct way to enable stop/start is using the chkconfig command:
chkconfig --add yourscript
This command check if the script yourscript is present in /etc/rc.d/init.d, then reads a line looking like this in the stop/start script:
#chkconfig: 2345 80 05
This tells chkconfig that the script should run in runlevels 2,3,4 and 5 with order number 80 and stopped (killed) with order number 05.
To check if chkconfig did understand your script, use:
chkconfig --list
Further reading about chkconfig:
man chkconfig

Alright, but what about crashing the database?
A lesser known fact is that the redhat stop/start system uses a file with the same name as the stop/start script in /var/lock/subsys to determine if the script is already started (so it doesn’t get started if it’s already started in a previous runlevel) and needs to be created in the start part, but more importantly it determines if the script needs to execute the stop procedure by checking the existence of this file in /var/lock/subsys. As part of the stop procedure the file needs to be removed, so the stop procedure isn’t run again in another runlevel.

In many custom made scripts this file doesn’t get created in /var/lock/subsys in the start procedure, so the stop procedure actually is never executed, because the script is already stopped from the perspective of the stop/start system. This typically means that the oracle stop/start script does a perfect job starting the database, but actually never stops the database, which means that during a shutdown or a reboot, the database actually crashes.

Sadly, I see very much scripts forgetting this (even ones from respectable sources, like redhat; see: Redhat summit beginner’s guide to running oracle on RHEL, for example, there are much more). Check your stop/start script today!

  1. Actually, I got a phone call on the UKOUG monday of a DBA of one of holland’s largest telecom company’s, who did suffer an ASM instance restart as a side effect of this issue (the server went from runlevel 3 to 5 or something, and because of the file missing in /var/lock/subsys, it started again, because they coded a restart in their homegrown script)

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: