Outage

(warning! no oracle related content!)

Some of you may noticed the outage I’ve had today. Due to a inconsistency on the root filesystem, the filesystem was remounted readonly. This caused all kinds of problems (log messages not could not be written, the mail could not flow through the server, statistics could not be written, webmail didn’t work, the database (mysql) could not work, etc.)

By the way, for the few people who do not read serverheaders (with nmap for example), or less geeky with netcraft (left upper part: what’s that site running), I run linux. Ubuntu to be exact.

Because it was the root filesystem (/), this could not been done online. It is sheer magic the system stayed up and was (a bit) able to do most of what was being asked.

So that needed a reboot. I hate that word. I hate reboot. If you must reboot frequently, there is something seriously wrong with the operating system. The reboot got stuck during init; linux found a inconsistency that only could be fixed when fsck was run manually. To commence manual fsck, init asked for the root password.

That’s a pity! Ubuntu has no root password! This may sound strange to some of you, but it is quite normal; instead of having a root account with a password, the only way to get root access is to be a user which is a member of the admin group, and use ‘sudo’. Bummer.

But this didn’t lead me anywhere, I still have a system which does not work. What to do?
To be honest, I just googled for ‘linux rescue cd’. I found PLD rescue CD.

This cd booted up nicely, gave a CUI (character interface), and just the tools I needed.

LVM2 rescue howto with the PLD Rescue CD
1. Boot the cd
2. Load the device mapper kernel module
# modprobe dm-mod

This must be done, otherwise errors like:
/proc/misc: No entry for device-mapper found
Is device-mapper driver missing from kernel?
Failure to communicate with kernel device-mapper driver.
Incompatible libdevmapper 1.01.00-ioctl (2005-01-17)(compat) md kernel driver

3. scan system for volume groups:
# vgscan
(yes, it’s really that simple; vgscan scans all attached devices for LVM2)

4. activate volume groups found:
# vgchange -a y

So! That’s it! Now I can run fsck on the logical volume, repair the inconsistencies and get running again. It did work; proof is this blog 🙂

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: