Repairing a Linux host with disk/filesystem issues

This is a writeup on some scenario’s for disk issues that you could encounter when running linux systems. The linux systems I am talking about are centos/redhat/oracle (EL) version 7 systems. This is just a writeup for myself to know how to deal with different scenario’s, hopefully other people find this interesting too. I don’t believe there is any difference for the below scenario’s and resolutions between running physical/bare metal, virtualised or in the cloud, provided you can get access the you need (like the BIOS POST console). The test configuration uses (legacy) MBR (meaning non UEFI), grub2 as boot loader, LVM for meta devices (you don’t want to run without LVM!!) and XFS for the filesystems.

I included a way to introduce each corruption yourself, so you can actually rehearse these kind of corruptions. These self introduced corruptions are simply parts overwritten and can be recovered from. In reality this might be different: if a disk is truly physically broken, there is no way you can make it come to live again with software, and the data on it is simply gone. Also, if the physical disk is not harmed, but the filesystem is damaged, I think most of you understand that if the amount of damage is big enough, at a certain point it is beyond repair.

REALLY IMPORTANT NOTICE

Modern filesystems like XFS, EXT3/4, NTFS are ‘logging filesystems’. This means changes to metadata are written into a log on disk called the ‘intent log’, after which the true change is written and then the corresponding log in the intent log is flagged as done. This means that unclean mounts or crashes can be recovered from by replaying the intent log to make sure all transactions are fully written.

However…by default these filesystems do not log actual data changes. The protected writes are filesystem structure changes. This is especially important to realise if writes are not done via O_DIRECT, and thus end up in the page cache.

This means that data corruption can still occur, despite having a modern, logging filesystem. Things like databases therefore keep consistency mechanisms in play to validate data consistency.

I do not know enough about BTRFS and ZFS to know if these do things differently from the logging perspective. Not logging all IO for “normal” filesystems is a performance choice.

Let’s start: know your system!

It really helps if you understand how your system looks like. This is how my system looks like:

# parted /dev/sda print
Model: ATA VBOX HARDDISK (scsi)
Disk /dev/sda: 68.7GB
Sector size (logical/physical): 512B/512B
Partition Table: msdos
Disk Flags:

Number  Start   End     Size    Type     File system  Flags
 1      1049kB  1075MB  1074MB  primary  xfs          boot
 2      1075MB  68.7GB  67.6GB  primary               lvm

The first partition of /dev/sda is set to be the boot partition (flags: boot), and is an XFS filesystem partition. The second partition is an LVM physical volume. ‘lsblk’ can look a bit more into the functions of the partitions, and reveals any other blockdevices.

# lsblk
NAME        MAJ:MIN RM  SIZE RO TYPE MOUNTPOINT
sda           8:0    0   64G  0 disk
├─sda1        8:1    0    1G  0 part /boot
└─sda2        8:2    0   63G  0 part
  ├─ol-root 249:0    0   41G  0 lvm  /
  ├─ol-swap 249:1    0    2G  0 lvm  [SWAP]
  └─ol-home 249:2    0   20G  0 lvm  /home
sr0          11:0    1  540M  0 rom

This shows that /dev/sda1 is mounted as /boot on the linux installation, and the sda2 LVM volume contains three logical volumes named: root, swap and home, which are mounted on /, as swap and on /home. There also is a ‘rom’ block device, which is the (oracle) linux boot image. This image can be downloaded from Oracle’s software delivery cloud, and is 566MB, and contains the rescue boot environment that is used in this blogpost. Any serious sysadmin should have such an image on hand to perform emergency recovery if needed.

Normally, the local disks are the first ones to boot from before other bootable devices such as a CD/DVD disk image in the BIOS. This means that I can leave the linux boot image connected, the system will normally boot from its hard disks and essentially not touch the DVD image. If for some reason the BIOS can’t find the MBR on the local disks, it will scan for MBR records on other disks in the order as set in the BIOS, and find the linux boot image and boot from that. On a real production system you don’t leave such an image connected, normally it’s attached when there is a need.

If you want or must use the linux boot image and not the local installation, you have to interfere the BIOS POST, and change the boot order. I am using virtualbox for most of my tests, the way to do that is: press F12 (Mac: fn-F12) and select CD-ROM as boot device and press enter.

To use the rescue mode on the linux boot image, boot it, then select “troubleshooting” (down arrow, enter), and “rescue a Oracle linux system” (down arrow, enter). This will boot linux from the boot image, and run anaconda. Anaconda gives you the following options:

1. Continue

This will scan your available disks, and mount the filesystems it could find under /mnt/sysimage in read-write mode. Not all filesystem issues will allow mounting read-write.

2. Read-only mount

This will scan your available disks, and mount the filesystems it could find under /mnt/sysimage in read-only mode. Obviously, not all issues allow read only mounting too.

3. Skip to shell

This will do no scanning of disks, so if you want to prevent any access to the local disks, this is the options to choose.

4. Quit (reboot)

This is a safe reboot (stopping all the started units). This is the same that would happen if the shell is exited from using option 1, 2 and 3.

Let’s look at a couple of different scenario’s:

Corrupted MBR

A MBR linux system scans the disks it can see in the order that is set in the BIOS, and boots from the first MBR found. This will normally be your linux installation on the local disks. If you have a corrupted or otherwise unbootable MBR on your local disks, and the linux boot image attached, it will boot from the linux boot image, unless it is set to explicitly not boot that device.

If no usable MBR is found, it will tell you on the console. On virtualbox, it shows:

FATAL: Could not read from the boot medium! System halted.

If just the MBR is damaged/corrupted, it can be recovered. If you want to rehearse recovering from a corrupted MBR, overwrite the MBR in the following way. Please mind this is machine dependent, look for specifics about the system to the overview above with parted and lsblk:

# dd if=/dev/zero of=/dev/sda bs=448 count=1

If you detach any other bootable devices, and reboot your system after writing over the MBR, it should not find anything to boot from, and your system will give an error like seen above, indicating it’s unable to boot.

In order to restore the MBR for grub2, perform the following tasks:
– Boot from the linux boot image.
– Start rescue mode.
– Choose ‘1’/continue, and enter to go to the shell.
– Change the root to your local disks: ‘chroot /mnt/sysimage’.
– Next thing is to understand your system. Based on the parted executable information above, we know the boot flag is set on device /dev/sda. This means we have to enter the following in the console:

grub2-install /dev/sda

This will report, if your disk is well enough to hold the information, and not really broken:

Installing for i386-pc platform.
Installation finished. No error reported.

– Now the MBR is written, and you can reboot your system by exiting the chroot jail and shell: ‘exit’ and ‘exit’ to boot your system from the local disks again.

Corrupted /boot filesystem

If the /boot filesystem, which holds the linux kernel and grub settings, is corrupted, the system will boot, because that is handled by the MBR, but then very quickly tell you it doesn’t recognise the filesystem:

.
error: unknown filesystem.
Entering rescue mode...
grub rescue>

A corrupted /boot filesystem can be recovered from in most cases, obviously depending on what and how much is corrupted in the filesystem, and the underlying device must be a able to function after the corruption. This can be rehearsed by overwriting part of the filesystem. In order to be absolutely sure the overwritten data on the disk is not changed by anything else, boot the linux boot image in rescue mode, and choose option ‘3’, and then overwrite part of the filesystem in /dev/sda1:

# dd if=/dev/zero of=/dev/sda1 bs=1k skip=2 count=1

In order to recover /boot, perform the following tasks:
– Boot from the linux boot image.
– Start rescue mode.
– Choose ‘1’/continue, and enter to go to the shell. If you are rehearsing: the dd command overwritten part of the filesystem header, it will not be recognised by lsblk, and if you take a look in the systems fstab (/mnt/sysimage/etc/fstab), it is mounted by UUID, but the UUID can’t be taken from the corrupted filesystem. The root and home filesystems are found in LVM, and mounted. This means you have to know /dev/sda1 is /boot.
– Repair the filesystem:

# xfs_repair /dev/sda1

Depending on the corruption, this might or might not succeed. If you are rehearsing this using the above dd command, you will see a lot of dots being printed searching for a superblock, which it will find and then recover from.
– Exit the shell to reboot the system: ‘exit’.

Corrupted root filesystem in LVM

When the root filesystem is corrupted, the system will boot as usual, show the different kernels in the grub dialogue, and then when the choice times out, boot the pre-selected kernel and try to boot linux. After the kernel is loaded, the root filesystem will be mounted, and it will fail. This generates the following message:

Generating "/run/initramfs/rdsosreport.txt"

Entering emergency mode. Exit the shell to continue.
Type "journalctl" to view system logs.
You might want to save "/run/initramfs/rdsosreport.txt to a USB stick or /boot after mounting them and attach it to a bug report.

In order to rehearse recovering a corrupted root filesystem, boot the linux boot image in rescue mode, and choose option ‘3’. You will find that the volume group is not active, it needs to be activated using:

# vgchange -ay
 3 logical volume(s) in volume group "ol" now active

Then overwrite the filesystem header for the logical volume ‘root’ in the volume group ‘ol’:

# dd if=/dev/zero of=/dev/ol/root bs=1k skip=2 count=1

A corrupted root filesystem can be recovered from in most cases, obviously depending on what and how much is corrupted in the filesystem, and the underlying device must be able to function after the corruption. In order to recover perform the following tasks:
– Boot from the linux boot image.
– Start rescue mode.
– Choose ‘3’.
– Activate LVM:

# vgchange -ay

– Repair the filesystem in the logical volume

# xfs_repair /dev/ol/root

– Exit the shell to reboot the system: ‘exit’.

Corrupted non root filesystem in LVM

On my system, I only got /home as a non root filesystem in LVM. When I corrupted this filesystem, the linux system will came up without any complaining on the console. To be honest, this was kind of surprising to me.

I needed to dig around in the /var/log/messages file to find an error message:

Oct  2 10:57:11 localhost lvm: 3 logical volume(s) in volume group "ol" now active
Oct  2 10:57:11 localhost systemd: Started LVM2 PV scan on device 8:2.
Oct  2 10:57:11 localhost mount: mount: wrong fs type, bad option, bad superblock on /dev/mapper/ol-home,
Oct  2 10:57:11 localhost mount: missing codepage or helper program, or other error
Oct  2 10:57:11 localhost mount: In some cases useful info is found in syslog - try
Oct  2 10:57:11 localhost mount: dmesg | tail or so.
Oct  2 10:57:11 localhost systemd: Found device /dev/mapper/ol-home.
Oct  2 10:57:11 localhost systemd: Mounting /home...
Oct  2 10:57:11 localhost systemd: Starting LVM2 vgchange on device 8:2...

Based on what I see, I interpret this as systemd activating LVM, and then mounting the filesystem, which gives an error. What I don’t understand is what systemd is doing when it says it found the logical volume device and mounts it.

This can be rehearsed by booting into rescue mode, choose option ‘3’, activate LVM ‘vgchange -ay’, and then overwrite the filesystem header in the logical volume: ‘dd if=/dev/zero of=/dev/ol/home bs=1k skip=2 count=1’.

A corrupted non root filesystem in LVM can recovered from in most cases. There is no need to use the linux boot image for rescue mode. A linux system will come up when the root filesystem is valid and it can load enough from it to boot linux. The following needs to be done:
– Start a shell as root.
– Repair the filesystem in the logical volume: ‘xfs_repair /dev/ol/home’.
– Mount all filesystems: ‘mount -a’. This should mount the just repaired filesystem.

Corrupted swap in LVM

Just like a non root filesystem corruption mentioned above, the inability to activate a swap device is logged in the /var/log/messages file, but there is no explicit message on the console. Outside of logging the inability to active the swap device the system will come up happily, without swap if the only swap device is corrupted. These are the messages I could find in the /var/log/messages file:

Oct  2 11:32:39 localhost systemd: Found device /dev/mapper/ol-swap.
Oct  2 11:32:39 localhost systemd: Activating swap /dev/mapper/ol-swap...
Oct  2 11:32:39 localhost kernel: alg: No test for __gcm-aes-aesni (__driver-gcm-aes-aesni)
Oct  2 11:32:39 localhost kernel: XFS (sda1): Mounting V4 Filesystem
Oct  2 11:32:39 localhost swapon: swapon: /dev/mapper/ol-swap: swapon failed: Invalid argument
Oct  2 11:32:39 localhost systemd: dev-mapper-ol\x2dswap.swap swap process exited, code=exited status=255
Oct  2 11:32:39 localhost systemd: Failed to activate swap /dev/mapper/ol-swap.
Oct  2 11:32:39 localhost systemd: Dependency failed for Swap.
Oct  2 11:32:39 localhost systemd: Job swap.target/start failed with result 'dependency'.
Oct  2 11:32:39 localhost systemd: Unit dev-mapper-ol\x2dswap.swap entered failed state.
Oct  2 11:32:39 localhost kernel: Unable to handle swap header version 0

It seems systemd understands the swap device is not usable, previously we saw systemd didn’t detect filesystem issues on an LVM device.

This can be rehearsed by booting into rescue mode, choose option ‘3’, activate LVM ‘vgchange -ay’, and then overwrite the swap header: ‘dd if=/dev/zero of=/dev/ol/swap bs=1k count=2’.

A corrupted swap device in LVM can be recovered from if the underlying device is able to function.
– Start a shell as root.
– Format the swap device again, in my case the swap device is /dev/ol/swap: ‘mkswap /dev/ol/swap’.
– Activate swap: ‘swapon -a’.
– Validate swap has been activated: ‘swapon -s’.

Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

This site uses Akismet to reduce spam. Learn how your comment data is processed.

%d bloggers like this: