Archive

Linux

The primary source of information regarding any change or issue on a linux system is the /var/log/messages file. I am often annoyed when a linux system is setup in such a way that certain messages are written to syslog with a high frequency swamping the messages file with information that is not important. The reason for my annoyance is that this makes it very hard to actually spot important information because you have to skip through a lot of lines before you find the important information, especially if you do not know for sure if there a message in the first place.

Please mind this blogpost is created on a Centos 7 server which uses rsyslog.

There are a couple of ways to manage this. The standard syslog way of managing this is the following, which can be found in /etc/rsyslog.conf:

#### RULES ####

# Log all kernel messages to the console.
# Logging much else clutters up the screen.
#kern.*                                                 /dev/console

# Log anything (except mail) of level info or higher.
# Don't log private authentication messages!
*.info;mail.none;authpriv.none;cron.none                /var/log/messages

# The authpriv file has restricted access.
authpriv.*                                              /var/log/secure

# Log all the mail messages in one place.
mail.*                                                  -/var/log/maillog


# Log cron stuff
cron.*                                                  /var/log/cron

# Everybody gets emergency messages
*.emerg                                                 :omusrmsg:*

# Save news errors of level crit and higher in a special file.
uucp,news.crit                                          /var/log/spooler

# Save boot messages also to boot.log
local7.*                                                /var/log/boot.log

What this shows is that certain principal important tasks have their own ‘facility’, which in this example are ‘mail’, ‘cron’, ‘authpriv’, etc. shown above, which are put in their own file.

But how about processes that do not have their own facility, and just write to syslog? One example of these this is dhclient, which produces messages like these:

Jun 30 03:28:32 ip-172-31-12-40 dhclient[19604]: DHCPREQUEST on eth0 to 172.31.0.1 port 67 (xid=0x5dfd88c4)
Jun 30 03:28:32 ip-172-31-12-40 dhclient[19604]: DHCPACK from 172.31.0.1 (xid=0x5dfd88c4)
Jun 30 03:28:35 ip-172-31-12-40 dhclient[19604]: bound to 172.31.12.40 -- renewal in 1726 seconds.
Jun 30 03:57:21 ip-172-31-12-40 dhclient[19604]: DHCPREQUEST on eth0 to 172.31.0.1 port 67 (xid=0x5dfd88c4)
Jun 30 03:57:21 ip-172-31-12-40 dhclient[19604]: DHCPACK from 172.31.0.1 (xid=0x5dfd88c4)
Jun 30 03:57:23 ip-172-31-12-40 dhclient[19604]: bound to 172.31.12.40 -- renewal in 1798 seconds.

Which it does every 5 minutes. This is actually truly annoying…

Luckily, there is a solution: rsyslog has the option to filter based on more properties than the logging facility a process is using. This is done using a script in /etc/rsyslog.d/.

On my server, the majority of the messages are from the dhclient and systemd daemons, and both the messages seem to be informal. In order not to miss anything, I still want that information to be logged, but not in the /var/log/messages file.

This can be actually quite simply be accomplished using the following two scripts in /etc/rsyslog.d/:

dhclient.conf:

if $programname == 'dhclient' then /var/log/dhclient.log
&stop

systemd.conf

if $programname = 'systemd' then /var/log/systemd.log
&stop

Once you created these scripts, you need to make rsyslogd read this new configuration. I thought killall -HUP rsyslogd would accomplish this, but outside of a message in the /var/log/messages file saying it got the HUP signal, it doesn’t execute a new task.

However, executing:

systemctl stop rsyslog.service
systemctl start rsyslog.service

Does make rsyslog read the new configuration, and then both dhclient and systemd log to their own files and do not write to the messages file anymore!

There is one last thing that needs to be done: make sure the newly defined logfiles are cleaned just like all the other files in /var/log. Otherwise these files will endlessly grow, eventually occupying all the space in the filesystem where /var/log is part of.

This too is really easy, the newly defined logfiles can be added to the list of syslog files for logrotate, which is defined in /etc/logrotate.d/syslog:

/var/log/cron
/var/log/maillog
/var/log/messages
/var/log/secure
/var/log/spooler
/var/log/dhclient.log
/var/log/systemd.log
{
    missingok
    sharedscripts
    postrotate
	/bin/kill -HUP `cat /var/run/syslogd.pid 2> /dev/null` 2> /dev/null || true
    endscript
}

As you can see, I simply added /var/log/dhclient.log and /var/log/systemd.log to the list.

Please mind that the filter for dhclient and systemd is the executable name, so even if the severity of the logging message is high, a message from these daemons will still go to the file it is configured to log to.

Advertisements

This blogpost is about the reason and solving getting the following message, or messages alike these when logging i to a linux box using ssh:

-bash: warning: setlocale: LC_CTYPE: cannot change locale (UTF-8): No such file or directory

However, this is a warning. Please mind such an issue might be come up in another way, which can be more disrupting; at least in the past I had issues running perl for the same issue:

[root@dmcel01 ~]# /usr/local/bin/ipconf -verify
perl: warning: Setting locale failed.
perl: warning: Please check that your locale settings:
LANGUAGE = "en_US.UTF-8",
LC_ALL = "UTF-8",
LC_CTYPE = "UTF-8",
LANG = "en_US.UTF-8"
are supported and installed on your system.
perl: warning: Falling back to the standard locale ("C").

This is a utility that comes with exadata, and I needed it during patching, but it didn’t execute because of it. This was more than 4 years ago, and I actually created a SR for it. But because I was patching during a downtime window, I had to quickly solve it. I managed to find it, and reported it back to Oracle, which was turned into a MOS note (however, as Oracle seems to have a habit of) without proper attribution to me, and a bit wrong. You can see the note here (MOS account required) Running /Usr/Local/Bin/Ipconf -Verify Results In ‘Perl: Warning: Setting Locale Failed.’ (Doc ID 1393517.1)

Anyway, today I ran into this again. The issue is the CLIENT (the machine from which an ssh session is created) explicitly requested locale settings on the machine it connected to. This is done in the client’s global ssh settings in a file called ssh_config.

I should add I use MacOS/OSX, which is derived from BSD, which is unix/linux like.

The file ssh_config was found in /etc/ssh (/etc/ssh/ssh_config), but at least on 10.14.3 (Mojave) this has changed to /private/etc/ssh/ssh_config. The MOS note describes /etc/ssh_config, but I don’t think ssh_config is present in /etc directly in any OS by default, and probably mostly found in /etc/ssh.

In my case, after updating to OSX Mojave, the ssh_config file is overwritten, giving me back the locale issue. This is how the original default host settings look like:

Host *
SendEnv LANG LC_*

To solve the locale issues (again, for me), change it very simply to:

Host *
SendEnv LANG

This post is about how to manage grub2 in an easy way.

grub1

In the past, which is before linux EL7, the boot loader was grub, the grand unified bootloader (version 1). Things were very simple; if you installed another kernel (using rpm) it would add an entry to grub’s configuration in /boot/grub/menu.lst. If you wanted to change grub to boot that newly installed kernel by default you edit /boot/grub/menu.lst and set ‘default’ to the number, counting from zero, of the newly installed kernel, in the order of the kernels listed. If you wanted a certain option set for booting the kernel, you added it to the kernel line.

grub2

Then came grub2. Reading up on the grub2 homepage it turns out that grub 1 apparently was old code, and it was clunky, hard to maintain and extend with new features. Other reasons that probably contributed to grub2 acceptance where UEFI support as well as limited support for other architectures than x86 and x86_64.

grub2: the mechanics

The heart of the grub2 configuration is by default in /boot/grub2/grub/grub.cfg. The start of the file looks like bash shell script. A little further in the file, the installed kernels and arguments are found:

### BEGIN /etc/grub.d/10_linux ###
menuentry 'Oracle Linux Server (4.1.12-124.18.6.el7uek.x86_64 with Unbreakable Enterprise Kernel) 7.5' --class oracle --class gnu-linux --class gnu --class os --unrestricted $me
nuentry_id_option 'gnulinux-4.1.12-124.18.6.el7uek.x86_64-advanced-64c9f287-84a8-48ec-b048-364295362114' {
        load_video
        set gfxpayload=keep
        insmod gzio
        insmod part_msdos
        insmod xfs
        set root='hd0,msdos1'
        if [ x$feature_platform_search_hint = xy ]; then
          search --no-floppy --fs-uuid --set=root --hint-bios=hd0,msdos1 --hint-efi=hd0,msdos1 --hint-baremetal=ahci0,msdos1 --hint='hd0,msdos1'  4fc4063c-861b-4b07-8c28-e847a53
5c6cb
        else
          search --no-floppy --fs-uuid --set=root 4fc4063c-861b-4b07-8c28-e847a535c6cb
        fi
        linux16 /vmlinuz-4.1.12-124.18.6.el7uek.x86_64 root=/dev/mapper/ol-root ro net.ifnames=0 biosdevname=0 crashkernel=auto rd.lvm.lv=ol/root rd.lvm.lv=ol/swap rhgb quiet nu
ma=off transparent_hugepage=never
        initrd16 /initramfs-4.1.12-124.18.6.el7uek.x86_64.img
}
menuentry 'Oracle Linux Server (4.1.12-112.16.4.el7uek.x86_64 with Unbreakable Enterprise Kernel) 7.5' --class oracle --class gnu-linux --class gnu --class os --unrestricted $menuentry_id_option 'gnulinux-4.1.12-112.16.4.el7uek.x86_64-advanced-64c9f287-84a8-48ec-b048-364295362114' {

A knee-jerk reaction would be to see if this file can be edited, but the heading of the file quite clearly says to not do that and that any modification could be overwritten:

#
# DO NOT EDIT THIS FILE
#
# It is automatically generated by grub2-mkconfig using templates
# from /etc/grub.d and settings from /etc/default/grub
#

So, this points us /etc/default/grub. To be sure what actually to do, the documentation/manual makes it clear how to control what kernel and options are chosen in the chapter 6.1 simple configuration handling:

grub-mkconfig does have some limitations. While adding extra custom menu entries to the end of the list can be done by editing /etc/grub.d/40_custom or creating /boot/grub/custom.cfg, changing the order of menu entries or changing their titles may require making complex changes to shell scripts stored in /etc/grub.d/. This may be improved in the future. In the meantime, those who feel that it would be easier to write grub.cfg directly are encouraged to do so (see Booting, and Shell-like scripting), and to disable any system provided by their distribution to automatically run grub-mkconfig.

The file /etc/default/grub controls the operation of grub-mkconfig. It is sourced by a shell script, and so must be valid POSIX shell input; normally, it will just be a sequence of ‘KEY=value’ lines, but if the value contains spaces or other special characters then it must be quoted.

What this means to say is that for basic operation and selection, the /etc/default/grub file must be edited, after which grub-mkconfig must be run to “build” the actual grub settings. That’s not extremely intuitive from my point of view.

Let’s look at the /etc/default/grub on my test system, which is not touched as far as I know:

[root@o184 ~]# cat /etc/default/grub
GRUB_TIMEOUT=5
GRUB_DISTRIBUTOR="$(sed 's, release .*$,,g' /etc/system-release)"
GRUB_DEFAULT=saved
GRUB_DISABLE_SUBMENU=true
GRUB_TERMINAL_OUTPUT="console"
GRUB_CMDLINE_LINUX="net.ifnames=0 biosdevname=0 crashkernel=auto rd.lvm.lv=ol/root rd.lvm.lv=ol/swap rhgb quiet numa=off transparent_hugepage=never"
GRUB_DISABLE_RECOVERY="true"

The first thing to look at is ‘GRUB_DEFAULT’, which is the default grub entry/kernel that is booted. It is set to ‘saved’, which is not really helpful. The utility ‘grub2-editenv’ can be used to list what the saved entry is:

[root@o184 ~]# grub2-editenv list
saved_entry=0

Which means the first (this is counting from zero) entry in /boot/grub2/grub.cfg. I have not come across a utility that is installed by default to list the menu entries with their number in grub.cfg, it’s easy to do with some shell commands, although that feels clumsy to me:

[root@o184 ~]# cat /boot/grub2/grub.cfg | grep ^menuentry | nl -v0
     0	menuentry 'Oracle Linux Server (4.1.12-124.18.6.el7uek.x86_64 with Unbreakable Enterprise Kernel) 7.5' --class oracle --class gnu-linux --class gnu --class os --unrestricted $menuentry_id_option 'gnulinux-4.1.12-124.18.6.el7uek.x86_64-advanced-64c9f287-84a8-48ec-b048-364295362114' {
     1	menuentry 'Oracle Linux Server (4.1.12-112.16.4.el7uek.x86_64 with Unbreakable Enterprise Kernel) 7.5' --class oracle --class gnu-linux --class gnu --class os --unrestricted $menuentry_id_option 'gnulinux-4.1.12-112.16.4.el7uek.x86_64-advanced-64c9f287-84a8-48ec-b048-364295362114' {
     2	menuentry 'Oracle Linux Server (3.10.0-862.11.6.el7.x86_64 with Linux) 7.5' --class oracle --class gnu-linux --class gnu --class os --unrestricted $menuentry_id_option 'gnulinux-3.10.0-862.11.6.el7.x86_64-advanced-64c9f287-84a8-48ec-b048-364295362114' {
     3	menuentry 'Oracle Linux Server (0-rescue-069a6c1ff25b409fa87b8e587a2f8b4d with Linux) 7.5' --class oracle --class gnu-linux --class gnu --class os --unrestricted $menuentry_id_option 'gnulinux-0-rescue-069a6c1ff25b409fa87b8e587a2f8b4d-advanced-64c9f287-84a8-48ec-b048-364295362114' {

So in this case kernel 4.1.12-124.18.6.el7uek.x86_64 is started by default.

In fact, the value of ‘saved’ is stored in /boot/grub2/grubenv:

[root@o184 ~]# cat /boot/grub2/grubenv
# GRUB Environment Block
saved_entry=0
################...

There is more to tell about the GRUB_DEFAULT and especially GRUB_SAVEDEFAULT; if GRUB_SAVEDEFAULT is set and set to ‘true’, and GRUB_DEFAULT is set to ‘saved’, then any kernel that is chosen in grub at boot will be saved as default. However, as far as I can see, this option is not set in a default /etc/default/grub file.

Another caveat is that by setting the number of the menuentry, the order could be changed when installing a new kernel, and the numbering may be different. For that reason, it’s also possible to set either the entry description after menuentry, or a name (“identifier”) set with –id at the menuentry line.

grubby

This is all nice, but it feels really indirect to me, multiple files must be combined to understand the current settings, and multiple commands must be entered to make a change. This also means it’s a little less easy to automate.

Now on to the actual reason for this blogpost. There actually is a utility that can do most of the manipulation using a single command, which is installed by default. However, it is not very well known. Let me introduce ‘grubby’ to you!

List grub entries:

[root@o184 ~]# grubby --info=ALL
index=0
kernel=/boot/vmlinuz-4.1.12-124.18.6.el7uek.x86_64
args="ro net.ifnames=0 biosdevname=0 crashkernel=auto rd.lvm.lv=ol/root rd.lvm.lv=ol/swap rhgb quiet numa=off transparent_hugepage=never "
root=/dev/mapper/ol-root
initrd=/boot/initramfs-4.1.12-124.18.6.el7uek.x86_64.img
title=Oracle Linux Server (4.1.12-124.18.6.el7uek.x86_64 with Unbreakable Enterprise Kernel) 7.5
index=1
kernel=/boot/vmlinuz-4.1.12-112.16.4.el7uek.x86_64
args="ro net.ifnames=0 biosdevname=0 crashkernel=auto rd.lvm.lv=ol/root rd.lvm.lv=ol/swap rhgb quiet numa=off transparent_hugepage=never "
root=/dev/mapper/ol-root
initrd=/boot/initramfs-4.1.12-112.16.4.el7uek.x86_64.img
title=Oracle Linux Server (4.1.12-112.16.4.el7uek.x86_64 with Unbreakable Enterprise Kernel) 7.5
...

List current default kernel:

[root@o184 ~]# grubby --default-kernel
/boot/vmlinuz-4.1.12-124.18.6.el7uek.x86_64

Set another kernel to boot:

[root@o184 ~]# grubby --set-default /boot/vmlinuz-4.1.12-112.16.4.el7uek.x86_64
[root@o184 ~]# grubby --default-kernel
/boot/vmlinuz-4.1.12-112.16.4.el7uek.x86_64

List current settings of the grub2 menuentry:

[root@o184 ~]# grubby --info /boot/vmlinuz-4.1.12-112.16.4.el7uek.x86_64
index=1
kernel=/boot/vmlinuz-4.1.12-112.16.4.el7uek.x86_64
args="ro net.ifnames=0 biosdevname=0 crashkernel=auto rd.lvm.lv=ol/root rd.lvm.lv=ol/swap rhgb quiet numa=off transparent_hugepage=never"
root=/dev/mapper/ol-root
initrd=/boot/initramfs-4.1.12-112.16.4.el7uek.x86_64.img
title=Oracle Linux Server (4.1.12-112.16.4.el7uek.x86_64 with Unbreakable Enterprise Kernel) 7.5

Grubby also facilitates making changes to the arguments of the kernel.
For example, if you want to change the setting ‘numa=off’ to ‘numa=on’, you can remove the argument:

[root@o184 ~]# grubby --update-kernel /boot/vmlinuz-4.1.12-112.16.4.el7uek.x86_64 --remove-args="numa=off"
[root@o184 ~]# grubby --info /boot/vmlinuz-4.1.12-112.16.4.el7uek.x86_64 | grep ^args
args="ro net.ifnames=0 biosdevname=0 crashkernel=auto rd.lvm.lv=ol/root rd.lvm.lv=ol/swap rhgb quiet transparent_hugepage=never"

And then add it with the correct argument:

[root@o184 ~]# grubby --update-kernel /boot/vmlinuz-4.1.12-112.16.4.el7uek.x86_64 --args="numa=on"
[root@o184 ~]# grubby --info /boot/vmlinuz-4.1.12-112.16.4.el7uek.x86_64 | grep ^args
args="ro net.ifnames=0 biosdevname=0 crashkernel=auto rd.lvm.lv=ol/root rd.lvm.lv=ol/swap rhgb quiet transparent_hugepage=never numa=on"

Grubby makes it easy to list and manage the grub2 boot loader settings, and also easily be used in scripts.

Agenda: https://www.techdaysbelgium.be/?page_id=507

Dates: February 7 and 8, 2019

Location: http://cinemacartoons.be in Antwerp, Belgium

More information soon.

For people from the netherlands: this is easy reachable by car or by train! This is a chance to attend a conference and meet up with a lot of well-known speakers in the Oracle database area without too extensive travelling.

This blogpost is about opatch and how to obtain information about the current oracle home(s), and how to obtain information about the patches to be applied.

Patches that can be applied using opatch are provided by oracle as zip files which have the following naming convention:
p[patchnumber]_[baseversion]_[platform]-[architecture].zip. The patch normally contains an XML file called ‘PatchSearch.xml’ and a directory with the patch number. Inside the patch number directory there is a README.txt which is lame, because it says ‘Refer to README.html’, and a README.html that contains the readme information that is also visible when the [README] button for this patch is selected in MOS.

I spend my time on the CLI exclusively. This is because I spend my time on remote servers all the time, and using the X window system would be unusable. The best part of using the CLI is that when done correctly, it gives you almost infinite control over what you do, while when clicking through an interface toggling selections and filling out fields makes you entering information that you have to copy from a document or make up on the spot, which then is quickly hidden by entering another tab or window. In fact, based on my experience, this is a guaranteed way of generating wrong or inconsistent results.

Because of being on the CLI almost exclusively on servers, I sometimes need to read the README.html. I can do that in MOS, but sometimes you want that information on the spot. You can open up the HTML file in ‘vim’ or ‘less’, but it will show you a lot of HTML making it very hard to read. What I find useful is installing an executable called ‘elinks’ (available on oracle linux via yum), and then read the README.html in this text based browser:

$ elinks README.html

Result:

                                             Oracle® Database Patch % psuBug % - Database Release Update % product_version % (1/12)
                                                       Go to primary content

   Patch 28655784 - Database Release Update 18.4.0.0.181016

   ──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────

   The script content on this page is for navigation purposes only and does not alter the content in any way.

                                                          Oracle® Database

   Patch 28655784 - Database Release Update 18.4.0.0.181016

   This document is accurate at the time of release. For any changes and additional information regarding Database Release Update
   18.4.0.0.181016, see these related documents that are available at My Oracle Support (http://support.oracle.com/):

     * Document 2433586.1 Oracle DB/GI/OJVM Update/Revision R18 Oct 2018 Known Issues

   This document includes the following sections:

     * Section 1, "Patch Information"

     * Section 2, "Prerequisites"

     * Section 3, "Installation"

     * Section 4, "Deinstallation"

     * Section 5, "Known Issues"

     * Section 6, "References"

     * Section 7, "Bugs Fixed by This Patch"

     * Section 8, "Documentation Accessibility"

1 Patch Information

   Database Release Update 18.4.0.0.181016 patches are cumulative. That is, the content of all previous Database bundles is
   included in the latest Database bundle patch.

   To install the Database Release Update 18.4.0.0.181016 patch, the Oracle home must have the 18.1 Database installed.
http://support.oracle.com/                                                                                                 [------]

The first thing I do, is obtain information about this patch. Based on my experience, very carefully examine the readme. It contains vital information about the patch, but I found inconsistencies in it in the past. I got the impression the readme’s of patches that are created with a certain interval are simply copied from a previous version and then corrected by hand, and things might be entered wrong or are forgotten and not touched, leaving wrong information.

opatch lspatches on an unzipped patch

Luckily, a lot of information in the readme is actually stored in the metadata of the patch. You can query the metadata of the patch to be applied using opatch lspatches by pointing it to the patch directory:
(in this example I am querying patch 28655784, which is RU 18.4, and my database oracle home is version 18.3)

$ /u01/app/oracle/product/18.3.0.0/dbhome_1/OPatch/opatch lspatches 28655784/
patch_id:28655784
unique_patch_id:22509982
date_of_patch:8 Oct 2018, 21:27:28 hrs PST8PDT
patch_description:Database Release Update : 18.4.0.0.181016 (28655784)
component:oracle.rdbms.rsf.ic,18.0.0.0.0,optional; oracle.oracore.rsf,18.0.0.0.0,optional; oracle.ctx.atg,18.0.0.0.0,optional; oracle.rdbms.rman,18.0.0.0.0,optional; oracle.rdbms.rsf,18.0.0.0.0,optional; oracle.sdo.locator.jrf,18.0.0.0.0,optional; ....
platform:226,Linux x86-64
executable:ORACLE_HOME/lib/libclntsh.so.18.1; ORACLE_HOME/lib/libasmclntsh18.so; ORACLE_HOME/lib/libskgxp11.so; ORACLE_HOME/lib/libskgxp18.so; ORACLE_HOME/lib/libsqlplus.so; ORACLE_HOME/bin/oracle; ...
instance_shutdown:true
online_rac_installable:true
patch_type:singleton
product_family:db
auto:false
bug:28571483, TRACKING BUG TO REGRESS ALL BLR/CIS OF 27502420
...

(edited for brevity)
Line 2: patch_id: patch number
Line 5: patch_description: description 🙂
Line 6: component: to be patched objects grouped in a ‘component’
Line 7: platform: the platform the patch is intended for
Line 8: executable: the executables that are patched by this patch
Line 9: instance_shutdown: does an instance have to shutdown when it’s ORACLE_HOME is patched?
Line 10: online_rac_installable: does the database have to go down entirely, or can the patch be applied rolling?
Line 11: patch_type: singleton (a single patch). other types that I know: bundle_member (part of a patch bundle).
Line 12: product_family: for which oracle product is this patch? Grid infra also is family ‘db’.
Line 13: auto: can this patch be applied using ‘opatch auto’?

opatch lspatches can also be used to validate if a patch has been applied to the oracle home:

$ opatch lspatches -verify 28655784/
Inventory check failed: Patch ID 28655784 is NOT registered in Oracle Home "/u01/app/oracle/product/18.3.0.0/dbhome_1" inventory.
Following patches [ 28655784 ] are NOT registered in Oracle Home "/u01/app/oracle/product/18.3.0.0/dbhome_1" inventory or can't load its meta-data

OPatch failed with error code 1

Please mind the argument to -verify is a unzipped patch patch-number directory.

opatch query

This functionality is also available using ‘opatch query’ (executed as opatch query [argument] directory) opatch query is meant to be run on an unzipped patch patch-number directory, not on an ORACLE_HOME. The next overview takes the above opatch lspatches output, and lists for what lines there is an equivalent opatch query argument to retrieve that information:
Line 2: patch_id: is the patch number, which is the directory that is entered as an argument.
Line 5: patch_description: the description cannot be extracted on its own, but is visible with argument -all.
Line 6: component: argument: -get_component
Line 7: platform: argument: -get_os
Line 8: executable: the executables cannot be fetched on its own, but is visible with argument -all.
Line 9: instance_shutdown: argument: -is_online_patch
Line 10: online_rac_installable: argument: -is_rolling_patch
Line 11: patch_type: argument: -get_patch_type
Line 12: product_family: argument: -get_product_family
Line 13: auto: argument: -is_auto_patch

However, opatch query offers a couple of other switches which can come in handy:
-is_system_patch: System patches are patches that contain several sub-patches and must be applied using opatchauto.
-get_patching_model: The way the patch is applied, normal patching seem to have model “one-off”.

opatch lsinventory (or shorthand: lsinv)

Now the we looked at a patch to be applied, we should also look at the metadata of the oracle home to be patched, which is stored in the inventory. The contents of the inventory can be shown using lsinventory or shorthand lsinv.

$ /u01/app/oracle/product/18.3.0.0/dbhome_1/OPatch/opatch lsinv
Oracle Interim Patch Installer version 12.2.0.1.14
Copyright (c) 2018, Oracle Corporation.  All rights reserved.


Oracle Home       : /u01/app/oracle/product/18.3.0.0/dbhome_1
Central Inventory : /u01/app/oraInventory
   from           : /u01/app/oracle/product/18.3.0.0/dbhome_1/oraInst.loc
OPatch version    : 12.2.0.1.14
OUI version       : 12.2.0.4.0
Log file location : /u01/app/oracle/product/18.3.0.0/dbhome_1/cfgtoollogs/opatch/opatch2018-11-13_08-18-30AM_1.log

Lsinventory Output file location : /u01/app/oracle/product/18.3.0.0/dbhome_1/cfgtoollogs/opatch/lsinv/lsinventory2018-11-13_08-18-30AM.txt

--------------------------------------------------------------------------------
Local Machine Information::
Hostname: ip-10-0-11-24.local
ARU platform id: 226
ARU platform description:: Linux x86-64

Installed Top-level Products (1):

Oracle Database 18c                                                  18.0.0.0.0
There are 1 products installed in this Oracle Home.


Interim patches (4) :

Patch  27908644     : applied on Wed Jul 18 13:44:11 EDT 2018
Unique Patch ID:  22153180
Patch description:  "UPDATE 18.3 DATABASE CLIENT JDK IN ORACLE HOME TO JDK8U171"
   Created on 4 May 2018, 01:21:02 hrs PST8PDT
   Bugs fixed:
     27908644

Patch  27923415     : applied on Wed Jul 18 13:41:38 EDT 2018
Unique Patch ID:  22239273
Patch description:  "OJVM RELEASE UPDATE: 18.3.0.0.180717 (27923415)"
   Created on 15 Jul 2018, 10:33:22 hrs PST8PDT
   Bugs fixed:
     27304131, 27539876, 27952586, 27642235, 27636900, 27461740

Patch  28090553     : applied on Wed Jul 18 13:40:01 EDT 2018
Unique Patch ID:  22256940
Patch description:  "OCW RELEASE UPDATE 18.3.0.0.0 (28090553)"
   Created on 11 Jul 2018, 19:20:31 hrs PST8PDT
   Bugs fixed:
     12816839, 18701017, 22734786, 23698980, 23840305, 25709124, 25724089
     26299684, 26313403, 26433972, 26527054, 26586174, 26587652, 26647619
     26827699, 26860285, 26882126, 26882316, 26943660, 26996813, 27012915
...etc...

This shows:
Line 6: the ORACLE_HOME for which the contents are listed.
Line 7: the central inventory location.
Line 9: the opatch version
Line 10: the oracle universal install version
Line 17: the hostname
Line 18/19: the platform description and identification
Line 20/24: installed ‘top level’ products
Line 27- : applied interim patches

This is not very detailed data (except for the bugs fixed numbers), however, there is an option to get more information: add ‘-detail’ to the lsinv command:

$ /u01/app/oracle/product/18.3.0.0/dbhome_1/OPatch/opatch lsinv -detail
Oracle Interim Patch Installer version 12.2.0.1.14
Copyright (c) 2018, Oracle Corporation.  All rights reserved.


Oracle Home       : /u01/app/oracle/product/18.3.0.0/dbhome_1
Central Inventory : /u01/app/oraInventory
   from           : /u01/app/oracle/product/18.3.0.0/dbhome_1/oraInst.loc
OPatch version    : 12.2.0.1.14
OUI version       : 12.2.0.4.0
Log file location : /u01/app/oracle/product/18.3.0.0/dbhome_1/cfgtoollogs/opatch/opatch2018-11-13_10-15-09AM_1.log

Lsinventory Output file location : /u01/app/oracle/product/18.3.0.0/dbhome_1/cfgtoollogs/opatch/lsinv/lsinventory2018-11-13_10-15-09AM.txt

--------------------------------------------------------------------------------
Local Machine Information::
Hostname: ip-10-0-11-24.local
ARU platform id: 226
ARU platform description:: Linux x86-64

Installed Top-level Products (1):

Oracle Database 18c                                                  18.0.0.0.0
There are 1 products installed in this Oracle Home.


Installed Products (125):

Assistant Common Files                                               18.0.0.0.0
BLASLAPACK Component                                                 18.0.0.0.0
Buildtools Common Files                                              18.0.0.0.0
Cluster Verification Utility Common Files                            18.0.0.0.0
Database Configuration and Upgrade Assistants                        18.0.0.0.0
Database Migration Assistant for Unicode                             18.0.0.0.0
...etc...
Patch  27923415     : applied on Wed Jul 18 13:41:38 EDT 2018
Unique Patch ID:  22239273
Patch description:  "OJVM RELEASE UPDATE: 18.3.0.0.180717 (27923415)"
   Created on 15 Jul 2018, 10:33:22 hrs PST8PDT
   Bugs fixed:
     27304131, 27539876, 27952586, 27642235, 27636900, 27461740
   Files Touched:
     java.security --> ORACLE_HOME/javavm/jdk/jdk8/lib/security/java.security
     cacerts --> ORACLE_HOME/javavm/jdk/jdk8/lib/security/cacerts
     sunjce_provider.jar --> ORACLE_HOME/javavm/jdk/jdk8/lib/sunjce_provider.jar
     jce.jar --> ORACLE_HOME/javavm/jdk/jdk8/lib/jce.jar
     classes.bin --> ORACLE_HOME/javavm/jdk/jdk8/admin/classes.bin
     lfclasses.bin --> ORACLE_HOME/javavm/jdk/jdk8/admin/lfclasses.bin
     jvmpsupii.sql --> ORACLE_HOME/javavm/install/jvmpsupii.sql
     jvmpsupi.sql --> ORACLE_HOME/javavm/install/jvmpsupi.sql
     jvmpsu.sql --> ORACLE_HOME/javavm/install/jvmpsu.sql
     jvmpsui.sql --> ORACLE_HOME/javavm/install/jvmpsui.sql
     jvmpsupdi.sql --> ORACLE_HOME/javavm/install/jvmpsupdi.sql
     jvmpsupdii.sql --> ORACLE_HOME/javavm/install/jvmpsupdii.sql
     libjavavm18.a --> ORACLE_HOME/javavm/jdk/jdk8/lib/libjavavm18.a
     27923415.xml --> ORACLE_HOME/sqlpatch/27923415/22239273/27923415.xml
     27923415_apply.sql --> ORACLE_HOME/sqlpatch/27923415/22239273/27923415_apply.sql
     27923415_rollback.sql --> ORACLE_HOME/sqlpatch/27923415/22239273/27923415_rollback.sql
     jvmpsupdi.sql --> ORACLE_HOME/sqlpatch/27923415/22239273/rollback_files/18.1.0.0.0/javavm/install/jvmpsupdi.sql
     jvmpsupdii.sql --> ORACLE_HOME/sqlpatch/27923415/22239273/rollback_files/18.1.0.0.0/javavm/install/jvmpsupdii.sql
     jvmpsupii.sql --> ORACLE_HOME/sqlpatch/27923415/22239273/rollback_files/18.1.0.0.0/javavm/install/jvmpsupii.sql
     jvmpsu.sql --> ORACLE_HOME/sqlpatch/27923415/22239273/rollback_files/18.1.0.0.0/javavm/install/jvmpsu.sql
     jvmpsui.sql --> ORACLE_HOME/sqlpatch/27923415/22239273/rollback_files/18.1.0.0.0/javavm/install/jvmpsui.sql
     jvmpsupi.sql --> ORACLE_HOME/sqlpatch/27923415/22239273/rollback_files/18.1.0.0.0/javavm/install/jvmpsupi.sql
     jox.o --> ORACLE_HOME/rdbms/lib/jox.o
     ins_rdbms.mk --> ORACLE_HOME/rdbms/lib/jox_on
     ins_rdbms.mk --> ORACLE_HOME/rdbms/lib/ioracle
     aurora.zip --> ORACLE_HOME/javavm/lib/aurora.zip
   Patch Location in Inventory:
     /u01/app/oracle/product/18.3.0.0/dbhome_1/inventory/oneoffs/27923415
   Patch Location in Storage area:
     /u01/app/oracle/product/18.3.0.0/dbhome_1/.patch_storage/27923415_Jul_15_2018_10_33_22

Patch  28090553     : applied on Wed Jul 18 13:40:01 EDT 2018
Unique Patch ID:  22256940
Patch description:  "OCW RELEASE UPDATE 18.3.0.0.0 (28090553)"
   Created on 11 Jul 2018, 19:20:31 hrs PST8PDT
   Bugs fixed:
     12816839, 18701017, 22734786, 23698980, 23840305, 25709124, 25724089
...etc...

The start of the output is the same as without ‘-detail’. However, on lines 27-34 you see the products, which as far as I understand it are what is meant with ‘components’ when looking at the patch details using ‘opatch lspatches’ and ‘opatch query’.

If we look at the details of patch 27923415 (line 36), at the end of the ‘Files Touched’ list at lines 69-72 it shows where the metadata is stored of this patch; that’s in the inventory directory in oneoffs/27923415. This directory contains information about the currently applied patch contents in the home.
The storage area contains the previous version of the contents that have been patched in case a rollback is needed.

opatch prereq

Now the we looked at the patch and the ORACLE_HOME to be patched, the next thing we can look at is if the environment is ready for the patch that was inspected to be applied to the ORACLE_HOME that was inspected.

Please mind that normally you would follow the readme, maybe run the prereq CheckConflictAgainstOHWithDetail (I don’t normally, because I check for conflicts before that), and then run opatch apply. And that is perfectly fine! Opatch performs all the checks that are listed below before applying a patch. That is one of the reasons opatch takes such a long time: it does perform a lot of work to be absolutely certain everything is alright.

The reason for listing the below individual checks is first of all to make the reader aware that all the checks opatch apply executes are available individually to be performed prior to patching. Another reason is to document and logically group these checks.

In fact, the PSU patching documentation advises to run the check CheckConflictAgainstOHWithDetail before applying the database PSU. I hope you now understand this is just a check that would have been performed anyway, but by explicitly executing this check before patching, you can validate the patch is apply-able to the oracle home in the first place.

Once opatch fails these individual checks can be used to validate the error and obtain more information:
If you read through them, you’ll see there are checks which are supersets of other checks.

— os checks
– Is there enough disk space available?
opatch prereq CheckSystemSpace -ph 28655784/
– Are all the operating system executables needed for patching available?
opatch prereq CheckSystemCommandAvailable -ph 28655784/
– Is the patch compatible with the OS/platform?
opatch prereq CheckPatchApplicableOnCurrentPlatform -ph 28655784/
– Are the files and directory structure and rights of the unzipped patch sane?
opatch prereq CheckPatchShipHome -ph 28655784/
– Are any files that are touched by the patch still open by processes?
opatch prereq CheckActiveFilesAndExecutables -ph 28655784/
– Is the current user ‘root’? If so, fail the check
opatch prereq CheckUserAdminPrivilege -ph 28655784/

— inventory checks
– Check if the ORACLE_HOME to be patched is available in the central inventory
opatch prereq CheckCentralInventoryForOH -ph 28655784/
– Is the central inventory writeable?
opatch prereq CheckCentralInventoryForRWSession -ph 28655784/
– Is the central inventory sane?
opatch prereq CheckCentralInventoryLocation -ph 28655784/

— oracle home checks
– Are the required components available in the oracle home?
opatch prereq CheckComponents -ph 28655784/
– Are the required components (products) available in the ORACLE_HOME, and are the actions applicable?
opatch prereq CheckApplicable -ph 28655784/
– Are there patches applied to the ORACLE_HOME already that are in the patches to be applied
opatch prereq CheckForIdenticalPatchInOracleHome -ph 28655784/
– Check if the oracle home is locked for patching (which might be by a previous unsuccessful opatch execution)
opatch prereq CheckIfOHLockedForPatching -ph 28655784/
– Check if the oracle home is valid, check for proper inventory files in the oracle home
opatch prereq CheckOracleHome -ph 28655784/
– Check if all the patches that are required by the patch to be applied are present in the oracle home
opatch prereq CheckPatchApplyDependents -ph 28655784/
– Check if the patching model of the patch is compatible with the oracle home
opatch prereq CheckPatchingModel -ph 28655784/

— conflict checks
– Are there conflicts between the patches in the ORACLE_HOME and the patches to be applied?
opatch prereq CheckConflictAgainstOH -ph 28655784/
– Are there conflicts between the patches in the ORACLE_HOME and the patches to be applied,
– and print out detailed conflict information.
opatch prereq CheckConflictAgainstOHWithDetail -ph 28655784/
– Are there conflicts between the patches to be applied?
– (please mind this requires multiple patches, which can be specified using -phBaseDir)
opatch prereq CheckConflictAmongPatches -phBaseDir
– Are there conflicts between the patches to be applied,
– and print out detailed conflict information.
– (please mind this requires multiple patches, which can be specified using -phBaseDir)
opatch prereq CheckConflictAmongPatchesWithDetail -phBaseDir

— RAC checks
– Check if the central inventory has a CRS home if the database home is RAC
opatch prereq CheckForCRSHomeIfRAC -ph 28655784/
– Check if all the nodes in the RAC setup are valid, up and reachable
opatch prereq CheckRACNodeList -ph 28655784/
– Check if commands can be invoked on the remote machines
opatch prereq CheckRemoteCommandInvocable -ph 28655784/
– Check if files can be copied to and removed from the remote machines
opatch prereq CheckRemoteCopyAndRemove -ph 28655784/

— opatch checks
– Check if the input values provided to opatch are sufficient to run the patch action
opatch prereq CheckForInputValues -ph 28655784/
– Is the opatch version high enough as per patch requirement?
opatch prereq CheckMinimumOPatchVersion -ph 28655784/

— rollback checks
– Check if all the patches provided to rollback are present in the given oracle home
opatch prereq CheckInstalledOneOffs -ph 28655784/
– Check if there are patches applied that depend on the patch that is being rolled back
opatch prereq CheckPatchRollbackDependents -ph 28655784/
– Check if the patch can be rolled back from the oracle home
opatch prereq CheckRollbackable -ph 28655784/

— OUI checks
– Is the OUI available in the ORACLE_HOME?
opatch prereq CheckOUILocation -ph 28655784/
– Is the OUI version high enough?
opatch prereq CheckOUIVersionCompatible -ph 28655784/
– Are all OUI libraries present?
opatch prereq CheckRequiredLibs -ph 28655784/

— oraInst.loc check
– Is oraInst.loc file proper?
opatch prereq CheckOraInstLocation -ph 28655784/

A few options of which I currently am not sure what they exactly do:
– CheckApplicableProduct — ?
opatch prereq CheckApplicableProduct -ph 28655784/
– CheckForNoOpPatches — ?
opatch prereq CheckForNoOpPatches -ph 28655784/
– CheckRollbackSid — ?
opatch prereq CheckRollbackSid -ph 28655784/

Conclusion

This is the second post about opatch, and I didn’t even get to the actual patching yet.

This post showed a number of ways to inspect both the patch to be applied (opatch lspatches/query) and the oracle home to apply the patch to (opatch lspatches/lsinventory), and a way to check specific properties before applying a patch and to validate and check things after a patch (opatch prereq).

This is important once you have to apply a patch to validate a machine in general, the central inventory and the oracle home inventory for readiness, and even more important to check an environment in case of patch failure.

I was asked some time ago what the Oracle database event ‘TCP socket (KGAS)’ means. This blogpost is a deep dive into what this event times in Oracle database 12.1.0.2.180717.

This event is not normally seen, only when TCP connections are initiated from the database using packages like UTL_TCP, UTL_SMTP and the one used in this article, UTL_HTTP.

A very basic explanation is this event times the time that a database foreground session spends on TCP connection management and communicating over TCP, excluding client and database link (sqlnet) networking. If you trace the system calls, you see that mostly that is working with a (network) socket. Part of the code in the oracle database that is managing that, sits in the kernel code layer kgas, kernel generic (of which I am quite sure, and then my guess:) asynchronous services, which explains the naming of the event.

This is what the Oracle online manual (https://docs.oracle.com/database/121/REFRN/GUID-203ACA60-9912-4493-9B79-EA4CDE89A78D.htm#REFRN00642 – Oracle is notorious for breaking links) says about ‘TCP socket (KGAS)’:

C.3.157 TCP Socket (KGAS)
A session is waiting for an external host to provide requested data over a network socket. The time that this wait event tracks does not indicate a problem, and even a long wait time is not a reason to contact Oracle Support. It naturally takes time for data to flow between hosts over a network, and for the remote aspect of an application to process any request made to it. An application that communicates with a remote host must wait until the data it will read has arrived. In addition, on Microsoft Windows, a separate thread monitors the arrival of traffic. This thread spends most of its life in waits tracked by the TCP Socket (KGAS) wait event.

Wait Time: The total elapsed time for the network connection to be established or for data to arrive from over the network

Parameter Description
P0 For Oracle internal use only. Values 8, 9, and 10 occur within the special thread present on Microsoft Windows; other P0 values occur in normal user sessions.

P1 For Oracle internal use only

Overall, the basic explanation that Oracle provides is mostly correct. I think the text saying to not contact Oracle support is not relevant, but maybe there is a need to relieve Oracle support. In my tests, I found that the full TCP connection lifecycle (creation, usage and removal) is put under this event, for which the text seems to emphasise on waiting for a remote host, which would be the most logical culprit for wait times, but other issues could lead to wait times additionally. This means the wait event itself is not explaining what it is showing, outside of TCP connection management.

The wait time explanation is nearly complete. If it would say something like ‘all TCP connection management and usage’ it would have fully covered it, it now excludes disconnecting and sending, because it explicitly mentions creating the connecting and receiving (waiting for data).

I do not understand what is meant with P0 and P1. I think it is p1 and p2 of the wait event, but naming it P0 and P1 is weird. When looking at the explanation it reads to me ‘we do not wish to explain anything to you’.

So, that means I am going to find this out myself….

If you are interested in this, or do want to write articles like this too, I urge you to replay this on your own system.

First of all, create a small setup which you can use to actually execute UTL_HTTP. The example essentially is taken from father of code examples, Tim Hall/Oracle base. Thank you Tim!
First setup the database to allow a user (in my case ‘ts’) to create the objects and use the network:

grant create sequence to ts;
grant create procedure to ts;
grant create table to ts;
grant alter session to ts;
begin
  dbms_network_acl_admin.create_acl (
    acl => 'anything.xml',
    description => 'allow anything',
    principal => 'TS',
    is_grant => true,
    privilege => 'connect'
  );
end;
begin
  dbms_network_acl_admin.assign_acl (
    acl => 'anything.xml',
    host => '*'
  );
end;

Then connect as the actual user (ts), and create the objects and the procedure that uses UTL_HTTP:

drop table http_clob_test;
create table http_clob_test (
        id number(10),
        url varchar2(255),
        data clob,
        constraint http_clob_test_pk primary key (id)
);
drop sequence http_clob_test_seq;
create sequence http_clob_test_seq;
CREATE OR REPLACE PROCEDURE load_html_from_url (p_url  IN  VARCHAR2) AS
  -- examples by tim hall
  -- https://oracle-base.com/articles/misc/retrieving-html-and-binaries-into-tables-over-http
  l_http_request   UTL_HTTP.req;
  l_http_response  UTL_HTTP.resp;
  l_clob           CLOB;
  l_text           VARCHAR2(32767);
BEGIN
  DBMS_LOB.createtemporary(l_clob, FALSE);

  -- Make a HTTP request and get the response.
  l_http_request  := UTL_HTTP.begin_request(p_url);
  l_http_response := UTL_HTTP.get_response(l_http_request);

  -- Copy the response into the CLOB.
  BEGIN
    LOOP
      UTL_HTTP.read_text(l_http_response, l_text, 32766);
      DBMS_LOB.writeappend (l_clob, LENGTH(l_text), l_text);
    END LOOP;
  EXCEPTION
    WHEN UTL_HTTP.end_of_body THEN
      UTL_HTTP.end_response(l_http_response);
  END;

  -- Insert the data into the table.
  INSERT INTO http_clob_test (id, url, data)
  VALUES (http_clob_test_seq.NEXTVAL, p_url, l_clob);

  -- Relase the resources associated with the temporary LOB.
  DBMS_LOB.freetemporary(l_clob);
EXCEPTION
  WHEN OTHERS THEN
    UTL_HTTP.end_response(l_http_response);
    DBMS_LOB.freetemporary(l_clob);
    RAISE;
END load_html_from_url;
/

The first thing to do is turn on sql_trace at level 8 to include waits:

set serverout on
alter session set events 'sql_trace level 8';
exec load_html_from_url('http://orafun.info/');
alter session set events 'sql_trace off';

If you look at the relevant piece, which means where it shows the wait events, it looks like this:

WAIT #139864521752120: nam='TCP Socket (KGAS)' ela= 128265  =2  =0  =0 obj#=662 tim=86395107497
WAIT #139864521752120: nam='TCP Socket (KGAS)' ela= 395  =5  =0  =0 obj#=662 tim=86395110191
WAIT #139864521752120: nam='TCP Socket (KGAS)' ela= 150  =6  =0  =0 obj#=662 tim=86395111115
WAIT #139864521752120: nam='TCP Socket (KGAS)' ela= 131998  =6  =0  =0 obj#=662 tim=86395243764
WAIT #139864521752120: nam='TCP Socket (KGAS)' ela= 269  =4  =0  =0 obj#=662 tim=86395245182
WAIT #139864521752120: nam='direct path write temp' ela= 4137 file number=201 first dba=257795 block cnt=1 obj#=662 tim=86395250494
WAIT #139864521752120: nam='TCP Socket (KGAS)' ela= 352  =3  =2  =0 obj#=662 tim=86395251294

What is shown here is some quite spectacular differences in elapsed time. Also, the only way to understand what is actually done flagged as ‘TCP Socket (KGAS)’ is the value following ‘ela’, which is the event p1 value.
The pattern is:

- 2
- 5
- 6
- 6
- 4
- 3

It’s relatively simple to guess what a few of these are:

- 2 - connect
- 5 - send
- 6 - \
- 6 - |   receiving ?
- 4 - /
- 3 - close

But if you include the timing, there must be more into play:

- 2 - ela= 128265   connect
- 5 - ela= 395      send
- 6 - ela= 150      \
- 6 - ela= 131998   |   receiving ?
- 4 - ela= 14       /
- 3 - ela= 177     close

2/connect: In order to build up a connection, a tcp connection needs to be created and established. That takes some time.
5/send: Sending from the perspective of a userland process is writing into a socket, which will get send by the operating system independently. This means sending from a userland process normally takes relative little time, because it’s not waiting for actually sending it.
6,4/receive: At this time, this doesn’t make sense to me.
3/close: Closing for a userland process is a simple, swift task. The operating system will keep the port open for some time, etc. but this is not visible for the user land application.

Let’s pull an old trick out of the hat: use strace (system call tracing) with an emphasis on writing on an oracle session that has SQL trace with waits enabled set. This will show the system calls executed, and show exactly when the oracle engine ends a wait, so we can reasonably well establish a relation between wait events and system calls. I say “reasonably well”, because we can’t see when Oracle truly started timing the wait event (kslwtbctx), only the output to trace file as part of ending the wait event (kslwtectx).

The way I done it, is using the command ‘strace -e write=all -p 18513 -o utl_http_test.txt’. Obviously 18513 is the process ID of the database foreground process. The results of the strace are in utl_http_test.txt.

Now open utl_http_test.txt and search for KGAS. The full output is way too much text, let me show some of the output which I think is noteworthy. Again: this is selective, partial output.
I do maintain the order in which the calls are visible.

1. TCP Socket (KGAS) p1=2, earlier annotated as ‘connect’

-- try to find a socket that has been created by nscd (name server caching deamon)
-- two times?
--
socket(AF_LOCAL, SOCK_STREAM|SOCK_CLOEXEC|SOCK_NONBLOCK, 0) = 9
connect(9, {sa_family=AF_LOCAL, sun_path="/var/run/nscd/socket"}, 110) = -1 ENOENT (No such file or directory)
close(9)                                = 0
socket(AF_LOCAL, SOCK_STREAM|SOCK_CLOEXEC|SOCK_NONBLOCK, 0) = 9
connect(9, {sa_family=AF_LOCAL, sun_path="/var/run/nscd/socket"}, 110) = -1 ENOENT (No such file or directory)
close(9)                                = 0
--
-- obtain file status of resolv.conf (hostname resolving configuration file)
--
stat("/etc/resolv.conf", {st_mode=S_IFREG|0644, st_size=93, ...}) = 0
--
-- open and read host.conf (another hostname resolving configuration file)
--
open("/etc/host.conf", O_RDONLY|O_CLOEXEC) = 9
fstat(9, {st_mode=S_IFREG|0644, st_size=9, ...}) = 0
mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7f34bf377000
read(9, "multi on\n", 4096)             = 9
read(9, "", 4096)                       = 0
close(9)                                = 0
--
-- open and read resolv.conf (hostname resolving configuration)
--
open("/etc/resolv.conf", O_RDONLY|O_CLOEXEC) = 9
fstat(9, {st_mode=S_IFREG|0644, st_size=93, ...}) = 0
mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7f34bf377000
read(9, "# Generated by NetworkManager\nse"..., 4096) = 93
read(9, "", 4096)                       = 0
close(9)                                = 0
--
-- open /etc/hosts (ip address to hostname mapping locally)
--
open("/etc/hosts", O_RDONLY|O_CLOEXEC)  = 9
fstat(9, {st_mode=S_IFREG|0644, st_size=200, ...}) = 0
mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7f34bf377000
read(9, "127.0.0.1   localhost localhost."..., 4096) = 200
read(9, "", 4096)                       = 0
close(9)
--
-- at this point two dynamic loadable libraries are read: libnss_dns.so.2 and libresolv.so.2
--
-- this is the DNS lookup of orafun.info
-- again, this is done twice, just like the use of /var/run/nscd/socket above?
--
socket(AF_INET, SOCK_DGRAM|SOCK_CLOEXEC|SOCK_NONBLOCK, IPPROTO_IP) = 9
connect(9, {sa_family=AF_INET, sin_port=htons(53), sin_addr=inet_addr("10.0.2.3")}, 16) = 0
poll([{fd=9, events=POLLOUT}], 1, 0)    = 1 ([{fd=9, revents=POLLOUT}])
sendto(9, "=#\1\0\0\1\0\0\0\0\0\0\6orafun\4info\0\0\1\0\1", 29, MSG_NOSIGNAL, NULL, 0) = 29
 | 00000  3d 23 01 00 00 01 00 00  00 00 00 00 06 6f 72 61  =#...........ora |
 | 00010  66 75 6e 04 69 6e 66 6f  00 00 01 00 01           fun.info.....    |
poll([{fd=9, events=POLLIN}], 1, 5000)  = 1 ([{fd=9, revents=POLLIN}])
ioctl(9, FIONREAD, [45])                = 0
recvfrom(9, "=#\201\200\0\1\0\1\0\0\0\0\6orafun\4info\0\0\1\0\1\300\f\0"..., 2048, 0, {sa_family=AF_INET, sin_port=htons(53), sin_addr=inet_addr("10.0.2.3")}, [16]) = 45
close(9)                                = 0
socket(AF_INET, SOCK_DGRAM|SOCK_CLOEXEC|SOCK_NONBLOCK, IPPROTO_IP) = 9
connect(9, {sa_family=AF_INET, sin_port=htons(53), sin_addr=inet_addr("10.0.2.3")}, 16) = 0
poll([{fd=9, events=POLLOUT}], 1, 4971) = 1 ([{fd=9, revents=POLLOUT}])
sendto(9, "o=\1\0\0\1\0\0\0\0\0\0\6orafun\4info\0\0\34\0\1", 29, MSG_NOSIGNAL, NULL, 0) = 29
 | 00000  6f 3d 01 00 00 01 00 00  00 00 00 00 06 6f 72 61  o=...........ora |
 | 00010  66 75 6e 04 69 6e 66 6f  00 00 1c 00 01           fun.info.....    |
poll([{fd=9, events=POLLIN}], 1, 4970)  = 1 ([{fd=9, revents=POLLIN}])
ioctl(9, FIONREAD, [109])               = 0
recvfrom(9, "o=\201\200\0\1\0\0\0\1\0\0\6orafun\4info\0\0\34\0\1\300\f\0"..., 65536, 0, {sa_family=AF_INET, sin_port=htons(53), sin_addr=inet_addr("10.0.2.3")}, [16]) = 109
close(9)                                = 0
--
-- an epoll is created at file descriptor 9 (epoll: I/O event notification facility)
--
epoll_create(82)                        = 9
fcntl(9, F_SETFD, FD_CLOEXEC)           = 0
--
-- an IPV6 socket is created at file descriptor 11, 
-- bound to the IPV6 equivalent of localhost (::1),
-- destination port 0, source port 63257,
-- and is NOT connected.
--
socket(AF_INET6, SOCK_DGRAM, IPPROTO_IP) = 11
bind(11, {sa_family=AF_INET6, sin6_port=htons(0), inet_pton(AF_INET6, "::1", &sin6_addr), sin6_flowinfo=0, sin6_scope_id=0}, 28) = 0
getsockname(11, {sa_family=AF_INET6, sin6_port=htons(63257), inet_pton(AF_INET6, "::1", &sin6_addr), sin6_flowinfo=0, sin6_scope_id=0}, [28]) = 0
getpeername(11, 0x7ffdea6ba0f8, 0x7ffdea6ba1c8) = -1 ENOTCONN (Transport endpoint is not connected)
getsockopt(11, SOL_SOCKET, SO_SNDBUF, [262144], [4]) = 0
getsockopt(11, SOL_SOCKET, SO_RCVBUF, [262144], [4]) = 0
fcntl(11, F_SETFD, FD_CLOEXEC)          = 0
fcntl(11, F_SETFL, O_RDONLY|O_NONBLOCK) = 0
--
-- File descriptor 11 is added to the epoll at file descriptor 9.
--
epoll_ctl(9, EPOLL_CTL_ADD, 11, {EPOLLIN, {u32=3110993336, u64=139864426020280}}) = 0
--
-- A connection is created to the true destination (orafun.info/18.218.92.122).
-- This connection gets file descriptor 12.
-- Destination port 80 (http), source port 11751.
--
socket(AF_INET, SOCK_STREAM, IPPROTO_IP) = 12
fcntl(12, F_SETFL, O_RDONLY|O_NONBLOCK) = 0
connect(12, {sa_family=AF_INET, sin_port=htons(80), sin_addr=inet_addr("18.218.92.122")}, 16) = -1 EINPROGRESS (Operation now in progress)
times(NULL)                             = 438106227
mmap(NULL, 786432, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7f34b959b000
poll([{fd=12, events=POLLOUT}], 1, 60000) = 1 ([{fd=12, revents=POLLOUT}])
getsockopt(12, SOL_SOCKET, SO_ERROR, [0], [4]) = 0
fcntl(12, F_GETFL)                      = 0x802 (flags O_RDWR|O_NONBLOCK)
fcntl(12, F_SETFL, O_RDWR)              = 0
getsockname(12, {sa_family=AF_INET, sin_port=htons(11751), sin_addr=inet_addr("10.0.2.15")}, [16]) = 0
getsockopt(12, SOL_SOCKET, SO_SNDBUF, [87040], [4]) = 0
getsockopt(12, SOL_SOCKET, SO_RCVBUF, [374400], [4]) = 0
setsockopt(12, SOL_TCP, TCP_NODELAY, [1], 4) = 0
fcntl(12, F_SETFD, FD_CLOEXEC)          = 0
--
-- And this is the wait event written by the process: TCP Socket (KGAS), p1=2
--
write(7, "WAIT #139864521752120: nam='TCP "..., 95) = 95
 | 00000  57 41 49 54 20 23 31 33  39 38 36 34 35 32 31 37  WAIT #1398645217 |
 | 00010  35 32 31 32 30 3a 20 6e  61 6d 3d 27 54 43 50 20  52120: nam='TCP  |
 | 00020  53 6f 63 6b 65 74 20 28  4b 47 41 53 29 27 20 65  Socket (KGAS)' e |
 | 00030  6c 61 3d 20 31 32 38 32  36 35 20 20 3d 32 20 20  la= 128265  =2   |
 | 00040  3d 30 20 20 3d 30 20 6f  62 6a 23 3d 36 36 32 20  =0  =0 obj#=662  |
 | 00050  74 69 6d 3d 38 36 33 39  35 31 30 37 34 39 37     tim=86395107497  |

So yes, I am not sure if all of this is in the wait event, but there is a lot of stuff happening to build a connection to the remote server.

In order to find out why the lookup which is tried via the NSCD socket at the beginning, and later via DNS, is done twice, I ran the same procedure again and used tcpdump to look at the actual network traffic. This explained a lot:

# tcpdump -n host 10.0.2.3 and port 53
09:14:02.923389 IP 10.0.2.15.16819 > 10.0.2.3.domain: 15651+ A? orafun.info. (29)
09:14:02.948791 IP 10.0.2.3.domain > 10.0.2.15.16819: 15651 1/0/0 A 18.218.92.122 (45)
09:14:02.952304 IP 10.0.2.15.54590 > 10.0.2.3.domain: 28477+ AAAA? orafun.info. (29)
09:14:02.979534 IP 10.0.2.3.domain > 10.0.2.15.54590: 28477 0/1/0 (109)

In other words: first a DNS A record is requested (TCPv4 DNS name lookup), which results in the IPv4 ip address, then a DNS AAAA record is requested (TCPv6 DNS name lookup), which resulted in no ip address. In other words: orafun.info only has an IPv4 ip address. So the two lookups actually do have a function.

2. TCP Socket (KGAS) p1=5, earlier annotated as ‘send’
These are the systemcalls that are visible and quite probably related to the send wait event:

--
-- file descriptor 12 holding the connection to the destination server is added to the epoll at file descriptor 9
--
epoll_ctl(9, EPOLL_CTL_ADD, 12, {EPOLLIN, {u32=3110998864, u64=139864426025808}}) = 0
--
-- Then the http get request is sent to the destination server at its normal file descriptor, 12.
--
write(12, "GET / HTTP/1.1\r\nHost: orafun.inf"..., 56) = 56
 | 00000  47 45 54 20 2f 20 48 54  54 50 2f 31 2e 31 0d 0a  GET / HTTP/1.1.. |
 | 00010  48 6f 73 74 3a 20 6f 72  61 66 75 6e 2e 69 6e 66  Host: orafun.inf |
 | 00020  6f 0d 0a 43 6f 6e 6e 65  63 74 69 6f 6e 3a 20 63  o..Connection: c |
 | 00030  6c 6f 73 65 0d 0a 0d 0a                           lose....         |
--
-- And this is the wait event written by the process: TCP Socket (KGAS), p1=5
--
write(7, "WAIT #139864521752120: nam='TCP "..., 92) = 92
 | 00000  57 41 49 54 20 23 31 33  39 38 36 34 35 32 31 37  WAIT #1398645217 |
 | 00010  35 32 31 32 30 3a 20 6e  61 6d 3d 27 54 43 50 20  52120: nam='TCP  |
 | 00020  53 6f 63 6b 65 74 20 28  4b 47 41 53 29 27 20 65  Socket (KGAS)' e |
 | 00030  6c 61 3d 20 33 39 35 20  20 3d 35 20 20 3d 30 20  la= 395  =5  =0  |
 | 00040  20 3d 30 20 6f 62 6a 23  3d 36 36 32 20 74 69 6d   =0 obj#=662 tim |
 | 00050  3d 38 36 33 39 35 31 31  30 31 39 31              =86395110191     |

3. TCP Socket (KGAS) p1=6, earlier annotated as ‘receive’

--
-- Calling epoll_wait with timeout set to 0, so it doesn't block.
--
epoll_wait(9, [], 82, 0)                = 0
--
-- And this is the wait event written by the process: TCP Socket (KGAS), p1=6
-- 
write(7, "WAIT #139864521752120: nam='TCP "..., 92) = 92
 | 00000  57 41 49 54 20 23 31 33  39 38 36 34 35 32 31 37  WAIT #1398645217 |
 | 00010  35 32 31 32 30 3a 20 6e  61 6d 3d 27 54 43 50 20  52120: nam='TCP  |
 | 00020  53 6f 63 6b 65 74 20 28  4b 47 41 53 29 27 20 65  Socket (KGAS)' e |
 | 00030  6c 61 3d 20 31 35 30 20  20 3d 36 20 20 3d 30 20  la= 150  =6  =0  |
 | 00040  20 3d 30 20 6f 62 6a 23  3d 36 36 32 20 74 69 6d   =0 obj#=662 tim |
 | 00050  3d 38 36 33 39 35 31 31  31 31 31 35              =86395111115     |

I have been thinking a lot about this seemingly weird call. It calls epoll_wait, but indicates it doesn’t want to wait (timeout=0), and even if epol_wait would have returned anything, indicated by a return code > 0, the epoll_event pointer is not set (indicated by []). The epoll file descriptor is used, but the only working file descriptor in the epoll is file descriptor 12, which has just been sent a http GET command, so the functionality of epoll is used.

This doesn’t make sense, unless you think about the asynchronous IO implementation of Oracle (see a lot of my earlier investigations), for which (in the case of asynchronous IO) io_getevents was called in a similar matter, timeout set to 0, to be able to do more requests while earlier IO requests are executed by the kernel. So my current theory here is that if multiple requests are happening, this mechanism provides a way to handle them.

If you have a simple single request, like in my case, this systemcall seems redundant. And because it queries the epoll file descriptor right after the request, it returns zero events, because there hardly has been any time after sending the http GET request.

4. Second TCP Socket (KGAS) p1=6, earlier annotated as ‘receive’

--
-- Calling epoll_wait with timeout set to 30000 (milliseconds).
--
epoll_wait(9, [{EPOLLIN, {u32=3110998864, u64=139864426025808}}], 82, 30000) = 1
--
-- And this is the second wait event written by the process: TCP Socket (KGAS), p1=6
-- 
write(7, "WAIT #139864521752120: nam='TCP "..., 95) = 95
 | 00000  57 41 49 54 20 23 31 33  39 38 36 34 35 32 31 37  WAIT #1398645217 |
 | 00010  35 32 31 32 30 3a 20 6e  61 6d 3d 27 54 43 50 20  52120: nam='TCP  |
 | 00020  53 6f 63 6b 65 74 20 28  4b 47 41 53 29 27 20 65  Socket (KGAS)' e |
 | 00030  6c 61 3d 20 31 33 31 39  39 38 20 20 3d 36 20 20  la= 131998  =6   |
 | 00040  3d 30 20 20 3d 30 20 6f  62 6a 23 3d 36 36 32 20  =0  =0 obj#=662  |
 | 00050  74 69 6d 3d 38 36 33 39  35 32 34 33 37 36 34     tim=86395243764  |

This is the second time epoll_wait is called, and this one is blocking, because timeout has been set to 30000 milliseconds. If you look at the ela time, this took some time, and this now makes perfect sense: this system calls waits for an event to become available in the epoll, so it waits for the response of the remote http server. Please mind this call just notifies the userland process that the response is ready, the received data yet has to be read.

5. TCP Socket (KGAS) p1=4, earlier annotated as ‘receive’

--
-- At this point we know there is a response. First the original file descriptor is removed from the epoll:
--
epoll_ctl(9, EPOLL_CTL_DEL, 12, 0x7ffdea6b9710) = 0
--
-- The the response is read from file descriptor 12:
--
read(12, "HTTP/1.1 200 OK\r\nServer: nginx/1"..., 4096) = 2687
--
-- Then file descriptor 12 is added to the epoll again.
--
epoll_ctl(9, EPOLL_CTL_ADD, 12, {EPOLLIN, {u32=3110998864, u64=139864426025808}}) = 0
--
-- And a wait event written by the process: TCP Socket (KGAS), p1=4
--
write(7, "WAIT #139864521752120: nam='TCP "..., 92) = 92
 | 00000  57 41 49 54 20 23 31 33  39 38 36 34 35 32 31 37  WAIT #1398645217 |
 | 00010  35 32 31 32 30 3a 20 6e  61 6d 3d 27 54 43 50 20  52120: nam='TCP  |
 | 00020  53 6f 63 6b 65 74 20 28  4b 47 41 53 29 27 20 65  Socket (KGAS)' e |
 | 00030  6c 61 3d 20 32 36 39 20  20 3d 34 20 20 3d 30 20  la= 269  =4  =0  |
 | 00040  20 3d 30 20 6f 62 6a 23  3d 36 36 32 20 74 69 6d   =0 obj#=662 tim |
 | 00050  3d 38 36 33 39 35 32 34  35 31 38 32              =86395245182     |

So, what p1 set to 4 actually means, is that once the connection did return data, which is checked using epoll, and visible with p1 set to 6, it is read into the process. This is also the reason this takes very little time, this is the time to read data from kernelspace to user space, and to manage the connection’s file descriptor. It is taken off the epoll in order not to disturb it, and it is added again because there could be another request.

6. TCP Socket (KGAS) p1=3, earlier annotated as ‘close’

--
-- file descriptor 12 removed from the epoll
--
epoll_ctl(9, EPOLL_CTL_DEL, 12, 0x7ffdea6bac20) = 0
--
-- file descriptor 12 is closed, closing the network connection
--
close(12)                               = 0
--
-- And a wait event written by the process: TCP Socket (KGAS), p1=3
--
write(7, "WAIT #139864521752120: nam='TCP "..., 92) = 92
 | 00000  57 41 49 54 20 23 31 33  39 38 36 34 35 32 31 37  WAIT #1398645217 |
 | 00010  35 32 31 32 30 3a 20 6e  61 6d 3d 27 54 43 50 20  52120: nam='TCP  |
 | 00020  53 6f 63 6b 65 74 20 28  4b 47 41 53 29 27 20 65  Socket (KGAS)' e |
 | 00030  6c 61 3d 20 33 35 32 20  20 3d 33 20 20 3d 32 20  la= 352  =3  =2  |
 | 00040  20 3d 30 20 6f 62 6a 23  3d 36 36 32 20 74 69 6d   =0 obj#=662 tim |
 | 00050  3d 38 36 33 39 35 32 35  31 32 39 34              =86395251294     |

I don’t think this part holds any surprises. The network file descriptor is first removed from the epoll, and then it is closed, ending the TCP connection that was setup to perform a http request (in my case, I didn’t test, but I believe you will see the same with for example a SMTP connection, or any other type of TCP connection).

Summary

The basic message of this article is not surprising, nor does it conflict with current knowledge. Whenever you see a wait event ‘TCP Socket (KGAS)’, it means a foreground process is performing TCP networking via PLSQL. This wait event is a single event for creating, sending, receiving and closing a connection.

The true information of this article is how you can use the p1 value of the event to learn what actually the foreground is doing, and thus should give you more information to troubleshoot in the case of long waiting times.

TCP Socket (KGAS) p1 values:
1 - ?
2 - perform DNS lookup and create connection
3 - close connection
4 - copy TCP response into process space
5 - send request
6 - wait for TCP response to become available
7 - ?
8 - ? \
9 - ? | According to documentation, windows only in a 'special thread'.
10- ? /

This is a writeup on some scenario’s for disk issues that you could encounter when running linux systems. The linux systems I am talking about are centos/redhat/oracle (EL) version 7 systems. This is just a writeup for myself to know how to deal with different scenario’s, hopefully other people find this interesting too. I don’t believe there is any difference for the below scenario’s and resolutions between running physical/bare metal, virtualised or in the cloud, provided you can get access the you need (like the BIOS POST console). The test configuration uses (legacy) MBR (meaning non UEFI), grub2 as boot loader, LVM for meta devices (you don’t want to run without LVM!!) and XFS for the filesystems.

I included a way to introduce each corruption yourself, so you can actually rehearse these kind of corruptions. These self introduced corruptions are simply parts overwritten and can be recovered from. In reality this might be different: if a disk is truly physically broken, there is no way you can make it come to live again with software, and the data on it is simply gone. Also, if the physical disk is not harmed, but the filesystem is damaged, I think most of you understand that if the amount of damage is big enough, at a certain point it is beyond repair.

REALLY IMPORTANT NOTICE

Modern filesystems like XFS, EXT3/4, NTFS are ‘logging filesystems’. This means changes to metadata are written into a log on disk called the ‘intent log’, after which the true change is written and then the corresponding log in the intent log is flagged as done. This means that unclean mounts or crashes can be recovered from by replaying the intent log to make sure all transactions are fully written.

However…by default these filesystems do not log actual data changes. The protected writes are filesystem structure changes. This is especially important to realise if writes are not done via O_DIRECT, and thus end up in the page cache.

This means that data corruption can still occur, despite having a modern, logging filesystem. Things like databases therefore keep consistency mechanisms in play to validate data consistency.

I do not know enough about BTRFS and ZFS to know if these do things differently from the logging perspective. Not logging all IO for “normal” filesystems is a performance choice.

Let’s start: know your system!

It really helps if you understand how your system looks like. This is how my system looks like:

# parted /dev/sda print
Model: ATA VBOX HARDDISK (scsi)
Disk /dev/sda: 68.7GB
Sector size (logical/physical): 512B/512B
Partition Table: msdos
Disk Flags:

Number  Start   End     Size    Type     File system  Flags
 1      1049kB  1075MB  1074MB  primary  xfs          boot
 2      1075MB  68.7GB  67.6GB  primary               lvm

The first partition of /dev/sda is set to be the boot partition (flags: boot), and is an XFS filesystem partition. The second partition is an LVM physical volume. ‘lsblk’ can look a bit more into the functions of the partitions, and reveals any other blockdevices.

# lsblk
NAME        MAJ:MIN RM  SIZE RO TYPE MOUNTPOINT
sda           8:0    0   64G  0 disk
├─sda1        8:1    0    1G  0 part /boot
└─sda2        8:2    0   63G  0 part
  ├─ol-root 249:0    0   41G  0 lvm  /
  ├─ol-swap 249:1    0    2G  0 lvm  [SWAP]
  └─ol-home 249:2    0   20G  0 lvm  /home
sr0          11:0    1  540M  0 rom

This shows that /dev/sda1 is mounted as /boot on the linux installation, and the sda2 LVM volume contains three logical volumes named: root, swap and home, which are mounted on /, as swap and on /home. There also is a ‘rom’ block device, which is the (oracle) linux boot image. This image can be downloaded from Oracle’s software delivery cloud, and is 566MB, and contains the rescue boot environment that is used in this blogpost. Any serious sysadmin should have such an image on hand to perform emergency recovery if needed.

Normally, the local disks are the first ones to boot from before other bootable devices such as a CD/DVD disk image in the BIOS. This means that I can leave the linux boot image connected, the system will normally boot from its hard disks and essentially not touch the DVD image. If for some reason the BIOS can’t find the MBR on the local disks, it will scan for MBR records on other disks in the order as set in the BIOS, and find the linux boot image and boot from that. On a real production system you don’t leave such an image connected, normally it’s attached when there is a need.

If you want or must use the linux boot image and not the local installation, you have to interfere the BIOS POST, and change the boot order. I am using virtualbox for most of my tests, the way to do that is: press F12 (Mac: fn-F12) and select CD-ROM as boot device and press enter.

To use the rescue mode on the linux boot image, boot it, then select “troubleshooting” (down arrow, enter), and “rescue a Oracle linux system” (down arrow, enter). This will boot linux from the boot image, and run anaconda. Anaconda gives you the following options:

1. Continue

This will scan your available disks, and mount the filesystems it could find under /mnt/sysimage in read-write mode. Not all filesystem issues will allow mounting read-write.

2. Read-only mount

This will scan your available disks, and mount the filesystems it could find under /mnt/sysimage in read-only mode. Obviously, not all issues allow read only mounting too.

3. Skip to shell

This will do no scanning of disks, so if you want to prevent any access to the local disks, this is the options to choose.

4. Quit (reboot)

This is a safe reboot (stopping all the started units). This is the same that would happen if the shell is exited from using option 1, 2 and 3.

Let’s look at a couple of different scenario’s:

Corrupted MBR

A MBR linux system scans the disks it can see in the order that is set in the BIOS, and boots from the first MBR found. This will normally be your linux installation on the local disks. If you have a corrupted or otherwise unbootable MBR on your local disks, and the linux boot image attached, it will boot from the linux boot image, unless it is set to explicitly not boot that device.

If no usable MBR is found, it will tell you on the console. On virtualbox, it shows:

FATAL: Could not read from the boot medium! System halted.

If just the MBR is damaged/corrupted, it can be recovered. If you want to rehearse recovering from a corrupted MBR, overwrite the MBR in the following way. Please mind this is machine dependent, look for specifics about the system to the overview above with parted and lsblk:

# dd if=/dev/zero of=/dev/sda bs=448 count=1

If you detach any other bootable devices, and reboot your system after writing over the MBR, it should not find anything to boot from, and your system will give an error like seen above, indicating it’s unable to boot.

In order to restore the MBR for grub2, perform the following tasks:
– Boot from the linux boot image.
– Start rescue mode.
– Choose ‘1’/continue, and enter to go to the shell.
– Change the root to your local disks: ‘chroot /mnt/sysimage’.
– Next thing is to understand your system. Based on the parted executable information above, we know the boot flag is set on device /dev/sda. This means we have to enter the following in the console:

grub2-install /dev/sda

This will report, if your disk is well enough to hold the information, and not really broken:

Installing for i386-pc platform.
Installation finished. No error reported.

– Now the MBR is written, and you can reboot your system by exiting the chroot jail and shell: ‘exit’ and ‘exit’ to boot your system from the local disks again.

Corrupted /boot filesystem

If the /boot filesystem, which holds the linux kernel and grub settings, is corrupted, the system will boot, because that is handled by the MBR, but then very quickly tell you it doesn’t recognise the filesystem:

.
error: unknown filesystem.
Entering rescue mode...
grub rescue>

A corrupted /boot filesystem can be recovered from in most cases, obviously depending on what and how much is corrupted in the filesystem, and the underlying device must be a able to function after the corruption. This can be rehearsed by overwriting part of the filesystem. In order to be absolutely sure the overwritten data on the disk is not changed by anything else, boot the linux boot image in rescue mode, and choose option ‘3’, and then overwrite part of the filesystem in /dev/sda1:

# dd if=/dev/zero of=/dev/sda1 bs=1k skip=2 count=1

In order to recover /boot, perform the following tasks:
– Boot from the linux boot image.
– Start rescue mode.
– Choose ‘1’/continue, and enter to go to the shell. If you are rehearsing: the dd command overwritten part of the filesystem header, it will not be recognised by lsblk, and if you take a look in the systems fstab (/mnt/sysimage/etc/fstab), it is mounted by UUID, but the UUID can’t be taken from the corrupted filesystem. The root and home filesystems are found in LVM, and mounted. This means you have to know /dev/sda1 is /boot.
– Repair the filesystem:

# xfs_repair /dev/sda1

Depending on the corruption, this might or might not succeed. If you are rehearsing this using the above dd command, you will see a lot of dots being printed searching for a superblock, which it will find and then recover from.
– Exit the shell to reboot the system: ‘exit’.

Corrupted root filesystem in LVM

When the root filesystem is corrupted, the system will boot as usual, show the different kernels in the grub dialogue, and then when the choice times out, boot the pre-selected kernel and try to boot linux. After the kernel is loaded, the root filesystem will be mounted, and it will fail. This generates the following message:

Generating "/run/initramfs/rdsosreport.txt"

Entering emergency mode. Exit the shell to continue.
Type "journalctl" to view system logs.
You might want to save "/run/initramfs/rdsosreport.txt to a USB stick or /boot after mounting them and attach it to a bug report.

In order to rehearse recovering a corrupted root filesystem, boot the linux boot image in rescue mode, and choose option ‘3’. You will find that the volume group is not active, it needs to be activated using:

# vgchange -ay
 3 logical volume(s) in volume group "ol" now active

Then overwrite the filesystem header for the logical volume ‘root’ in the volume group ‘ol’:

# dd if=/dev/zero of=/dev/ol/root bs=1k skip=2 count=1

A corrupted root filesystem can be recovered from in most cases, obviously depending on what and how much is corrupted in the filesystem, and the underlying device must be able to function after the corruption. In order to recover perform the following tasks:
– Boot from the linux boot image.
– Start rescue mode.
– Choose ‘3’.
– Activate LVM:

# vgchange -ay

– Repair the filesystem in the logical volume

# xfs_repair /dev/ol/root

– Exit the shell to reboot the system: ‘exit’.

Corrupted non root filesystem in LVM

On my system, I only got /home as a non root filesystem in LVM. When I corrupted this filesystem, the linux system will came up without any complaining on the console. To be honest, this was kind of surprising to me.

I needed to dig around in the /var/log/messages file to find an error message:

Oct  2 10:57:11 localhost lvm: 3 logical volume(s) in volume group "ol" now active
Oct  2 10:57:11 localhost systemd: Started LVM2 PV scan on device 8:2.
Oct  2 10:57:11 localhost mount: mount: wrong fs type, bad option, bad superblock on /dev/mapper/ol-home,
Oct  2 10:57:11 localhost mount: missing codepage or helper program, or other error
Oct  2 10:57:11 localhost mount: In some cases useful info is found in syslog - try
Oct  2 10:57:11 localhost mount: dmesg | tail or so.
Oct  2 10:57:11 localhost systemd: Found device /dev/mapper/ol-home.
Oct  2 10:57:11 localhost systemd: Mounting /home...
Oct  2 10:57:11 localhost systemd: Starting LVM2 vgchange on device 8:2...

Based on what I see, I interpret this as systemd activating LVM, and then mounting the filesystem, which gives an error. What I don’t understand is what systemd is doing when it says it found the logical volume device and mounts it.

This can be rehearsed by booting into rescue mode, choose option ‘3’, activate LVM ‘vgchange -ay’, and then overwrite the filesystem header in the logical volume: ‘dd if=/dev/zero of=/dev/ol/home bs=1k skip=2 count=1’.

A corrupted non root filesystem in LVM can recovered from in most cases. There is no need to use the linux boot image for rescue mode. A linux system will come up when the root filesystem is valid and it can load enough from it to boot linux. The following needs to be done:
– Start a shell as root.
– Repair the filesystem in the logical volume: ‘xfs_repair /dev/ol/home’.
– Mount all filesystems: ‘mount -a’. This should mount the just repaired filesystem.

Corrupted swap in LVM

Just like a non root filesystem corruption mentioned above, the inability to activate a swap device is logged in the /var/log/messages file, but there is no explicit message on the console. Outside of logging the inability to active the swap device the system will come up happily, without swap if the only swap device is corrupted. These are the messages I could find in the /var/log/messages file:

Oct  2 11:32:39 localhost systemd: Found device /dev/mapper/ol-swap.
Oct  2 11:32:39 localhost systemd: Activating swap /dev/mapper/ol-swap...
Oct  2 11:32:39 localhost kernel: alg: No test for __gcm-aes-aesni (__driver-gcm-aes-aesni)
Oct  2 11:32:39 localhost kernel: XFS (sda1): Mounting V4 Filesystem
Oct  2 11:32:39 localhost swapon: swapon: /dev/mapper/ol-swap: swapon failed: Invalid argument
Oct  2 11:32:39 localhost systemd: dev-mapper-ol\x2dswap.swap swap process exited, code=exited status=255
Oct  2 11:32:39 localhost systemd: Failed to activate swap /dev/mapper/ol-swap.
Oct  2 11:32:39 localhost systemd: Dependency failed for Swap.
Oct  2 11:32:39 localhost systemd: Job swap.target/start failed with result 'dependency'.
Oct  2 11:32:39 localhost systemd: Unit dev-mapper-ol\x2dswap.swap entered failed state.
Oct  2 11:32:39 localhost kernel: Unable to handle swap header version 0

It seems systemd understands the swap device is not usable, previously we saw systemd didn’t detect filesystem issues on an LVM device.

This can be rehearsed by booting into rescue mode, choose option ‘3’, activate LVM ‘vgchange -ay’, and then overwrite the swap header: ‘dd if=/dev/zero of=/dev/ol/swap bs=1k count=2’.

A corrupted swap device in LVM can be recovered from if the underlying device is able to function.
– Start a shell as root.
– Format the swap device again, in my case the swap device is /dev/ol/swap: ‘mkswap /dev/ol/swap’.
– Activate swap: ‘swapon -a’.
– Validate swap has been activated: ‘swapon -s’.

%d bloggers like this: