This blogpost is about opatch and how to obtain information about the current oracle home(s), and how to obtain information about the patches to be applied.

Patches that can be applied using opatch are provided by oracle as zip files which have the following naming convention:
p[patchnumber]_[baseversion]_[platform]-[architecture].zip. The patch normally contains an XML file called ‘PatchSearch.xml’ and a directory with the patch number. Inside the patch number directory there is a README.txt which is lame, because it says ‘Refer to README.html’, and a README.html that contains the readme information that is also visible when the [README] button for this patch is selected in MOS.

I spend my time on the CLI exclusively. This is because I spend my time on remote servers all the time, and using the X window system would be unusable. The best part of using the CLI is that when done correctly, it gives you almost infinite control over what you do, while when clicking through an interface toggling selections and filling out fields makes you entering information that you have to copy from a document or make up on the spot, which then is quickly hidden by entering another tab or window. In fact, based on my experience, this is a guaranteed way of generating wrong or inconsistent results.

Because of being on the CLI almost exclusively on servers, I sometimes need to read the README.html. I can do that in MOS, but sometimes you want that information on the spot. You can open up the HTML file in ‘vim’ or ‘less’, but it will show you a lot of HTML making it very hard to read. What I find useful is installing an executable called ‘elinks’ (available on oracle linux via yum), and then read the README.html in this text based browser:

$ elinks README.html

Result:

                                             Oracle® Database Patch % psuBug % - Database Release Update % product_version % (1/12)
                                                       Go to primary content

   Patch 28655784 - Database Release Update 18.4.0.0.181016

   ──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────

   The script content on this page is for navigation purposes only and does not alter the content in any way.

                                                          Oracle® Database

   Patch 28655784 - Database Release Update 18.4.0.0.181016

   This document is accurate at the time of release. For any changes and additional information regarding Database Release Update
   18.4.0.0.181016, see these related documents that are available at My Oracle Support (http://support.oracle.com/):

     * Document 2433586.1 Oracle DB/GI/OJVM Update/Revision R18 Oct 2018 Known Issues

   This document includes the following sections:

     * Section 1, "Patch Information"

     * Section 2, "Prerequisites"

     * Section 3, "Installation"

     * Section 4, "Deinstallation"

     * Section 5, "Known Issues"

     * Section 6, "References"

     * Section 7, "Bugs Fixed by This Patch"

     * Section 8, "Documentation Accessibility"

1 Patch Information

   Database Release Update 18.4.0.0.181016 patches are cumulative. That is, the content of all previous Database bundles is
   included in the latest Database bundle patch.

   To install the Database Release Update 18.4.0.0.181016 patch, the Oracle home must have the 18.1 Database installed.
http://support.oracle.com/                                                                                                 [------]

The first thing I do, is obtain information about this patch. Based on my experience, very carefully examine the readme. It contains vital information about the patch, but I found inconsistencies in it in the past. I got the impression the readme’s of patches that are created with a certain interval are simply copied from a previous version and then corrected by hand, and things might be entered wrong or are forgotten and not touched, leaving wrong information.

opatch lspatches on an unzipped patch

Luckily, a lot of information in the readme is actually stored in the metadata of the patch. You can query the metadata of the patch to be applied using opatch lspatches by pointing it to the patch directory:
(in this example I am querying patch 28655784, which is RU 18.4, and my database oracle home is version 18.3)

$ /u01/app/oracle/product/18.3.0.0/dbhome_1/OPatch/opatch lspatches 28655784/
patch_id:28655784
unique_patch_id:22509982
date_of_patch:8 Oct 2018, 21:27:28 hrs PST8PDT
patch_description:Database Release Update : 18.4.0.0.181016 (28655784)
component:oracle.rdbms.rsf.ic,18.0.0.0.0,optional; oracle.oracore.rsf,18.0.0.0.0,optional; oracle.ctx.atg,18.0.0.0.0,optional; oracle.rdbms.rman,18.0.0.0.0,optional; oracle.rdbms.rsf,18.0.0.0.0,optional; oracle.sdo.locator.jrf,18.0.0.0.0,optional; ....
platform:226,Linux x86-64
executable:ORACLE_HOME/lib/libclntsh.so.18.1; ORACLE_HOME/lib/libasmclntsh18.so; ORACLE_HOME/lib/libskgxp11.so; ORACLE_HOME/lib/libskgxp18.so; ORACLE_HOME/lib/libsqlplus.so; ORACLE_HOME/bin/oracle; ...
instance_shutdown:true
online_rac_installable:true
patch_type:singleton
product_family:db
auto:false
bug:28571483, TRACKING BUG TO REGRESS ALL BLR/CIS OF 27502420
...

(edited for brevity)
Line 2: patch_id: patch number
Line 5: patch_description: description 🙂
Line 6: component: to be patched objects grouped in a ‘component’
Line 7: platform: the platform the patch is intended for
Line 8: executable: the executables that are patched by this patch
Line 9: instance_shutdown: does an instance have to shutdown when it’s ORACLE_HOME is patched?
Line 10: online_rac_installable: does the database have to go down entirely, or can the patch be applied rolling?
Line 11: patch_type: singleton (a single patch). other types that I know: bundle_member (part of a patch bundle).
Line 12: product_family: for which oracle product is this patch? Grid infra also is family ‘db’.
Line 13: auto: can this patch be applied using ‘opatch auto’?

opatch lspatches can also be used to validate if a patch has been applied to the oracle home:

$ opatch lspatches -verify 28655784/
Inventory check failed: Patch ID 28655784 is NOT registered in Oracle Home "/u01/app/oracle/product/18.3.0.0/dbhome_1" inventory.
Following patches [ 28655784 ] are NOT registered in Oracle Home "/u01/app/oracle/product/18.3.0.0/dbhome_1" inventory or can't load its meta-data

OPatch failed with error code 1

Please mind the argument to -verify is a unzipped patch patch-number directory.

opatch query

This functionality is also available using ‘opatch query’ (executed as opatch query [argument] directory) opatch query is meant to be run on an unzipped patch patch-number directory, not on an ORACLE_HOME. The next overview takes the above opatch lspatches output, and lists for what lines there is an equivalent opatch query argument to retrieve that information:
Line 2: patch_id: is the patch number, which is the directory that is entered as an argument.
Line 5: patch_description: the description cannot be extracted on its own, but is visible with argument -all.
Line 6: component: argument: -get_component
Line 7: platform: argument: -get_os
Line 8: executable: the executables cannot be fetched on its own, but is visible with argument -all.
Line 9: instance_shutdown: argument: -is_online_patch
Line 10: online_rac_installable: argument: -is_rolling_patch
Line 11: patch_type: argument: -get_patch_type
Line 12: product_family: argument: -get_product_family
Line 13: auto: argument: -is_auto_patch

However, opatch query offers a couple of other switches which can come in handy:
-is_system_patch: System patches are patches that contain several sub-patches and must be applied using opatchauto.
-get_patching_model: The way the patch is applied, normal patching seem to have model “one-off”.

opatch lsinventory (or shorthand: lsinv)

Now the we looked at a patch to be applied, we should also look at the metadata of the oracle home to be patched, which is stored in the inventory. The contents of the inventory can be shown using lsinventory or shorthand lsinv.

$ /u01/app/oracle/product/18.3.0.0/dbhome_1/OPatch/opatch lsinv
Oracle Interim Patch Installer version 12.2.0.1.14
Copyright (c) 2018, Oracle Corporation.  All rights reserved.


Oracle Home       : /u01/app/oracle/product/18.3.0.0/dbhome_1
Central Inventory : /u01/app/oraInventory
   from           : /u01/app/oracle/product/18.3.0.0/dbhome_1/oraInst.loc
OPatch version    : 12.2.0.1.14
OUI version       : 12.2.0.4.0
Log file location : /u01/app/oracle/product/18.3.0.0/dbhome_1/cfgtoollogs/opatch/opatch2018-11-13_08-18-30AM_1.log

Lsinventory Output file location : /u01/app/oracle/product/18.3.0.0/dbhome_1/cfgtoollogs/opatch/lsinv/lsinventory2018-11-13_08-18-30AM.txt

--------------------------------------------------------------------------------
Local Machine Information::
Hostname: ip-10-0-11-24.local
ARU platform id: 226
ARU platform description:: Linux x86-64

Installed Top-level Products (1):

Oracle Database 18c                                                  18.0.0.0.0
There are 1 products installed in this Oracle Home.


Interim patches (4) :

Patch  27908644     : applied on Wed Jul 18 13:44:11 EDT 2018
Unique Patch ID:  22153180
Patch description:  "UPDATE 18.3 DATABASE CLIENT JDK IN ORACLE HOME TO JDK8U171"
   Created on 4 May 2018, 01:21:02 hrs PST8PDT
   Bugs fixed:
     27908644

Patch  27923415     : applied on Wed Jul 18 13:41:38 EDT 2018
Unique Patch ID:  22239273
Patch description:  "OJVM RELEASE UPDATE: 18.3.0.0.180717 (27923415)"
   Created on 15 Jul 2018, 10:33:22 hrs PST8PDT
   Bugs fixed:
     27304131, 27539876, 27952586, 27642235, 27636900, 27461740

Patch  28090553     : applied on Wed Jul 18 13:40:01 EDT 2018
Unique Patch ID:  22256940
Patch description:  "OCW RELEASE UPDATE 18.3.0.0.0 (28090553)"
   Created on 11 Jul 2018, 19:20:31 hrs PST8PDT
   Bugs fixed:
     12816839, 18701017, 22734786, 23698980, 23840305, 25709124, 25724089
     26299684, 26313403, 26433972, 26527054, 26586174, 26587652, 26647619
     26827699, 26860285, 26882126, 26882316, 26943660, 26996813, 27012915
...etc...

This shows:
Line 6: the ORACLE_HOME for which the contents are listed.
Line 7: the central inventory location.
Line 9: the opatch version
Line 10: the oracle universal install version
Line 17: the hostname
Line 18/19: the platform description and identification
Line 20/24: installed ‘top level’ products
Line 27- : applied interim patches

This is not very detailed data (except for the bugs fixed numbers), however, there is an option to get more information: add ‘-detail’ to the lsinv command:

$ /u01/app/oracle/product/18.3.0.0/dbhome_1/OPatch/opatch lsinv -detail
Oracle Interim Patch Installer version 12.2.0.1.14
Copyright (c) 2018, Oracle Corporation.  All rights reserved.


Oracle Home       : /u01/app/oracle/product/18.3.0.0/dbhome_1
Central Inventory : /u01/app/oraInventory
   from           : /u01/app/oracle/product/18.3.0.0/dbhome_1/oraInst.loc
OPatch version    : 12.2.0.1.14
OUI version       : 12.2.0.4.0
Log file location : /u01/app/oracle/product/18.3.0.0/dbhome_1/cfgtoollogs/opatch/opatch2018-11-13_10-15-09AM_1.log

Lsinventory Output file location : /u01/app/oracle/product/18.3.0.0/dbhome_1/cfgtoollogs/opatch/lsinv/lsinventory2018-11-13_10-15-09AM.txt

--------------------------------------------------------------------------------
Local Machine Information::
Hostname: ip-10-0-11-24.local
ARU platform id: 226
ARU platform description:: Linux x86-64

Installed Top-level Products (1):

Oracle Database 18c                                                  18.0.0.0.0
There are 1 products installed in this Oracle Home.


Installed Products (125):

Assistant Common Files                                               18.0.0.0.0
BLASLAPACK Component                                                 18.0.0.0.0
Buildtools Common Files                                              18.0.0.0.0
Cluster Verification Utility Common Files                            18.0.0.0.0
Database Configuration and Upgrade Assistants                        18.0.0.0.0
Database Migration Assistant for Unicode                             18.0.0.0.0
...etc...
Patch  27923415     : applied on Wed Jul 18 13:41:38 EDT 2018
Unique Patch ID:  22239273
Patch description:  "OJVM RELEASE UPDATE: 18.3.0.0.180717 (27923415)"
   Created on 15 Jul 2018, 10:33:22 hrs PST8PDT
   Bugs fixed:
     27304131, 27539876, 27952586, 27642235, 27636900, 27461740
   Files Touched:
     java.security --> ORACLE_HOME/javavm/jdk/jdk8/lib/security/java.security
     cacerts --> ORACLE_HOME/javavm/jdk/jdk8/lib/security/cacerts
     sunjce_provider.jar --> ORACLE_HOME/javavm/jdk/jdk8/lib/sunjce_provider.jar
     jce.jar --> ORACLE_HOME/javavm/jdk/jdk8/lib/jce.jar
     classes.bin --> ORACLE_HOME/javavm/jdk/jdk8/admin/classes.bin
     lfclasses.bin --> ORACLE_HOME/javavm/jdk/jdk8/admin/lfclasses.bin
     jvmpsupii.sql --> ORACLE_HOME/javavm/install/jvmpsupii.sql
     jvmpsupi.sql --> ORACLE_HOME/javavm/install/jvmpsupi.sql
     jvmpsu.sql --> ORACLE_HOME/javavm/install/jvmpsu.sql
     jvmpsui.sql --> ORACLE_HOME/javavm/install/jvmpsui.sql
     jvmpsupdi.sql --> ORACLE_HOME/javavm/install/jvmpsupdi.sql
     jvmpsupdii.sql --> ORACLE_HOME/javavm/install/jvmpsupdii.sql
     libjavavm18.a --> ORACLE_HOME/javavm/jdk/jdk8/lib/libjavavm18.a
     27923415.xml --> ORACLE_HOME/sqlpatch/27923415/22239273/27923415.xml
     27923415_apply.sql --> ORACLE_HOME/sqlpatch/27923415/22239273/27923415_apply.sql
     27923415_rollback.sql --> ORACLE_HOME/sqlpatch/27923415/22239273/27923415_rollback.sql
     jvmpsupdi.sql --> ORACLE_HOME/sqlpatch/27923415/22239273/rollback_files/18.1.0.0.0/javavm/install/jvmpsupdi.sql
     jvmpsupdii.sql --> ORACLE_HOME/sqlpatch/27923415/22239273/rollback_files/18.1.0.0.0/javavm/install/jvmpsupdii.sql
     jvmpsupii.sql --> ORACLE_HOME/sqlpatch/27923415/22239273/rollback_files/18.1.0.0.0/javavm/install/jvmpsupii.sql
     jvmpsu.sql --> ORACLE_HOME/sqlpatch/27923415/22239273/rollback_files/18.1.0.0.0/javavm/install/jvmpsu.sql
     jvmpsui.sql --> ORACLE_HOME/sqlpatch/27923415/22239273/rollback_files/18.1.0.0.0/javavm/install/jvmpsui.sql
     jvmpsupi.sql --> ORACLE_HOME/sqlpatch/27923415/22239273/rollback_files/18.1.0.0.0/javavm/install/jvmpsupi.sql
     jox.o --> ORACLE_HOME/rdbms/lib/jox.o
     ins_rdbms.mk --> ORACLE_HOME/rdbms/lib/jox_on
     ins_rdbms.mk --> ORACLE_HOME/rdbms/lib/ioracle
     aurora.zip --> ORACLE_HOME/javavm/lib/aurora.zip
   Patch Location in Inventory:
     /u01/app/oracle/product/18.3.0.0/dbhome_1/inventory/oneoffs/27923415
   Patch Location in Storage area:
     /u01/app/oracle/product/18.3.0.0/dbhome_1/.patch_storage/27923415_Jul_15_2018_10_33_22

Patch  28090553     : applied on Wed Jul 18 13:40:01 EDT 2018
Unique Patch ID:  22256940
Patch description:  "OCW RELEASE UPDATE 18.3.0.0.0 (28090553)"
   Created on 11 Jul 2018, 19:20:31 hrs PST8PDT
   Bugs fixed:
     12816839, 18701017, 22734786, 23698980, 23840305, 25709124, 25724089
...etc...

The start of the output is the same as without ‘-detail’. However, on lines 27-34 you see the products, which as far as I understand it are what is meant with ‘components’ when looking at the patch details using ‘opatch lspatches’ and ‘opatch query’.

If we look at the details of patch 27923415 (line 36), at the end of the ‘Files Touched’ list at lines 69-72 it shows where the metadata is stored of this patch; that’s in the inventory directory in oneoffs/27923415. This directory contains information about the currently applied patch contents in the home.
The storage area contains the previous version of the contents that have been patched in case a rollback is needed.

opatch prereq

Now the we looked at the patch and the ORACLE_HOME to be patched, the next thing we can look at is if the environment is ready for the patch that was inspected to be applied to the ORACLE_HOME that was inspected.

Please mind that normally you would follow the readme, maybe run the prereq CheckConflictAgainstOHWithDetail (I don’t normally, because I check for conflicts before that), and then run opatch apply. And that is perfectly fine! Opatch performs all the checks that are listed below before applying a patch. That is one of the reasons opatch takes such a long time: it does perform a lot of work to be absolutely certain everything is alright.

The reason for listing the below individual checks is first of all to make the reader aware that all the checks opatch apply executes are available individually to be performed prior to patching. Another reason is to document and logically group these checks.

In fact, the PSU patching documentation advises to run the check CheckConflictAgainstOHWithDetail before applying the database PSU. I hope you now understand this is just a check that would have been performed anyway, but by explicitly executing this check before patching, you can validate the patch is apply-able to the oracle home in the first place.

Once opatch fails these individual checks can be used to validate the error and obtain more information:
If you read through them, you’ll see there are checks which are supersets of other checks.

— os checks
– Is there enough disk space available?
opatch prereq CheckSystemSpace -ph 28655784/
– Are all the operating system executables needed for patching available?
opatch prereq CheckSystemCommandAvailable -ph 28655784/
– Is the patch compatible with the OS/platform?
opatch prereq CheckPatchApplicableOnCurrentPlatform -ph 28655784/
– Are the files and directory structure and rights of the unzipped patch sane?
opatch prereq CheckPatchShipHome -ph 28655784/
– Are any files that are touched by the patch still open by processes?
opatch prereq CheckActiveFilesAndExecutables -ph 28655784/
– Is the current user ‘root’? If so, fail the check
opatch prereq CheckUserAdminPrivilege -ph 28655784/

— inventory checks
– Check if the ORACLE_HOME to be patched is available in the central inventory
opatch prereq CheckCentralInventoryForOH -ph 28655784/
– Is the central inventory writeable?
opatch prereq CheckCentralInventoryForRWSession -ph 28655784/
– Is the central inventory sane?
opatch prereq CheckCentralInventoryLocation -ph 28655784/

— oracle home checks
– Are the required components available in the oracle home?
opatch prereq CheckComponents -ph 28655784/
– Are the required components (products) available in the ORACLE_HOME, and are the actions applicable?
opatch prereq CheckApplicable -ph 28655784/
– Are there patches applied to the ORACLE_HOME already that are in the patches to be applied
opatch prereq CheckForIdenticalPatchInOracleHome -ph 28655784/
– Check if the oracle home is locked for patching (which might be by a previous unsuccessful opatch execution)
opatch prereq CheckIfOHLockedForPatching -ph 28655784/
– Check if the oracle home is valid, check for proper inventory files in the oracle home
opatch prereq CheckOracleHome -ph 28655784/
– Check if all the patches that are required by the patch to be applied are present in the oracle home
opatch prereq CheckPatchApplyDependents -ph 28655784/
– Check if the patching model of the patch is compatible with the oracle home
opatch prereq CheckPatchingModel -ph 28655784/

— conflict checks
– Are there conflicts between the patches in the ORACLE_HOME and the patches to be applied?
opatch prereq CheckConflictAgainstOH -ph 28655784/
– Are there conflicts between the patches in the ORACLE_HOME and the patches to be applied,
– and print out detailed conflict information.
opatch prereq CheckConflictAgainstOHWithDetail -ph 28655784/
– Are there conflicts between the patches to be applied?
– (please mind this requires multiple patches, which can be specified using -phBaseDir)
opatch prereq CheckConflictAmongPatches -phBaseDir
– Are there conflicts between the patches to be applied,
– and print out detailed conflict information.
– (please mind this requires multiple patches, which can be specified using -phBaseDir)
opatch prereq CheckConflictAmongPatchesWithDetail -phBaseDir

— RAC checks
– Check if the central inventory has a CRS home if the database home is RAC
opatch prereq CheckForCRSHomeIfRAC -ph 28655784/
– Check if all the nodes in the RAC setup are valid, up and reachable
opatch prereq CheckRACNodeList -ph 28655784/
– Check if commands can be invoked on the remote machines
opatch prereq CheckRemoteCommandInvocable -ph 28655784/
– Check if files can be copied to and removed from the remote machines
opatch prereq CheckRemoteCopyAndRemove -ph 28655784/

— opatch checks
– Check if the input values provided to opatch are sufficient to run the patch action
opatch prereq CheckForInputValues -ph 28655784/
– Is the opatch version high enough as per patch requirement?
opatch prereq CheckMinimumOPatchVersion -ph 28655784/

— rollback checks
– Check if all the patches provided to rollback are present in the given oracle home
opatch prereq CheckInstalledOneOffs -ph 28655784/
– Check if there are patches applied that depend on the patch that is being rolled back
opatch prereq CheckPatchRollbackDependents -ph 28655784/
– Check if the patch can be rolled back from the oracle home
opatch prereq CheckRollbackable -ph 28655784/

— OUI checks
– Is the OUI available in the ORACLE_HOME?
opatch prereq CheckOUILocation -ph 28655784/
– Is the OUI version high enough?
opatch prereq CheckOUIVersionCompatible -ph 28655784/
– Are all OUI libraries present?
opatch prereq CheckRequiredLibs -ph 28655784/

— oraInst.loc check
– Is oraInst.loc file proper?
opatch prereq CheckOraInstLocation -ph 28655784/

A few options of which I currently am not sure what they exactly do:
– CheckApplicableProduct — ?
opatch prereq CheckApplicableProduct -ph 28655784/
– CheckForNoOpPatches — ?
opatch prereq CheckForNoOpPatches -ph 28655784/
– CheckRollbackSid — ?
opatch prereq CheckRollbackSid -ph 28655784/

Conclusion

This is the second post about opatch, and I didn’t even get to the actual patching yet.

This post showed a number of ways to inspect both the patch to be applied (opatch lspatches/query) and the oracle home to apply the patch to (opatch lspatches/lsinventory), and a way to check specific properties before applying a patch and to validate and check things after a patch (opatch prereq).

This is important once you have to apply a patch to validate a machine in general, the central inventory and the oracle home inventory for readiness, and even more important to check an environment in case of patch failure.

Advertisements

This blogpost is about oracle’s patching tool for the database and grid infrastructure: opatch. I personally have a love/hate relationship with opatch. In essence, opatch automates a lot of things that would be very error prone if it were to be done by hand, which is a good thing. However, there are a lot of things surrounding opatch that I despise. An example of that is the version numbering of opatch itself.

Versions and more versions

To jump into the matter directly: versions. I don’t understand why this has to be this complicated. I would even go as far as saying that somebody needs to step in and clean this up.

All opatch versions are identified by patch number 6880880. Anyone working with patching oracle probably knows this patch number by heart. You can go to this patch very quickly in MOS if you go to the search box which sits at the right side of the screen on the same height as the tabs, and type ‘patch 6880880’. A green rectangle will say ‘You have been directed to this PATCH based on an ID match’… So far the easy part.

On the right side below that, there is a rectangle, for which the first dropdown box is ‘release’. The release here means the release for which you want to download opatch. There are a great number of releases here:
– OPatch 12.2.0.1.0
– OPatch 13.9.0.0.0                           – EMCC
– OPatch 18.0.0.0.0
– OPatch for FMW 12c (OUI 13.9.x)   – Obsolete FMW(?)
– OUI NextGen 13.1                           – FMW
– OUI NextGen 13.2                           – FMW
– Oracle 10.1.0.0.0
– Oracle 10.1.0.3                                – Obsolete
– Oracle 10.2.0.0.0
– Oracle 11.1.0.0.0
– Oracle 11.2.0.0.0
– Oracle 12.1.0.1.0
– Oracle 12.1.0.1.1                             – Obsolete
– Oracle Database 12.2.0.1                – Obsolete

The red entries describe the patch being obsolete, and not downloadable. IMO, they should not be there.
The blue entries are non-database opatch releases. I would beg oracle to publish non-database opatch releases under their own patch number.

This eliminates 7 of the 14 choices. The left over entries show a list of versions that seems to make sense, these seem to show the database versions for which you can download a version of opatch. Let’s look at the patch descriptions of the leftover versions:

– OPatch 12.2.0.1.0    – Patch 6880880: OPatch 12.2.0.1.14 for DB 12.2 releases (JUL 2018)
– OPatch 18.0.0.0.0    – Patch 6880880: OPatch 12.2.0.1.14 for DB 18.x releases (JUL 2018)
– Oracle 10.1.0.0.0     – Patch 6880880: OPatch 9i, 10.1
– Oracle 10.2.0.0.0     – Patch 6880880: OPatch 10.2
– Oracle 11.1.0.0.0     – Patch 6880880: OPatch patch of version 11.1.0.12.9 for Oracle software releases 11.1.0.x (OCT 2015)
– Oracle 11.2.0.0.0     – Patch 6880880: OPatch patch of version 11.2.0.3.19 for Oracle software releases 11.2.0.x and 18.x (APR 2018)
– Oracle 12.1.0.1.0     – Patch 6880880: OPatch 12.2.0.1.14 for DB 12.x and 18.x releases (JUL 2018)

If we follow the versions from the lowest to the highest versions of Opatch:
– 10.1.0.0.0     -> ‘OPatch 9i, 10.1’ This is opatch for database versions 9i and 10.1.
– 10.2.0.0.0     -> ‘OPatch 10.2’ This is opatch for database version 10.2.
– 11.1.0.0.0     -> ‘OPatch patch of version 11.1.0.12.9 for Oracle software releases 11.1.0.x (OCT 2015)’ This is opatch for database version 11.1.0.x.
– 11.2.0.0.0     -> ‘OPatch patch of version 11.2.0.3.19 for Oracle software releases 11.2.0.x and 18.x (APR 2018)’ This is opatch for database version 11.2.0.x. I have no idea what ‘and 18.x’ means.
– 12.1.0.1.0     -> ‘OPatch patch 12.2.0.1.14 for DB 12.x and 18.x releases (JUL 2018)’ This is opatch for database version 12.1.x. NOT 12.x as the patch text indicates, it’s 12.1.x, and again I have no idea what ‘and 18.x’ means. I also don’t understand why the the 4th number is getting used all of a sudden.
– 12.2.0.1.0    -> ‘OPatch 12.2.0.1.14 for DB 12.2 releases (JUL 2018)’ This is opatch for database version 12.2. The description now aptly describes the database version and does not confuse.
– 18.0.0.0.0    -> ‘OPatch 12.2.0.1.14 for DB 18.x releases (JUL 2018)’ This is opatch for database version 18. The description again now aptly describes the version.

We are not there yet. Now that the patch release that should be used for every version of the database has been identified, let me throw in another version. This is the actual version that opatch reports when it is queried with ‘opatch version’. Let me list the opatch versions:
– 11.1.0.0.0 -> opatch version 11.1.0.12.9
– 11.2.0.0.0 -> opatch version 11.2.0.3.19
– 12.1.0.1.0 -> opatch version 12.2.0.1.14
– 12.2.0.1.0 -> opatch version 12.2.0.1.14
– 18.0.0.0.0 -> opatch version 12.2.0.1.14
Yes, this is again some weirdness. The opatch version seemed to have followed the database version, and -apparently- starting from version 12.1 up to version 18 there is one opatch version: 12.2.

That begs the question whether the 12.1.0.1.0, 12.2.0.1.0 and 18.0.0.0.0 opatch versions is actually the same opatch version. I got these versions downloaded, with the opatch version included in the name:

$ md5sum p6880880*
1ee44d25f5e858eb67424b69b89b8a25  p6880880_121010_Linux-x86-64-12.2.0.1.14.zip
1ee44d25f5e858eb67424b69b89b8a25  p6880880_122010_Linux-x86-64-12.2.0.1.14.zip
1ee44d25f5e858eb67424b69b89b8a25  p6880880_180000_Linux-x86-64-12.2.0.1.14.zip

Okay…so these are actually 100% identical copies, which just have a different name. I have no idea why the exact same file is offered with a different name. This means that the addition of ‘and 18.x’ in the patch description of the release means that that opatch version can also be used with version 18 of the database.

To be honest, the original intention of this blog article was to describe the some specific usage of opatch, and I now already got enough content for a post, and will retain the original intended content for a next blogpost.

The question you might be having at this point is: but what version should I download now? For databases up to version 11.2 it is simple, there is actually only one choice for for each version. For database versions starting from version 12.1 you -currently- have a choice. I would advise simply downloading the “correct” opatch version for every database version, which means the exact oracle version as indicated in the release pulldown menu at the download page (‘Oracle 12.1.0.1.0’, ‘OPatch 12.2.0.1.0’ or ‘OPatch 18.0.0.0.0’) despite the fact that these are the same opatch versions *currently*.

I suspect that once 12.1 goes into extended support, the opatch version will freeze, whilst the opatch version for version 12.2 and 18 will be improved and increase in version. So the only way to download the correct opatch version is still by choosing the actual database version it is intended for.

This brings me to another important thing to realise: the actual opatch version that is downloaded. At the time of writing the current and only available opatch release for Oracle database versions 12.1 to 18 is 12.2.0.1.14. Once oracle brings out a newer opatch version, the previous version will not be available anywhere anymore (as far as I know). To me this means that if you patch databases per tier (from test, development and so on up to production), you have to stage opatch in order to be able to use the same opatch version consistently in the future. Of course Oracle advises to use the latest opatch version, but the patch will check for a minimal opatch version, and if you tested your database software version, opatch version and patch to be working correctly together, in my opinion the most important thing is consistency.

I was asked some time ago what the Oracle database event ‘TCP socket (KGAS)’ means. This blogpost is a deep dive into what this event times in Oracle database 12.1.0.2.180717.

This event is not normally seen, only when TCP connections are initiated from the database using packages like UTL_TCP, UTL_SMTP and the one used in this article, UTL_HTTP.

A very basic explanation is this event times the time that a database foreground session spends on TCP connection management and communicating over TCP, excluding client and database link (sqlnet) networking. If you trace the system calls, you see that mostly that is working with a (network) socket. Part of the code in the oracle database that is managing that, sits in the kernel code layer kgas, kernel generic (of which I am quite sure, and then my guess:) asynchronous services, which explains the naming of the event.

This is what the Oracle online manual (https://docs.oracle.com/database/121/REFRN/GUID-203ACA60-9912-4493-9B79-EA4CDE89A78D.htm#REFRN00642 – Oracle is notorious for breaking links) says about ‘TCP socket (KGAS)’:

C.3.157 TCP Socket (KGAS)
A session is waiting for an external host to provide requested data over a network socket. The time that this wait event tracks does not indicate a problem, and even a long wait time is not a reason to contact Oracle Support. It naturally takes time for data to flow between hosts over a network, and for the remote aspect of an application to process any request made to it. An application that communicates with a remote host must wait until the data it will read has arrived. In addition, on Microsoft Windows, a separate thread monitors the arrival of traffic. This thread spends most of its life in waits tracked by the TCP Socket (KGAS) wait event.

Wait Time: The total elapsed time for the network connection to be established or for data to arrive from over the network

Parameter Description
P0 For Oracle internal use only. Values 8, 9, and 10 occur within the special thread present on Microsoft Windows; other P0 values occur in normal user sessions.

P1 For Oracle internal use only

Overall, the basic explanation that Oracle provides is mostly correct. I think the text saying to not contact Oracle support is not relevant, but maybe there is a need to relieve Oracle support. In my tests, I found that the full TCP connection lifecycle (creation, usage and removal) is put under this event, for which the text seems to emphasise on waiting for a remote host, which would be the most logical culprit for wait times, but other issues could lead to wait times additionally. This means the wait event itself is not explaining what it is showing, outside of TCP connection management.

The wait time explanation is nearly complete. If it would say something like ‘all TCP connection management and usage’ it would have fully covered it, it now excludes disconnecting and sending, because it explicitly mentions creating the connecting and receiving (waiting for data).

I do not understand what is meant with P0 and P1. I think it is p1 and p2 of the wait event, but naming it P0 and P1 is weird. When looking at the explanation it reads to me ‘we do not wish to explain anything to you’.

So, that means I am going to find this out myself….

If you are interested in this, or do want to write articles like this too, I urge you to replay this on your own system.

First of all, create a small setup which you can use to actually execute UTL_HTTP. The example essentially is taken from father of code examples, Tim Hall/Oracle base. Thank you Tim!
First setup the database to allow a user (in my case ‘ts’) to create the objects and use the network:

grant create sequence to ts;
grant create procedure to ts;
grant create table to ts;
grant alter session to ts;
begin
  dbms_network_acl_admin.create_acl (
    acl => 'anything.xml',
    description => 'allow anything',
    principal => 'TS',
    is_grant => true,
    privilege => 'connect'
  );
end;
begin
  dbms_network_acl_admin.assign_acl (
    acl => 'anything.xml',
    host => '*'
  );
end;

Then connect as the actual user (ts), and create the objects and the procedure that uses UTL_HTTP:

drop table http_clob_test;
create table http_clob_test (
        id number(10),
        url varchar2(255),
        data clob,
        constraint http_clob_test_pk primary key (id)
);
drop sequence http_clob_test_seq;
create sequence http_clob_test_seq;
CREATE OR REPLACE PROCEDURE load_html_from_url (p_url  IN  VARCHAR2) AS
  -- examples by tim hall
  -- https://oracle-base.com/articles/misc/retrieving-html-and-binaries-into-tables-over-http
  l_http_request   UTL_HTTP.req;
  l_http_response  UTL_HTTP.resp;
  l_clob           CLOB;
  l_text           VARCHAR2(32767);
BEGIN
  DBMS_LOB.createtemporary(l_clob, FALSE);

  -- Make a HTTP request and get the response.
  l_http_request  := UTL_HTTP.begin_request(p_url);
  l_http_response := UTL_HTTP.get_response(l_http_request);

  -- Copy the response into the CLOB.
  BEGIN
    LOOP
      UTL_HTTP.read_text(l_http_response, l_text, 32766);
      DBMS_LOB.writeappend (l_clob, LENGTH(l_text), l_text);
    END LOOP;
  EXCEPTION
    WHEN UTL_HTTP.end_of_body THEN
      UTL_HTTP.end_response(l_http_response);
  END;

  -- Insert the data into the table.
  INSERT INTO http_clob_test (id, url, data)
  VALUES (http_clob_test_seq.NEXTVAL, p_url, l_clob);

  -- Relase the resources associated with the temporary LOB.
  DBMS_LOB.freetemporary(l_clob);
EXCEPTION
  WHEN OTHERS THEN
    UTL_HTTP.end_response(l_http_response);
    DBMS_LOB.freetemporary(l_clob);
    RAISE;
END load_html_from_url;
/

The first thing to do is turn on sql_trace at level 8 to include waits:

set serverout on
alter session set events 'sql_trace level 8';
exec load_html_from_url('http://orafun.info/');
alter session set events 'sql_trace off';

If you look at the relevant piece, which means where it shows the wait events, it looks like this:

WAIT #139864521752120: nam='TCP Socket (KGAS)' ela= 128265  =2  =0  =0 obj#=662 tim=86395107497
WAIT #139864521752120: nam='TCP Socket (KGAS)' ela= 395  =5  =0  =0 obj#=662 tim=86395110191
WAIT #139864521752120: nam='TCP Socket (KGAS)' ela= 150  =6  =0  =0 obj#=662 tim=86395111115
WAIT #139864521752120: nam='TCP Socket (KGAS)' ela= 131998  =6  =0  =0 obj#=662 tim=86395243764
WAIT #139864521752120: nam='TCP Socket (KGAS)' ela= 269  =4  =0  =0 obj#=662 tim=86395245182
WAIT #139864521752120: nam='direct path write temp' ela= 4137 file number=201 first dba=257795 block cnt=1 obj#=662 tim=86395250494
WAIT #139864521752120: nam='TCP Socket (KGAS)' ela= 352  =3  =2  =0 obj#=662 tim=86395251294

What is shown here is some quite spectacular differences in elapsed time. Also, the only way to understand what is actually done flagged as ‘TCP Socket (KGAS)’ is the value following ‘ela’, which is the event p1 value.
The pattern is:

- 2
- 5
- 6
- 6
- 4
- 3

It’s relatively simple to guess what a few of these are:

- 2 - connect
- 5 - send
- 6 - \
- 6 - |   receiving ?
- 4 - /
- 3 - close

But if you include the timing, there must be more into play:

- 2 - ela= 128265   connect
- 5 - ela= 395      send
- 6 - ela= 150      \
- 6 - ela= 131998   |   receiving ?
- 4 - ela= 14       /
- 3 - ela= 177     close

2/connect: In order to build up a connection, a tcp connection needs to be created and established. That takes some time.
5/send: Sending from the perspective of a userland process is writing into a socket, which will get send by the operating system independently. This means sending from a userland process normally takes relative little time, because it’s not waiting for actually sending it.
6,4/receive: At this time, this doesn’t make sense to me.
3/close: Closing for a userland process is a simple, swift task. The operating system will keep the port open for some time, etc. but this is not visible for the user land application.

Let’s pull an old trick out of the hat: use strace (system call tracing) with an emphasis on writing on an oracle session that has SQL trace with waits enabled set. This will show the system calls executed, and show exactly when the oracle engine ends a wait, so we can reasonably well establish a relation between wait events and system calls. I say “reasonably well”, because we can’t see when Oracle truly started timing the wait event (kslwtbctx), only the output to trace file as part of ending the wait event (kslwtectx).

The way I done it, is using the command ‘strace -e write=all -p 18513 -o utl_http_test.txt’. Obviously 18513 is the process ID of the database foreground process. The results of the strace are in utl_http_test.txt.

Now open utl_http_test.txt and search for KGAS. The full output is way too much text, let me show some of the output which I think is noteworthy. Again: this is selective, partial output.
I do maintain the order in which the calls are visible.

1. TCP Socket (KGAS) p1=2, earlier annotated as ‘connect’

-- try to find a socket that has been created by nscd (name server caching deamon)
-- two times?
--
socket(AF_LOCAL, SOCK_STREAM|SOCK_CLOEXEC|SOCK_NONBLOCK, 0) = 9
connect(9, {sa_family=AF_LOCAL, sun_path="/var/run/nscd/socket"}, 110) = -1 ENOENT (No such file or directory)
close(9)                                = 0
socket(AF_LOCAL, SOCK_STREAM|SOCK_CLOEXEC|SOCK_NONBLOCK, 0) = 9
connect(9, {sa_family=AF_LOCAL, sun_path="/var/run/nscd/socket"}, 110) = -1 ENOENT (No such file or directory)
close(9)                                = 0
--
-- obtain file status of resolv.conf (hostname resolving configuration file)
--
stat("/etc/resolv.conf", {st_mode=S_IFREG|0644, st_size=93, ...}) = 0
--
-- open and read host.conf (another hostname resolving configuration file)
--
open("/etc/host.conf", O_RDONLY|O_CLOEXEC) = 9
fstat(9, {st_mode=S_IFREG|0644, st_size=9, ...}) = 0
mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7f34bf377000
read(9, "multi on\n", 4096)             = 9
read(9, "", 4096)                       = 0
close(9)                                = 0
--
-- open and read resolv.conf (hostname resolving configuration)
--
open("/etc/resolv.conf", O_RDONLY|O_CLOEXEC) = 9
fstat(9, {st_mode=S_IFREG|0644, st_size=93, ...}) = 0
mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7f34bf377000
read(9, "# Generated by NetworkManager\nse"..., 4096) = 93
read(9, "", 4096)                       = 0
close(9)                                = 0
--
-- open /etc/hosts (ip address to hostname mapping locally)
--
open("/etc/hosts", O_RDONLY|O_CLOEXEC)  = 9
fstat(9, {st_mode=S_IFREG|0644, st_size=200, ...}) = 0
mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7f34bf377000
read(9, "127.0.0.1   localhost localhost."..., 4096) = 200
read(9, "", 4096)                       = 0
close(9)
--
-- at this point two dynamic loadable libraries are read: libnss_dns.so.2 and libresolv.so.2
--
-- this is the DNS lookup of orafun.info
-- again, this is done twice, just like the use of /var/run/nscd/socket above?
--
socket(AF_INET, SOCK_DGRAM|SOCK_CLOEXEC|SOCK_NONBLOCK, IPPROTO_IP) = 9
connect(9, {sa_family=AF_INET, sin_port=htons(53), sin_addr=inet_addr("10.0.2.3")}, 16) = 0
poll([{fd=9, events=POLLOUT}], 1, 0)    = 1 ([{fd=9, revents=POLLOUT}])
sendto(9, "=#\1\0\0\1\0\0\0\0\0\0\6orafun\4info\0\0\1\0\1", 29, MSG_NOSIGNAL, NULL, 0) = 29
 | 00000  3d 23 01 00 00 01 00 00  00 00 00 00 06 6f 72 61  =#...........ora |
 | 00010  66 75 6e 04 69 6e 66 6f  00 00 01 00 01           fun.info.....    |
poll([{fd=9, events=POLLIN}], 1, 5000)  = 1 ([{fd=9, revents=POLLIN}])
ioctl(9, FIONREAD, [45])                = 0
recvfrom(9, "=#\201\200\0\1\0\1\0\0\0\0\6orafun\4info\0\0\1\0\1\300\f\0"..., 2048, 0, {sa_family=AF_INET, sin_port=htons(53), sin_addr=inet_addr("10.0.2.3")}, [16]) = 45
close(9)                                = 0
socket(AF_INET, SOCK_DGRAM|SOCK_CLOEXEC|SOCK_NONBLOCK, IPPROTO_IP) = 9
connect(9, {sa_family=AF_INET, sin_port=htons(53), sin_addr=inet_addr("10.0.2.3")}, 16) = 0
poll([{fd=9, events=POLLOUT}], 1, 4971) = 1 ([{fd=9, revents=POLLOUT}])
sendto(9, "o=\1\0\0\1\0\0\0\0\0\0\6orafun\4info\0\0\34\0\1", 29, MSG_NOSIGNAL, NULL, 0) = 29
 | 00000  6f 3d 01 00 00 01 00 00  00 00 00 00 06 6f 72 61  o=...........ora |
 | 00010  66 75 6e 04 69 6e 66 6f  00 00 1c 00 01           fun.info.....    |
poll([{fd=9, events=POLLIN}], 1, 4970)  = 1 ([{fd=9, revents=POLLIN}])
ioctl(9, FIONREAD, [109])               = 0
recvfrom(9, "o=\201\200\0\1\0\0\0\1\0\0\6orafun\4info\0\0\34\0\1\300\f\0"..., 65536, 0, {sa_family=AF_INET, sin_port=htons(53), sin_addr=inet_addr("10.0.2.3")}, [16]) = 109
close(9)                                = 0
--
-- an epoll is created at file descriptor 9 (epoll: I/O event notification facility)
--
epoll_create(82)                        = 9
fcntl(9, F_SETFD, FD_CLOEXEC)           = 0
--
-- an IPV6 socket is created at file descriptor 11, 
-- bound to the IPV6 equivalent of localhost (::1),
-- destination port 0, source port 63257,
-- and is NOT connected.
--
socket(AF_INET6, SOCK_DGRAM, IPPROTO_IP) = 11
bind(11, {sa_family=AF_INET6, sin6_port=htons(0), inet_pton(AF_INET6, "::1", &sin6_addr), sin6_flowinfo=0, sin6_scope_id=0}, 28) = 0
getsockname(11, {sa_family=AF_INET6, sin6_port=htons(63257), inet_pton(AF_INET6, "::1", &sin6_addr), sin6_flowinfo=0, sin6_scope_id=0}, [28]) = 0
getpeername(11, 0x7ffdea6ba0f8, 0x7ffdea6ba1c8) = -1 ENOTCONN (Transport endpoint is not connected)
getsockopt(11, SOL_SOCKET, SO_SNDBUF, [262144], [4]) = 0
getsockopt(11, SOL_SOCKET, SO_RCVBUF, [262144], [4]) = 0
fcntl(11, F_SETFD, FD_CLOEXEC)          = 0
fcntl(11, F_SETFL, O_RDONLY|O_NONBLOCK) = 0
--
-- File descriptor 11 is added to the epoll at file descriptor 9.
--
epoll_ctl(9, EPOLL_CTL_ADD, 11, {EPOLLIN, {u32=3110993336, u64=139864426020280}}) = 0
--
-- A connection is created to the true destination (orafun.info/18.218.92.122).
-- This connection gets file descriptor 12.
-- Destination port 80 (http), source port 11751.
--
socket(AF_INET, SOCK_STREAM, IPPROTO_IP) = 12
fcntl(12, F_SETFL, O_RDONLY|O_NONBLOCK) = 0
connect(12, {sa_family=AF_INET, sin_port=htons(80), sin_addr=inet_addr("18.218.92.122")}, 16) = -1 EINPROGRESS (Operation now in progress)
times(NULL)                             = 438106227
mmap(NULL, 786432, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7f34b959b000
poll([{fd=12, events=POLLOUT}], 1, 60000) = 1 ([{fd=12, revents=POLLOUT}])
getsockopt(12, SOL_SOCKET, SO_ERROR, [0], [4]) = 0
fcntl(12, F_GETFL)                      = 0x802 (flags O_RDWR|O_NONBLOCK)
fcntl(12, F_SETFL, O_RDWR)              = 0
getsockname(12, {sa_family=AF_INET, sin_port=htons(11751), sin_addr=inet_addr("10.0.2.15")}, [16]) = 0
getsockopt(12, SOL_SOCKET, SO_SNDBUF, [87040], [4]) = 0
getsockopt(12, SOL_SOCKET, SO_RCVBUF, [374400], [4]) = 0
setsockopt(12, SOL_TCP, TCP_NODELAY, [1], 4) = 0
fcntl(12, F_SETFD, FD_CLOEXEC)          = 0
--
-- And this is the wait event written by the process: TCP Socket (KGAS), p1=2
--
write(7, "WAIT #139864521752120: nam='TCP "..., 95) = 95
 | 00000  57 41 49 54 20 23 31 33  39 38 36 34 35 32 31 37  WAIT #1398645217 |
 | 00010  35 32 31 32 30 3a 20 6e  61 6d 3d 27 54 43 50 20  52120: nam='TCP  |
 | 00020  53 6f 63 6b 65 74 20 28  4b 47 41 53 29 27 20 65  Socket (KGAS)' e |
 | 00030  6c 61 3d 20 31 32 38 32  36 35 20 20 3d 32 20 20  la= 128265  =2   |
 | 00040  3d 30 20 20 3d 30 20 6f  62 6a 23 3d 36 36 32 20  =0  =0 obj#=662  |
 | 00050  74 69 6d 3d 38 36 33 39  35 31 30 37 34 39 37     tim=86395107497  |

So yes, I am not sure if all of this is in the wait event, but there is a lot of stuff happening to build a connection to the remote server.

In order to find out why the lookup which is tried via the NSCD socket at the beginning, and later via DNS, is done twice, I ran the same procedure again and used tcpdump to look at the actual network traffic. This explained a lot:

# tcpdump -n host 10.0.2.3 and port 53
09:14:02.923389 IP 10.0.2.15.16819 > 10.0.2.3.domain: 15651+ A? orafun.info. (29)
09:14:02.948791 IP 10.0.2.3.domain > 10.0.2.15.16819: 15651 1/0/0 A 18.218.92.122 (45)
09:14:02.952304 IP 10.0.2.15.54590 > 10.0.2.3.domain: 28477+ AAAA? orafun.info. (29)
09:14:02.979534 IP 10.0.2.3.domain > 10.0.2.15.54590: 28477 0/1/0 (109)

In other words: first a DNS A record is requested (TCPv4 DNS name lookup), which results in the IPv4 ip address, then a DNS AAAA record is requested (TCPv6 DNS name lookup), which resulted in no ip address. In other words: orafun.info only has an IPv4 ip address. So the two lookups actually do have a function.

2. TCP Socket (KGAS) p1=5, earlier annotated as ‘send’
These are the systemcalls that are visible and quite probably related to the send wait event:

--
-- file descriptor 12 holding the connection to the destination server is added to the epoll at file descriptor 9
--
epoll_ctl(9, EPOLL_CTL_ADD, 12, {EPOLLIN, {u32=3110998864, u64=139864426025808}}) = 0
--
-- Then the http get request is sent to the destination server at its normal file descriptor, 12.
--
write(12, "GET / HTTP/1.1\r\nHost: orafun.inf"..., 56) = 56
 | 00000  47 45 54 20 2f 20 48 54  54 50 2f 31 2e 31 0d 0a  GET / HTTP/1.1.. |
 | 00010  48 6f 73 74 3a 20 6f 72  61 66 75 6e 2e 69 6e 66  Host: orafun.inf |
 | 00020  6f 0d 0a 43 6f 6e 6e 65  63 74 69 6f 6e 3a 20 63  o..Connection: c |
 | 00030  6c 6f 73 65 0d 0a 0d 0a                           lose....         |
--
-- And this is the wait event written by the process: TCP Socket (KGAS), p1=5
--
write(7, "WAIT #139864521752120: nam='TCP "..., 92) = 92
 | 00000  57 41 49 54 20 23 31 33  39 38 36 34 35 32 31 37  WAIT #1398645217 |
 | 00010  35 32 31 32 30 3a 20 6e  61 6d 3d 27 54 43 50 20  52120: nam='TCP  |
 | 00020  53 6f 63 6b 65 74 20 28  4b 47 41 53 29 27 20 65  Socket (KGAS)' e |
 | 00030  6c 61 3d 20 33 39 35 20  20 3d 35 20 20 3d 30 20  la= 395  =5  =0  |
 | 00040  20 3d 30 20 6f 62 6a 23  3d 36 36 32 20 74 69 6d   =0 obj#=662 tim |
 | 00050  3d 38 36 33 39 35 31 31  30 31 39 31              =86395110191     |

3. TCP Socket (KGAS) p1=6, earlier annotated as ‘receive’

--
-- Calling epoll_wait with timeout set to 0, so it doesn't block.
--
epoll_wait(9, [], 82, 0)                = 0
--
-- And this is the wait event written by the process: TCP Socket (KGAS), p1=6
-- 
write(7, "WAIT #139864521752120: nam='TCP "..., 92) = 92
 | 00000  57 41 49 54 20 23 31 33  39 38 36 34 35 32 31 37  WAIT #1398645217 |
 | 00010  35 32 31 32 30 3a 20 6e  61 6d 3d 27 54 43 50 20  52120: nam='TCP  |
 | 00020  53 6f 63 6b 65 74 20 28  4b 47 41 53 29 27 20 65  Socket (KGAS)' e |
 | 00030  6c 61 3d 20 31 35 30 20  20 3d 36 20 20 3d 30 20  la= 150  =6  =0  |
 | 00040  20 3d 30 20 6f 62 6a 23  3d 36 36 32 20 74 69 6d   =0 obj#=662 tim |
 | 00050  3d 38 36 33 39 35 31 31  31 31 31 35              =86395111115     |

I have been thinking a lot about this seemingly weird call. It calls epoll_wait, but indicates it doesn’t want to wait (timeout=0), and even if epol_wait would have returned anything, indicated by a return code > 0, the epoll_event pointer is not set (indicated by []). The epoll file descriptor is used, but the only working file descriptor in the epoll is file descriptor 12, which has just been sent a http GET command, so the functionality of epoll is used.

This doesn’t make sense, unless you think about the asynchronous IO implementation of Oracle (see a lot of my earlier investigations), for which (in the case of asynchronous IO) io_getevents was called in a similar matter, timeout set to 0, to be able to do more requests while earlier IO requests are executed by the kernel. So my current theory here is that if multiple requests are happening, this mechanism provides a way to handle them.

If you have a simple single request, like in my case, this systemcall seems redundant. And because it queries the epoll file descriptor right after the request, it returns zero events, because there hardly has been any time after sending the http GET request.

4. Second TCP Socket (KGAS) p1=6, earlier annotated as ‘receive’

--
-- Calling epoll_wait with timeout set to 30000 (milliseconds).
--
epoll_wait(9, [{EPOLLIN, {u32=3110998864, u64=139864426025808}}], 82, 30000) = 1
--
-- And this is the second wait event written by the process: TCP Socket (KGAS), p1=6
-- 
write(7, "WAIT #139864521752120: nam='TCP "..., 95) = 95
 | 00000  57 41 49 54 20 23 31 33  39 38 36 34 35 32 31 37  WAIT #1398645217 |
 | 00010  35 32 31 32 30 3a 20 6e  61 6d 3d 27 54 43 50 20  52120: nam='TCP  |
 | 00020  53 6f 63 6b 65 74 20 28  4b 47 41 53 29 27 20 65  Socket (KGAS)' e |
 | 00030  6c 61 3d 20 31 33 31 39  39 38 20 20 3d 36 20 20  la= 131998  =6   |
 | 00040  3d 30 20 20 3d 30 20 6f  62 6a 23 3d 36 36 32 20  =0  =0 obj#=662  |
 | 00050  74 69 6d 3d 38 36 33 39  35 32 34 33 37 36 34     tim=86395243764  |

This is the second time epoll_wait is called, and this one is blocking, because timeout has been set to 30000 milliseconds. If you look at the ela time, this took some time, and this now makes perfect sense: this system calls waits for an event to become available in the epoll, so it waits for the response of the remote http server. Please mind this call just notifies the userland process that the response is ready, the received data yet has to be read.

5. TCP Socket (KGAS) p1=4, earlier annotated as ‘receive’

--
-- At this point we know there is a response. First the original file descriptor is removed from the epoll:
--
epoll_ctl(9, EPOLL_CTL_DEL, 12, 0x7ffdea6b9710) = 0
--
-- The the response is read from file descriptor 12:
--
read(12, "HTTP/1.1 200 OK\r\nServer: nginx/1"..., 4096) = 2687
--
-- Then file descriptor 12 is added to the epoll again.
--
epoll_ctl(9, EPOLL_CTL_ADD, 12, {EPOLLIN, {u32=3110998864, u64=139864426025808}}) = 0
--
-- And a wait event written by the process: TCP Socket (KGAS), p1=4
--
write(7, "WAIT #139864521752120: nam='TCP "..., 92) = 92
 | 00000  57 41 49 54 20 23 31 33  39 38 36 34 35 32 31 37  WAIT #1398645217 |
 | 00010  35 32 31 32 30 3a 20 6e  61 6d 3d 27 54 43 50 20  52120: nam='TCP  |
 | 00020  53 6f 63 6b 65 74 20 28  4b 47 41 53 29 27 20 65  Socket (KGAS)' e |
 | 00030  6c 61 3d 20 32 36 39 20  20 3d 34 20 20 3d 30 20  la= 269  =4  =0  |
 | 00040  20 3d 30 20 6f 62 6a 23  3d 36 36 32 20 74 69 6d   =0 obj#=662 tim |
 | 00050  3d 38 36 33 39 35 32 34  35 31 38 32              =86395245182     |

So, what p1 set to 4 actually means, is that once the connection did return data, which is checked using epoll, and visible with p1 set to 6, it is read into the process. This is also the reason this takes very little time, this is the time to read data from kernelspace to user space, and to manage the connection’s file descriptor. It is taken off the epoll in order not to disturb it, and it is added again because there could be another request.

6. TCP Socket (KGAS) p1=3, earlier annotated as ‘close’

--
-- file descriptor 12 removed from the epoll
--
epoll_ctl(9, EPOLL_CTL_DEL, 12, 0x7ffdea6bac20) = 0
--
-- file descriptor 12 is closed, closing the network connection
--
close(12)                               = 0
--
-- And a wait event written by the process: TCP Socket (KGAS), p1=3
--
write(7, "WAIT #139864521752120: nam='TCP "..., 92) = 92
 | 00000  57 41 49 54 20 23 31 33  39 38 36 34 35 32 31 37  WAIT #1398645217 |
 | 00010  35 32 31 32 30 3a 20 6e  61 6d 3d 27 54 43 50 20  52120: nam='TCP  |
 | 00020  53 6f 63 6b 65 74 20 28  4b 47 41 53 29 27 20 65  Socket (KGAS)' e |
 | 00030  6c 61 3d 20 33 35 32 20  20 3d 33 20 20 3d 32 20  la= 352  =3  =2  |
 | 00040  20 3d 30 20 6f 62 6a 23  3d 36 36 32 20 74 69 6d   =0 obj#=662 tim |
 | 00050  3d 38 36 33 39 35 32 35  31 32 39 34              =86395251294     |

I don’t think this part holds any surprises. The network file descriptor is first removed from the epoll, and then it is closed, ending the TCP connection that was setup to perform a http request (in my case, I didn’t test, but I believe you will see the same with for example a SMTP connection, or any other type of TCP connection).

Summary

The basic message of this article is not surprising, nor does it conflict with current knowledge. Whenever you see a wait event ‘TCP Socket (KGAS)’, it means a foreground process is performing TCP networking via PLSQL. This wait event is a single event for creating, sending, receiving and closing a connection.

The true information of this article is how you can use the p1 value of the event to learn what actually the foreground is doing, and thus should give you more information to troubleshoot in the case of long waiting times.

TCP Socket (KGAS) p1 values:
1 - ?
2 - perform DNS lookup and create connection
3 - close connection
4 - copy TCP response into process space
5 - send request
6 - wait for TCP response to become available
7 - ?
8 - ? \
9 - ? | According to documentation, windows only in a 'special thread'.
10- ? /

This is a writeup on some scenario’s for disk issues that you could encounter when running linux systems. The linux systems I am talking about are centos/redhat/oracle (EL) version 7 systems. This is just a writeup for myself to know how to deal with different scenario’s, hopefully other people find this interesting too. I don’t believe there is any difference for the below scenario’s and resolutions between running physical/bare metal, virtualised or in the cloud, provided you can get access the you need (like the BIOS POST console). The test configuration uses (legacy) MBR (meaning non UEFI), grub2 as boot loader, LVM for meta devices (you don’t want to run without LVM!!) and XFS for the filesystems.

I included a way to introduce each corruption yourself, so you can actually rehearse these kind of corruptions. These self introduced corruptions are simply parts overwritten and can be recovered from. In reality this might be different: if a disk is truly physically broken, there is no way you can make it come to live again with software, and the data on it is simply gone. Also, if the physical disk is not harmed, but the filesystem is damaged, I think most of you understand that if the amount of damage is big enough, at a certain point it is beyond repair.

REALLY IMPORTANT NOTICE

Modern filesystems like XFS, EXT3/4, NTFS are ‘logging filesystems’. This means changes to metadata are written into a log on disk called the ‘intent log’, after which the true change is written and then the corresponding log in the intent log is flagged as done. This means that unclean mounts or crashes can be recovered from by replaying the intent log to make sure all transactions are fully written.

However…by default these filesystems do not log actual data changes. The protected writes are filesystem structure changes. This is especially important to realise if writes are not done via O_DIRECT, and thus end up in the page cache.

This means that data corruption can still occur, despite having a modern, logging filesystem. Things like databases therefore keep consistency mechanisms in play to validate data consistency.

I do not know enough about BTRFS and ZFS to know if these do things differently from the logging perspective. Not logging all IO for “normal” filesystems is a performance choice.

Let’s start: know your system!

It really helps if you understand how your system looks like. This is how my system looks like:

# parted /dev/sda print
Model: ATA VBOX HARDDISK (scsi)
Disk /dev/sda: 68.7GB
Sector size (logical/physical): 512B/512B
Partition Table: msdos
Disk Flags:

Number  Start   End     Size    Type     File system  Flags
 1      1049kB  1075MB  1074MB  primary  xfs          boot
 2      1075MB  68.7GB  67.6GB  primary               lvm

The first partition of /dev/sda is set to be the boot partition (flags: boot), and is an XFS filesystem partition. The second partition is an LVM physical volume. ‘lsblk’ can look a bit more into the functions of the partitions, and reveals any other blockdevices.

# lsblk
NAME        MAJ:MIN RM  SIZE RO TYPE MOUNTPOINT
sda           8:0    0   64G  0 disk
├─sda1        8:1    0    1G  0 part /boot
└─sda2        8:2    0   63G  0 part
  ├─ol-root 249:0    0   41G  0 lvm  /
  ├─ol-swap 249:1    0    2G  0 lvm  [SWAP]
  └─ol-home 249:2    0   20G  0 lvm  /home
sr0          11:0    1  540M  0 rom

This shows that /dev/sda1 is mounted as /boot on the linux installation, and the sda2 LVM volume contains three logical volumes named: root, swap and home, which are mounted on /, as swap and on /home. There also is a ‘rom’ block device, which is the (oracle) linux boot image. This image can be downloaded from Oracle’s software delivery cloud, and is 566MB, and contains the rescue boot environment that is used in this blogpost. Any serious sysadmin should have such an image on hand to perform emergency recovery if needed.

Normally, the local disks are the first ones to boot from before other bootable devices such as a CD/DVD disk image in the BIOS. This means that I can leave the linux boot image connected, the system will normally boot from its hard disks and essentially not touch the DVD image. If for some reason the BIOS can’t find the MBR on the local disks, it will scan for MBR records on other disks in the order as set in the BIOS, and find the linux boot image and boot from that. On a real production system you don’t leave such an image connected, normally it’s attached when there is a need.

If you want or must use the linux boot image and not the local installation, you have to interfere the BIOS POST, and change the boot order. I am using virtualbox for most of my tests, the way to do that is: press F12 (Mac: fn-F12) and select CD-ROM as boot device and press enter.

To use the rescue mode on the linux boot image, boot it, then select “troubleshooting” (down arrow, enter), and “rescue a Oracle linux system” (down arrow, enter). This will boot linux from the boot image, and run anaconda. Anaconda gives you the following options:

1. Continue

This will scan your available disks, and mount the filesystems it could find under /mnt/sysimage in read-write mode. Not all filesystem issues will allow mounting read-write.

2. Read-only mount

This will scan your available disks, and mount the filesystems it could find under /mnt/sysimage in read-only mode. Obviously, not all issues allow read only mounting too.

3. Skip to shell

This will do no scanning of disks, so if you want to prevent any access to the local disks, this is the options to choose.

4. Quit (reboot)

This is a safe reboot (stopping all the started units). This is the same that would happen if the shell is exited from using option 1, 2 and 3.

Let’s look at a couple of different scenario’s:

Corrupted MBR

A MBR linux system scans the disks it can see in the order that is set in the BIOS, and boots from the first MBR found. This will normally be your linux installation on the local disks. If you have a corrupted or otherwise unbootable MBR on your local disks, and the linux boot image attached, it will boot from the linux boot image, unless it is set to explicitly not boot that device.

If no usable MBR is found, it will tell you on the console. On virtualbox, it shows:

FATAL: Could not read from the boot medium! System halted.

If just the MBR is damaged/corrupted, it can be recovered. If you want to rehearse recovering from a corrupted MBR, overwrite the MBR in the following way. Please mind this is machine dependent, look for specifics about the system to the overview above with parted and lsblk:

# dd if=/dev/zero of=/dev/sda bs=448 count=1

If you detach any other bootable devices, and reboot your system after writing over the MBR, it should not find anything to boot from, and your system will give an error like seen above, indicating it’s unable to boot.

In order to restore the MBR for grub2, perform the following tasks:
– Boot from the linux boot image.
– Start rescue mode.
– Choose ‘1’/continue, and enter to go to the shell.
– Change the root to your local disks: ‘chroot /mnt/sysimage’.
– Next thing is to understand your system. Based on the parted executable information above, we know the boot flag is set on device /dev/sda. This means we have to enter the following in the console:

grub2-install /dev/sda

This will report, if your disk is well enough to hold the information, and not really broken:

Installing for i386-pc platform.
Installation finished. No error reported.

– Now the MBR is written, and you can reboot your system by exiting the chroot jail and shell: ‘exit’ and ‘exit’ to boot your system from the local disks again.

Corrupted /boot filesystem

If the /boot filesystem, which holds the linux kernel and grub settings, is corrupted, the system will boot, because that is handled by the MBR, but then very quickly tell you it doesn’t recognise the filesystem:

.
error: unknown filesystem.
Entering rescue mode...
grub rescue>

A corrupted /boot filesystem can be recovered from in most cases, obviously depending on what and how much is corrupted in the filesystem, and the underlying device must be a able to function after the corruption. This can be rehearsed by overwriting part of the filesystem. In order to be absolutely sure the overwritten data on the disk is not changed by anything else, boot the linux boot image in rescue mode, and choose option ‘3’, and then overwrite part of the filesystem in /dev/sda1:

# dd if=/dev/zero of=/dev/sda1 bs=1k skip=2 count=1

In order to recover /boot, perform the following tasks:
– Boot from the linux boot image.
– Start rescue mode.
– Choose ‘1’/continue, and enter to go to the shell. If you are rehearsing: the dd command overwritten part of the filesystem header, it will not be recognised by lsblk, and if you take a look in the systems fstab (/mnt/sysimage/etc/fstab), it is mounted by UUID, but the UUID can’t be taken from the corrupted filesystem. The root and home filesystems are found in LVM, and mounted. This means you have to know /dev/sda1 is /boot.
– Repair the filesystem:

# xfs_repair /dev/sda1

Depending on the corruption, this might or might not succeed. If you are rehearsing this using the above dd command, you will see a lot of dots being printed searching for a superblock, which it will find and then recover from.
– Exit the shell to reboot the system: ‘exit’.

Corrupted root filesystem in LVM

When the root filesystem is corrupted, the system will boot as usual, show the different kernels in the grub dialogue, and then when the choice times out, boot the pre-selected kernel and try to boot linux. After the kernel is loaded, the root filesystem will be mounted, and it will fail. This generates the following message:

Generating "/run/initramfs/rdsosreport.txt"

Entering emergency mode. Exit the shell to continue.
Type "journalctl" to view system logs.
You might want to save "/run/initramfs/rdsosreport.txt to a USB stick or /boot after mounting them and attach it to a bug report.

In order to rehearse recovering a corrupted root filesystem, boot the linux boot image in rescue mode, and choose option ‘3’. You will find that the volume group is not active, it needs to be activated using:

# vgchange -ay
 3 logical volume(s) in volume group "ol" now active

Then overwrite the filesystem header for the logical volume ‘root’ in the volume group ‘ol’:

# dd if=/dev/zero of=/dev/ol/root bs=1k skip=2 count=1

A corrupted root filesystem can be recovered from in most cases, obviously depending on what and how much is corrupted in the filesystem, and the underlying device must be able to function after the corruption. In order to recover perform the following tasks:
– Boot from the linux boot image.
– Start rescue mode.
– Choose ‘3’.
– Activate LVM:

# vgchange -ay

– Repair the filesystem in the logical volume

# xfs_repair /dev/ol/root

– Exit the shell to reboot the system: ‘exit’.

Corrupted non root filesystem in LVM

On my system, I only got /home as a non root filesystem in LVM. When I corrupted this filesystem, the linux system will came up without any complaining on the console. To be honest, this was kind of surprising to me.

I needed to dig around in the /var/log/messages file to find an error message:

Oct  2 10:57:11 localhost lvm: 3 logical volume(s) in volume group "ol" now active
Oct  2 10:57:11 localhost systemd: Started LVM2 PV scan on device 8:2.
Oct  2 10:57:11 localhost mount: mount: wrong fs type, bad option, bad superblock on /dev/mapper/ol-home,
Oct  2 10:57:11 localhost mount: missing codepage or helper program, or other error
Oct  2 10:57:11 localhost mount: In some cases useful info is found in syslog - try
Oct  2 10:57:11 localhost mount: dmesg | tail or so.
Oct  2 10:57:11 localhost systemd: Found device /dev/mapper/ol-home.
Oct  2 10:57:11 localhost systemd: Mounting /home...
Oct  2 10:57:11 localhost systemd: Starting LVM2 vgchange on device 8:2...

Based on what I see, I interpret this as systemd activating LVM, and then mounting the filesystem, which gives an error. What I don’t understand is what systemd is doing when it says it found the logical volume device and mounts it.

This can be rehearsed by booting into rescue mode, choose option ‘3’, activate LVM ‘vgchange -ay’, and then overwrite the filesystem header in the logical volume: ‘dd if=/dev/zero of=/dev/ol/home bs=1k skip=2 count=1’.

A corrupted non root filesystem in LVM can recovered from in most cases. There is no need to use the linux boot image for rescue mode. A linux system will come up when the root filesystem is valid and it can load enough from it to boot linux. The following needs to be done:
– Start a shell as root.
– Repair the filesystem in the logical volume: ‘xfs_repair /dev/ol/home’.
– Mount all filesystems: ‘mount -a’. This should mount the just repaired filesystem.

Corrupted swap in LVM

Just like a non root filesystem corruption mentioned above, the inability to activate a swap device is logged in the /var/log/messages file, but there is no explicit message on the console. Outside of logging the inability to active the swap device the system will come up happily, without swap if the only swap device is corrupted. These are the messages I could find in the /var/log/messages file:

Oct  2 11:32:39 localhost systemd: Found device /dev/mapper/ol-swap.
Oct  2 11:32:39 localhost systemd: Activating swap /dev/mapper/ol-swap...
Oct  2 11:32:39 localhost kernel: alg: No test for __gcm-aes-aesni (__driver-gcm-aes-aesni)
Oct  2 11:32:39 localhost kernel: XFS (sda1): Mounting V4 Filesystem
Oct  2 11:32:39 localhost swapon: swapon: /dev/mapper/ol-swap: swapon failed: Invalid argument
Oct  2 11:32:39 localhost systemd: dev-mapper-ol\x2dswap.swap swap process exited, code=exited status=255
Oct  2 11:32:39 localhost systemd: Failed to activate swap /dev/mapper/ol-swap.
Oct  2 11:32:39 localhost systemd: Dependency failed for Swap.
Oct  2 11:32:39 localhost systemd: Job swap.target/start failed with result 'dependency'.
Oct  2 11:32:39 localhost systemd: Unit dev-mapper-ol\x2dswap.swap entered failed state.
Oct  2 11:32:39 localhost kernel: Unable to handle swap header version 0

It seems systemd understands the swap device is not usable, previously we saw systemd didn’t detect filesystem issues on an LVM device.

This can be rehearsed by booting into rescue mode, choose option ‘3’, activate LVM ‘vgchange -ay’, and then overwrite the swap header: ‘dd if=/dev/zero of=/dev/ol/swap bs=1k count=2’.

A corrupted swap device in LVM can be recovered from if the underlying device is able to function.
– Start a shell as root.
– Format the swap device again, in my case the swap device is /dev/ol/swap: ‘mkswap /dev/ol/swap’.
– Activate swap: ‘swapon -a’.
– Validate swap has been activated: ‘swapon -s’.

This post was triggered upon rereading a blogpost by Mike Dietrich called databases need patched minimum april 2019. Mike’s blogpost makes it clear this is about databases that are connected using database links, and that:
– Newer databases do not need additional patching for this issue (11.2.0.4, 12.1.0.2, 12.2 and newer).
– Recent PSU patches contain a fix for certain older versions (11.1.0.7, 11.2.0.3 and 12.1.0.1).
– This means versions 11.2.0.2 and earlier 11.2 versions, 11.1.0.6 and earlier and anything at version 10 or earlier can not be fixed and thus are affected.

But what is the actual issue?

The first link in the article: Recommended patches and actions for Oracle databases versions 12.1.0.1, 11.2.0.3 and earlier – before June 2019 (Doc ID 2361478.1) provides essentially the same information as Mike’s blogpost, however it additionaly mentions that the interoperability of database clients with database servers is not affected.

Mike’s article mentions the following:
The alert refers to an SCN issue which came up a while ago. The system change number (SCN) is a logical, internal timestamp used by the Oracle Database. SCNs order events that occur within the database, which is necessary to satisfy the ACID properties of a transaction. The database uses SCNs to query and track changes.

So I guess it has something to do with SCNs. Most of the links are about SCNs. The MOS article that is most descriptive is: System Change Number (SCN), Headroom, Security and Patch Information (Doc ID 1376995.1).

This article describes a lot of details:
– SCNs are necessary for the database to keep changes organised. I got an article that explains that SCNs are not unique to a transaction, but are “just” granular enough to keep changes organised.
– SCNs are an ever increasing number. SCNs are never decreased!
– SCNs have a hard limit, which version specific, and is based on the number of bits for the number:
– – The general limit is 2^48.
– – From version 12.2 on, with compatibility set to 12.2 or higher, the limit is 2^63.
– SCNs have a per second increasing soft limit, called ‘RSL’ or ‘reasonable SCN limit’, which is version specific:
– – The general soft limit is 16384 (16k) SCNs per second calculated by the number of second from Januari 1st, 1988 times 16384.
– – From version 12.2 on, with compatibility set to 12.2 or higher, the limit is 98304 (96k) SCNs per second calculated by the number of seconds from Januari 1st, 2008 times 98304.
– The RSL can not be exceeded, if a database session tries to go beyond the soft limit, an ORA 600 [2252] is raised and the transaction is rolled back. This will not corrupt data (but obviously the data in the transaction is not applied).
– The difference between the current SCN and the RSL SCN is known as ‘SCN headroom’.
– There have been bugs that can increase SCNs at a higher rate, decreasing the SCN headroom or even reaching the soft limit, but these have all been solved in the Januari 2012 CPU/PSU/patch bundles.
– When databases communicate which each other via a database link, the SCNs of both databases are synchronised by picking the highest of the two.

But it doesn’t really explain why patches must be applied before June 2019. However, another article is more concrete on that: Recommended patching and actions for Oracle database versions 12.1.0.1, 11.2.0.3 and earlier – before June 2019 (Doc ID 2335265.1). The first interesting thing mentioned is:

3. What is the change introduced by the patches listed above?
These patches increase the database’s current maximum SCN (system change number) limit.
At any point in time, the Oracle Database calculates a “not to exceed” limit for the number of SCNs a database can have used, based on the number of seconds elapsed since 1988.

So, this means the patched database have a change (increase) in the RSL.

These recommended patches enable the databases to allow for a higher current maximum SCN limit. The rate at which this limit is calculated can be referred to as the “SCN rate” and these patches help allow higher SCN rates to enable databases to support many times higher transaction rates than earlier releases.

And this means the RSL is increased from the 16k per second since Januari 1988 to the 96k per second since Januari 2008.

Please note that the patches only increase the max limit but the current SCN is not impacted. So, if all your databases don’t have any major change in transaction rate, the current SCN would still remain below the current maximum SCN limit and database links between newer (or patched) and unpatched databases would continue to work. The patches provide the safety measure to ensure that you don’t have any issue with dblinks independent of any possible future change in your transaction rate.

With the patches applied, this change in current maximum SCN limit will happen automatically starting 23rd June 2019.

That is important information! So with the patch applied to some databases and not applied to some other databases and patched and non-patched databases have database links between them, everything should remain working. This is true for any database at this moment, because the change of the limit will happen on the 23rd of June 2019, which at the time of writing is in the future.

Once the change of limit has happened at the 23rd of June 2019, database links between older, non-patched versions of the database and newer or patched versions of the database could be affected if after the 23rd of June 2019 the SCN rate is increased in a newer or patched database and it runs beyond the headroom available in databases with the 16k rate, because the earlier mentioned SCN synchronisation then will fail because it’s beyond the 16k rate database headroom meaning it can not synchronise with the newer dataebase.

So the problem we are talking about here is two databases talking with each other over a database link, which have a different RSL, which could lead to a situation that one database is at an SCN number which is too high for the other older or non-patched database, meaning the communication will fail, which will persist until the older or non-patched databases is able to reach the SCN number of the newer database over time, if that is possible at all.

A thing that is not clear to me at this point: the documentation to me seems to suggest that Oracle version 12.2 with compatibility set to 12.2 or higher versions already allows the higher number of SCNs per second. (the above point: – – From version 12.2 on, with compatibility set to 12.2 or higher, the limit is 98304 (96k) SCNs per second calculated by the number of seconds from Januari 1st, 2008) If that is true, the issue that is warned for could potentially already happen!

Luckily, there is are ways to investigate this:

The reported “newer” versions and the versions that are patched for the rate change have an SGA variable “kcmscnc_” that lists the current SCN rate of the database. There are 3 values that I have seen for “kcmscnc_”:
1: This is the rate of 16k per second since Januari 1st 1988.
2: ?
3: This is the rate of 96k per second since Januari 1st 2008.

SQL> select ksmfsnam, ksmfsadr, ksmfssiz from x$ksmfsv where ksmfsnam like 'kcmscnc_';
KSMFSNAM							 KSMFSADR	    KSMFSSIZ
---------------------------------------------------------------- ---------------- ----------
kcmscnc_							 000000006001579C	   4
SQL> oradebug peek 0x6001579C 4
[06001579C, 0600157A0) = 00000001

So, this databases is capable of switching (because kcmscnc_ exists), and is currently running with the 16k per second threshold.

In fact, I tested this on 11.2.0.4, 12.1.0.2 and 18.3.0.0, all these version report currently (currently is before June 2019) “1” or “scheme 1”. So the above mentioned rate of 96k for 12.2 and above does currently not apply for the soft limit or SCN headroom calculation for any database, including 12.2 and 18.3.

There is a less “hardcore” way to obtain this information, via the DBMS_SCN. This package equally to the “kcmscnc_” variable only exists if the database is of a version or patch version high enough to switch:

declare
  v_rsl number;
  v_headroom_in_scn number;
  v_headroom_in_sec number;
  v_cur_scn_compat number;
  v_max_scn_compat number;
begin
  dbms_scn.getcurrentscnparams(v_rsl, v_headroom_in_scn, v_headroom_in_sec, v_cur_scn_compat, v_max_scn_compat);
  dbms_output.put_line('reasonable scn limit (soft limit): '||to_char(v_rsl,'999,999,999,999,999,999'));
  dbms_output.put_line('headroom in scn                  : '||to_char(v_headroom_in_scn,'999,999,999,999,999,999'));
  dbms_output.put_line('headroom in sec                  : '||v_headroom_in_sec);
  dbms_output.put_line('current scn compatibility scheme : '||v_cur_scn_compat);
  dbms_output.put_line('max scn compatibility scheme     : '||v_max_scn_compat);
end;
/

SQL> /
reasonable scn limit (soft limit):	 16,104,861,483,008
headroom in scn 		 :	 16,104,860,802,471
headroom in sec 		 : 982962695
current scn compatibility scheme : 1
max scn compatibility scheme	 : 3

PL/SQL procedure successfully completed.

This is executed in a version 18.3.0.0 database. So the conclusion here is that currently all versions up to 18.3.0.0 are still compatible, because they all use the same SCN limit per second, which is referred to as ‘scheme 1’. However, on June 23, 2019 newer versions of the database will switch to a new scheme, which is referred to as ‘scheme 3’, which allows an more headroom.

First of all, I hope a lot of databases out there have enough headroom in the first place, and a modest rate of SCNs it is using per second, which means it doesn’t advance into the RSL. In such a case, when you got older versions that can not switch to scheme 3 and newer versions that will, I can see no reason to worry at all.

The second case is when your database is close to running out of headroom currently in scheme 1, and the SCN rate in the database is also close the limit, so you truly should worry when your database switches to scheme 3, it might actually run over the 16k per second limit, and when it does that long enough to run over RSL of scheme 1, communication over a database link between the current scheme 3 database with a scheme 1 database will be disrupted until the scheme 3 database SCN is low enough again to fit the scheme 1 RSL. If the SCN rate persists in the scheme 3 database, communication will be impossible between the scheme 1 and a scheme 3 database.

So, at this point it should be clear that the most important statistic for determining issues between scheme 1 and after June 2019 scheme 3 databases is the current headroom in your databases. For all of the databases involved that will have a database link between a scheme 1 and a scheme 3 database, you should investigate their SCN number and SCN rate. If some of these databases have run into the soft limit ora-600, ora 600 [2252], these are potential candidates for running over the soft limit when they switch to scheme 3.

However, the most important statistic is to see how close the current SCN is to the scheme 1 soft limit. This can be done using the following SQL (this SQL does not need a newer version of the database, and is tested with version 11.2.0.2 and higher):

col "RSL scheme 1" format 9,999,999,999,999,999
col "current value" format 9,999,999,999,999,999
select dbms_flashback.get_system_change_number "current value",
       ((((to_number(to_char(sysdate,'YYYY'))-1988)*12*31*24*60*60) +
       ((to_number(to_char(sysdate,'MM'))-1)*31*24*60*60) +
       (((to_number(to_char(sysdate,'DD'))-1))*24*60*60) +
       (to_number(to_char(sysdate,'HH24'))*60*60) +
       (to_number(to_char(sysdate,'MI'))*60) +
       (to_number(to_char(sysdate,'SS')))) * (16*1024)) "RSL scheme 1",
       round(dbms_flashback.get_system_change_number/((((to_number(to_char(sysdate,'YYYY'))-1988)*12*31*24*60*60) +
       ((to_number(to_char(sysdate,'MM'))-1)*31*24*60*60) +
       (((to_number(to_char(sysdate,'DD'))-1))*24*60*60) +
       (to_number(to_char(sysdate,'HH24'))*60*60) +
       (to_number(to_char(sysdate,'MI'))*60) +
       (to_number(to_char(sysdate,'SS')))) * (16*1024))*100,5) "% to RSL scheme 1"
from dual;

If a database is close the RSL (roughly speaking beyond 90%-95%), the next thing to do is measure if the database keeps on using SCNs and keeps on being close to the RSL. If that is true, an additional increase in SCN usage could in the current situation using scheme 1 lead to an ora-600 [2252], but if that database has switched to scheme 3 after June 2019, there will not be anything keeping that database from going beyond an SCN number that will exceed the RSL of scheme 1, which will then cause issues if that database has a database link with a scheme 1 database.

Is there anything you can do if you suspect or know a database will go over the scheme 1 RSL? Purely for this issue, the obvious solution would be to make sure you are on a version that will switch to scheme 3 on June 2019, so at least after June 2019 it will not run into ora-600 [2252].

However, if such a scheme 3 database needs to connect to an older scheme 1 database, you have two choices:
1. Potentially run over the the scheme 1 limit and disrupt the database link communication.
2. Stop a newer database from switching to scheme 3, potentially disrupt changes in the current database, but it guarantees the database link will always work.

To look into the switch to scheme 3, which oracle calls ‘auto rollover’, the following SQL can be used:

declare
  v_autorollover_date date;
  v_target_compat number;
  v_is_enabled boolean;
begin
  dbms_scn.getscnautorolloverparams(v_autorollover_date, v_target_compat, v_is_enabled);
  dbms_output.put_line('auto rollover date      : '||to_char(v_autorollover_date,'YYYY-MM-DD'));
  dbms_output.put_line('target scheme	        : '||v_target_compat);
  dbms_output.put_line('rollover enabled (1=yes): '||sys.diutil.bool_to_int(v_is_enabled));
end;
/
SQL> /
auto rollover date	: 2019-06-23
target scheme		: 3
rollover enabled (1=yes): 1

PL/SQL procedure successfully completed.

If you want to prevent a database from rolling over to scheme 3, the procedure dbms_scn.disableautorollover can be used:

exec dbms_scn.disableautorollover;

Obviously, the procedure enableautorollover does the opposite. Please mind to contact Oracle support before doing this with your production database, this is an undocumented procedure at this time.

Also mind that if you create a new database after June 23, 2019, with a new or patched version that can switch to scheme 3, it will probably be running scheme 3 by default. If you want to be absolutely sure it will not exceed the scheme 1 limit, you can revert it to scheme 1 manually using the alter database set scn compatibility N command in mount mode:

SQL> startup mount;
ORACLE instance started.

Total System Global Area 1048574496 bytes
Fixed Size		    8665632 bytes
Variable Size		  281018368 bytes
Database Buffers	  616562688 bytes
Redo Buffers		  142327808 bytes
Database mounted.
SQL> alter database set scn compatibility 1;

Database altered.

SQL> alter database open;

Database altered.

For this too I would strongly advise to contact Oracle support first. The purpose of this blogpost is to define the problem, show all the technical details that have to do with it, and show all the tools that are part of it. There is in no way anything in this article to tell you what you should do, it just shows everything that surrounds the switch to scheme 3 in June 2019.

Another view that might be beneficial is x$kcmscn. This view seems to be created to help looking if a scheme 3 database can connect to a scheme 1 database:

col cur_max_scn format 999,999,999,999,999
col pre_11_2_0_2_cur_max_scn format 999,999,999,999,999
select * from x$ksmscn;
ADDR		       INDX    INST_ID	   CON_ID CUR_MAX_RATE	  CUR_SCN
---------------- ---------- ---------- ---------- ------------ ----------
	 CUR_MAX_SCN DIFF_IN_DAYS DIFF_IN_HOURS PRE_11_2_0_2_CUR_MAX_SCN
-------------------- ------------ ------------- ------------------------
PRE_11_2_0_2_DIFF_IN_DAYS
-------------------------
00007F773DEDAE10	  0	     1		0	     0	   800918
  16,108,830,064,640	    11379	 273112       16,108,830,064,640
		186444783

Because currently (before June 2019) every database by default will be in scheme 1, the cur_max_scn and pre_11_2_0_2_cur_max_scn are identical. I even believe the column naming is wrong, the first version that can switch if it is patched to a high enough PSU version is 11.2.0.3, I do believe the column name is suggesting scheme 1 databases are databases of a version lower than 11.2.0.2, not including 11.2.0.2.

Conclusion.
I think there’s been a lot of fuzz for something that in most cases is not an issue. This article is supposed to give you all the knowledge and the tools to determine how it looks like in your situation.

This might be an issue if you happen to have one or more databases that are high on SCN numbering, and continues to take a lot of SCN numbers, and will be converted to a scheme 3 database on June 29, 2019 and is suspected to increase on taking SCN numbers for whatever reason AND it has a database link to a scheme 1 database that remains scheme 1. That’s a lot of ifs.

On the other hand you only need one database to be high in SCN numbering which continues to take a lot of SCNs keeping it close to the soft limit, which will propagate its SCN to other databases if it is linked, or the required properties of the problem spread out over multiple linked databases.

Again, I do not advise anything in this article, the purpose here is to provide all the details that surround it so you can make the best decision for yourself.

This blogpost is a follow up of this blogpost, with the exception that the install method in this blogpost is way easier, it uses an Ansible playbook to do most of the installation.

1. Install git and ansible via EPEL:

# yum -y localinstall https://dl.fedoraproject.org/pub/epel/epel-release-latest-7.noarch.rpm
# yum -y install ansible git

2. Clone my ‘prometheus_node_exp_grafana_install’ repository:

# git clone https://gitlab.com/FritsHoogland/prometheus_node_exp_grafana_install.git

3. Run the prometheus.yml playbook to install prometheus, node_exporter and grafana:

# cd prometheus_node_exp_grafana_install
# ansible-playbook prometheus.yml

Now all you need to do is setup grafana (at port 3000) to talk to prometheus to fetch data.
If you are interested, the memory usage dashboard can be imported, dashboard id 2747.

In fact, it seems node_exporter has renamed the operating system statistics, so the old Linux memory dashboard didn’t work anymore. I updated it, and uploaded a new version of the memory usage dashboard.

This blogpost is a look into a bug in the wait interface that has been reported by me to Oracle a few times. I verified all versions from Oracle 11.2 version up to 18.2.0.0.180417 on Linux x86_64, in all these versions this bug is present. The bug is that the wait event ‘db file async I/O submit’ does not time anything when using ASM, only when using a filesystem, where this wait event essentially times the time the system call io_submit takes. All tests are done on Linux x86_64, Oracle Linux 7.4 with database and grid version 18.2.0.0.180417

So what?
You might have not seen this wait event before; that’s perfectly possible, because this wait event is unique to the database writer. So does this wait event matter?

When the Oracle datebase engine is set to using asynchronous I/O, and when it makes sense to use asynchronous I/O (!), the engine will use the combination of io_submit() to issue I/O requests to the operating system, and when needs to, fetch the I/O requests using io_getevents(). In general (so not consistently), the engine does not time io_submit, which is a non-blocking call, it only times when it needs to wait for I/O requests using io_getevents(), which is reported as a wait event in an IO wait event class. A lot of ‘%parallel%’ IO related wait events can time asynchronous IO calls.

So why would the engine then time io_submit() for the database writer?
Well, io_submit() is not a blocking call, UNLESS the device queue to which the requests are submitted is full. This means that the developers of the database writer code decided to implement a wait event for io_submit, which is not the case for any other process.

To understand why this makes sense, a little knowledge about database writer internals is necessary. When blocks are dirtied in the cache and these blocks are checkpointed later on, these must be written to disk. The amount of blocks to be written and therefore the number of writes can get high very quickly. The way this is processed is quite interesting (simplified obviously; and when using a filesystem):

a) the database writer picks up a batch of blocks needing writing, for up to 128 IO requests.
b) that batch is submitted, timed by ‘db file async I/O submit’
c) a blocking io_getevents call is issued, timed by ‘db file parallel write’, to wait for the IOs to finish. The interesting thing specifically for the database writer is that the minimal number of IOs ready to wait for is very low (a few IOs to 25-75% of the IOs if the amount gets bigger). Any finished IO will be picked up here, however it’s perfectly possible IOs are still active after this step. In fact, I think it’s deliberately made that way.
d) if any IO requests are still pending, a nonblocking, non-wait event timed io_getevents call is issued to pick up any finished IOs.
e) if any blocks still need writing for which no IO request have been submitted, go to a).
f) if at this point IO requests are still pending, to to c).

This means that the database writer can submit huge amounts of IO requests, and keep on doing that, much more than any other process, because it doesn’t need to wait for all IOs to finish. So, this means that if there is a process that is likely to run into a blocking io_submit call, it’s the database writer.

When using a database without ASM, the above wait timing is exactly what happens. A function call graph of io_submit for the database writer when the database uses a filesystem looks like this:

 | | | | | > kslwtbctx(0x7ffc55eb3e60, 0x8b4, ...)
 | | | | | | > sltrgftime64(0x6c2f4288, 0x6bbe3ca0, ...)
 | | | | | | | > clock_gettime@plt(0x1, 0x7ffc55eb3400, ...)
 | | | | | | | | > clock_gettime(0x1, 0x7ffc55eb3400, ...)
 | | | | | | | | < clock_gettime+0x00000000005d returns: 0
 | | | | | | | < clock_gettime+0x00000000003a returns: 0
 | | | | | |  kslwait_timeout_centi_to_micro(0x7fffffff, 0x19183e92, ...)
 | | | | | |  kskthbwt(0x19c37b0d8, 0xb3, ...)
 | | | | | |  kslwt_start_snapshot(0x6c2f5538, 0x6c2f5538, ...)
 | | | | | | < kslwt_start_snapshot+0x0000000000d0 returns: 0x6c2f4ae8
 | | | | |  ksfdgo(0x800, 0, ...)
 | | | | | | > ksfd_skgfqio(0x7fc304483f78, 0x9, ...)
 | | | | | | | > skgfqio(0x7fc3091fddc0, 0x7fc304483f78, ...)
 | | | | | | | | > skgfrvldtrq(0x7fc304483f78, 0x9, ...)
 | | | | | | | |  sltrgftime64(0x2000, 0x7fc3043772b0, ...)
 | | | | | | | | | > clock_gettime@plt(0x1, 0x7ffc55eae3b0, ...)
 | | | | | | | | | | > clock_gettime(0x1, 0x7ffc55eae3b0, ...)
 | | | | | | | | | | < clock_gettime+0x00000000005d returns: 0
 | | | | | | | | | < clock_gettime+0x00000000003a returns: 0
 | | | | | | | |  skgfr_lio_listio64(0x7fc3091fddc0, 0x1, ...)
 | | | | | | | | | > io_submit@plt(0x7fc302992000, 0x115, ...)
 | | | | | | | | | < io_submit+0x000000000007 returns: 0x115
 | | | | | | | | < skgfr_lio_listio64+0x000000000131 returns: 0
 | | | | | | | < skgfqio+0x00000000035e returns: 0
 | | | | | | < ksfd_skgfqio+0x0000000001f5 returns: 0
 | | | | |  kslwtectx(0x7ffc55eb3e60, 0x7fc304483f78, ...)
 | | | | | | > sltrgftime64(0x7ffc55eb3e60, 0x7fc304483f78, ...)
 | | | | | | | > clock_gettime@plt(0x1, 0x7ffc55eb33e0, ...)
 | | | | | | | | > clock_gettime(0x1, 0x7ffc55eb33e0, ...)
 | | | | | | | | < clock_gettime+0x00000000005d returns: 0
 | | | | | | | < clock_gettime+0x00000000003a returns: 0
 | | | | | |  kslwt_end_snapshot(0x6c2f5538, 0x6c2f5538, ...)
 | | | | | | | > kslwh_enter_waithist_int(0x6c2f5538, 0x6c2f5538, ...)
 | | | | | | |  kslwtrk_enter_wait_int(0x6c2f5538, 0x6c2f5538, ...)
 | | | | | | | < kslwtrk_enter_wait_int+0x000000000019 returns: 0x6bcaa180
 | | | | | |  kslwt_update_stats_int(0x6c2f5538, 0x6c2f5538, ...)
 | | | | | | | > kews_update_wait_time(0x9, 0x8f54, ...)
 | | | | | | |  ksucpu_wait_update(0x9, 0x8f54, ...)
 | | | | | | | < ksucpu_wait_update+0x000000000036 returns: 0x6bd658b0
 | | | | | |  kskthewt(0x19c38402c, 0xb3, ...)
 | | | | | | < kskthewt+0x0000000005b1 returns: 0x30
 | | | | |  select event#, name from v$event_name where event# = to_number('b3','xx');
    EVENT# NAME
---------- ----------------------------------------------------------------
       179 db file async I/O submit

Now on to the actual purpose of this blog post, the same situation, but now when ASM is used. When ASM is used, there is a significant increase in the call stack. This means more code is executed. This may sound strange at first, but it’s very logical if you give it some thought: when using ASM, the Oracle database is talking to raw devices. This means that any of the functionality a filesystem performs, which is implemented in ASM must in some way be performed. This is done in several additional layers in the database code.

Let’s look at a backtrace of io_submit of the database writer when using a filesystem:

#0  0x00007f22bdb36690 in io_submit () from /lib64/libaio.so.1
#1  0x0000000004832ef0 in skgfr_lio_listio64 ()
#2  0x000000001238b7ce in skgfqio ()
#3  0x0000000011d5c3ad in ksfd_skgfqio ()
#4  0x0000000011d57fce in ksfdgo ()
#5  0x0000000000d9f21c in ksfdaio ()
#6  0x00000000039c4a5e in kcfisio ()
#7  0x0000000001d836ec in kcbbdrv ()
#8  0x000000001222fac5 in ksb_act_run_int ()
#9  0x000000001222e792 in ksb_act_run ()
#10 0x0000000003b8b9ce in ksbabs ()
#11 0x0000000003baa161 in ksbrdp ()
#12 0x0000000003fbaed7 in opirip ()
#13 0x00000000026ecaa0 in opidrv ()
#14 0x00000000032904cf in sou2o ()
#15 0x0000000000d681cd in opimai_real ()
#16 0x000000000329d2a1 in ssthrdmain ()
#17 0x0000000000d680d3 in main ()

If you want to follow the call sequence, a backtrace/stacktrace must be read from the bottom up.
ksb = kernel service background processes
kcf = kernel cache file management
ksfd = kernel service functions disk IO
skgf = o/s dependent kernel generic fiile
I hope you recognise the logical layers that are necessary for doing the I/O.

Now look at a backtrace of io_submit of the database writer when using ASM:

#0  0x00007f22bdb36690 in io_submit () from /lib64/libaio.so.1
#1  0x0000000004832ef0 in skgfr_lio_listio64 ()
#2  0x000000001238b7ce in skgfqio ()
#3  0x0000000011d5c3ad in ksfd_skgfqio ()
#4  0x0000000011d57fce in ksfdgo ()
#5  0x0000000000d9f21c in ksfdaio ()
#6  0x000000000755c1a8 in kfk_ufs_async_io ()
#7  0x0000000001455fb2 in kfk_submit_io ()
#8  0x00000000014551a8 in kfk_io1 ()
#9  0x0000000001450b3e in kfk_transitIO ()
#10 0x000000000143c450 in kfioSubmitIO ()
#11 0x000000000143bbaa in kfioRequestPriv ()
#12 0x000000000143b160 in kfioRequest ()
#13 0x000000000136f6bd in ksfdafRequest ()
#14 0x000000000137311a in ksfdafGo ()
#15 0x0000000011d58179 in ksfdgo ()
#16 0x0000000000d9f269 in ksfdaio ()
#17 0x00000000039c4a5e in kcfisio ()
#18 0x0000000001d836ec in kcbbdrv ()
#19 0x000000001222fac5 in ksb_act_run_int ()
#20 0x000000001222e792 in ksb_act_run ()
#21 0x0000000003b8b9ce in ksbabs ()
#22 0x0000000003baa161 in ksbrdp ()
#23 0x0000000003fbaed7 in opirip ()
#24 0x00000000026ecaa0 in opidrv ()
#25 0x00000000032904cf in sou2o ()
#26 0x0000000000d681cd in opimai_real ()
#27 0x000000000329d2a1 in ssthrdmain ()
#28 0x0000000000d680d3 in main ()

Essentially, a couple of layers are added to facilitate ASM; ksfdaf, kfio, kfk.
So the logical sequence becomes:
ksb = kernel service background processes
kcf = kernel cache file management
ksfd = kernel service functions disk IO
ksfdaf = kernel service functions disk IO ASM files
kfio = kernel automatic storage management translation I/O layer
kfk = kernel automatic storage management KFK

ksfd = kernel service functions disk IO
skgf = o/s dependent kernel generic file

Now to give an overview of the function call sequence, I simply need to cut out a lot of functions because otherwise it would be unreadable.

 | | | | | > ksfdgo(0x806, 0x35b4, ...)
 | | | | | | > ksfdafGo(0x806, 0x35b4, ...)
 | | | | | | | > ksfdafRequest(0x7ffcc7d845a0, 0x10f, ...)
 | | | | | | | | > kfioRequest(0x7ffcc7d845a0, 0x10f, ...)
 | | | | | | | | | > _setjmp@plt(0x7ffcc7d821d8, 0x10f, ...)
 | | | | | | | | |  __sigsetjmp(0x7ffcc7d821d8, 0, ...)
 | | | | | | | | |  __sigjmp_save(0x7ffcc7d821d8, 0, ...)
 | | | | | | | | |  kfioRequestPriv(0x7ffcc7d845a0, 0x10f, ...)
...
 | | | | | | | | | | | | | | | | > ksfdgo(0x188, 0x35c5, ...)
 | | | | | | | | | | | | | | | | | > ksfd_skgfqio(0x7f4232709f78, 0x9, ...)
 | | | | | | | | | | | | | | | | | | > skgfqio(0x7f4237483dc0, 0x7f4232709f78, ...)
 | | | | | | | | | | | | | | | | | | | > skgfrvldtrq(0x7f4232709f78, 0x9, ...)
 | | | | | | | | | | | | | | | | | | |  sltrgftime64(0x2000, 0x7f4230b61c98, ...)
 | | | | | | | | | | | | | | | | | | | | > clock_gettime@plt(0x1, 0x7ffcc7d7bb10, ...)
 | | | | | | | | | | | | | | | | | | | | | > clock_gettime(0x1, 0x7ffcc7d7bb10, ...)
 | | | | | | | | | | | | | | | | | | | | | < clock_gettime+0x000000000059 returns: 0
 | | | | | | | | | | | | | | | | | | | | < clock_gettime+0x00000000003a returns: 0
 | | | | | | | | | | | | | | | | | | |  skgfr_lio_listio64(0x7f4237483dc0, 0x1, ...)
 | | | | | | | | | | | | | | | | | | | | > io_submit@plt(0x7f4230ad8000, 0x112, ...)
 | | | | | | | | | | | | | | | | | | | | < io_submit+0x000000000007 returns: 0x112
 | | | | | | | | | | | | | | | | | | | < skgfr_lio_listio64+0x000000000131 returns: 0
 | | | | | | | | | | | | | | | | | | < skgfqio+0x00000000035e returns: 0
 | | | | | | | | | | | | | | | | | < ksfd_skgfqio+0x0000000001f5 returns: 0
 | | | | | | | | | | | | | | | | < ksfdgo+0x000000000135 returns: 0
...
 | | | | | | | | | < kfioRequestPriv+0x000000000224 returns: 0
 | | | | | | | | < kfioRequest+0x000000000251 returns: 0
 | | | | | | | < ksfdafRequest+0x0000000003c8 returns: 0
 | | | | | | < ksfdafGo+0x000000000081 returns: 0x1
 | | | | |  kslwtbctx(0x7ffcc7d86f60, 0x7f4232709f38, ...)
 | | | | | | > sltrgftime64(0x6da39e68, 0x6d2f5bc0, ...)
 | | | | | | | > clock_gettime@plt(0x1, 0x7ffcc7d86500, ...)
 | | | | | | | | > clock_gettime(0x1, 0x7ffcc7d86500, ...)
 | | | | | | | | < clock_gettime+0x000000000059 returns: 0
 | | | | | | | < clock_gettime+0x00000000003a returns: 0
 | | | | | |  kslwait_timeout_centi_to_micro(0x7fffffff, 0x14cb3fcb, ...)
 | | | | | |  kskthbwt(0x2b0f3fe06, 0xb3, ...)
 | | | | | |  kslwt_start_snapshot(0x6da3b118, 0x6da3b118, ...)
 | | | | | | < kslwt_start_snapshot+0x0000000000d0 returns: 0x6da3a6c8
 | | | | |  ksfdgo(0x808, 0, ...)
 | | | | |  kslwtectx(0x7ffcc7d86f60, 0x9, ...)
 | | | | | | > sltrgftime64(0x7ffcc7d86f60, 0x9, ...)
 | | | | | | | > clock_gettime@plt(0x1, 0x7ffcc7d864e0, ...)
 | | | | | | | | > clock_gettime(0x1, 0x7ffcc7d864e0, ...)
 | | | | | | | | < clock_gettime+0x000000000059 returns: 0
 | | | | | | | < clock_gettime+0x00000000003a returns: 0
 | | | | | |  kslwt_end_snapshot(0x6da3b118, 0x6da3b118, ...)
 | | | | | | | > kslwh_enter_waithist_int(0x6da3b118, 0x6da3b118, ...)
 | | | | | | |  kslwtrk_enter_wait_int(0x6da3b118, 0x6da3b118, ...)
 | | | | | | | < kslwtrk_enter_wait_int+0x000000000019 returns: 0x6dacf1e8
 | | | | | |  kslwt_update_stats_int(0x6da3b118, 0x6da3b118, ...)
 | | | | | | | > kews_update_wait_time(0x9, 0xd02, ...)
 | | | | | | |  ksucpu_wait_update(0x9, 0xd02, ...)
 | | | | | | | < ksucpu_wait_update+0x000000000036 returns: 0x6db40f70
 | | | | | |  kskthewt(0x2b0f40b08, 0xb3, ...)
 | | | | | | < kskthewt+0x0000000005b1 returns: 0x30
 | | | | |  ksfdafCopyWaitCtx(0x7ffcc7d86f60, 0xb3, ...)
 | | | | | | > _intel_fast_memcpy(0x7ffcc7d86f60, 0x7f423270a848, ...)
 | | | | | |  _intel_fast_memcpy.P(0x7ffcc7d86f60, 0x7f423270a848, ...)
 | | | | | |  __intel_ssse3_rep_memcpy(0x7ffcc7d86f60, 0x7f423270a848, ...)
 | | | | | | < __intel_ssse3_rep_memcpy+0x00000000242e returns: 0x7ffcc7d86f60
 | | | | | < ksfdafCopyWaitCtx+0x000000000038 returns: 0x7ffcc7d86f60
 | | | | < ksfdaio+0x00000000055f returns: 0x7ffcc7d86f60
 | | |  oradebug setorapname dbw0
Oracle pid: 18, Unix process pid: 3617, image: oracle@o182-fs.local (DBW0)
SQL> oradebug event sql_trace wait=true
Statement processed.

Then go to the trace directory, and tail the database writer trace file.
Next, attach to the database writer with gdb, and break on the io_submit call and perform a sleep 1 (sleep for 1 second). This should add 1000000 microseconds to the waiting time, if the wait event includes the function we put the break on.

(gdb) break io_submit
Breakpoint 1 at 0x7f336b986690
(gdb) commands
Type commands for breakpoint(s) 1, one per line.
End with a line saying just "end".
>shell sleep 1
>c
>end

Now continue the database writer, and execute a checkpoint (alter system checkpoint), and look at the wait events:

WAIT #0: nam='db file async I/O submit' ela= 2 requests=11 interrupt=0 timeout=0 obj#=-1 tim=15801301770
WAIT #0: nam='db file parallel write' ela= 5077 requests=1 interrupt=0 timeout=2147483647 obj#=-1 tim=15801306930

Well, it’s clear nothing has timed the one second we added, right? (the time in the wait event is at ‘ela’, which is in microseconds)

For the sake of completeness, and to validate this test method, let’s add the sleep to io_getevents (io_getevents_0_4) to see if ‘db file parallel write’ does show the extra time we added in the system call, because ‘db file parallel write’ is supposed to time io_getevents():

(gdb) dis 1
(gdb) break io_getevents_0_4
Breakpoint 2 at 0x7f336b986650
(gdb) commands
Type commands for breakpoint(s) 2, one per line.
End with a line saying just "end".
>shell sleep 1
>c
>end

Continue the database writer again, and execute a checkpoint:

WAIT #0: nam='db file async I/O submit' ela= 1 requests=22 interrupt=0 timeout=0 obj#=-1 tim=15983030322
WAIT #0: nam='db file parallel write' ela= 1003978 requests=2 interrupt=0 timeout=2147483647 obj#=-1 tim=15984034336

Yay! There we got the artificial waiting time!

Based on this, I can only come the conclusion that the wait event ‘db file async I/O submit’ does not perform any actual timing of the io_submit system call when ASM is used with the Oracle database.

%d bloggers like this: