Archive

Oracle EE

This blogpost is about opatch and how to obtain information about the current oracle home(s), and how to obtain information about the patches to be applied.

Patches that can be applied using opatch are provided by oracle as zip files which have the following naming convention:
p[patchnumber]_[baseversion]_[platform]-[architecture].zip. The patch normally contains an XML file called ‘PatchSearch.xml’ and a directory with the patch number. Inside the patch number directory there is a README.txt which is lame, because it says ‘Refer to README.html’, and a README.html that contains the readme information that is also visible when the [README] button for this patch is selected in MOS.

I spend my time on the CLI exclusively. This is because I spend my time on remote servers all the time, and using the X window system would be unusable. The best part of using the CLI is that when done correctly, it gives you almost infinite control over what you do, while when clicking through an interface toggling selections and filling out fields makes you entering information that you have to copy from a document or make up on the spot, which then is quickly hidden by entering another tab or window. In fact, based on my experience, this is a guaranteed way of generating wrong or inconsistent results.

Because of being on the CLI almost exclusively on servers, I sometimes need to read the README.html. I can do that in MOS, but sometimes you want that information on the spot. You can open up the HTML file in ‘vim’ or ‘less’, but it will show you a lot of HTML making it very hard to read. What I find useful is installing an executable called ‘elinks’ (available on oracle linux via yum), and then read the README.html in this text based browser:

$ elinks README.html

Result:

                                             Oracle® Database Patch % psuBug % - Database Release Update % product_version % (1/12)
                                                       Go to primary content

   Patch 28655784 - Database Release Update 18.4.0.0.181016

   ──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────

   The script content on this page is for navigation purposes only and does not alter the content in any way.

                                                          Oracle® Database

   Patch 28655784 - Database Release Update 18.4.0.0.181016

   This document is accurate at the time of release. For any changes and additional information regarding Database Release Update
   18.4.0.0.181016, see these related documents that are available at My Oracle Support (http://support.oracle.com/):

     * Document 2433586.1 Oracle DB/GI/OJVM Update/Revision R18 Oct 2018 Known Issues

   This document includes the following sections:

     * Section 1, "Patch Information"

     * Section 2, "Prerequisites"

     * Section 3, "Installation"

     * Section 4, "Deinstallation"

     * Section 5, "Known Issues"

     * Section 6, "References"

     * Section 7, "Bugs Fixed by This Patch"

     * Section 8, "Documentation Accessibility"

1 Patch Information

   Database Release Update 18.4.0.0.181016 patches are cumulative. That is, the content of all previous Database bundles is
   included in the latest Database bundle patch.

   To install the Database Release Update 18.4.0.0.181016 patch, the Oracle home must have the 18.1 Database installed.
http://support.oracle.com/                                                                                                 [------]

The first thing I do, is obtain information about this patch. Based on my experience, very carefully examine the readme. It contains vital information about the patch, but I found inconsistencies in it in the past. I got the impression the readme’s of patches that are created with a certain interval are simply copied from a previous version and then corrected by hand, and things might be entered wrong or are forgotten and not touched, leaving wrong information.

opatch lspatches on an unzipped patch

Luckily, a lot of information in the readme is actually stored in the metadata of the patch. You can query the metadata of the patch to be applied using opatch lspatches by pointing it to the patch directory:
(in this example I am querying patch 28655784, which is RU 18.4, and my database oracle home is version 18.3)

$ /u01/app/oracle/product/18.3.0.0/dbhome_1/OPatch/opatch lspatches 28655784/
patch_id:28655784
unique_patch_id:22509982
date_of_patch:8 Oct 2018, 21:27:28 hrs PST8PDT
patch_description:Database Release Update : 18.4.0.0.181016 (28655784)
component:oracle.rdbms.rsf.ic,18.0.0.0.0,optional; oracle.oracore.rsf,18.0.0.0.0,optional; oracle.ctx.atg,18.0.0.0.0,optional; oracle.rdbms.rman,18.0.0.0.0,optional; oracle.rdbms.rsf,18.0.0.0.0,optional; oracle.sdo.locator.jrf,18.0.0.0.0,optional; ....
platform:226,Linux x86-64
executable:ORACLE_HOME/lib/libclntsh.so.18.1; ORACLE_HOME/lib/libasmclntsh18.so; ORACLE_HOME/lib/libskgxp11.so; ORACLE_HOME/lib/libskgxp18.so; ORACLE_HOME/lib/libsqlplus.so; ORACLE_HOME/bin/oracle; ...
instance_shutdown:true
online_rac_installable:true
patch_type:singleton
product_family:db
auto:false
bug:28571483, TRACKING BUG TO REGRESS ALL BLR/CIS OF 27502420
...

(edited for brevity)
Line 2: patch_id: patch number
Line 5: patch_description: description 🙂
Line 6: component: to be patched objects grouped in a ‘component’
Line 7: platform: the platform the patch is intended for
Line 8: executable: the executables that are patched by this patch
Line 9: instance_shutdown: does an instance have to shutdown when it’s ORACLE_HOME is patched?
Line 10: online_rac_installable: does the database have to go down entirely, or can the patch be applied rolling?
Line 11: patch_type: singleton (a single patch). other types that I know: bundle_member (part of a patch bundle).
Line 12: product_family: for which oracle product is this patch? Grid infra also is family ‘db’.
Line 13: auto: can this patch be applied using ‘opatch auto’?

opatch lspatches can also be used to validate if a patch has been applied to the oracle home:

$ opatch lspatches -verify 28655784/
Inventory check failed: Patch ID 28655784 is NOT registered in Oracle Home "/u01/app/oracle/product/18.3.0.0/dbhome_1" inventory.
Following patches [ 28655784 ] are NOT registered in Oracle Home "/u01/app/oracle/product/18.3.0.0/dbhome_1" inventory or can't load its meta-data

OPatch failed with error code 1

Please mind the argument to -verify is a unzipped patch patch-number directory.

opatch query

This functionality is also available using ‘opatch query’ (executed as opatch query [argument] directory) opatch query is meant to be run on an unzipped patch patch-number directory, not on an ORACLE_HOME. The next overview takes the above opatch lspatches output, and lists for what lines there is an equivalent opatch query argument to retrieve that information:
Line 2: patch_id: is the patch number, which is the directory that is entered as an argument.
Line 5: patch_description: the description cannot be extracted on its own, but is visible with argument -all.
Line 6: component: argument: -get_component
Line 7: platform: argument: -get_os
Line 8: executable: the executables cannot be fetched on its own, but is visible with argument -all.
Line 9: instance_shutdown: argument: -is_online_patch
Line 10: online_rac_installable: argument: -is_rolling_patch
Line 11: patch_type: argument: -get_patch_type
Line 12: product_family: argument: -get_product_family
Line 13: auto: argument: -is_auto_patch

However, opatch query offers a couple of other switches which can come in handy:
-is_system_patch: System patches are patches that contain several sub-patches and must be applied using opatchauto.
-get_patching_model: The way the patch is applied, normal patching seem to have model “one-off”.

opatch lsinventory (or shorthand: lsinv)

Now the we looked at a patch to be applied, we should also look at the metadata of the oracle home to be patched, which is stored in the inventory. The contents of the inventory can be shown using lsinventory or shorthand lsinv.

$ /u01/app/oracle/product/18.3.0.0/dbhome_1/OPatch/opatch lsinv
Oracle Interim Patch Installer version 12.2.0.1.14
Copyright (c) 2018, Oracle Corporation.  All rights reserved.


Oracle Home       : /u01/app/oracle/product/18.3.0.0/dbhome_1
Central Inventory : /u01/app/oraInventory
   from           : /u01/app/oracle/product/18.3.0.0/dbhome_1/oraInst.loc
OPatch version    : 12.2.0.1.14
OUI version       : 12.2.0.4.0
Log file location : /u01/app/oracle/product/18.3.0.0/dbhome_1/cfgtoollogs/opatch/opatch2018-11-13_08-18-30AM_1.log

Lsinventory Output file location : /u01/app/oracle/product/18.3.0.0/dbhome_1/cfgtoollogs/opatch/lsinv/lsinventory2018-11-13_08-18-30AM.txt

--------------------------------------------------------------------------------
Local Machine Information::
Hostname: ip-10-0-11-24.local
ARU platform id: 226
ARU platform description:: Linux x86-64

Installed Top-level Products (1):

Oracle Database 18c                                                  18.0.0.0.0
There are 1 products installed in this Oracle Home.


Interim patches (4) :

Patch  27908644     : applied on Wed Jul 18 13:44:11 EDT 2018
Unique Patch ID:  22153180
Patch description:  "UPDATE 18.3 DATABASE CLIENT JDK IN ORACLE HOME TO JDK8U171"
   Created on 4 May 2018, 01:21:02 hrs PST8PDT
   Bugs fixed:
     27908644

Patch  27923415     : applied on Wed Jul 18 13:41:38 EDT 2018
Unique Patch ID:  22239273
Patch description:  "OJVM RELEASE UPDATE: 18.3.0.0.180717 (27923415)"
   Created on 15 Jul 2018, 10:33:22 hrs PST8PDT
   Bugs fixed:
     27304131, 27539876, 27952586, 27642235, 27636900, 27461740

Patch  28090553     : applied on Wed Jul 18 13:40:01 EDT 2018
Unique Patch ID:  22256940
Patch description:  "OCW RELEASE UPDATE 18.3.0.0.0 (28090553)"
   Created on 11 Jul 2018, 19:20:31 hrs PST8PDT
   Bugs fixed:
     12816839, 18701017, 22734786, 23698980, 23840305, 25709124, 25724089
     26299684, 26313403, 26433972, 26527054, 26586174, 26587652, 26647619
     26827699, 26860285, 26882126, 26882316, 26943660, 26996813, 27012915
...etc...

This shows:
Line 6: the ORACLE_HOME for which the contents are listed.
Line 7: the central inventory location.
Line 9: the opatch version
Line 10: the oracle universal install version
Line 17: the hostname
Line 18/19: the platform description and identification
Line 20/24: installed ‘top level’ products
Line 27- : applied interim patches

This is not very detailed data (except for the bugs fixed numbers), however, there is an option to get more information: add ‘-detail’ to the lsinv command:

$ /u01/app/oracle/product/18.3.0.0/dbhome_1/OPatch/opatch lsinv -detail
Oracle Interim Patch Installer version 12.2.0.1.14
Copyright (c) 2018, Oracle Corporation.  All rights reserved.


Oracle Home       : /u01/app/oracle/product/18.3.0.0/dbhome_1
Central Inventory : /u01/app/oraInventory
   from           : /u01/app/oracle/product/18.3.0.0/dbhome_1/oraInst.loc
OPatch version    : 12.2.0.1.14
OUI version       : 12.2.0.4.0
Log file location : /u01/app/oracle/product/18.3.0.0/dbhome_1/cfgtoollogs/opatch/opatch2018-11-13_10-15-09AM_1.log

Lsinventory Output file location : /u01/app/oracle/product/18.3.0.0/dbhome_1/cfgtoollogs/opatch/lsinv/lsinventory2018-11-13_10-15-09AM.txt

--------------------------------------------------------------------------------
Local Machine Information::
Hostname: ip-10-0-11-24.local
ARU platform id: 226
ARU platform description:: Linux x86-64

Installed Top-level Products (1):

Oracle Database 18c                                                  18.0.0.0.0
There are 1 products installed in this Oracle Home.


Installed Products (125):

Assistant Common Files                                               18.0.0.0.0
BLASLAPACK Component                                                 18.0.0.0.0
Buildtools Common Files                                              18.0.0.0.0
Cluster Verification Utility Common Files                            18.0.0.0.0
Database Configuration and Upgrade Assistants                        18.0.0.0.0
Database Migration Assistant for Unicode                             18.0.0.0.0
...etc...
Patch  27923415     : applied on Wed Jul 18 13:41:38 EDT 2018
Unique Patch ID:  22239273
Patch description:  "OJVM RELEASE UPDATE: 18.3.0.0.180717 (27923415)"
   Created on 15 Jul 2018, 10:33:22 hrs PST8PDT
   Bugs fixed:
     27304131, 27539876, 27952586, 27642235, 27636900, 27461740
   Files Touched:
     java.security --> ORACLE_HOME/javavm/jdk/jdk8/lib/security/java.security
     cacerts --> ORACLE_HOME/javavm/jdk/jdk8/lib/security/cacerts
     sunjce_provider.jar --> ORACLE_HOME/javavm/jdk/jdk8/lib/sunjce_provider.jar
     jce.jar --> ORACLE_HOME/javavm/jdk/jdk8/lib/jce.jar
     classes.bin --> ORACLE_HOME/javavm/jdk/jdk8/admin/classes.bin
     lfclasses.bin --> ORACLE_HOME/javavm/jdk/jdk8/admin/lfclasses.bin
     jvmpsupii.sql --> ORACLE_HOME/javavm/install/jvmpsupii.sql
     jvmpsupi.sql --> ORACLE_HOME/javavm/install/jvmpsupi.sql
     jvmpsu.sql --> ORACLE_HOME/javavm/install/jvmpsu.sql
     jvmpsui.sql --> ORACLE_HOME/javavm/install/jvmpsui.sql
     jvmpsupdi.sql --> ORACLE_HOME/javavm/install/jvmpsupdi.sql
     jvmpsupdii.sql --> ORACLE_HOME/javavm/install/jvmpsupdii.sql
     libjavavm18.a --> ORACLE_HOME/javavm/jdk/jdk8/lib/libjavavm18.a
     27923415.xml --> ORACLE_HOME/sqlpatch/27923415/22239273/27923415.xml
     27923415_apply.sql --> ORACLE_HOME/sqlpatch/27923415/22239273/27923415_apply.sql
     27923415_rollback.sql --> ORACLE_HOME/sqlpatch/27923415/22239273/27923415_rollback.sql
     jvmpsupdi.sql --> ORACLE_HOME/sqlpatch/27923415/22239273/rollback_files/18.1.0.0.0/javavm/install/jvmpsupdi.sql
     jvmpsupdii.sql --> ORACLE_HOME/sqlpatch/27923415/22239273/rollback_files/18.1.0.0.0/javavm/install/jvmpsupdii.sql
     jvmpsupii.sql --> ORACLE_HOME/sqlpatch/27923415/22239273/rollback_files/18.1.0.0.0/javavm/install/jvmpsupii.sql
     jvmpsu.sql --> ORACLE_HOME/sqlpatch/27923415/22239273/rollback_files/18.1.0.0.0/javavm/install/jvmpsu.sql
     jvmpsui.sql --> ORACLE_HOME/sqlpatch/27923415/22239273/rollback_files/18.1.0.0.0/javavm/install/jvmpsui.sql
     jvmpsupi.sql --> ORACLE_HOME/sqlpatch/27923415/22239273/rollback_files/18.1.0.0.0/javavm/install/jvmpsupi.sql
     jox.o --> ORACLE_HOME/rdbms/lib/jox.o
     ins_rdbms.mk --> ORACLE_HOME/rdbms/lib/jox_on
     ins_rdbms.mk --> ORACLE_HOME/rdbms/lib/ioracle
     aurora.zip --> ORACLE_HOME/javavm/lib/aurora.zip
   Patch Location in Inventory:
     /u01/app/oracle/product/18.3.0.0/dbhome_1/inventory/oneoffs/27923415
   Patch Location in Storage area:
     /u01/app/oracle/product/18.3.0.0/dbhome_1/.patch_storage/27923415_Jul_15_2018_10_33_22

Patch  28090553     : applied on Wed Jul 18 13:40:01 EDT 2018
Unique Patch ID:  22256940
Patch description:  "OCW RELEASE UPDATE 18.3.0.0.0 (28090553)"
   Created on 11 Jul 2018, 19:20:31 hrs PST8PDT
   Bugs fixed:
     12816839, 18701017, 22734786, 23698980, 23840305, 25709124, 25724089
...etc...

The start of the output is the same as without ‘-detail’. However, on lines 27-34 you see the products, which as far as I understand it are what is meant with ‘components’ when looking at the patch details using ‘opatch lspatches’ and ‘opatch query’.

If we look at the details of patch 27923415 (line 36), at the end of the ‘Files Touched’ list at lines 69-72 it shows where the metadata is stored of this patch; that’s in the inventory directory in oneoffs/27923415. This directory contains information about the currently applied patch contents in the home.
The storage area contains the previous version of the contents that have been patched in case a rollback is needed.

opatch prereq

Now the we looked at the patch and the ORACLE_HOME to be patched, the next thing we can look at is if the environment is ready for the patch that was inspected to be applied to the ORACLE_HOME that was inspected.

Please mind that normally you would follow the readme, maybe run the prereq CheckConflictAgainstOHWithDetail (I don’t normally, because I check for conflicts before that), and then run opatch apply. And that is perfectly fine! Opatch performs all the checks that are listed below before applying a patch. That is one of the reasons opatch takes such a long time: it does perform a lot of work to be absolutely certain everything is alright.

The reason for listing the below individual checks is first of all to make the reader aware that all the checks opatch apply executes are available individually to be performed prior to patching. Another reason is to document and logically group these checks.

In fact, the PSU patching documentation advises to run the check CheckConflictAgainstOHWithDetail before applying the database PSU. I hope you now understand this is just a check that would have been performed anyway, but by explicitly executing this check before patching, you can validate the patch is apply-able to the oracle home in the first place.

Once opatch fails these individual checks can be used to validate the error and obtain more information:
If you read through them, you’ll see there are checks which are supersets of other checks.

— os checks
– Is there enough disk space available?
opatch prereq CheckSystemSpace -ph 28655784/
– Are all the operating system executables needed for patching available?
opatch prereq CheckSystemCommandAvailable -ph 28655784/
– Is the patch compatible with the OS/platform?
opatch prereq CheckPatchApplicableOnCurrentPlatform -ph 28655784/
– Are the files and directory structure and rights of the unzipped patch sane?
opatch prereq CheckPatchShipHome -ph 28655784/
– Are any files that are touched by the patch still open by processes?
opatch prereq CheckActiveFilesAndExecutables -ph 28655784/
– Is the current user ‘root’? If so, fail the check
opatch prereq CheckUserAdminPrivilege -ph 28655784/

— inventory checks
– Check if the ORACLE_HOME to be patched is available in the central inventory
opatch prereq CheckCentralInventoryForOH -ph 28655784/
– Is the central inventory writeable?
opatch prereq CheckCentralInventoryForRWSession -ph 28655784/
– Is the central inventory sane?
opatch prereq CheckCentralInventoryLocation -ph 28655784/

— oracle home checks
– Are the required components available in the oracle home?
opatch prereq CheckComponents -ph 28655784/
– Are the required components (products) available in the ORACLE_HOME, and are the actions applicable?
opatch prereq CheckApplicable -ph 28655784/
– Are there patches applied to the ORACLE_HOME already that are in the patches to be applied
opatch prereq CheckForIdenticalPatchInOracleHome -ph 28655784/
– Check if the oracle home is locked for patching (which might be by a previous unsuccessful opatch execution)
opatch prereq CheckIfOHLockedForPatching -ph 28655784/
– Check if the oracle home is valid, check for proper inventory files in the oracle home
opatch prereq CheckOracleHome -ph 28655784/
– Check if all the patches that are required by the patch to be applied are present in the oracle home
opatch prereq CheckPatchApplyDependents -ph 28655784/
– Check if the patching model of the patch is compatible with the oracle home
opatch prereq CheckPatchingModel -ph 28655784/

— conflict checks
– Are there conflicts between the patches in the ORACLE_HOME and the patches to be applied?
opatch prereq CheckConflictAgainstOH -ph 28655784/
– Are there conflicts between the patches in the ORACLE_HOME and the patches to be applied,
– and print out detailed conflict information.
opatch prereq CheckConflictAgainstOHWithDetail -ph 28655784/
– Are there conflicts between the patches to be applied?
– (please mind this requires multiple patches, which can be specified using -phBaseDir)
opatch prereq CheckConflictAmongPatches -phBaseDir
– Are there conflicts between the patches to be applied,
– and print out detailed conflict information.
– (please mind this requires multiple patches, which can be specified using -phBaseDir)
opatch prereq CheckConflictAmongPatchesWithDetail -phBaseDir

— RAC checks
– Check if the central inventory has a CRS home if the database home is RAC
opatch prereq CheckForCRSHomeIfRAC -ph 28655784/
– Check if all the nodes in the RAC setup are valid, up and reachable
opatch prereq CheckRACNodeList -ph 28655784/
– Check if commands can be invoked on the remote machines
opatch prereq CheckRemoteCommandInvocable -ph 28655784/
– Check if files can be copied to and removed from the remote machines
opatch prereq CheckRemoteCopyAndRemove -ph 28655784/

— opatch checks
– Check if the input values provided to opatch are sufficient to run the patch action
opatch prereq CheckForInputValues -ph 28655784/
– Is the opatch version high enough as per patch requirement?
opatch prereq CheckMinimumOPatchVersion -ph 28655784/

— rollback checks
– Check if all the patches provided to rollback are present in the given oracle home
opatch prereq CheckInstalledOneOffs -ph 28655784/
– Check if there are patches applied that depend on the patch that is being rolled back
opatch prereq CheckPatchRollbackDependents -ph 28655784/
– Check if the patch can be rolled back from the oracle home
opatch prereq CheckRollbackable -ph 28655784/

— OUI checks
– Is the OUI available in the ORACLE_HOME?
opatch prereq CheckOUILocation -ph 28655784/
– Is the OUI version high enough?
opatch prereq CheckOUIVersionCompatible -ph 28655784/
– Are all OUI libraries present?
opatch prereq CheckRequiredLibs -ph 28655784/

— oraInst.loc check
– Is oraInst.loc file proper?
opatch prereq CheckOraInstLocation -ph 28655784/

A few options of which I currently am not sure what they exactly do:
– CheckApplicableProduct — ?
opatch prereq CheckApplicableProduct -ph 28655784/
– CheckForNoOpPatches — ?
opatch prereq CheckForNoOpPatches -ph 28655784/
– CheckRollbackSid — ?
opatch prereq CheckRollbackSid -ph 28655784/

Conclusion

This is the second post about opatch, and I didn’t even get to the actual patching yet.

This post showed a number of ways to inspect both the patch to be applied (opatch lspatches/query) and the oracle home to apply the patch to (opatch lspatches/lsinventory), and a way to check specific properties before applying a patch and to validate and check things after a patch (opatch prereq).

This is important once you have to apply a patch to validate a machine in general, the central inventory and the oracle home inventory for readiness, and even more important to check an environment in case of patch failure.

Advertisements

This blogpost is about oracle’s patching tool for the database and grid infrastructure: opatch. I personally have a love/hate relationship with opatch. In essence, opatch automates a lot of things that would be very error prone if it were to be done by hand, which is a good thing. However, there are a lot of things surrounding opatch that I despise. An example of that is the version numbering of opatch itself.

Versions and more versions

To jump into the matter directly: versions. I don’t understand why this has to be this complicated. I would even go as far as saying that somebody needs to step in and clean this up.

All opatch versions are identified by patch number 6880880. Anyone working with patching oracle probably knows this patch number by heart. You can go to this patch very quickly in MOS if you go to the search box which sits at the right side of the screen on the same height as the tabs, and type ‘patch 6880880’. A green rectangle will say ‘You have been directed to this PATCH based on an ID match’… So far the easy part.

On the right side below that, there is a rectangle, for which the first dropdown box is ‘release’. The release here means the release for which you want to download opatch. There are a great number of releases here:
– OPatch 12.2.0.1.0
– OPatch 13.9.0.0.0                           – EMCC
– OPatch 18.0.0.0.0
– OPatch for FMW 12c (OUI 13.9.x)   – Obsolete FMW(?)
– OUI NextGen 13.1                           – FMW
– OUI NextGen 13.2                           – FMW
– Oracle 10.1.0.0.0
– Oracle 10.1.0.3                                – Obsolete
– Oracle 10.2.0.0.0
– Oracle 11.1.0.0.0
– Oracle 11.2.0.0.0
– Oracle 12.1.0.1.0
– Oracle 12.1.0.1.1                             – Obsolete
– Oracle Database 12.2.0.1                – Obsolete

The red entries describe the patch being obsolete, and not downloadable. IMO, they should not be there.
The blue entries are non-database opatch releases. I would beg oracle to publish non-database opatch releases under their own patch number.

This eliminates 7 of the 14 choices. The left over entries show a list of versions that seems to make sense, these seem to show the database versions for which you can download a version of opatch. Let’s look at the patch descriptions of the leftover versions:

– OPatch 12.2.0.1.0    – Patch 6880880: OPatch 12.2.0.1.14 for DB 12.2 releases (JUL 2018)
– OPatch 18.0.0.0.0    – Patch 6880880: OPatch 12.2.0.1.14 for DB 18.x releases (JUL 2018)
– Oracle 10.1.0.0.0     – Patch 6880880: OPatch 9i, 10.1
– Oracle 10.2.0.0.0     – Patch 6880880: OPatch 10.2
– Oracle 11.1.0.0.0     – Patch 6880880: OPatch patch of version 11.1.0.12.9 for Oracle software releases 11.1.0.x (OCT 2015)
– Oracle 11.2.0.0.0     – Patch 6880880: OPatch patch of version 11.2.0.3.19 for Oracle software releases 11.2.0.x and 18.x (APR 2018)
– Oracle 12.1.0.1.0     – Patch 6880880: OPatch 12.2.0.1.14 for DB 12.x and 18.x releases (JUL 2018)

If we follow the versions from the lowest to the highest versions of Opatch:
– 10.1.0.0.0     -> ‘OPatch 9i, 10.1’ This is opatch for database versions 9i and 10.1.
– 10.2.0.0.0     -> ‘OPatch 10.2’ This is opatch for database version 10.2.
– 11.1.0.0.0     -> ‘OPatch patch of version 11.1.0.12.9 for Oracle software releases 11.1.0.x (OCT 2015)’ This is opatch for database version 11.1.0.x.
– 11.2.0.0.0     -> ‘OPatch patch of version 11.2.0.3.19 for Oracle software releases 11.2.0.x and 18.x (APR 2018)’ This is opatch for database version 11.2.0.x. I have no idea what ‘and 18.x’ means.
– 12.1.0.1.0     -> ‘OPatch patch 12.2.0.1.14 for DB 12.x and 18.x releases (JUL 2018)’ This is opatch for database version 12.1.x. NOT 12.x as the patch text indicates, it’s 12.1.x, and again I have no idea what ‘and 18.x’ means. I also don’t understand why the the 4th number is getting used all of a sudden.
– 12.2.0.1.0    -> ‘OPatch 12.2.0.1.14 for DB 12.2 releases (JUL 2018)’ This is opatch for database version 12.2. The description now aptly describes the database version and does not confuse.
– 18.0.0.0.0    -> ‘OPatch 12.2.0.1.14 for DB 18.x releases (JUL 2018)’ This is opatch for database version 18. The description again now aptly describes the version.

We are not there yet. Now that the patch release that should be used for every version of the database has been identified, let me throw in another version. This is the actual version that opatch reports when it is queried with ‘opatch version’. Let me list the opatch versions:
– 11.1.0.0.0 -> opatch version 11.1.0.12.9
– 11.2.0.0.0 -> opatch version 11.2.0.3.19
– 12.1.0.1.0 -> opatch version 12.2.0.1.14
– 12.2.0.1.0 -> opatch version 12.2.0.1.14
– 18.0.0.0.0 -> opatch version 12.2.0.1.14
Yes, this is again some weirdness. The opatch version seemed to have followed the database version, and -apparently- starting from version 12.1 up to version 18 there is one opatch version: 12.2.

That begs the question whether the 12.1.0.1.0, 12.2.0.1.0 and 18.0.0.0.0 opatch versions is actually the same opatch version. I got these versions downloaded, with the opatch version included in the name:

$ md5sum p6880880*
1ee44d25f5e858eb67424b69b89b8a25  p6880880_121010_Linux-x86-64-12.2.0.1.14.zip
1ee44d25f5e858eb67424b69b89b8a25  p6880880_122010_Linux-x86-64-12.2.0.1.14.zip
1ee44d25f5e858eb67424b69b89b8a25  p6880880_180000_Linux-x86-64-12.2.0.1.14.zip

Okay…so these are actually 100% identical copies, which just have a different name. I have no idea why the exact same file is offered with a different name. This means that the addition of ‘and 18.x’ in the patch description of the release means that that opatch version can also be used with version 18 of the database.

To be honest, the original intention of this blog article was to describe the some specific usage of opatch, and I now already got enough content for a post, and will retain the original intended content for a next blogpost.

The question you might be having at this point is: but what version should I download now? For databases up to version 11.2 it is simple, there is actually only one choice for for each version. For database versions starting from version 12.1 you -currently- have a choice. I would advise simply downloading the “correct” opatch version for every database version, which means the exact oracle version as indicated in the release pulldown menu at the download page (‘Oracle 12.1.0.1.0’, ‘OPatch 12.2.0.1.0’ or ‘OPatch 18.0.0.0.0’) despite the fact that these are the same opatch versions *currently*.

I suspect that once 12.1 goes into extended support, the opatch version will freeze, whilst the opatch version for version 12.2 and 18 will be improved and increase in version. So the only way to download the correct opatch version is still by choosing the actual database version it is intended for.

This brings me to another important thing to realise: the actual opatch version that is downloaded. At the time of writing the current and only available opatch release for Oracle database versions 12.1 to 18 is 12.2.0.1.14. Once oracle brings out a newer opatch version, the previous version will not be available anywhere anymore (as far as I know). To me this means that if you patch databases per tier (from test, development and so on up to production), you have to stage opatch in order to be able to use the same opatch version consistently in the future. Of course Oracle advises to use the latest opatch version, but the patch will check for a minimal opatch version, and if you tested your database software version, opatch version and patch to be working correctly together, in my opinion the most important thing is consistency.

I was asked some time ago what the Oracle database event ‘TCP socket (KGAS)’ means. This blogpost is a deep dive into what this event times in Oracle database 12.1.0.2.180717.

This event is not normally seen, only when TCP connections are initiated from the database using packages like UTL_TCP, UTL_SMTP and the one used in this article, UTL_HTTP.

A very basic explanation is this event times the time that a database foreground session spends on TCP connection management and communicating over TCP, excluding client and database link (sqlnet) networking. If you trace the system calls, you see that mostly that is working with a (network) socket. Part of the code in the oracle database that is managing that, sits in the kernel code layer kgas, kernel generic (of which I am quite sure, and then my guess:) asynchronous services, which explains the naming of the event.

This is what the Oracle online manual (https://docs.oracle.com/database/121/REFRN/GUID-203ACA60-9912-4493-9B79-EA4CDE89A78D.htm#REFRN00642 – Oracle is notorious for breaking links) says about ‘TCP socket (KGAS)’:

C.3.157 TCP Socket (KGAS)
A session is waiting for an external host to provide requested data over a network socket. The time that this wait event tracks does not indicate a problem, and even a long wait time is not a reason to contact Oracle Support. It naturally takes time for data to flow between hosts over a network, and for the remote aspect of an application to process any request made to it. An application that communicates with a remote host must wait until the data it will read has arrived. In addition, on Microsoft Windows, a separate thread monitors the arrival of traffic. This thread spends most of its life in waits tracked by the TCP Socket (KGAS) wait event.

Wait Time: The total elapsed time for the network connection to be established or for data to arrive from over the network

Parameter Description
P0 For Oracle internal use only. Values 8, 9, and 10 occur within the special thread present on Microsoft Windows; other P0 values occur in normal user sessions.

P1 For Oracle internal use only

Overall, the basic explanation that Oracle provides is mostly correct. I think the text saying to not contact Oracle support is not relevant, but maybe there is a need to relieve Oracle support. In my tests, I found that the full TCP connection lifecycle (creation, usage and removal) is put under this event, for which the text seems to emphasise on waiting for a remote host, which would be the most logical culprit for wait times, but other issues could lead to wait times additionally. This means the wait event itself is not explaining what it is showing, outside of TCP connection management.

The wait time explanation is nearly complete. If it would say something like ‘all TCP connection management and usage’ it would have fully covered it, it now excludes disconnecting and sending, because it explicitly mentions creating the connecting and receiving (waiting for data).

I do not understand what is meant with P0 and P1. I think it is p1 and p2 of the wait event, but naming it P0 and P1 is weird. When looking at the explanation it reads to me ‘we do not wish to explain anything to you’.

So, that means I am going to find this out myself….

If you are interested in this, or do want to write articles like this too, I urge you to replay this on your own system.

First of all, create a small setup which you can use to actually execute UTL_HTTP. The example essentially is taken from father of code examples, Tim Hall/Oracle base. Thank you Tim!
First setup the database to allow a user (in my case ‘ts’) to create the objects and use the network:

grant create sequence to ts;
grant create procedure to ts;
grant create table to ts;
grant alter session to ts;
begin
  dbms_network_acl_admin.create_acl (
    acl => 'anything.xml',
    description => 'allow anything',
    principal => 'TS',
    is_grant => true,
    privilege => 'connect'
  );
end;
begin
  dbms_network_acl_admin.assign_acl (
    acl => 'anything.xml',
    host => '*'
  );
end;

Then connect as the actual user (ts), and create the objects and the procedure that uses UTL_HTTP:

drop table http_clob_test;
create table http_clob_test (
        id number(10),
        url varchar2(255),
        data clob,
        constraint http_clob_test_pk primary key (id)
);
drop sequence http_clob_test_seq;
create sequence http_clob_test_seq;
CREATE OR REPLACE PROCEDURE load_html_from_url (p_url  IN  VARCHAR2) AS
  -- examples by tim hall
  -- https://oracle-base.com/articles/misc/retrieving-html-and-binaries-into-tables-over-http
  l_http_request   UTL_HTTP.req;
  l_http_response  UTL_HTTP.resp;
  l_clob           CLOB;
  l_text           VARCHAR2(32767);
BEGIN
  DBMS_LOB.createtemporary(l_clob, FALSE);

  -- Make a HTTP request and get the response.
  l_http_request  := UTL_HTTP.begin_request(p_url);
  l_http_response := UTL_HTTP.get_response(l_http_request);

  -- Copy the response into the CLOB.
  BEGIN
    LOOP
      UTL_HTTP.read_text(l_http_response, l_text, 32766);
      DBMS_LOB.writeappend (l_clob, LENGTH(l_text), l_text);
    END LOOP;
  EXCEPTION
    WHEN UTL_HTTP.end_of_body THEN
      UTL_HTTP.end_response(l_http_response);
  END;

  -- Insert the data into the table.
  INSERT INTO http_clob_test (id, url, data)
  VALUES (http_clob_test_seq.NEXTVAL, p_url, l_clob);

  -- Relase the resources associated with the temporary LOB.
  DBMS_LOB.freetemporary(l_clob);
EXCEPTION
  WHEN OTHERS THEN
    UTL_HTTP.end_response(l_http_response);
    DBMS_LOB.freetemporary(l_clob);
    RAISE;
END load_html_from_url;
/

The first thing to do is turn on sql_trace at level 8 to include waits:

set serverout on
alter session set events 'sql_trace level 8';
exec load_html_from_url('http://orafun.info/');
alter session set events 'sql_trace off';

If you look at the relevant piece, which means where it shows the wait events, it looks like this:

WAIT #139864521752120: nam='TCP Socket (KGAS)' ela= 128265  =2  =0  =0 obj#=662 tim=86395107497
WAIT #139864521752120: nam='TCP Socket (KGAS)' ela= 395  =5  =0  =0 obj#=662 tim=86395110191
WAIT #139864521752120: nam='TCP Socket (KGAS)' ela= 150  =6  =0  =0 obj#=662 tim=86395111115
WAIT #139864521752120: nam='TCP Socket (KGAS)' ela= 131998  =6  =0  =0 obj#=662 tim=86395243764
WAIT #139864521752120: nam='TCP Socket (KGAS)' ela= 269  =4  =0  =0 obj#=662 tim=86395245182
WAIT #139864521752120: nam='direct path write temp' ela= 4137 file number=201 first dba=257795 block cnt=1 obj#=662 tim=86395250494
WAIT #139864521752120: nam='TCP Socket (KGAS)' ela= 352  =3  =2  =0 obj#=662 tim=86395251294

What is shown here is some quite spectacular differences in elapsed time. Also, the only way to understand what is actually done flagged as ‘TCP Socket (KGAS)’ is the value following ‘ela’, which is the event p1 value.
The pattern is:

- 2
- 5
- 6
- 6
- 4
- 3

It’s relatively simple to guess what a few of these are:

- 2 - connect
- 5 - send
- 6 - \
- 6 - |   receiving ?
- 4 - /
- 3 - close

But if you include the timing, there must be more into play:

- 2 - ela= 128265   connect
- 5 - ela= 395      send
- 6 - ela= 150      \
- 6 - ela= 131998   |   receiving ?
- 4 - ela= 14       /
- 3 - ela= 177     close

2/connect: In order to build up a connection, a tcp connection needs to be created and established. That takes some time.
5/send: Sending from the perspective of a userland process is writing into a socket, which will get send by the operating system independently. This means sending from a userland process normally takes relative little time, because it’s not waiting for actually sending it.
6,4/receive: At this time, this doesn’t make sense to me.
3/close: Closing for a userland process is a simple, swift task. The operating system will keep the port open for some time, etc. but this is not visible for the user land application.

Let’s pull an old trick out of the hat: use strace (system call tracing) with an emphasis on writing on an oracle session that has SQL trace with waits enabled set. This will show the system calls executed, and show exactly when the oracle engine ends a wait, so we can reasonably well establish a relation between wait events and system calls. I say “reasonably well”, because we can’t see when Oracle truly started timing the wait event (kslwtbctx), only the output to trace file as part of ending the wait event (kslwtectx).

The way I done it, is using the command ‘strace -e write=all -p 18513 -o utl_http_test.txt’. Obviously 18513 is the process ID of the database foreground process. The results of the strace are in utl_http_test.txt.

Now open utl_http_test.txt and search for KGAS. The full output is way too much text, let me show some of the output which I think is noteworthy. Again: this is selective, partial output.
I do maintain the order in which the calls are visible.

1. TCP Socket (KGAS) p1=2, earlier annotated as ‘connect’

-- try to find a socket that has been created by nscd (name server caching deamon)
-- two times?
--
socket(AF_LOCAL, SOCK_STREAM|SOCK_CLOEXEC|SOCK_NONBLOCK, 0) = 9
connect(9, {sa_family=AF_LOCAL, sun_path="/var/run/nscd/socket"}, 110) = -1 ENOENT (No such file or directory)
close(9)                                = 0
socket(AF_LOCAL, SOCK_STREAM|SOCK_CLOEXEC|SOCK_NONBLOCK, 0) = 9
connect(9, {sa_family=AF_LOCAL, sun_path="/var/run/nscd/socket"}, 110) = -1 ENOENT (No such file or directory)
close(9)                                = 0
--
-- obtain file status of resolv.conf (hostname resolving configuration file)
--
stat("/etc/resolv.conf", {st_mode=S_IFREG|0644, st_size=93, ...}) = 0
--
-- open and read host.conf (another hostname resolving configuration file)
--
open("/etc/host.conf", O_RDONLY|O_CLOEXEC) = 9
fstat(9, {st_mode=S_IFREG|0644, st_size=9, ...}) = 0
mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7f34bf377000
read(9, "multi on\n", 4096)             = 9
read(9, "", 4096)                       = 0
close(9)                                = 0
--
-- open and read resolv.conf (hostname resolving configuration)
--
open("/etc/resolv.conf", O_RDONLY|O_CLOEXEC) = 9
fstat(9, {st_mode=S_IFREG|0644, st_size=93, ...}) = 0
mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7f34bf377000
read(9, "# Generated by NetworkManager\nse"..., 4096) = 93
read(9, "", 4096)                       = 0
close(9)                                = 0
--
-- open /etc/hosts (ip address to hostname mapping locally)
--
open("/etc/hosts", O_RDONLY|O_CLOEXEC)  = 9
fstat(9, {st_mode=S_IFREG|0644, st_size=200, ...}) = 0
mmap(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7f34bf377000
read(9, "127.0.0.1   localhost localhost."..., 4096) = 200
read(9, "", 4096)                       = 0
close(9)
--
-- at this point two dynamic loadable libraries are read: libnss_dns.so.2 and libresolv.so.2
--
-- this is the DNS lookup of orafun.info
-- again, this is done twice, just like the use of /var/run/nscd/socket above?
--
socket(AF_INET, SOCK_DGRAM|SOCK_CLOEXEC|SOCK_NONBLOCK, IPPROTO_IP) = 9
connect(9, {sa_family=AF_INET, sin_port=htons(53), sin_addr=inet_addr("10.0.2.3")}, 16) = 0
poll([{fd=9, events=POLLOUT}], 1, 0)    = 1 ([{fd=9, revents=POLLOUT}])
sendto(9, "=#\1\0\0\1\0\0\0\0\0\0\6orafun\4info\0\0\1\0\1", 29, MSG_NOSIGNAL, NULL, 0) = 29
 | 00000  3d 23 01 00 00 01 00 00  00 00 00 00 06 6f 72 61  =#...........ora |
 | 00010  66 75 6e 04 69 6e 66 6f  00 00 01 00 01           fun.info.....    |
poll([{fd=9, events=POLLIN}], 1, 5000)  = 1 ([{fd=9, revents=POLLIN}])
ioctl(9, FIONREAD, [45])                = 0
recvfrom(9, "=#\201\200\0\1\0\1\0\0\0\0\6orafun\4info\0\0\1\0\1\300\f\0"..., 2048, 0, {sa_family=AF_INET, sin_port=htons(53), sin_addr=inet_addr("10.0.2.3")}, [16]) = 45
close(9)                                = 0
socket(AF_INET, SOCK_DGRAM|SOCK_CLOEXEC|SOCK_NONBLOCK, IPPROTO_IP) = 9
connect(9, {sa_family=AF_INET, sin_port=htons(53), sin_addr=inet_addr("10.0.2.3")}, 16) = 0
poll([{fd=9, events=POLLOUT}], 1, 4971) = 1 ([{fd=9, revents=POLLOUT}])
sendto(9, "o=\1\0\0\1\0\0\0\0\0\0\6orafun\4info\0\0\34\0\1", 29, MSG_NOSIGNAL, NULL, 0) = 29
 | 00000  6f 3d 01 00 00 01 00 00  00 00 00 00 06 6f 72 61  o=...........ora |
 | 00010  66 75 6e 04 69 6e 66 6f  00 00 1c 00 01           fun.info.....    |
poll([{fd=9, events=POLLIN}], 1, 4970)  = 1 ([{fd=9, revents=POLLIN}])
ioctl(9, FIONREAD, [109])               = 0
recvfrom(9, "o=\201\200\0\1\0\0\0\1\0\0\6orafun\4info\0\0\34\0\1\300\f\0"..., 65536, 0, {sa_family=AF_INET, sin_port=htons(53), sin_addr=inet_addr("10.0.2.3")}, [16]) = 109
close(9)                                = 0
--
-- an epoll is created at file descriptor 9 (epoll: I/O event notification facility)
--
epoll_create(82)                        = 9
fcntl(9, F_SETFD, FD_CLOEXEC)           = 0
--
-- an IPV6 socket is created at file descriptor 11, 
-- bound to the IPV6 equivalent of localhost (::1),
-- destination port 0, source port 63257,
-- and is NOT connected.
--
socket(AF_INET6, SOCK_DGRAM, IPPROTO_IP) = 11
bind(11, {sa_family=AF_INET6, sin6_port=htons(0), inet_pton(AF_INET6, "::1", &sin6_addr), sin6_flowinfo=0, sin6_scope_id=0}, 28) = 0
getsockname(11, {sa_family=AF_INET6, sin6_port=htons(63257), inet_pton(AF_INET6, "::1", &sin6_addr), sin6_flowinfo=0, sin6_scope_id=0}, [28]) = 0
getpeername(11, 0x7ffdea6ba0f8, 0x7ffdea6ba1c8) = -1 ENOTCONN (Transport endpoint is not connected)
getsockopt(11, SOL_SOCKET, SO_SNDBUF, [262144], [4]) = 0
getsockopt(11, SOL_SOCKET, SO_RCVBUF, [262144], [4]) = 0
fcntl(11, F_SETFD, FD_CLOEXEC)          = 0
fcntl(11, F_SETFL, O_RDONLY|O_NONBLOCK) = 0
--
-- File descriptor 11 is added to the epoll at file descriptor 9.
--
epoll_ctl(9, EPOLL_CTL_ADD, 11, {EPOLLIN, {u32=3110993336, u64=139864426020280}}) = 0
--
-- A connection is created to the true destination (orafun.info/18.218.92.122).
-- This connection gets file descriptor 12.
-- Destination port 80 (http), source port 11751.
--
socket(AF_INET, SOCK_STREAM, IPPROTO_IP) = 12
fcntl(12, F_SETFL, O_RDONLY|O_NONBLOCK) = 0
connect(12, {sa_family=AF_INET, sin_port=htons(80), sin_addr=inet_addr("18.218.92.122")}, 16) = -1 EINPROGRESS (Operation now in progress)
times(NULL)                             = 438106227
mmap(NULL, 786432, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7f34b959b000
poll([{fd=12, events=POLLOUT}], 1, 60000) = 1 ([{fd=12, revents=POLLOUT}])
getsockopt(12, SOL_SOCKET, SO_ERROR, [0], [4]) = 0
fcntl(12, F_GETFL)                      = 0x802 (flags O_RDWR|O_NONBLOCK)
fcntl(12, F_SETFL, O_RDWR)              = 0
getsockname(12, {sa_family=AF_INET, sin_port=htons(11751), sin_addr=inet_addr("10.0.2.15")}, [16]) = 0
getsockopt(12, SOL_SOCKET, SO_SNDBUF, [87040], [4]) = 0
getsockopt(12, SOL_SOCKET, SO_RCVBUF, [374400], [4]) = 0
setsockopt(12, SOL_TCP, TCP_NODELAY, [1], 4) = 0
fcntl(12, F_SETFD, FD_CLOEXEC)          = 0
--
-- And this is the wait event written by the process: TCP Socket (KGAS), p1=2
--
write(7, "WAIT #139864521752120: nam='TCP "..., 95) = 95
 | 00000  57 41 49 54 20 23 31 33  39 38 36 34 35 32 31 37  WAIT #1398645217 |
 | 00010  35 32 31 32 30 3a 20 6e  61 6d 3d 27 54 43 50 20  52120: nam='TCP  |
 | 00020  53 6f 63 6b 65 74 20 28  4b 47 41 53 29 27 20 65  Socket (KGAS)' e |
 | 00030  6c 61 3d 20 31 32 38 32  36 35 20 20 3d 32 20 20  la= 128265  =2   |
 | 00040  3d 30 20 20 3d 30 20 6f  62 6a 23 3d 36 36 32 20  =0  =0 obj#=662  |
 | 00050  74 69 6d 3d 38 36 33 39  35 31 30 37 34 39 37     tim=86395107497  |

So yes, I am not sure if all of this is in the wait event, but there is a lot of stuff happening to build a connection to the remote server.

In order to find out why the lookup which is tried via the NSCD socket at the beginning, and later via DNS, is done twice, I ran the same procedure again and used tcpdump to look at the actual network traffic. This explained a lot:

# tcpdump -n host 10.0.2.3 and port 53
09:14:02.923389 IP 10.0.2.15.16819 > 10.0.2.3.domain: 15651+ A? orafun.info. (29)
09:14:02.948791 IP 10.0.2.3.domain > 10.0.2.15.16819: 15651 1/0/0 A 18.218.92.122 (45)
09:14:02.952304 IP 10.0.2.15.54590 > 10.0.2.3.domain: 28477+ AAAA? orafun.info. (29)
09:14:02.979534 IP 10.0.2.3.domain > 10.0.2.15.54590: 28477 0/1/0 (109)

In other words: first a DNS A record is requested (TCPv4 DNS name lookup), which results in the IPv4 ip address, then a DNS AAAA record is requested (TCPv6 DNS name lookup), which resulted in no ip address. In other words: orafun.info only has an IPv4 ip address. So the two lookups actually do have a function.

2. TCP Socket (KGAS) p1=5, earlier annotated as ‘send’
These are the systemcalls that are visible and quite probably related to the send wait event:

--
-- file descriptor 12 holding the connection to the destination server is added to the epoll at file descriptor 9
--
epoll_ctl(9, EPOLL_CTL_ADD, 12, {EPOLLIN, {u32=3110998864, u64=139864426025808}}) = 0
--
-- Then the http get request is sent to the destination server at its normal file descriptor, 12.
--
write(12, "GET / HTTP/1.1\r\nHost: orafun.inf"..., 56) = 56
 | 00000  47 45 54 20 2f 20 48 54  54 50 2f 31 2e 31 0d 0a  GET / HTTP/1.1.. |
 | 00010  48 6f 73 74 3a 20 6f 72  61 66 75 6e 2e 69 6e 66  Host: orafun.inf |
 | 00020  6f 0d 0a 43 6f 6e 6e 65  63 74 69 6f 6e 3a 20 63  o..Connection: c |
 | 00030  6c 6f 73 65 0d 0a 0d 0a                           lose....         |
--
-- And this is the wait event written by the process: TCP Socket (KGAS), p1=5
--
write(7, "WAIT #139864521752120: nam='TCP "..., 92) = 92
 | 00000  57 41 49 54 20 23 31 33  39 38 36 34 35 32 31 37  WAIT #1398645217 |
 | 00010  35 32 31 32 30 3a 20 6e  61 6d 3d 27 54 43 50 20  52120: nam='TCP  |
 | 00020  53 6f 63 6b 65 74 20 28  4b 47 41 53 29 27 20 65  Socket (KGAS)' e |
 | 00030  6c 61 3d 20 33 39 35 20  20 3d 35 20 20 3d 30 20  la= 395  =5  =0  |
 | 00040  20 3d 30 20 6f 62 6a 23  3d 36 36 32 20 74 69 6d   =0 obj#=662 tim |
 | 00050  3d 38 36 33 39 35 31 31  30 31 39 31              =86395110191     |

3. TCP Socket (KGAS) p1=6, earlier annotated as ‘receive’

--
-- Calling epoll_wait with timeout set to 0, so it doesn't block.
--
epoll_wait(9, [], 82, 0)                = 0
--
-- And this is the wait event written by the process: TCP Socket (KGAS), p1=6
-- 
write(7, "WAIT #139864521752120: nam='TCP "..., 92) = 92
 | 00000  57 41 49 54 20 23 31 33  39 38 36 34 35 32 31 37  WAIT #1398645217 |
 | 00010  35 32 31 32 30 3a 20 6e  61 6d 3d 27 54 43 50 20  52120: nam='TCP  |
 | 00020  53 6f 63 6b 65 74 20 28  4b 47 41 53 29 27 20 65  Socket (KGAS)' e |
 | 00030  6c 61 3d 20 31 35 30 20  20 3d 36 20 20 3d 30 20  la= 150  =6  =0  |
 | 00040  20 3d 30 20 6f 62 6a 23  3d 36 36 32 20 74 69 6d   =0 obj#=662 tim |
 | 00050  3d 38 36 33 39 35 31 31  31 31 31 35              =86395111115     |

I have been thinking a lot about this seemingly weird call. It calls epoll_wait, but indicates it doesn’t want to wait (timeout=0), and even if epol_wait would have returned anything, indicated by a return code > 0, the epoll_event pointer is not set (indicated by []). The epoll file descriptor is used, but the only working file descriptor in the epoll is file descriptor 12, which has just been sent a http GET command, so the functionality of epoll is used.

This doesn’t make sense, unless you think about the asynchronous IO implementation of Oracle (see a lot of my earlier investigations), for which (in the case of asynchronous IO) io_getevents was called in a similar matter, timeout set to 0, to be able to do more requests while earlier IO requests are executed by the kernel. So my current theory here is that if multiple requests are happening, this mechanism provides a way to handle them.

If you have a simple single request, like in my case, this systemcall seems redundant. And because it queries the epoll file descriptor right after the request, it returns zero events, because there hardly has been any time after sending the http GET request.

4. Second TCP Socket (KGAS) p1=6, earlier annotated as ‘receive’

--
-- Calling epoll_wait with timeout set to 30000 (milliseconds).
--
epoll_wait(9, [{EPOLLIN, {u32=3110998864, u64=139864426025808}}], 82, 30000) = 1
--
-- And this is the second wait event written by the process: TCP Socket (KGAS), p1=6
-- 
write(7, "WAIT #139864521752120: nam='TCP "..., 95) = 95
 | 00000  57 41 49 54 20 23 31 33  39 38 36 34 35 32 31 37  WAIT #1398645217 |
 | 00010  35 32 31 32 30 3a 20 6e  61 6d 3d 27 54 43 50 20  52120: nam='TCP  |
 | 00020  53 6f 63 6b 65 74 20 28  4b 47 41 53 29 27 20 65  Socket (KGAS)' e |
 | 00030  6c 61 3d 20 31 33 31 39  39 38 20 20 3d 36 20 20  la= 131998  =6   |
 | 00040  3d 30 20 20 3d 30 20 6f  62 6a 23 3d 36 36 32 20  =0  =0 obj#=662  |
 | 00050  74 69 6d 3d 38 36 33 39  35 32 34 33 37 36 34     tim=86395243764  |

This is the second time epoll_wait is called, and this one is blocking, because timeout has been set to 30000 milliseconds. If you look at the ela time, this took some time, and this now makes perfect sense: this system calls waits for an event to become available in the epoll, so it waits for the response of the remote http server. Please mind this call just notifies the userland process that the response is ready, the received data yet has to be read.

5. TCP Socket (KGAS) p1=4, earlier annotated as ‘receive’

--
-- At this point we know there is a response. First the original file descriptor is removed from the epoll:
--
epoll_ctl(9, EPOLL_CTL_DEL, 12, 0x7ffdea6b9710) = 0
--
-- The the response is read from file descriptor 12:
--
read(12, "HTTP/1.1 200 OK\r\nServer: nginx/1"..., 4096) = 2687
--
-- Then file descriptor 12 is added to the epoll again.
--
epoll_ctl(9, EPOLL_CTL_ADD, 12, {EPOLLIN, {u32=3110998864, u64=139864426025808}}) = 0
--
-- And a wait event written by the process: TCP Socket (KGAS), p1=4
--
write(7, "WAIT #139864521752120: nam='TCP "..., 92) = 92
 | 00000  57 41 49 54 20 23 31 33  39 38 36 34 35 32 31 37  WAIT #1398645217 |
 | 00010  35 32 31 32 30 3a 20 6e  61 6d 3d 27 54 43 50 20  52120: nam='TCP  |
 | 00020  53 6f 63 6b 65 74 20 28  4b 47 41 53 29 27 20 65  Socket (KGAS)' e |
 | 00030  6c 61 3d 20 32 36 39 20  20 3d 34 20 20 3d 30 20  la= 269  =4  =0  |
 | 00040  20 3d 30 20 6f 62 6a 23  3d 36 36 32 20 74 69 6d   =0 obj#=662 tim |
 | 00050  3d 38 36 33 39 35 32 34  35 31 38 32              =86395245182     |

So, what p1 set to 4 actually means, is that once the connection did return data, which is checked using epoll, and visible with p1 set to 6, it is read into the process. This is also the reason this takes very little time, this is the time to read data from kernelspace to user space, and to manage the connection’s file descriptor. It is taken off the epoll in order not to disturb it, and it is added again because there could be another request.

6. TCP Socket (KGAS) p1=3, earlier annotated as ‘close’

--
-- file descriptor 12 removed from the epoll
--
epoll_ctl(9, EPOLL_CTL_DEL, 12, 0x7ffdea6bac20) = 0
--
-- file descriptor 12 is closed, closing the network connection
--
close(12)                               = 0
--
-- And a wait event written by the process: TCP Socket (KGAS), p1=3
--
write(7, "WAIT #139864521752120: nam='TCP "..., 92) = 92
 | 00000  57 41 49 54 20 23 31 33  39 38 36 34 35 32 31 37  WAIT #1398645217 |
 | 00010  35 32 31 32 30 3a 20 6e  61 6d 3d 27 54 43 50 20  52120: nam='TCP  |
 | 00020  53 6f 63 6b 65 74 20 28  4b 47 41 53 29 27 20 65  Socket (KGAS)' e |
 | 00030  6c 61 3d 20 33 35 32 20  20 3d 33 20 20 3d 32 20  la= 352  =3  =2  |
 | 00040  20 3d 30 20 6f 62 6a 23  3d 36 36 32 20 74 69 6d   =0 obj#=662 tim |
 | 00050  3d 38 36 33 39 35 32 35  31 32 39 34              =86395251294     |

I don’t think this part holds any surprises. The network file descriptor is first removed from the epoll, and then it is closed, ending the TCP connection that was setup to perform a http request (in my case, I didn’t test, but I believe you will see the same with for example a SMTP connection, or any other type of TCP connection).

Summary

The basic message of this article is not surprising, nor does it conflict with current knowledge. Whenever you see a wait event ‘TCP Socket (KGAS)’, it means a foreground process is performing TCP networking via PLSQL. This wait event is a single event for creating, sending, receiving and closing a connection.

The true information of this article is how you can use the p1 value of the event to learn what actually the foreground is doing, and thus should give you more information to troubleshoot in the case of long waiting times.

TCP Socket (KGAS) p1 values:
1 - ?
2 - perform DNS lookup and create connection
3 - close connection
4 - copy TCP response into process space
5 - send request
6 - wait for TCP response to become available
7 - ?
8 - ? \
9 - ? | According to documentation, windows only in a 'special thread'.
10- ? /

This post was triggered upon rereading a blogpost by Mike Dietrich called databases need patched minimum april 2019. Mike’s blogpost makes it clear this is about databases that are connected using database links, and that:
– Newer databases do not need additional patching for this issue (11.2.0.4, 12.1.0.2, 12.2 and newer).
– Recent PSU patches contain a fix for certain older versions (11.1.0.7, 11.2.0.3 and 12.1.0.1).
– This means versions 11.2.0.2 and earlier 11.2 versions, 11.1.0.6 and earlier and anything at version 10 or earlier can not be fixed and thus are affected.

But what is the actual issue?

The first link in the article: Recommended patches and actions for Oracle databases versions 12.1.0.1, 11.2.0.3 and earlier – before June 2019 (Doc ID 2361478.1) provides essentially the same information as Mike’s blogpost, however it additionaly mentions that the interoperability of database clients with database servers is not affected.

Mike’s article mentions the following:
The alert refers to an SCN issue which came up a while ago. The system change number (SCN) is a logical, internal timestamp used by the Oracle Database. SCNs order events that occur within the database, which is necessary to satisfy the ACID properties of a transaction. The database uses SCNs to query and track changes.

So I guess it has something to do with SCNs. Most of the links are about SCNs. The MOS article that is most descriptive is: System Change Number (SCN), Headroom, Security and Patch Information (Doc ID 1376995.1).

This article describes a lot of details:
– SCNs are necessary for the database to keep changes organised. I got an article that explains that SCNs are not unique to a transaction, but are “just” granular enough to keep changes organised.
– SCNs are an ever increasing number. SCNs are never decreased!
– SCNs have a hard limit, which version specific, and is based on the number of bits for the number:
– – The general limit is 2^48.
– – From version 12.2 on, with compatibility set to 12.2 or higher, the limit is 2^63.
– SCNs have a per second increasing soft limit, called ‘RSL’ or ‘reasonable SCN limit’, which is version specific:
– – The general soft limit is 16384 (16k) SCNs per second calculated by the number of second from Januari 1st, 1988 times 16384.
– – From version 12.2 on, with compatibility set to 12.2 or higher, the limit is 98304 (96k) SCNs per second calculated by the number of seconds from Januari 1st, 2008 times 98304.
– The RSL can not be exceeded, if a database session tries to go beyond the soft limit, an ORA 600 [2252] is raised and the transaction is rolled back. This will not corrupt data (but obviously the data in the transaction is not applied).
– The difference between the current SCN and the RSL SCN is known as ‘SCN headroom’.
– There have been bugs that can increase SCNs at a higher rate, decreasing the SCN headroom or even reaching the soft limit, but these have all been solved in the Januari 2012 CPU/PSU/patch bundles.
– When databases communicate which each other via a database link, the SCNs of both databases are synchronised by picking the highest of the two.

But it doesn’t really explain why patches must be applied before June 2019. However, another article is more concrete on that: Recommended patching and actions for Oracle database versions 12.1.0.1, 11.2.0.3 and earlier – before June 2019 (Doc ID 2335265.1). The first interesting thing mentioned is:

3. What is the change introduced by the patches listed above?
These patches increase the database’s current maximum SCN (system change number) limit.
At any point in time, the Oracle Database calculates a “not to exceed” limit for the number of SCNs a database can have used, based on the number of seconds elapsed since 1988.

So, this means the patched database have a change (increase) in the RSL.

These recommended patches enable the databases to allow for a higher current maximum SCN limit. The rate at which this limit is calculated can be referred to as the “SCN rate” and these patches help allow higher SCN rates to enable databases to support many times higher transaction rates than earlier releases.

And this means the RSL is increased from the 16k per second since Januari 1988 to the 96k per second since Januari 2008.

Please note that the patches only increase the max limit but the current SCN is not impacted. So, if all your databases don’t have any major change in transaction rate, the current SCN would still remain below the current maximum SCN limit and database links between newer (or patched) and unpatched databases would continue to work. The patches provide the safety measure to ensure that you don’t have any issue with dblinks independent of any possible future change in your transaction rate.

With the patches applied, this change in current maximum SCN limit will happen automatically starting 23rd June 2019.

That is important information! So with the patch applied to some databases and not applied to some other databases and patched and non-patched databases have database links between them, everything should remain working. This is true for any database at this moment, because the change of the limit will happen on the 23rd of June 2019, which at the time of writing is in the future.

Once the change of limit has happened at the 23rd of June 2019, database links between older, non-patched versions of the database and newer or patched versions of the database could be affected if after the 23rd of June 2019 the SCN rate is increased in a newer or patched database and it runs beyond the headroom available in databases with the 16k rate, because the earlier mentioned SCN synchronisation then will fail because it’s beyond the 16k rate database headroom meaning it can not synchronise with the newer dataebase.

So the problem we are talking about here is two databases talking with each other over a database link, which have a different RSL, which could lead to a situation that one database is at an SCN number which is too high for the other older or non-patched database, meaning the communication will fail, which will persist until the older or non-patched databases is able to reach the SCN number of the newer database over time, if that is possible at all.

A thing that is not clear to me at this point: the documentation to me seems to suggest that Oracle version 12.2 with compatibility set to 12.2 or higher versions already allows the higher number of SCNs per second. (the above point: – – From version 12.2 on, with compatibility set to 12.2 or higher, the limit is 98304 (96k) SCNs per second calculated by the number of seconds from Januari 1st, 2008) If that is true, the issue that is warned for could potentially already happen!

Luckily, there is are ways to investigate this:

The reported “newer” versions and the versions that are patched for the rate change have an SGA variable “kcmscnc_” that lists the current SCN rate of the database. There are 3 values that I have seen for “kcmscnc_”:
1: This is the rate of 16k per second since Januari 1st 1988.
2: ?
3: This is the rate of 96k per second since Januari 1st 2008.

SQL> select ksmfsnam, ksmfsadr, ksmfssiz from x$ksmfsv where ksmfsnam like 'kcmscnc_';
KSMFSNAM							 KSMFSADR	    KSMFSSIZ
---------------------------------------------------------------- ---------------- ----------
kcmscnc_							 000000006001579C	   4
SQL> oradebug peek 0x6001579C 4
[06001579C, 0600157A0) = 00000001

So, this databases is capable of switching (because kcmscnc_ exists), and is currently running with the 16k per second threshold.

In fact, I tested this on 11.2.0.4, 12.1.0.2 and 18.3.0.0, all these version report currently (currently is before June 2019) “1” or “scheme 1”. So the above mentioned rate of 96k for 12.2 and above does currently not apply for the soft limit or SCN headroom calculation for any database, including 12.2 and 18.3.

There is a less “hardcore” way to obtain this information, via the DBMS_SCN. This package equally to the “kcmscnc_” variable only exists if the database is of a version or patch version high enough to switch:

declare
  v_rsl number;
  v_headroom_in_scn number;
  v_headroom_in_sec number;
  v_cur_scn_compat number;
  v_max_scn_compat number;
begin
  dbms_scn.getcurrentscnparams(v_rsl, v_headroom_in_scn, v_headroom_in_sec, v_cur_scn_compat, v_max_scn_compat);
  dbms_output.put_line('reasonable scn limit (soft limit): '||to_char(v_rsl,'999,999,999,999,999,999'));
  dbms_output.put_line('headroom in scn                  : '||to_char(v_headroom_in_scn,'999,999,999,999,999,999'));
  dbms_output.put_line('headroom in sec                  : '||v_headroom_in_sec);
  dbms_output.put_line('current scn compatibility scheme : '||v_cur_scn_compat);
  dbms_output.put_line('max scn compatibility scheme     : '||v_max_scn_compat);
end;
/

SQL> /
reasonable scn limit (soft limit):	 16,104,861,483,008
headroom in scn 		 :	 16,104,860,802,471
headroom in sec 		 : 982962695
current scn compatibility scheme : 1
max scn compatibility scheme	 : 3

PL/SQL procedure successfully completed.

This is executed in a version 18.3.0.0 database. So the conclusion here is that currently all versions up to 18.3.0.0 are still compatible, because they all use the same SCN limit per second, which is referred to as ‘scheme 1’. However, on June 23, 2019 newer versions of the database will switch to a new scheme, which is referred to as ‘scheme 3’, which allows an more headroom.

First of all, I hope a lot of databases out there have enough headroom in the first place, and a modest rate of SCNs it is using per second, which means it doesn’t advance into the RSL. In such a case, when you got older versions that can not switch to scheme 3 and newer versions that will, I can see no reason to worry at all.

The second case is when your database is close to running out of headroom currently in scheme 1, and the SCN rate in the database is also close the limit, so you truly should worry when your database switches to scheme 3, it might actually run over the 16k per second limit, and when it does that long enough to run over RSL of scheme 1, communication over a database link between the current scheme 3 database with a scheme 1 database will be disrupted until the scheme 3 database SCN is low enough again to fit the scheme 1 RSL. If the SCN rate persists in the scheme 3 database, communication will be impossible between the scheme 1 and a scheme 3 database.

So, at this point it should be clear that the most important statistic for determining issues between scheme 1 and after June 2019 scheme 3 databases is the current headroom in your databases. For all of the databases involved that will have a database link between a scheme 1 and a scheme 3 database, you should investigate their SCN number and SCN rate. If some of these databases have run into the soft limit ora-600, ora 600 [2252], these are potential candidates for running over the soft limit when they switch to scheme 3.

However, the most important statistic is to see how close the current SCN is to the scheme 1 soft limit. This can be done using the following SQL (this SQL does not need a newer version of the database, and is tested with version 11.2.0.2 and higher):

col "RSL scheme 1" format 9,999,999,999,999,999
col "current value" format 9,999,999,999,999,999
select dbms_flashback.get_system_change_number "current value",
       ((((to_number(to_char(sysdate,'YYYY'))-1988)*12*31*24*60*60) +
       ((to_number(to_char(sysdate,'MM'))-1)*31*24*60*60) +
       (((to_number(to_char(sysdate,'DD'))-1))*24*60*60) +
       (to_number(to_char(sysdate,'HH24'))*60*60) +
       (to_number(to_char(sysdate,'MI'))*60) +
       (to_number(to_char(sysdate,'SS')))) * (16*1024)) "RSL scheme 1",
       round(dbms_flashback.get_system_change_number/((((to_number(to_char(sysdate,'YYYY'))-1988)*12*31*24*60*60) +
       ((to_number(to_char(sysdate,'MM'))-1)*31*24*60*60) +
       (((to_number(to_char(sysdate,'DD'))-1))*24*60*60) +
       (to_number(to_char(sysdate,'HH24'))*60*60) +
       (to_number(to_char(sysdate,'MI'))*60) +
       (to_number(to_char(sysdate,'SS')))) * (16*1024))*100,5) "% to RSL scheme 1"
from dual;

If a database is close the RSL (roughly speaking beyond 90%-95%), the next thing to do is measure if the database keeps on using SCNs and keeps on being close to the RSL. If that is true, an additional increase in SCN usage could in the current situation using scheme 1 lead to an ora-600 [2252], but if that database has switched to scheme 3 after June 2019, there will not be anything keeping that database from going beyond an SCN number that will exceed the RSL of scheme 1, which will then cause issues if that database has a database link with a scheme 1 database.

Is there anything you can do if you suspect or know a database will go over the scheme 1 RSL? Purely for this issue, the obvious solution would be to make sure you are on a version that will switch to scheme 3 on June 2019, so at least after June 2019 it will not run into ora-600 [2252].

However, if such a scheme 3 database needs to connect to an older scheme 1 database, you have two choices:
1. Potentially run over the the scheme 1 limit and disrupt the database link communication.
2. Stop a newer database from switching to scheme 3, potentially disrupt changes in the current database, but it guarantees the database link will always work.

To look into the switch to scheme 3, which oracle calls ‘auto rollover’, the following SQL can be used:

declare
  v_autorollover_date date;
  v_target_compat number;
  v_is_enabled boolean;
begin
  dbms_scn.getscnautorolloverparams(v_autorollover_date, v_target_compat, v_is_enabled);
  dbms_output.put_line('auto rollover date      : '||to_char(v_autorollover_date,'YYYY-MM-DD'));
  dbms_output.put_line('target scheme	        : '||v_target_compat);
  dbms_output.put_line('rollover enabled (1=yes): '||sys.diutil.bool_to_int(v_is_enabled));
end;
/
SQL> /
auto rollover date	: 2019-06-23
target scheme		: 3
rollover enabled (1=yes): 1

PL/SQL procedure successfully completed.

If you want to prevent a database from rolling over to scheme 3, the procedure dbms_scn.disableautorollover can be used:

exec dbms_scn.disableautorollover;

Obviously, the procedure enableautorollover does the opposite. Please mind to contact Oracle support before doing this with your production database, this is an undocumented procedure at this time.

Also mind that if you create a new database after June 23, 2019, with a new or patched version that can switch to scheme 3, it will probably be running scheme 3 by default. If you want to be absolutely sure it will not exceed the scheme 1 limit, you can revert it to scheme 1 manually using the alter database set scn compatibility N command in mount mode:

SQL> startup mount;
ORACLE instance started.

Total System Global Area 1048574496 bytes
Fixed Size		    8665632 bytes
Variable Size		  281018368 bytes
Database Buffers	  616562688 bytes
Redo Buffers		  142327808 bytes
Database mounted.
SQL> alter database set scn compatibility 1;

Database altered.

SQL> alter database open;

Database altered.

For this too I would strongly advise to contact Oracle support first. The purpose of this blogpost is to define the problem, show all the technical details that have to do with it, and show all the tools that are part of it. There is in no way anything in this article to tell you what you should do, it just shows everything that surrounds the switch to scheme 3 in June 2019.

Another view that might be beneficial is x$kcmscn. This view seems to be created to help looking if a scheme 3 database can connect to a scheme 1 database:

col cur_max_scn format 999,999,999,999,999
col pre_11_2_0_2_cur_max_scn format 999,999,999,999,999
select * from x$ksmscn;
ADDR		       INDX    INST_ID	   CON_ID CUR_MAX_RATE	  CUR_SCN
---------------- ---------- ---------- ---------- ------------ ----------
	 CUR_MAX_SCN DIFF_IN_DAYS DIFF_IN_HOURS PRE_11_2_0_2_CUR_MAX_SCN
-------------------- ------------ ------------- ------------------------
PRE_11_2_0_2_DIFF_IN_DAYS
-------------------------
00007F773DEDAE10	  0	     1		0	     0	   800918
  16,108,830,064,640	    11379	 273112       16,108,830,064,640
		186444783

Because currently (before June 2019) every database by default will be in scheme 1, the cur_max_scn and pre_11_2_0_2_cur_max_scn are identical. I even believe the column naming is wrong, the first version that can switch if it is patched to a high enough PSU version is 11.2.0.3, I do believe the column name is suggesting scheme 1 databases are databases of a version lower than 11.2.0.2, not including 11.2.0.2.

Conclusion.
I think there’s been a lot of fuzz for something that in most cases is not an issue. This article is supposed to give you all the knowledge and the tools to determine how it looks like in your situation.

This might be an issue if you happen to have one or more databases that are high on SCN numbering, and continues to take a lot of SCN numbers, and will be converted to a scheme 3 database on June 29, 2019 and is suspected to increase on taking SCN numbers for whatever reason AND it has a database link to a scheme 1 database that remains scheme 1. That’s a lot of ifs.

On the other hand you only need one database to be high in SCN numbering which continues to take a lot of SCNs keeping it close to the soft limit, which will propagate its SCN to other databases if it is linked, or the required properties of the problem spread out over multiple linked databases.

Again, I do not advise anything in this article, the purpose here is to provide all the details that surround it so you can make the best decision for yourself.

This blogpost is a look into a bug in the wait interface that has been reported by me to Oracle a few times. I verified all versions from Oracle 11.2 version up to 18.2.0.0.180417 on Linux x86_64, in all these versions this bug is present. The bug is that the wait event ‘db file async I/O submit’ does not time anything when using ASM, only when using a filesystem, where this wait event essentially times the time the system call io_submit takes. All tests are done on Linux x86_64, Oracle Linux 7.4 with database and grid version 18.2.0.0.180417

So what?
You might have not seen this wait event before; that’s perfectly possible, because this wait event is unique to the database writer. So does this wait event matter?

When the Oracle datebase engine is set to using asynchronous I/O, and when it makes sense to use asynchronous I/O (!), the engine will use the combination of io_submit() to issue I/O requests to the operating system, and when needs to, fetch the I/O requests using io_getevents(). In general (so not consistently), the engine does not time io_submit, which is a non-blocking call, it only times when it needs to wait for I/O requests using io_getevents(), which is reported as a wait event in an IO wait event class. A lot of ‘%parallel%’ IO related wait events can time asynchronous IO calls.

So why would the engine then time io_submit() for the database writer?
Well, io_submit() is not a blocking call, UNLESS the device queue to which the requests are submitted is full. This means that the developers of the database writer code decided to implement a wait event for io_submit, which is not the case for any other process.

To understand why this makes sense, a little knowledge about database writer internals is necessary. When blocks are dirtied in the cache and these blocks are checkpointed later on, these must be written to disk. The amount of blocks to be written and therefore the number of writes can get high very quickly. The way this is processed is quite interesting (simplified obviously; and when using a filesystem):

a) the database writer picks up a batch of blocks needing writing, for up to 128 IO requests.
b) that batch is submitted, timed by ‘db file async I/O submit’
c) a blocking io_getevents call is issued, timed by ‘db file parallel write’, to wait for the IOs to finish. The interesting thing specifically for the database writer is that the minimal number of IOs ready to wait for is very low (a few IOs to 25-75% of the IOs if the amount gets bigger). Any finished IO will be picked up here, however it’s perfectly possible IOs are still active after this step. In fact, I think it’s deliberately made that way.
d) if any IO requests are still pending, a nonblocking, non-wait event timed io_getevents call is issued to pick up any finished IOs.
e) if any blocks still need writing for which no IO request have been submitted, go to a).
f) if at this point IO requests are still pending, to to c).

This means that the database writer can submit huge amounts of IO requests, and keep on doing that, much more than any other process, because it doesn’t need to wait for all IOs to finish. So, this means that if there is a process that is likely to run into a blocking io_submit call, it’s the database writer.

When using a database without ASM, the above wait timing is exactly what happens. A function call graph of io_submit for the database writer when the database uses a filesystem looks like this:

 | | | | | > kslwtbctx(0x7ffc55eb3e60, 0x8b4, ...)
 | | | | | | > sltrgftime64(0x6c2f4288, 0x6bbe3ca0, ...)
 | | | | | | | > clock_gettime@plt(0x1, 0x7ffc55eb3400, ...)
 | | | | | | | | > clock_gettime(0x1, 0x7ffc55eb3400, ...)
 | | | | | | | | < clock_gettime+0x00000000005d returns: 0
 | | | | | | | < clock_gettime+0x00000000003a returns: 0
 | | | | | |  kslwait_timeout_centi_to_micro(0x7fffffff, 0x19183e92, ...)
 | | | | | |  kskthbwt(0x19c37b0d8, 0xb3, ...)
 | | | | | |  kslwt_start_snapshot(0x6c2f5538, 0x6c2f5538, ...)
 | | | | | | < kslwt_start_snapshot+0x0000000000d0 returns: 0x6c2f4ae8
 | | | | |  ksfdgo(0x800, 0, ...)
 | | | | | | > ksfd_skgfqio(0x7fc304483f78, 0x9, ...)
 | | | | | | | > skgfqio(0x7fc3091fddc0, 0x7fc304483f78, ...)
 | | | | | | | | > skgfrvldtrq(0x7fc304483f78, 0x9, ...)
 | | | | | | | |  sltrgftime64(0x2000, 0x7fc3043772b0, ...)
 | | | | | | | | | > clock_gettime@plt(0x1, 0x7ffc55eae3b0, ...)
 | | | | | | | | | | > clock_gettime(0x1, 0x7ffc55eae3b0, ...)
 | | | | | | | | | | < clock_gettime+0x00000000005d returns: 0
 | | | | | | | | | < clock_gettime+0x00000000003a returns: 0
 | | | | | | | |  skgfr_lio_listio64(0x7fc3091fddc0, 0x1, ...)
 | | | | | | | | | > io_submit@plt(0x7fc302992000, 0x115, ...)
 | | | | | | | | | < io_submit+0x000000000007 returns: 0x115
 | | | | | | | | < skgfr_lio_listio64+0x000000000131 returns: 0
 | | | | | | | < skgfqio+0x00000000035e returns: 0
 | | | | | | < ksfd_skgfqio+0x0000000001f5 returns: 0
 | | | | |  kslwtectx(0x7ffc55eb3e60, 0x7fc304483f78, ...)
 | | | | | | > sltrgftime64(0x7ffc55eb3e60, 0x7fc304483f78, ...)
 | | | | | | | > clock_gettime@plt(0x1, 0x7ffc55eb33e0, ...)
 | | | | | | | | > clock_gettime(0x1, 0x7ffc55eb33e0, ...)
 | | | | | | | | < clock_gettime+0x00000000005d returns: 0
 | | | | | | | < clock_gettime+0x00000000003a returns: 0
 | | | | | |  kslwt_end_snapshot(0x6c2f5538, 0x6c2f5538, ...)
 | | | | | | | > kslwh_enter_waithist_int(0x6c2f5538, 0x6c2f5538, ...)
 | | | | | | |  kslwtrk_enter_wait_int(0x6c2f5538, 0x6c2f5538, ...)
 | | | | | | | < kslwtrk_enter_wait_int+0x000000000019 returns: 0x6bcaa180
 | | | | | |  kslwt_update_stats_int(0x6c2f5538, 0x6c2f5538, ...)
 | | | | | | | > kews_update_wait_time(0x9, 0x8f54, ...)
 | | | | | | |  ksucpu_wait_update(0x9, 0x8f54, ...)
 | | | | | | | < ksucpu_wait_update+0x000000000036 returns: 0x6bd658b0
 | | | | | |  kskthewt(0x19c38402c, 0xb3, ...)
 | | | | | | < kskthewt+0x0000000005b1 returns: 0x30
 | | | | |  select event#, name from v$event_name where event# = to_number('b3','xx');
    EVENT# NAME
---------- ----------------------------------------------------------------
       179 db file async I/O submit

Now on to the actual purpose of this blog post, the same situation, but now when ASM is used. When ASM is used, there is a significant increase in the call stack. This means more code is executed. This may sound strange at first, but it’s very logical if you give it some thought: when using ASM, the Oracle database is talking to raw devices. This means that any of the functionality a filesystem performs, which is implemented in ASM must in some way be performed. This is done in several additional layers in the database code.

Let’s look at a backtrace of io_submit of the database writer when using a filesystem:

#0  0x00007f22bdb36690 in io_submit () from /lib64/libaio.so.1
#1  0x0000000004832ef0 in skgfr_lio_listio64 ()
#2  0x000000001238b7ce in skgfqio ()
#3  0x0000000011d5c3ad in ksfd_skgfqio ()
#4  0x0000000011d57fce in ksfdgo ()
#5  0x0000000000d9f21c in ksfdaio ()
#6  0x00000000039c4a5e in kcfisio ()
#7  0x0000000001d836ec in kcbbdrv ()
#8  0x000000001222fac5 in ksb_act_run_int ()
#9  0x000000001222e792 in ksb_act_run ()
#10 0x0000000003b8b9ce in ksbabs ()
#11 0x0000000003baa161 in ksbrdp ()
#12 0x0000000003fbaed7 in opirip ()
#13 0x00000000026ecaa0 in opidrv ()
#14 0x00000000032904cf in sou2o ()
#15 0x0000000000d681cd in opimai_real ()
#16 0x000000000329d2a1 in ssthrdmain ()
#17 0x0000000000d680d3 in main ()

If you want to follow the call sequence, a backtrace/stacktrace must be read from the bottom up.
ksb = kernel service background processes
kcf = kernel cache file management
ksfd = kernel service functions disk IO
skgf = o/s dependent kernel generic fiile
I hope you recognise the logical layers that are necessary for doing the I/O.

Now look at a backtrace of io_submit of the database writer when using ASM:

#0  0x00007f22bdb36690 in io_submit () from /lib64/libaio.so.1
#1  0x0000000004832ef0 in skgfr_lio_listio64 ()
#2  0x000000001238b7ce in skgfqio ()
#3  0x0000000011d5c3ad in ksfd_skgfqio ()
#4  0x0000000011d57fce in ksfdgo ()
#5  0x0000000000d9f21c in ksfdaio ()
#6  0x000000000755c1a8 in kfk_ufs_async_io ()
#7  0x0000000001455fb2 in kfk_submit_io ()
#8  0x00000000014551a8 in kfk_io1 ()
#9  0x0000000001450b3e in kfk_transitIO ()
#10 0x000000000143c450 in kfioSubmitIO ()
#11 0x000000000143bbaa in kfioRequestPriv ()
#12 0x000000000143b160 in kfioRequest ()
#13 0x000000000136f6bd in ksfdafRequest ()
#14 0x000000000137311a in ksfdafGo ()
#15 0x0000000011d58179 in ksfdgo ()
#16 0x0000000000d9f269 in ksfdaio ()
#17 0x00000000039c4a5e in kcfisio ()
#18 0x0000000001d836ec in kcbbdrv ()
#19 0x000000001222fac5 in ksb_act_run_int ()
#20 0x000000001222e792 in ksb_act_run ()
#21 0x0000000003b8b9ce in ksbabs ()
#22 0x0000000003baa161 in ksbrdp ()
#23 0x0000000003fbaed7 in opirip ()
#24 0x00000000026ecaa0 in opidrv ()
#25 0x00000000032904cf in sou2o ()
#26 0x0000000000d681cd in opimai_real ()
#27 0x000000000329d2a1 in ssthrdmain ()
#28 0x0000000000d680d3 in main ()

Essentially, a couple of layers are added to facilitate ASM; ksfdaf, kfio, kfk.
So the logical sequence becomes:
ksb = kernel service background processes
kcf = kernel cache file management
ksfd = kernel service functions disk IO
ksfdaf = kernel service functions disk IO ASM files
kfio = kernel automatic storage management translation I/O layer
kfk = kernel automatic storage management KFK

ksfd = kernel service functions disk IO
skgf = o/s dependent kernel generic file

Now to give an overview of the function call sequence, I simply need to cut out a lot of functions because otherwise it would be unreadable.

 | | | | | > ksfdgo(0x806, 0x35b4, ...)
 | | | | | | > ksfdafGo(0x806, 0x35b4, ...)
 | | | | | | | > ksfdafRequest(0x7ffcc7d845a0, 0x10f, ...)
 | | | | | | | | > kfioRequest(0x7ffcc7d845a0, 0x10f, ...)
 | | | | | | | | | > _setjmp@plt(0x7ffcc7d821d8, 0x10f, ...)
 | | | | | | | | |  __sigsetjmp(0x7ffcc7d821d8, 0, ...)
 | | | | | | | | |  __sigjmp_save(0x7ffcc7d821d8, 0, ...)
 | | | | | | | | |  kfioRequestPriv(0x7ffcc7d845a0, 0x10f, ...)
...
 | | | | | | | | | | | | | | | | > ksfdgo(0x188, 0x35c5, ...)
 | | | | | | | | | | | | | | | | | > ksfd_skgfqio(0x7f4232709f78, 0x9, ...)
 | | | | | | | | | | | | | | | | | | > skgfqio(0x7f4237483dc0, 0x7f4232709f78, ...)
 | | | | | | | | | | | | | | | | | | | > skgfrvldtrq(0x7f4232709f78, 0x9, ...)
 | | | | | | | | | | | | | | | | | | |  sltrgftime64(0x2000, 0x7f4230b61c98, ...)
 | | | | | | | | | | | | | | | | | | | | > clock_gettime@plt(0x1, 0x7ffcc7d7bb10, ...)
 | | | | | | | | | | | | | | | | | | | | | > clock_gettime(0x1, 0x7ffcc7d7bb10, ...)
 | | | | | | | | | | | | | | | | | | | | | < clock_gettime+0x000000000059 returns: 0
 | | | | | | | | | | | | | | | | | | | | < clock_gettime+0x00000000003a returns: 0
 | | | | | | | | | | | | | | | | | | |  skgfr_lio_listio64(0x7f4237483dc0, 0x1, ...)
 | | | | | | | | | | | | | | | | | | | | > io_submit@plt(0x7f4230ad8000, 0x112, ...)
 | | | | | | | | | | | | | | | | | | | | < io_submit+0x000000000007 returns: 0x112
 | | | | | | | | | | | | | | | | | | | < skgfr_lio_listio64+0x000000000131 returns: 0
 | | | | | | | | | | | | | | | | | | < skgfqio+0x00000000035e returns: 0
 | | | | | | | | | | | | | | | | | < ksfd_skgfqio+0x0000000001f5 returns: 0
 | | | | | | | | | | | | | | | | < ksfdgo+0x000000000135 returns: 0
...
 | | | | | | | | | < kfioRequestPriv+0x000000000224 returns: 0
 | | | | | | | | < kfioRequest+0x000000000251 returns: 0
 | | | | | | | < ksfdafRequest+0x0000000003c8 returns: 0
 | | | | | | < ksfdafGo+0x000000000081 returns: 0x1
 | | | | |  kslwtbctx(0x7ffcc7d86f60, 0x7f4232709f38, ...)
 | | | | | | > sltrgftime64(0x6da39e68, 0x6d2f5bc0, ...)
 | | | | | | | > clock_gettime@plt(0x1, 0x7ffcc7d86500, ...)
 | | | | | | | | > clock_gettime(0x1, 0x7ffcc7d86500, ...)
 | | | | | | | | < clock_gettime+0x000000000059 returns: 0
 | | | | | | | < clock_gettime+0x00000000003a returns: 0
 | | | | | |  kslwait_timeout_centi_to_micro(0x7fffffff, 0x14cb3fcb, ...)
 | | | | | |  kskthbwt(0x2b0f3fe06, 0xb3, ...)
 | | | | | |  kslwt_start_snapshot(0x6da3b118, 0x6da3b118, ...)
 | | | | | | < kslwt_start_snapshot+0x0000000000d0 returns: 0x6da3a6c8
 | | | | |  ksfdgo(0x808, 0, ...)
 | | | | |  kslwtectx(0x7ffcc7d86f60, 0x9, ...)
 | | | | | | > sltrgftime64(0x7ffcc7d86f60, 0x9, ...)
 | | | | | | | > clock_gettime@plt(0x1, 0x7ffcc7d864e0, ...)
 | | | | | | | | > clock_gettime(0x1, 0x7ffcc7d864e0, ...)
 | | | | | | | | < clock_gettime+0x000000000059 returns: 0
 | | | | | | | < clock_gettime+0x00000000003a returns: 0
 | | | | | |  kslwt_end_snapshot(0x6da3b118, 0x6da3b118, ...)
 | | | | | | | > kslwh_enter_waithist_int(0x6da3b118, 0x6da3b118, ...)
 | | | | | | |  kslwtrk_enter_wait_int(0x6da3b118, 0x6da3b118, ...)
 | | | | | | | < kslwtrk_enter_wait_int+0x000000000019 returns: 0x6dacf1e8
 | | | | | |  kslwt_update_stats_int(0x6da3b118, 0x6da3b118, ...)
 | | | | | | | > kews_update_wait_time(0x9, 0xd02, ...)
 | | | | | | |  ksucpu_wait_update(0x9, 0xd02, ...)
 | | | | | | | < ksucpu_wait_update+0x000000000036 returns: 0x6db40f70
 | | | | | |  kskthewt(0x2b0f40b08, 0xb3, ...)
 | | | | | | < kskthewt+0x0000000005b1 returns: 0x30
 | | | | |  ksfdafCopyWaitCtx(0x7ffcc7d86f60, 0xb3, ...)
 | | | | | | > _intel_fast_memcpy(0x7ffcc7d86f60, 0x7f423270a848, ...)
 | | | | | |  _intel_fast_memcpy.P(0x7ffcc7d86f60, 0x7f423270a848, ...)
 | | | | | |  __intel_ssse3_rep_memcpy(0x7ffcc7d86f60, 0x7f423270a848, ...)
 | | | | | | < __intel_ssse3_rep_memcpy+0x00000000242e returns: 0x7ffcc7d86f60
 | | | | | < ksfdafCopyWaitCtx+0x000000000038 returns: 0x7ffcc7d86f60
 | | | | < ksfdaio+0x00000000055f returns: 0x7ffcc7d86f60
 | | |  oradebug setorapname dbw0
Oracle pid: 18, Unix process pid: 3617, image: oracle@o182-fs.local (DBW0)
SQL> oradebug event sql_trace wait=true
Statement processed.

Then go to the trace directory, and tail the database writer trace file.
Next, attach to the database writer with gdb, and break on the io_submit call and perform a sleep 1 (sleep for 1 second). This should add 1000000 microseconds to the waiting time, if the wait event includes the function we put the break on.

(gdb) break io_submit
Breakpoint 1 at 0x7f336b986690
(gdb) commands
Type commands for breakpoint(s) 1, one per line.
End with a line saying just "end".
>shell sleep 1
>c
>end

Now continue the database writer, and execute a checkpoint (alter system checkpoint), and look at the wait events:

WAIT #0: nam='db file async I/O submit' ela= 2 requests=11 interrupt=0 timeout=0 obj#=-1 tim=15801301770
WAIT #0: nam='db file parallel write' ela= 5077 requests=1 interrupt=0 timeout=2147483647 obj#=-1 tim=15801306930

Well, it’s clear nothing has timed the one second we added, right? (the time in the wait event is at ‘ela’, which is in microseconds)

For the sake of completeness, and to validate this test method, let’s add the sleep to io_getevents (io_getevents_0_4) to see if ‘db file parallel write’ does show the extra time we added in the system call, because ‘db file parallel write’ is supposed to time io_getevents():

(gdb) dis 1
(gdb) break io_getevents_0_4
Breakpoint 2 at 0x7f336b986650
(gdb) commands
Type commands for breakpoint(s) 2, one per line.
End with a line saying just "end".
>shell sleep 1
>c
>end

Continue the database writer again, and execute a checkpoint:

WAIT #0: nam='db file async I/O submit' ela= 1 requests=22 interrupt=0 timeout=0 obj#=-1 tim=15983030322
WAIT #0: nam='db file parallel write' ela= 1003978 requests=2 interrupt=0 timeout=2147483647 obj#=-1 tim=15984034336

Yay! There we got the artificial waiting time!

Based on this, I can only come the conclusion that the wait event ‘db file async I/O submit’ does not perform any actual timing of the io_submit system call when ASM is used with the Oracle database.

In a blogpost introducing the vagrant builder suite I explained what the suite could do, and the principal use, to automate the installation of the Oracle database software and the creation of a database on a virtual machine using vagrant together with ansible and virtual box.

This blogpost shows how to use that suite for automating the installation of the Oracle database software and the creation of a database on a linux server directly, with only the use of ansible without vagrant and virtualbox.

The suite has been updated with all the PSU’s up to current (180417; april 2018), for 11.2.0.4, 12.1.0.2 and 12.2.0.1, and now includes Oracle 18. Please mind for Oracle version 12.2 and 18 you need to provide the installation media, because these are not downloadable as patch. I really hate that Oracle is not providing the installation media for these downloadable in an automated way. This is a good opportunity for oracle to show it’s listening and provide a solution, especially because it wants to be a player in the developer/devops field.

The regular use if this suite with vagrant/ansible/virtualbox would be to clone the vagrant-builder repository, set the variables in the Vagrantfile, and issue ‘vagrant up’, which will fetch an O/S image from the vagrant cloud, create an extra disk for Oracle, create a VM specification in virtual box, boot up the VM, and then run ansible to do the full configuration of Linux for installing the Oracle database, install and patch the Oracle database software and create a database. Without any human intervention.

Currently, the local/direct install scripts support redhat (actually, a redhat version that manifests itself as ‘Red Hat Enterprise Linux’ or ‘RedHat’ in the ansible fact ‘ansible_distribution’) or oracle linux (‘OracleLinux’), version 7 only. You can get all the facts that ansible gathers on the current host by executing ‘ansible localhost -m setup’.

To use it natively on a linux system, you must first make sure the operating system provides a /u01 directory with enough space for the Oracle software install and the database. A rough indication would be more than 20G. I regularly use 40G, which allows me to use the database and create tablespaces. It doesn’t care if it’s only a directory, or if it’s a mount point. Actually the only thing that local_install.yml is doing with it, is changing the ownership of the /u01 directory to oracle once the user is created.

The first thing to do is install ansible and git. Ansible is using python, and python comes installed with all recent RHEL compatible distributions (as an example, ‘yum’ is using python too). In my opinion, installing additional python packages should not be a problem. If your environment is highly standardised and these kind of installs are not allowed, you should use the scripts in the “regular” ansible way, which I will detail at the end.

1. Install git
Oracle linux 7 has git in the base repository. I assume this means this is the same for RHEL 7. That means that when the server has the base repository or the latest repository or a company repository (for version controlling the packages), it can be simply done using yum install:

# yum install git

2. Install ansible
One way of installing ansible is using easy_install and pip. This will get you a recent ansible version. The simplest way would be using an EPEL repository, however, packages in the linux distribution repositories are not updated very frequently in my experience.

# yum install python-setuptools
# easy_install pip
# pip install ansible

3. Clone the vagrant-builder suite
This CLI example clones the vagrant-builder suite into the builder directory. You can name the directory any way you want. The best way of doing this is using a normal (meaning non-root) user, that has password-less unlimited sudo rights. Cloud environments come with such a user by default, for the oracle cloud this is ‘opc’, for amazon this is ‘ec2-user’ (and for vagrant boxes this is ‘vagrant’). The ansible script uses sudo to execute as root, creates the oracle user and then uses sudo to execute as oracle.

$ git clone https://gitlab.com/FritsHoogland/vagrant-builder.git builder

4. Run local_install.yml
Now the automatic installation components are all setup. The next thing to do is go into the builder/ansible directory, and edit the local_install.yml file:

---
- hosts: localhost
  become: true
  vars:
  - mosuser:
  - mospass:
  - oracle_base: /u01/app/oracle
  - database_name:
  - global_password: oracle
  - db_create_file_dest: /u01/app/oracle/oradata
  - database_version:
  - asm_version: ""
  - stage_directory: /u01/stage
...

Line 5/6: fill out MOS details. This is needed to download patches or installation media in patches.
Line 7: this is the default value for the ORACLE_BASE.
Line 8: you need a database name if you want to create a database. If you set it to empty (“”) no database will be created.
Line 9: this sets this password for all oracle database accounts.
Line 10: this variable sets the place for all the database files, db_create_file_dest.
Line 11: this sets the database software version to install. Ansible will determine what files to download. The Oracle 12.2 installation media must be placed in the builder/ansible/files directory. Look in the Vagrantfile for specifications and versions available.
Line 12: the grid software is currently not installed. The variable needs to be specified in order to have the facts setup. Facts are ansible variables.
Line 13: this is the directory in which all files are staged. It will be created at the beginning of a role, and removed at the end.

After this has been filled out, run it with a user that is allowed to execute sudo without specifying a password:

$ ansible-playbook local_install.yml

Ansible will read the playbook, and see that no inventory (list of hosts) is specified, and the hosts specifier is set to localhost, and then run locally.

Just like with its original use with vagrant, this will download all the necessary software directly from Oracle using the MOS credentials, with the exception of the installation media of Oracle 12.2.0.1 and Oracle 18.0.0.0, because these can not be downloaded via the CLI as far as I know, and therefore have to be provided in the ansible/files directory.

If you want to speed up the build, or do not have a good enough network, you can put the installation media in the files directory anyway, the ansible script will look in the files directory for it. The patches for performing patching (all MOS downloads are patches, of which some are actually the installation media) are always downloaded from MOS.

Using ansible non-local.
If you are not allowed to install additional software on the oracle server, there is another way. However, when you want to use this, it means you are asked to essentially manually install the oracle software. Regardless of what is allowed, you have to make changes to the operating system to facilitate running oracle anyway, and apparently the environment is not that automated, so that claim would be a bit silly.

The non-local way is running ansible in the way it is normally used, which is having a server with ansible installed, from which it uses ssh to run the ansible playbooks on (one or more) remote servers (which then only needs python installed on the remote servers, which EL6 and EL7 have by default). The local_install.yml script can be modified very easily to be used in this way: the hosts specification must be changed to ‘all’ or to a name given to a group of servers in the inventory file. This however is beyond the scope of the article.

However, if you need to perform database software installations and database creations regularly in an environment, it would make sense to use a centralised server to perform these actions, instead of setting it up on every distinct server.

Bonus material.
I added another script in the ansible directory, db_management.yml, which performs database creation and removal (and the install of slob). It follows the same pattern of local_install.yml, fill out the variables, and, unique for this script, uncomment the action you want it to perform, and run it.

Starting from Oracle 12, in a default configured database, there are more log writer processes than the well known ‘LGWR’ process itself, which are the ‘LGnn’ processes:

$ ps -ef | grep test | grep lg
oracle   18048     1  0 12:50 ?        00:00:13 ora_lgwr_test
oracle   18052     1  0 12:50 ?        00:00:06 ora_lg00_test
oracle   18056     1  0 12:50 ?        00:00:00 ora_lg01_test

These are the log writer worker processes, for which the minimal amount is equal to the amount public redo strands. Worker processes are assigned to a group, and the group is assigned to a public redo strand. The amount of worker processes in the group is dependent on the undocumented parameter “_max_log_write_parallelism”, which is one by default.

The actual usage of the worker processes is dependent in the first place on the value of the undocumented parameter “_use_single_log_writer”, for which the default value is ‘ADAPTIVE’, which means it’s switching automatically between ‘single log writer mode’, which is the traditional way of the LGWR process handling everything that the log writer functionality needs to do, and the ‘scalable log writer mode’, which means the log writer functionality is presumably using the log writer worker processes.

Other values for “_use_single_log_writer” are ‘TRUE’ to set ‘single log writer mode’, or ‘FALSE’ to set ‘scalable log writer mode’ fixed.

I assume most readers of this blog will know that the master log writer idle work cycle is sleeping on a semaphore (semtimedop()) under the wait event ‘rdbms ipc message’ for 3 seconds, then performs some “housekeeping”, after which it’ll sleep again repeating the small cycle of sleeping and housekeeping. For the log writer worker processes, this looks different if you look at the wait event information of the log writer worker processes:

135,59779,@1    14346                    DEDICATED oracle@memory-presentation.local (LGWR)	    time:1909.44ms,event:rdbms ipc message,seq#:292
48,34282,@1     14350                    DEDICATED oracle@memory-presentation.local (LG00)	    time:57561.85ms,event:LGWR worker group idle,seq#:150
136,24935,@1    14354                    DEDICATED oracle@memory-presentation.local (LG01)	    time:112785.66ms,event:LGWR worker group idle,seq#:74

The master log writer process (LGWR) has been sleeping for 1.9s when I queried the database, and it will sleep for 3 seconds, and then do some work and sleep again. However, the log writer worker processes have been sleeping for much longer: LG00 for 57.6s and LG01 for 112.8s, and the event is different: ‘LGWR worker group idle’. How is this implemented? Let’s look!

$ strace -p $(pgrep lg01)
strace: Process 14354 attached
semtimedop(360448, [{27, -1, 0}], 1, {3, 0}) = -1 EAGAIN (Resource temporarily unavailable)
semtimedop(360448, [{27, -1, 0}], 1, {3, 0}) = -1 EAGAIN (Resource temporarily unavailable)

I used strace on the LG01 process, and it’s still doing the same as most idle background processes are doing: sleeping on a semaphore for 3 seconds. But, it does not end its wait like LGWR does, the event the log writer worker processes are waiting in keeps on being timed.

Using a pin tools debugtrace shows the following:

 | | < semtimedop+0x000000000023 returns: 0xffffffffffffffff
 | | > __errno_location(0x38000, 0x7ffce278c328, ...)
 | | | > fthread_self(0x38000, 0x7ffce278c328, ...)
 | | | < fthread_self+0x000000000024 returns: 0
 | | < __errno_location+0x000000000010 returns: 0x7f7e930a26a0
 | < sskgpwwait+0x00000000014e returns: 0
 < skgpwwait+0x0000000000e0 returns: 0
 > ksuSdiInProgress(0x19e80, 0x19e80, ...)
 < ksuSdiInProgress+0x000000000035 returns: 0
 > sltrgftime64(0x19e80, 0x19e80, ...)
 | > clock_gettime@plt(0x1, 0x7ffce278c3a0, ...)
 | | > clock_gettime(0x1, 0x7ffce278c3a0, ...)
 | | < clock_gettime+0x000000000069 returns: 0
 | < clock_gettime+0x00000000003a returns: 0
 < sltrgftime64+0x00000000004c returns: 0x19c253f3ff
 > kslwo_getcbk(0xa2, 0xd80fa62, ...)
 < kslwo_getcbk+0x000000000017 returns: 0
 > kgslwait_last_waitctx_time_waited_usecs(0x7f7e930a29a0, 0x6dfd01c0, ...)
 < kgslwait_last_waitctx_time_waited_usecs+0x000000000045 returns: 0x25e5e80
 > kskiorm(0x6d1854a8, 0, ...)
 < kskiorm+0x00000000001e returns: 0
 > kfias_iswtgon_ksfd(0x6d1854a8, 0, ...)
 < kfias_iswtgon_ksfd+0x00000000002b returns: 0
 > kxdbio_has_work(0x7ffce278c3c4, 0x6003d010, ...)
 < kxdbio_has_work+0x000000000027 returns: 0
 > skgpwwait(0x7ffce278c630, 0x7f7e930a7ca0, ...)
 | > kslwait_conv_wait_time(0x2dc6c0, 0x7f7e930a7ca0, ...)
 | < kslwait_conv_wait_time+0x000000000027 returns: 0x2dc6c0
 | > sskgpwwait(0x7ffce278c630, 0x7f7e930a7ca0, ...)
 | | > semtimedop(0x38000, 0x7ffce278c328, ...)
 | | < semtimedop+0x000000000023 returns: 0xffffffffffffffff

And a full stack trace of a log writer worker look like this:

$ pstack $(pgrep lg01)
#0  0x00007feda8eaebda in semtimedop () at ../sysdeps/unix/syscall-template.S:81
#1  0x0000000010f9cca6 in sskgpwwait ()
#2  0x0000000010f9a2e8 in skgpwwait ()
#3  0x0000000010a66995 in ksliwat ()
#4  0x0000000010a65d25 in kslwaitctx ()
#5  0x00000000031fb4d0 in kcrfw_slave_queue_remove ()
#6  0x00000000031fad2a in kcrfw_slave_group_main ()
#7  0x00000000012160fa in ksvrdp_int ()
#8  0x000000000370d99a in opirip ()
#9  0x0000000001eb034a in opidrv ()
#10 0x0000000002afedf1 in sou2o ()
#11 0x0000000000d0547a in opimai_real ()
#12 0x0000000002b09b31 in ssthrdmain ()
#13 0x0000000000d05386 in main ()

If you combine the pstack backtrace and the debugtrace information, you see that the idle cycle does not leave the ‘ksliwat’ function, so the wait event is not finished. Quickly looking at the other functions, it’s easy to spot it reads the system clock (sltrgftime64), updates some information (kgslwait_last_waitctx_time_waited_usecs) and then performs some proactive IO checks (kskiorm, kfias_iswtgon_ksfd, kxdbio_has_work) after which it calls the post/wait based functions to setup the semaphore again.

Conclusion so far is the log writer workers do perform a 3 second sleep just like the master log writer, however the wait event ‘LGWR worker group idle’ is not interrupted like ‘rdbms ipc message’ is for the master log writer. This means the wait time for the event for each worker process indicates the last time the worker process actually performed something. A next logical question then is: but what do the log writer worker processes perform? Do they entirely take over the master log writer functionality, or do they work together with the master log writer?

In order to fully understand the next part, it is very beneficial to read up on how the log writer works in ‘single log writer’ mode, where the master log writer handling the idle and work cycle itself:
https://fritshoogland.wordpress.com/2018/02/20/a-look-into-into-oracle-redo-part-4-the-log-writer-null-write/
https://fritshoogland.wordpress.com/2018/02/27/a-look-into-oracle-redo-part-5-the-log-writer-writing/

If you want to perform this investigation yourself, make sure the database is in ‘scalable log writer’ mode, by setting “_use_single_log_writer” to FALSE. This is exactly what I did in order to make sure a log write is done in ‘scalable log writer’ mode.

Now let’s first apply some logic. Above the idle cycle of a log writer worker process is shown. Based on the ‘log writer null write’ blog post, we know that the log writer does advance the LWN and On-disk SCN every 3 seconds. Clearly, the log writer worker process does not do that. So that must mean the master log writer is still performing that function. It would also make very much sense, because it doesn’t matter for scalability if the master log writer performs the function of advancing the LWN and On-disk SCN or a worker process, nothing is waiting on it. Plus, if the master log writer performs most of its functions just like in ‘single log writer’ mode, the change to scalable mode would mean no change for client processes, any committing process must semop() the log writer to start writing.

Let’s look at the relevant debugtrace output of the master log writer in scalable log writer mode:

 | > kcrfw_redo_write_driver(0, 0, ...)
 | | > kcrfw_handle_member_write_errors(0, 0, ...)
 | | < kcrfw_handle_member_write_errors+0x000000000020 returns: 0x600161a0
 | | > kcmgtsf(0, 0, ...)
 | | | > sltrgatime64(0, 0, ...)
 | | | | > sltrgftime64(0, 0, ...)
 | | | | | > clock_gettime@plt(0x1, 0x7fff1fe13010, ...)
 | | | | | | > clock_gettime(0x1, 0x7fff1fe13010, ...)
 | | | | | | < clock_gettime+0x000000000069 returns: 0
 | | | | | < clock_gettime+0x00000000003a returns: 0
 | | | | < sltrgftime64+0x00000000004c returns: 0x53747fe42
 | | | < sltrgatime64+0x00000000003e returns: 0x155d4fd
 | | < kcmgtsf+0x00000000032f returns: 0x3a182314
 | | > kcrfw_slave_adaptive_updatemode(0, 0x600161a0, ...)
 | | < kcrfw_slave_adaptive_updatemode+0x000000000080 returns: 0x7efe34d1f760
 | | > kcrfw_defer_write(0, 0x600161a0, ...)
 | | < kcrfw_defer_write+0x000000000038 returns: 0x7efe34d1f760
 | | > kcrfw_slave_queue_find(0, 0x600161a0, ...)
 | | < kcrfw_slave_queue_find+0x0000000000f1 returns: 0
 | | > kcrfw_slave_queue_setpreparing(0, 0x1, ...)
 | | < kcrfw_slave_queue_setpreparing+0x000000000021 returns: 0
 | | > kcrfw_slave_group_switchpic(0, 0x1, ...)
 | | < kcrfw_slave_group_switchpic+0x000000000050 returns: 0x699b4508
 | | > skgstmGetEpochTs(0, 0x1, ...)
 | | | > gettimeofday@plt(0x7fff1fe13070, 0, ...)
 | | | < __vdso_gettimeofday+0x0000000000fe returns: 0
 | | < skgstmGetEpochTs+0x000000000049 returns: 0x20debfd6192e5
 | | > kcsnew3(0x600113b8, 0x7fff1fe13228, ...)
 | | | > kcsnew8(0x600113b8, 0x7fff1fe13070, ...)
 | | | | > kslgetl(0x60049800, 0x1, ...)
 | | | | < kslgetl+0x00000000012f returns: 0x1
 | | | | > kslfre(0x60049800, 0x1, ...)
 | | | | < kslfre+0x0000000001e2 returns: 0
 | | | < kcsnew8+0x000000000117 returns: 0
 | | | > ub8_to_kscn_impl(0x66c3c7, 0x7fff1fe13228, ...)
 | | | < ub8_to_kscn_impl+0x000000000031 returns: 0
 | | < kcsnew3+0x00000000006f returns: 0x8000
 | | > ktfwtsm(0x3a182314, 0x7fff1fe13228, ...)
 | | | > kcmgtsf(0x2, 0x7fff1fe13228, ...)
 | | | | > sltrgatime64(0x2, 0x7fff1fe13228, ...)
 | | | | | > sltrgftime64(0x2, 0x7fff1fe13228, ...)
 | | | | | | > clock_gettime@plt(0x1, 0x7fff1fe12fe0, ...)
 | | | | | | | > clock_gettime(0x1, 0x7fff1fe12fe0, ...)
 | | | | | | | < clock_gettime+0x000000000069 returns: 0
 | | | | | | < clock_gettime+0x00000000003a returns: 0
 | | | | | < sltrgftime64+0x00000000004c returns: 0x537484a6d
 | | | | < sltrgatime64+0x00000000003e returns: 0x155d511
 | | | < kcmgtsf+0x0000000001b2 returns: 0x3a182314
 | | | > kcmtdif(0x3a182314, 0x3a182314, ...)
 | | | < kcmtdif+0x00000000001b returns: 0
 | | | > ksl_get_shared_latch_int(0x60050340, 0x6ddb1408, ...)
 | | | < ksl_get_shared_latch_int+0x00000000016b returns: 0x1
 | | <> kslfre(0x60050340, 0x66c3c7, ...)
 | | < kslfre+0x0000000001e2 returns: 0
 | | > kcn_stm_write(0x7fff1fe13228, 0x66c3c7, ...)
 | | | > kstmgetsectick(0x7fff1fe13228, 0x66c3c7, ...)
 | | | < kstmgetsectick+0x00000000003a returns: 0x5ae4c494
 | | | > ksl_get_shared_latch_int(0x6004ee40, 0x6ddb1408, ...)
 | | | < ksl_get_shared_latch_int+0x00000000016b returns: 0x1
 | | <> kslfre(0x6004ee40, 0x2244, ...)
 | | < kslfre+0x0000000001e2 returns: 0
 | | > kcrfw_redo_write_initpic(0x699b4508, 0x7fff1fe13228, ...)
 | | | > kscn_to_ub8_impl(0x7fff1fe13228, 0x7fff1fe13228, ...)
 | | | < kscn_to_ub8_impl+0x00000000003e returns: 0x66c3c7
 | | < kcrfw_redo_write_initpic+0x0000000000dc returns: 0x3a182314
 | | > kscn_to_ub8_impl(0x7fff1fe13228, 0, ...)
 | | < kscn_to_ub8_impl+0x00000000003e returns: 0x66c3c7
 | | > kcrfw_gather_lwn(0x7fff1fe13268, 0x699b4508, ...)
 | | | > kslgetl(0x6abe4538, 0x1, ...)
 | | | < kslgetl+0x00000000012f returns: 0x1
 | | | > kcrfw_gather_strand(0x7fff1fe13268, 0, ...)
 | | | < kcrfw_gather_strand+0x0000000000c2 returns: 0
 | | | > kslfre(0x6abe4538, 0x17d5f, ...)
 | | | < kslfre+0x0000000001e2 returns: 0
 | | | > kslgetl(0x6abe45d8, 0x1, ...)
 | | | < kslgetl+0x00000000012f returns: 0x1
 | | | > kcrfw_gather_strand(0x7fff1fe13268, 0x1, ...)
 | | | < kcrfw_gather_strand+0x0000000000c2 returns: 0
 | | | > kslfre(0x6abe45d8, 0x137, ...)
 | | | < kslfre+0x0000000001e2 returns: 0
 | | < kcrfw_gather_lwn+0x00000000065c returns: 0xffffffff
 | | > krsh_trace(0x1000, 0x200, ...)
 | | < krsh_trace+0x00000000005d returns: 0
 | | > kspgip(0x71e, 0x1, ...)
 | | < kspgip+0x00000000023f returns: 0
 | | > kcrfw_slave_queue_setpreparing(0, 0, ...)
 | | < kcrfw_slave_queue_setpreparing+0x000000000021 returns: 0
 | | > kcrfw_slave_queue_flush_internal(0x1, 0, ...)
 | | < kcrfw_slave_queue_flush_internal+0x0000000000d7 returns: 0x1
 | | > kcrfw_do_null_write(0, 0, ...)
 | | | > kcrfw_slave_phase_batchdo(0, 0, ...)
 | | | | > kcrfw_slave_phase_enter(0, 0x9b, ...)
 | | | | < kcrfw_slave_phase_enter+0x000000000449 returns: 0
 | | | <> kcrfw_slave_phase_exit(0, 0x9b, ...)
 | | | < kcrfw_slave_phase_exit+0x00000000035a returns: 0
 | | | > kcrfw_post(0, 0, ...)
 | | | | > kcrfw_slave_single_getactivegroup(0, 0, ...)
 | | | | < kcrfw_slave_single_getactivegroup+0x000000000047 returns: 0x6a9a0718
 | | | | > kspGetInstType(0x1, 0x1, ...)
 | | | | | > vsnffe_internal(0x19, 0x1, ...)
 | | | | | | > vsnfprd(0x19, 0x1, ...)
 | | | | | | < vsnfprd+0x00000000000f returns: 0x8
 | | | | | | > kfIsASMOn(0x19, 0x1, ...)
 | | | | | | <> kfOsmInstanceSafe(0x19, 0x1, ...)
 | | | | | | < kfOsmInstanceSafe+0x000000000031 returns: 0
 | | | | | < vsnffe_internal+0x0000000000a7 returns: 0
 | | | | | > kspges(0x115, 0x1, ...)
 | | | | | < kspges+0x00000000010f returns: 0
 | | | | < kspGetInstType+0x0000000000b1 returns: 0x1
 | | | | > kcrfw_slave_phase_enter(0x1, 0x9b, ...)
 | | | | < kcrfw_slave_phase_enter+0x00000000006f returns: 0x9b
 | | | | > kcscu8(0x60016290, 0x7fff1fe12f98, ...)
 | | | | < kcscu8+0x000000000047 returns: 0x1
 | | | | > kcsaj8(0x60016290, 0x7fff1fe12f38, ...)
 | | | | < kcsaj8+0x0000000000dc returns: 0x1
 | | | | > kcrfw_slave_phase_exit(0x1, 0x9b, ...)
 | | | | < kcrfw_slave_phase_exit+0x00000000008e returns: 0
 | | | | > kslpsemf(0x97, 0, ...)
 | | | | | > ksl_postm_init(0x7fff1fe0ac30, 0x7fff1fe12c50, ...)
 | | | | | < ksl_postm_init+0x00000000002b returns: 0
 | | | | < kslpsemf+0x0000000006b5 returns: 0x1f
 | | | | > kcrfw_slave_barrier_nonmasterwait(0x6a9a0720, 0x4, ...)
 | | | | < kcrfw_slave_barrier_nonmasterwait+0x000000000035 returns: 0x600161a0
 | | | < kcrfw_post+0x000000000c1c returns: 0xd3
 | | < kcrfw_do_null_write+0x0000000000b2 returns: 0xd3
 | < kcrfw_redo_write_driver+0x000000000535 returns: 0xd3

The highlighted functions are extra functions executed when the instance is set to scalable log writer mode, or when adaptive mode has set the instance to scalable log writer mode. This means that the changes between the modes is minimal when there’s no writes, and outside of a few extra functions, the log writer does exactly the same.

The absence of any spectacular changes in the behaviour of the log writer when in scalable log writer mode when there are no writes does hint what the actual changes will be of the scalable mode, which is how writing is handled. In single log writer mode, the most time the log writer is process is likely to spend on is writing the change vectors into the online redologfiles, and maybe, if you have a bad application (!) semop()-ing foreground sessions will be second, if there are a large number of processes committing, because every process needs to be semop()-ed individually. These two functions, along with some other functionality are exactly what the log writer worker processes are doing.

This means that foreground processes do nothing different in scalable log writer mode, they signal (semop) the master log writer, which will investigate the public redo strands, and if the master log writer finds change vectors to write, it will assign log writer worker processes to perform the write, and the log writer worker process will semop() the foreground sessions to indicate the redo has been written when the instance is in post/wait mode, or do not semop() when the instance is in polling mode.

This is the entire function flow of a write when the instance is in scalable log writer mode:

 | > kcrfw_slave_queue_insert(0, 0xd3, ...)
 | | > kcrfw_slave_group_setcurrsize(0, 0, ...)
 | | < kcrfw_slave_group_setcurrsize+0x0000000001d1 returns: 0x1
 | | > _intel_fast_memcpy(0x6a9a05f8, 0x7ffdae335fa0, ...)
 | | <> _intel_fast_memcpy.P(0x6a9a05f8, 0x7ffdae335fa0, ...)
 | | <> __intel_ssse3_rep_memcpy(0x6a9a05f8, 0x7ffdae335fa0, ...)
 | | < __intel_ssse3_rep_memcpy+0x000000002798 returns: 0x6a9a05f8
 | | > kcrfw_slave_group_postall(0, 0xf0, ...)
 | | | > ksvgcls(0, 0xf0, ...)
 | | | < ksvgcls+0x000000000021 returns: 0
 | | | > ksl_post_proc(0x6ddb32f0, 0, ...)
 | | | <> kskpthr(0x6ddb32f0, 0, ...)
 | | | <> kslpsprns(0x6ddb32f0, 0, ...)
 | | | | > ksl_update_post_stats(0x6ddb32f0, 0, ...)
 | | | | | > dbgtTrcData_int(0x7f464c0676c0, 0x2050031, ...)
 | | | | | | > dbgtBucketRedirect(0x7f464c0676c0, 0x7ffdae335338, ...)
 | | | | | | < dbgtBucketRedirect+0x000000000050 returns: 0x1
 | | | | | | > dbgtIncInMemTrcRedirect(0x7f464c0676c0, 0x6fa, ...)
 | | | | | | < dbgtIncInMemTrcRedirect+0x000000000035 returns: 0x1
 | | | | | | > skgstmGetEpochTs(0x7f464c0676c0, 0x6fa, ...)
 | | | | | | | > gettimeofday@plt(0x7ffdae334e40, 0, ...)
 | | | | | | | < __vdso_gettimeofday+0x0000000000fe returns: 0
 | | | | | | < skgstmGetEpochTs+0x000000000049 returns: 0x20e067375b55d
 | | | | | | > dbgtrRecAllocate(0x7f464c0676c0, 0x7ffdae3352e0, ...)
 | | | | | | | > dbgtrPrepareWrite(0x7f464c0676c0, 0x65accba0, ...)
 | | | | | | | < dbgtrPrepareWrite+0x00000000011c returns: 0x4
 | | | | | | < dbgtrRecAllocate+0x000000000144 returns: 0x1
 | | | | | | > _intel_fast_memcpy(0x65acda30, 0x7ffdae3353d8, ...)
 | | | | | | <> _intel_fast_memcpy.P(0x65acda30, 0x7ffdae3353d8, ...)
 | | | | | | <> __intel_ssse3_rep_memcpy(0x65acda30, 0x7ffdae3353d8, ...)
 | | | | | | < __intel_ssse3_rep_memcpy+0x000000002030 returns: 0x65acda30
 | | | | | | > dbgtrRecEndSegment(0x7f464c0676c0, 0x7ffdae3352e0, ...)
 | | | | | | < dbgtrRecEndSegment+0x00000000011c returns: 0x77c000a4
 | | | | | < dbgtTrcData_int+0x000000000323 returns: 0x77c000a4
 | | | | < ksl_update_post_stats+0x00000000024f returns: 0x77c000a4
 | | | | > skgpwpost(0x7ffdae335480, 0x7f464c0acca0, ...)
 | | | | <> sskgpwpost(0x7ffdae335480, 0x7f464c0acca0, ...)
 | | | | | > semop@plt(0xc0000, 0x7ffdae335410, ...)
 | | | | | < semop+0x00000000000f returns: 0
 | | | | < sskgpwpost+0x00000000009a returns: 0x1
 | | | < kslpsprns+0x0000000001c3 returns: 0
 | | < kcrfw_slave_group_postall+0x0000000000a8 returns: 0
 | < kcrfw_slave_queue_insert+0x0000000001b6 returns: 0x667bc540

After the instance has established there are change vectors in kcrfw_gather_lwn, in single log writer mode, the function kcrfw_redo_write is called, which will call kcrfw_do_write which handles the writing, and kslpslf to semop any waiting processes among other things. Now in scalable log writer mode, kcrfw_slave_queue_insert is called which assigns work to worker processes, and then kcrfw_slave_group_postall is called to semop one or more worker processes.

The worker processes are sleeping on a semaphore, and if a process gets signalled, it exits the kcrfw_slave_queue_remove function, ends the wait event, and calls kcrfw_redo_write, just like the master log writer process would call in single log writer mode, which includes doing the write (kcrfw_do_write) and posting the foregrounds (kslpslf), exactly all the functions.

Conclusion.
The adaptive scalable log writer processes function has been silently introduced with Oracle 12, although a lot of the used functionality has been available more or less in earlier versions. It is a fully automatic feature which will turn itself on and off based on heuristics. The purpose of this article is to explain how it works and what it is doing. Essentially, all the functionality that surrounds a log writer write has been moved to a worker process, which means the work can be done in parallel with multiple processes, whilst all the work outside of the work around the write, which is not performance critical, is left with the master log writer.

%d bloggers like this: