Archive

Tag Archives: oracle linux performance gdb IO wait events

Recently I was discussing some IO related waits with some friends. The wait I was discussing was ‘kfk: async disk IO’. This wait was always visible in Oracle version 11.2.0.1 and seems to be gone in version 11.2.0.2 and above. Here is the result of some investigation into that.

First: the wait is not gone with version 11.2.0.2 and above, which is very simple to prove (this is a database version 11.2.0.3):

SYS@v11203 AS SYSDBA> select name, parameter1, parameter2, parameter3, wait_class from v$event_name where name like 'kfk: async disk IO';

NAME		     PARAMETER1 PARAMETER2 PARAMETER3 WAIT_CLASS
-------------------- ---------- ---------- ---------- ----------------------------------------------------------------
kfk: async disk IO   count	intr	   timeout    System I/O

What is interesting, is that the wait class is ‘System I/O’. I don’t know the official definition of the wait class ‘System I/O’, but it tells me that it is something a background process is waiting for, not my foreground process. But I could be wrong…

Let’s look at an excerpt of a tracefile of a asynchronous, full segment scan in Oracle version 11.2.0.1. This host is running in a VM on OL6.3 X64:

PARSING IN CURSOR #3 len=23 dep=0 uid=85 oct=3 lid=85 tim=1356620409310181 hv=1020534364 ad='7f1b14a0' sqlid='94dwfa8yd87kw'
select count(*) from t2
END OF STMT
PARSE #3:c=0,e=120,p=0,cr=0,cu=0,mis=0,r=0,dep=0,og=1,plh=3321871023,tim=1356620409310180
EXEC #3:c=0,e=63,p=0,cr=0,cu=0,mis=0,r=0,dep=0,og=1,plh=3321871023,tim=1356620409310297
WAIT #3: nam='SQL*Net message to client' ela= 2 driver id=1413697536 #bytes=1 p3=0 obj#=-1 tim=1356620409310351
WAIT #3: nam='Disk file operations I/O' ela= 9 FileOperation=2 fileno=5 filetype=2 obj#=-1 tim=1356620409311203
WAIT #3: nam='Disk file operations I/O' ela= 238 FileOperation=2 fileno=0 filetype=15 obj#=73426 tim=1356620409312218
WAIT #3: nam='kfk: async disk IO' ela= 14 count=1 intr=0 timeout=4294967295 obj#=73426 tim=1356620409313703
WAIT #3: nam='kfk: async disk IO' ela= 6 count=1 intr=0 timeout=4294967295 obj#=73426 tim=1356620409314449
WAIT #3: nam='kfk: async disk IO' ela= 6 count=1 intr=0 timeout=4294967295 obj#=73426 tim=1356620409315169
WAIT #3: nam='kfk: async disk IO' ela= 5 count=1 intr=0 timeout=4294967295 obj#=73426 tim=1356620409315850
WAIT #3: nam='Disk file operations I/O' ela= 42 FileOperation=2 fileno=0 filetype=15 obj#=73426 tim=1356620409316451
WAIT #3: nam='kfk: async disk IO' ela= 33 count=1 intr=0 timeout=4294967295 obj#=73426 tim=1356620409317903
WAIT #3: nam='kfk: async disk IO' ela= 403 count=1 intr=0 timeout=4294967295 obj#=73426 tim=1356620409320529
WAIT #3: nam='kfk: async disk IO' ela= 6 count=1 intr=0 timeout=4294967295 obj#=73426 tim=1356620409321950
WAIT #3: nam='kfk: async disk IO' ela= 7 count=1 intr=0 timeout=4294967295 obj#=73426 tim=1356620409323627
WAIT #3: nam='kfk: async disk IO' ela= 7 count=1 intr=0 timeout=4294967295 obj#=73426 tim=1356620409325424
WAIT #3: nam='kfk: async disk IO' ela= 6 count=1 intr=0 timeout=4294967295 obj#=73426 tim=1356620409327121
WAIT #3: nam='kfk: async disk IO' ela= 7 count=1 intr=0 timeout=4294967295 obj#=73426 tim=1356620409329153
WAIT #3: nam='kfk: async disk IO' ela= 7 count=1 intr=0 timeout=4294967295 obj#=73426 tim=1356620409330861
WAIT #3: nam='kfk: async disk IO' ela= 8 count=1 intr=0 timeout=4294967295 obj#=73426 tim=1356620409332534
WAIT #3: nam='kfk: async disk IO' ela= 6 count=1 intr=0 timeout=4294967295 obj#=73426 tim=1356620409334179

What is quit clear to see, is the database is doing some stuff which is instrumented in the wait interface (‘Disk file operations I/O’), but the event ‘kfk: async disk IO’ is much occurring. It is NOT taking an enormous amount of time (ela < 10 (microseconds) most of the time).

Let's dig a little deeper. Most events are instrumentation for calls to the operating system to do something. Let's take the linux 'strace' utility, and list the waits together with the system calls. This is done with the 'strace -e write=all -p PID’, here is a fragment:

io_submit(139892671860736, 1, {{0x7f3b4b5851e0, 0, 0, 0, 11}}) = 1
io_submit(139892671860736, 1, {{0x7f3b4b585420, 0, 0, 0, 12}}) = 1
io_getevents(139892671860736, 2, 128, {{0x7f3b4b5851e0, 0x7f3b4b5851e0, 122880, 0}}, {0, 0}) = 1
write(9, "WAIT #3: nam='kfk: async disk IO"..., 107) = 107
 | 00000  57 41 49 54 20 23 33 3a  20 6e 61 6d 3d 27 6b 66  WAIT #3:  nam='kf |
 | 00010  6b 3a 20 61 73 79 6e 63  20 64 69 73 6b 20 49 4f  k: async  disk IO |
 | 00020  27 20 65 6c 61 3d 20 35  33 20 63 6f 75 6e 74 3d  ' ela= 5 3 count= |
 | 00030  31 20 69 6e 74 72 3d 30  20 74 69 6d 65 6f 75 74  1 intr=0  timeout |
 | 00040  3d 34 32 39 34 39 36 37  32 39 35 20 6f 62 6a 23  =4294967 295 obj# |
 | 00050  3d 37 33 34 32 36 20 74  69 6d 3d 31 33 35 36 36  =73426 t im=13566 |
 | 00060  32 31 31 31 30 32 35 38  30 31 34                 21110258 014      |
write(9, "\n", 1)                       = 1
 | 00000  0a                                                .                 |
io_getevents(139892671860736, 1, 128, {{0x7f3b4b585420, 0x7f3b4b585420, 1032192, 0}}, {0, 0}) = 1
io_submit(139892671860736, 1, {{0x7f3b4b5851e0, 0, 0, 0, 11}}) = 1
write(9, "WAIT #3: nam='kfk: async disk IO"..., 106) = 106
 | 00000  57 41 49 54 20 23 33 3a  20 6e 61 6d 3d 27 6b 66  WAIT #3:  nam='kf |
 | 00010  6b 3a 20 61 73 79 6e 63  20 64 69 73 6b 20 49 4f  k: async  disk IO |
 | 00020  27 20 65 6c 61 3d 20 36  20 63 6f 75 6e 74 3d 31  ' ela= 6  count=1 |
 | 00030  20 69 6e 74 72 3d 30 20  74 69 6d 65 6f 75 74 3d   intr=0  timeout= |
 | 00040  34 32 39 34 39 36 37 32  39 35 20 6f 62 6a 23 3d  42949672 95 obj#= |
 | 00050  37 33 34 32 36 20 74 69  6d 3d 31 33 35 36 36 32  73426 ti m=135662 |
 | 00060  31 31 31 30 32 35 39 37  35 39                    11102597 59       |
write(9, "\n", 1)                       = 1
 | 00000  0a                                                .                 |
io_getevents(139892671860736, 1, 128, {{0x7f3b4b5851e0, 0x7f3b4b5851e0, 1032192, 0}}, {0, 0}) = 1
io_submit(139892671860736, 1, {{0x7f3b4b585420, 0, 0, 0, 12}}) = 1
write(9, "WAIT #3: nam='kfk: async disk IO"..., 106) = 106
 | 00000  57 41 49 54 20 23 33 3a  20 6e 61 6d 3d 27 6b 66  WAIT #3:  nam='kf |
 | 00010  6b 3a 20 61 73 79 6e 63  20 64 69 73 6b 20 49 4f  k: async  disk IO |
 | 00020  27 20 65 6c 61 3d 20 39  20 63 6f 75 6e 74 3d 31  ' ela= 9  count=1 |
 | 00030  20 69 6e 74 72 3d 30 20  74 69 6d 65 6f 75 74 3d   intr=0  timeout= |
 | 00040  34 32 39 34 39 36 37 32  39 35 20 6f 62 6a 23 3d  42949672 95 obj#= |
 | 00050  37 33 34 32 36 20 74 69  6d 3d 31 33 35 36 36 32  73426 ti m=135662 |
 | 00060  31 31 31 30 32 36 32 31  37 38                    11102621 78       |

What is interesting here, is that the event ‘kfk: async disk IO’ is reported after both the io_getevents() and the io_submit() call. (with the io_submit call an IO request or IO requests are dispatched to the operating system, with the io_getevents call the operating system (AIO) completion queue is examined to see if any requests are ready). So we can conclude that ‘System I/O’ is not appropriate (by my definition), this is something my foreground process apparently is waiting for. We can also conclude that this event is related to IO. But what does it mean?

In order to dig deeper, we need to know what the database is doing internally. The most obvious way is to read the source. But, it’s not possible to get the Oracle database source if your not working for Oracle (and probably explicitly have been granted access). There are a number of ways to see more. I wrote about ‘perf’, but also the gdb (gnu debugger) can be used.

In order to debug a running Oracle database process, you can attach to a process with the debugger (as root) with ‘gdb -p PID’. An equivalent of using trace and looking for io_submit() and io_getevents() calls is:

# gdb -p 14122
GNU gdb (GDB) Red Hat Enterprise Linux (7.2-56.el6)
...
0x0000003f38a0e530 in __read_nocancel () at ../sysdeps/unix/syscall-template.S:82
82	T_PSEUDO (SYSCALL_SYMBOL, SYSCALL_NAME, SYSCALL_NARGS)
(gdb) 

Now enter the following commands (if you go into gdb yourself, it becomes apparent which are the commands):

(gdb) set pagination off
(gdb) break io_submit
Breakpoint 1 at 0x3f38200660: file io_submit.c, line 23.
(gdb) commands
Type commands for breakpoint(s) 1, one per line.
End with a line saying just "end".
>silent
>frame
>continue
>end
(gdb) break 'io_getevents@plt'
Breakpoint 2 at 0x9d1a58
(gdb) commands
Type commands for breakpoint(s) 2, one per line.
End with a line saying just "end".
>silent
>frame
>continue
>end
(gdb) continue
Continuing.

And run the full segment scan again:

#0  0x0000003f38200660 in io_submit () from /lib64/libaio.so.1
#0  0x0000003f38200660 in io_submit () from /lib64/libaio.so.1
#0  0x00000000009d1a58 in io_getevents@plt ()
#0  0x0000003f38200660 in io_submit () from /lib64/libaio.so.1
#0  0x0000003f38200660 in io_submit () from /lib64/libaio.so.1
#0  0x00000000009d1a58 in io_getevents@plt ()
#0  0x0000003f38200660 in io_submit () from /lib64/libaio.so.1
#0  0x0000003f38200660 in io_submit () from /lib64/libaio.so.1
#0  0x00000000009d1a58 in io_getevents@plt ()
#0  0x0000003f38200660 in io_submit () from /lib64/libaio.so.1
#0  0x0000003f38200660 in io_submit () from /lib64/libaio.so.1
#0  0x00000000009d1a58 in io_getevents@plt ()
#0  0x0000003f38200660 in io_submit () from /lib64/libaio.so.1
#0  0x0000003f38200660 in io_submit () from /lib64/libaio.so.1
#0  0x00000000009d1a58 in io_getevents@plt ()

Now we need to know the function call which Oracle uses for the registering waits, for which we used the tracefile generated by SQL trace earlier. The first line in the tracefile of our current session if we issue the full scan again will be the ending of the wait ‘SQL*Net message from client’. The call which Oracle uses for writing this line is write(). Let’s clean up the current breakpoints, and set a breakpoint for the write call (stop the debugger’s current action using CTRL-c):

(gdb) delete
Delete all breakpoints? (y or n) y
(gdb) break write
Breakpoint 3 at 0x3f38a0e4c0: file ../sysdeps/unix/syscall-template.S, line 82. (3 locations)
(gdb) c
Continuing.

(‘c’ is an abbreviation of ‘continue’)
And run the scan again.
Because of the breakpoint set, gdb will break execution when the write() call is issued:

Breakpoint 3, write () at ../sysdeps/unix/syscall-template.S:82
82	T_PSEUDO (SYSCALL_SYMBOL, SYSCALL_NAME, SYSCALL_NARGS)

And generate a backtrace (a list of all the functions which where called to reach the write() call), with the ‘backtrace’ command (‘bt’):

(gdb) bt
#0  write () at ../sysdeps/unix/syscall-template.S:82
#1  0x00000000086014dd in sdbgrfuwf_write_file ()
#2  0x0000000008601426 in sdbgrfwf_write_file ()
#3  0x00000000085caafb in dbgtfdFileWrite ()
#4  0x00000000085ca685 in dbgtfdFileAccessCbk ()
#5  0x00000000085c9d6d in dbgtfPutStr ()
#6  0x0000000001a8d93a in dbktWriteTimestamp ()
#7  0x0000000001a8d6b5 in __PGOSF61_dbktWriteNfy ()
#8  0x00000000085caa06 in dbgtfdFileWrite ()
#9  0x00000000085ca685 in dbgtfdFileAccessCbk ()
#10 0x00000000085c9d6d in dbgtfPutStr ()
#11 0x0000000001a8e291 in dbktPri ()
#12 0x00000000009f143e in ksdwrf ()
#13 0x00000000009f1d8f in ksdwrfn ()
#14 0x0000000005abf0e1 in kxstTraceWait ()
#15 0x000000000821e88d in kslwtectx ()
#16 0x00000000083c2a79 in __PGOSF24_opikndf2 ()
#17 0x000000000143b790 in opitsk ()
#18 0x00000000014406da in opiino ()
#19 0x00000000083c54fd in opiodr ()
#20 0x0000000001437b60 in opidrv ()
#21 0x00000000018aac97 in sou2o ()
#22 0x00000000009d3ef1 in opimai_real ()
#23 0x00000000018aff26 in ssthrdmain ()
#24 0x00000000009d3e5d in main ()

What is interesting to see here, is the function at number 15, kslwtectx(). We know this write reports the ending of waiting at ‘SQL*Net message from client’. This function is ‘Kernel Service Layer WaiT End ConTeXt’ (educated guess; I’ve had a little help from Tanel Poder to find the Oracle wait interface functions). The function which marks the start of a wait is called ‘kslwtbctx’, and the function which ends a wait is called ‘kslwtectx’. Let’s put a break and continue on io_submit, io_getevents, kslwtbctx and kslwtectx:

(gdb) delete
Delete all breakpoints? (y or n) y
(gdb) break io_submit
Breakpoint 4 at 0x3f38200660
(gdb) commands
Type commands for breakpoint(s) 4, one per line.
End with a line saying just "end".
>silent
>f
>c
>end
(gdb) break 'io_getevents@plt'
Breakpoint 5 at 0x9d1a58
(gdb) commands
Type commands for breakpoint(s) 5, one per line.
End with a line saying just "end".
>silent
>f
>c
>end
(gdb) break kslwtbctx
Breakpoint 6 at 0x8217fda
(gdb) commands
Type commands for breakpoint(s) 6, one per line.
End with a line saying just "end".
>silent
>f
>c
>end
(gdb) break kslwtectx
Breakpoint 7 at 0x821e3d8
(gdb) commands
Type commands for breakpoint(s) 7, one per line.
End with a line saying just "end".
>silent
>f
>c
>end
(gdb) c
Continuing.

Now run the full scan again; this is what is seen in the debugger:

#0  0x000000000821e3d8 in kslwtectx ()
#0  0x0000003f38200660 in io_submit () from /lib64/libaio.so.1
#0  0x0000003f38200660 in io_submit () from /lib64/libaio.so.1
#0  0x0000000008217fda in kslwtbctx ()
#0  0x00000000009d1a58 in io_getevents@plt ()
#0  0x000000000821e3d8 in kslwtectx ()
#0  0x0000003f38200660 in io_submit () from /lib64/libaio.so.1
#0  0x0000003f38200660 in io_submit () from /lib64/libaio.so.1
#0  0x0000000008217fda in kslwtbctx ()
#0  0x00000000009d1a58 in io_getevents@plt ()
#0  0x000000000821e3d8 in kslwtectx ()
#0  0x0000003f38200660 in io_submit () from /lib64/libaio.so.1
#0  0x0000003f38200660 in io_submit () from /lib64/libaio.so.1
#0  0x0000000008217fda in kslwtbctx ()
#0  0x00000000009d1a58 in io_getevents@plt ()
#0  0x000000000821e3d8 in kslwtectx ()

This goes on, until the IO calls end, and the foreground does some other stuff at the end, ending with the beginning of waiting on ‘SQL*Net message from client’:

#0  0x0000003f38200660 in io_submit () from /lib64/libaio.so.1
#0  0x0000003f38200660 in io_submit () from /lib64/libaio.so.1
#0  0x0000000008217fda in kslwtbctx ()
#0  0x00000000009d1a58 in io_getevents@plt ()
#0  0x000000000821e3d8 in kslwtectx ()
#0  0x0000003f38200660 in io_submit () from /lib64/libaio.so.1
#0  0x0000003f38200660 in io_submit () from /lib64/libaio.so.1
#0  0x0000000008217fda in kslwtbctx ()
#0  0x00000000009d1a58 in io_getevents@plt ()
#0  0x000000000821e3d8 in kslwtectx ()
#0  0x0000000008217fda in kslwtbctx ()
#0  0x000000000821e3d8 in kslwtectx ()
#0  0x0000000008217fda in kslwtbctx ()
#0  0x000000000821e3d8 in kslwtectx ()
#0  0x0000000008217fda in kslwtbctx ()
#0  0x000000000821e3d8 in kslwtectx ()
#0  0x0000000008217fda in kslwtbctx ()
#0  0x000000000821e3d8 in kslwtectx ()
#0  0x0000000008217fda in kslwtbctx ()
#0  0x000000000821e3d8 in kslwtectx ()
#0  0x0000000008217fda in kslwtbctx ()

Okay, back to ‘kfk: async disk IO’. We see the ‘io_getevents()’ call which ALWAYS has a kslwtbctx() call before, and always a kslwtectx() call after the io_getevents() call itself. Please mind we are on 11.2.0.1. This strongly suggests ‘kfk: async disk IO’ is the instrumentation of the io_getevents() call. Or io_getevents() is part of this wait event.

Let’s make the io_getevents() call take longer, and see if that is reflected in the wait, to prove the hypothesis made above. This can be done using the ‘catch’ function of the debugger:

(gdb) delete
Delete all breakpoints? (y or n) y
(gdb) catch syscall io_getevents
Catchpoint 8 (syscall 'io_getevents' [208])
(gdb) commands
Type commands for breakpoint(s) 8, one per line.
End with a line saying just "end".
>silent
>shell sleep 0.01
>c
>end
(gdb) c
Continuing.

Please mind it’s ‘io_getevents’ here instead of ‘io_getevents@plt’, because the catch function looks the syscall up in an header file, instead of the symbol table in the executable.

Now run the full scan again, with SQL trace turned on at level 8 to show the waits. Previously we saw waits with an elapsed time mostly lesser than 10ms. here is a snippet from the trace:

PARSING IN CURSOR #3 len=23 dep=0 uid=85 oct=3 lid=85 tim=1356626115353145 hv=1020534364 ad='7f1b14a0' sqlid='94dwfa8yd87kw'
select count(*) from t2
END OF STMT
PARSE #3:c=0,e=203,p=0,cr=0,cu=0,mis=0,r=0,dep=0,og=1,plh=3321871023,tim=1356626115353116
EXEC #3:c=1000,e=80,p=0,cr=0,cu=0,mis=0,r=0,dep=0,og=1,plh=3321871023,tim=1356626115355176
WAIT #3: nam='SQL*Net message to client' ela= 30 driver id=1413697536 #bytes=1 p3=0 obj#=73426 tim=1356626115355832
WAIT #3: nam='kfk: async disk IO' ela= 26404 count=1 intr=0 timeout=4294967295 obj#=73426 tim=1356626115384177
WAIT #3: nam='kfk: async disk IO' ela= 25050 count=1 intr=0 timeout=4294967295 obj#=73426 tim=1356626115411677
WAIT #3: nam='kfk: async disk IO' ela= 24885 count=1 intr=0 timeout=4294967295 obj#=73426 tim=1356626115437990
WAIT #3: nam='kfk: async disk IO' ela= 25002 count=1 intr=0 timeout=4294967295 obj#=73426 tim=1356626115464244

*** 2012-12-27 17:35:15.492
WAIT #3: nam='kfk: async disk IO' ela= 25192 count=1 intr=0 timeout=4294967295 obj#=73426 tim=1356626115492891
WAIT #3: nam='kfk: async disk IO' ela= 25175 count=1 intr=0 timeout=4294967295 obj#=73426 tim=1356626115521419
WAIT #3: nam='kfk: async disk IO' ela= 24804 count=1 intr=0 timeout=4294967295 obj#=73426 tim=1356626115549124
WAIT #3: nam='kfk: async disk IO' ela= 24980 count=1 intr=0 timeout=4294967295 obj#=73426 tim=1356626115576955
WAIT #3: nam='kfk: async disk IO' ela= 24988 count=1 intr=0 timeout=4294967295 obj#=73426 tim=1356626115604879

A little explanation here: the debugger triggers the syscall catcher on entrance and leaving of the syscall. The times being somewhat more than 20ms, instead of a few microseconds gives a fair indication the wait is related. We can also look in the debugger:

(gdb) info break
Num     Type           Disp Enb Address            What
8       catchpoint     keep y                      syscall "io_getevents" 
	catchpoint already hit 174 times
        silent
        shell sleep 0.01
        c

Yes, it has been hit 174 times. So ‘kfk: async disk IO’ is the instrumentation of the ‘io_getevents()’ call, or more actions of which the io_getevents() is part of.

Okay, now that we know that, let’s switch to Oracle version 11.2.0.3…
This is what I find on doing EXACTLY the same (full scan) in the tracefile:

PARSING IN CURSOR #140382932318408 len=23 dep=0 uid=84 oct=3 lid=84 tim=1356626776003874 hv=1020534364 ad='7f491298' sqlid='94dwfa8yd87kw'
select count(*) from t2
END OF STMT
PARSE #140382932318408:c=0,e=41,p=0,cr=0,cu=0,mis=0,r=0,dep=0,og=1,plh=3321871023,tim=1356626776003873
EXEC #140382932318408:c=0,e=31,p=0,cr=0,cu=0,mis=0,r=0,dep=0,og=1,plh=3321871023,tim=1356626776003962
WAIT #140382932318408: nam='SQL*Net message to client' ela= 2 driver id=1413697536 #bytes=1 p3=0 obj#=75579 tim=1356626776004009
WAIT #140382932318408: nam='direct path read' ela= 340 file number=5 first dba=28418 block cnt=126 obj#=75579 tim=1356626776045733
FETCH #140382932318408:c=164975,e=188719,p=20941,cr=20944,cu=0,mis=0,r=1,dep=0,og=1,plh=3321871023,tim=1356626776192752
STAT #140382932318408 id=1 cnt=1 pid=0 pos=1 obj=0 op='SORT AGGREGATE (cr=20944 pr=20941 pw=0 time=188708 us)'
STAT #140382932318408 id=2 cnt=1000000 pid=1 pos=1 obj=75579 op='TABLE ACCESS FULL T2 (cr=20944 pr=20941 pw=0 time=68726 us cost=5738 siz=0 card=1000000)'

Okay. First thing to notice is the ‘kfk: async disk IO’ events are gone. We see the PARSE line, the SQL Net message, and a single ‘direct path read’ wait, and then the FETCH line. This is absolutely different behavior than version 11.2.0.1!
What is happening? I think the only way to understand more about this, is running the debugger again with notification of io_submit, io_getevents, kslwtbctx and kslwtectx:

#0  0x0000003f38200660 in io_submit () from /lib64/libaio.so.1
#0  0x0000003f38200660 in io_submit () from /lib64/libaio.so.1
#0  0x0000000000a09030 in io_getevents@plt ()
#0  0x0000003f38200660 in io_submit () from /lib64/libaio.so.1
#0  0x0000003f38200660 in io_submit () from /lib64/libaio.so.1
#0  0x0000000000a09030 in io_getevents@plt ()
#0  0x0000003f38200660 in io_submit () from /lib64/libaio.so.1
#0  0x0000003f38200660 in io_submit () from /lib64/libaio.so.1
#0  0x0000000000a09030 in io_getevents@plt ()
#0  0x0000003f38200660 in io_submit () from /lib64/libaio.so.1
#0  0x0000003f38200660 in io_submit () from /lib64/libaio.so.1
#0  0x0000000000a09030 in io_getevents@plt ()
#0  0x0000003f38200660 in io_submit () from /lib64/libaio.so.1
#0  0x0000003f38200660 in io_submit () from /lib64/libaio.so.1
#0  0x0000000000a09030 in io_getevents@plt ()
#0  0x0000003f38200660 in io_submit () from /lib64/libaio.so.1
#0  0x0000003f38200660 in io_submit () from /lib64/libaio.so.1
#0  0x0000000000a09030 in io_getevents@plt ()
#0  0x0000003f38200660 in io_submit () from /lib64/libaio.so.1
#0  0x0000003f38200660 in io_submit () from /lib64/libaio.so.1
#0  0x0000000000a09030 in io_getevents@plt ()
#0  0x0000000008f9a652 in kslwtbctx ()
#0  0x0000000008fa1334 in kslwtectx ()
#0  0x0000000008f9a652 in kslwtbctx ()
#0  0x0000000008fa1334 in kslwtectx ()
#0  0x0000000008f9a652 in kslwtbctx ()
#0  0x0000000008fa1334 in kslwtectx ()
#0  0x0000000008f9a652 in kslwtbctx ()
#0  0x0000000008fa1334 in kslwtectx ()
#0  0x0000000008f9a652 in kslwtbctx ()

That’s odd! There are no calls to the code location’s kslwtbctx and kslwtectx during IO processing! So, despite ‘kfk: async disk IO’ still being a wait event, it doesn’t seem to be instrumenting io_getevents() anymore. What if we make io_getevents() take longer (I’ve used the same number as the earlier catching of the syscall, 0.01 second):

PARSE #140382932318408:c=0,e=168,p=0,cr=0,cu=0,mis=0,r=0,dep=0,og=1,plh=3321871023,tim=1356627973832834
EXEC #140382932318408:c=0,e=68,p=0,cr=0,cu=0,mis=0,r=0,dep=0,og=1,plh=3321871023,tim=1356627973833385
WAIT #140382932318408: nam='SQL*Net message to client' ela= 671 driver id=1413697536 #bytes=1 p3=0 obj#=75579 tim=1356627973835367
WAIT #140382932318408: nam='asynch descriptor resize' ela= 680 outstanding #aio=0 current aio limit=152 new aio limit=182 obj#=75579 tim=1356627973837361

*** 2012-12-27 18:06:16.440
FETCH #140382932318408:c=185972,e=2604507,p=20941,cr=20944,cu=0,mis=0,r=1,dep=0,og=1,plh=3321871023,tim=1356627976440376

No, the time in the FETCH line increases (elapsed; e=2604507, cpu; c=185972; there’s 2418535 microseconds of time which is not spend on CPU), but there is only little time instrumented by wait events. A reasonable conclusion is the ‘kfk: disk async IO’ wait event is different in this version (11.2.0.3). I’ve also tried to slow down the io_submit() call the same way, with the same result: the time increases in the FETCH line, but there’s nothing the database writes a wait line for.

What if I slow down the disk? This can be done using cgroups, as described in this post. I’ve set my both my ASM disks to 1 IO per second. That should result in waits! This is the resulting trace output:

PARSE #140382932318408:c=0,e=20,p=0,cr=0,cu=0,mis=0,r=0,dep=0,og=1,plh=3321871023,tim=1356633964787836
EXEC #140382932318408:c=0,e=64,p=0,cr=0,cu=0,mis=0,r=0,dep=0,og=1,plh=3321871023,tim=1356633964788256
WAIT #140382932318408: nam='SQL*Net message to client' ela= 703 driver id=1413697536 #bytes=1 p3=0 obj#=75579 tim=1356633964790122
WAIT #140382932318408: nam='asynch descriptor resize' ela= 699 outstanding #aio=0 current aio limit=238 new aio limit=264 obj#=75579 tim=1356633964792014
WAIT #140382932318408: nam='direct path read' ela= 497938 file number=5 first dba=23939 block cnt=13 obj#=75579 tim=1356633965295483

*** 2012-12-27 19:46:05.795
WAIT #140382932318408: nam='direct path read' ela= 495498 file number=5 first dba=23953 block cnt=15 obj#=75579 tim=1356633965795382

*** 2012-12-27 19:46:06.295
WAIT #140382932318408: nam='direct path read' ela= 495890 file number=5 first dba=23969 block cnt=15 obj#=75579 tim=1356633966295208
WAIT #140382932318408: nam='direct path read' ela= 495889 file number=5 first dba=23985 block cnt=15 obj#=75579 tim=1356633966795127

So, if I slow the IO down, I get ‘direct path read’ wait events. How does that look when I use the break/continue technique on this configuration?

#0  0x0000003f38200660 in io_submit () from /lib64/libaio.so.1
#0  0x0000003f38200660 in io_submit () from /lib64/libaio.so.1
#0  0x0000000000a09030 in io_getevents@plt ()
#0  0x0000000000a09030 in io_getevents@plt ()
#0  0x0000000000a09030 in io_getevents@plt ()
#0  0x0000000000a09030 in io_getevents@plt ()
#0  0x0000000008f9a652 in kslwtbctx ()
#0  0x0000000000a09030 in io_getevents@plt ()
#0  0x0000000000a09030 in io_getevents@plt ()
#0  0x0000000008fa1334 in kslwtectx ()
#0  0x0000003f38200660 in io_submit () from /lib64/libaio.so.1
#0  0x0000003f38200660 in io_submit () from /lib64/libaio.so.1
#0  0x0000000000a09030 in io_getevents@plt ()
#0  0x0000000000a09030 in io_getevents@plt ()
#0  0x0000000000a09030 in io_getevents@plt ()
#0  0x0000000000a09030 in io_getevents@plt ()
#0  0x0000000008f9a652 in kslwtbctx ()
#0  0x0000000000a09030 in io_getevents@plt ()
#0  0x0000000000a09030 in io_getevents@plt ()
#0  0x0000000008fa1334 in kslwtectx ()
#0  0x0000003f38200660 in io_submit () from /lib64/libaio.so.1
#0  0x0000000000a09030 in io_getevents@plt ()
#0  0x0000000000a09030 in io_getevents@plt ()
#0  0x0000000000a09030 in io_getevents@plt ()
#0  0x0000000000a09030 in io_getevents@plt ()
#0  0x0000000008f9a652 in kslwtbctx ()
#0  0x0000000000a09030 in io_getevents@plt ()
#0  0x0000000000a09030 in io_getevents@plt ()
#0  0x0000000008fa1334 in kslwtectx ()
#0  0x0000000008f9a652 in kslwtbctx ()
#0  0x0000000008fa1334 in kslwtectx ()
#0  0x0000000008f9a652 in kslwtbctx ()
#0  0x0000000008fa1334 in kslwtectx ()
#0  0x0000000008f9a652 in kslwtbctx ()
#0  0x0000000008fa1334 in kslwtectx ()
#0  0x0000000008f9a652 in kslwtbctx ()
#0  0x0000000008fa1334 in kslwtectx ()
#0  0x0000000008f9a652 in kslwtbctx ()

The interesting part are the io_getevents() calls, which consistently are 4 calls before kslwtbctx is called, and another one or two calls are done before kslwtectx is called, and wait time is registered (see above). Interesting, but no ‘kfk: async disk IO’.

Now let’s look at the same output when Oracle 11.2.0.1 is throttled:

#0  0x0000003f38200660 in io_submit () from /lib64/libaio.so.1
#0  0x0000003f38200660 in io_submit () from /lib64/libaio.so.1
#0  0x0000000008217fda in kslwtbctx ()
#0  0x00000000009d1a58 in io_getevents@plt ()
#0  0x000000000821e3d8 in kslwtectx ()
#0  0x00000000009d1a58 in io_getevents@plt ()
#0  0x0000000008217fda in kslwtbctx ()
#0  0x00000000009d1a58 in io_getevents@plt ()
#0  0x00000000009d1a58 in io_getevents@plt ()
#0  0x000000000821e3d8 in kslwtectx ()
#0  0x0000003f38200660 in io_submit () from /lib64/libaio.so.1
#0  0x0000003f38200660 in io_submit () from /lib64/libaio.so.1
#0  0x0000000008217fda in kslwtbctx ()
#0  0x00000000009d1a58 in io_getevents@plt ()
#0  0x000000000821e3d8 in kslwtectx ()
#0  0x00000000009d1a58 in io_getevents@plt ()
#0  0x0000000008217fda in kslwtbctx ()
#0  0x00000000009d1a58 in io_getevents@plt ()
#0  0x00000000009d1a58 in io_getevents@plt ()
#0  0x000000000821e3d8 in kslwtectx ()
#0  0x0000000008217fda in kslwtbctx ()
#0  0x000000000821e3d8 in kslwtectx ()
#0  0x0000000008217fda in kslwtbctx ()
#0  0x000000000821e3d8 in kslwtectx ()
#0  0x0000000008217fda in kslwtbctx ()
#0  0x000000000821e3d8 in kslwtectx ()
#0  0x0000000008217fda in kslwtbctx ()
#0  0x000000000821e3d8 in kslwtectx ()
#0  0x0000000008217fda in kslwtbctx ()

This looks remarkably the same as the 11.2.0.3 version!

Conclusion

It seems the database issues a number of io_getevents() calls after asynchronous IO calls are submitted with io_submit(), which do return immediately (also known as ‘non-blocking’). They just peak at the completion queue, if an IO is available in the completion queue, the process processes that IO. The number of non-blocking calls is 3 with version 11.2.0.1 and 4 with 11.2.0.3 (please mind the ‘slot’ mechanism which controls the number of simultaneous AIO requests is probably in play here too). If these calls do not return any IO request, the wait interface is called (kslwtbctx()), and a blocking io_getevents() call is issued (which waits for at least 1 request to be returned too), for which the wait time is visible as event ‘direct path read’.

It also seems the first non-blocking io_getevents() call is instrumented by the ‘kfk: async disk IO’ event with Oracle 11.2.0.1, for reasons unknown to me. This instrumentation is not present in version 11.2.0.3.