Doing synchronous IO on ASM
One of the advantages of ASM is doing non (operating system) buffered IO (also known as ‘DIO’ or Direct IO), and doing asynchronous IO (also known as ‘AIO’ or ‘asynch IO’).
This is an excerpt from “ASM 10gr2 Best Practices“:
The database file level access (read/write) of ASM files is similar to pre-10g, except that any database file name that begins with a “+”, will automatically be handled and managed using the ASM code path. However, with ASM files, the database file access inherently has the characteristics of raw devices; i.e., un- buffered (direct IO) with kernelized asynchronous IO (KAIO).
What does this quote say?
- If a database file name begins with a “+”, it will be automatically be handled and managed using the ASM code path.
I think this is true.
- With ASM files (I read that as ‘database files in ASM’) the database file access inherently has the characteristics of raw device; i.e., un-buffered (direct IO)
I am confident this is true too: there is no filesystem involved during the reading of ASM file, so it can’t be buffered in a filesystem cache.
- with kernelized asynchronous IO (KAIO)
This is mostly true. If asynchronous IO is setup correctly, most IO’s are done asynchronous, but some are synchronous.
How do you measure if asynchronous IO is being used?
The note on MOS (My Oracle Support) on asynchronous IO (‘How To Check if Asynchronous I/O is Working On Linux, Doc ID 237299.1) has changed. It used to point to the linux slabinfo (visible in /proc/slabinfo; these are kernel “slabs”), to look for allocations of “kiocb” (kernel IO callback) and “kioctx” (kernel IO context) which would allow someone to see if *some* process has initiated the usage of these slabs, which indicates asynchronous IO is initialised. Mind “some process”: this doesn’t have to be the database. Not an extremely reliable way of telling asynchronous IO is used.
Now, the note also includes more accurate ways to detect the kind of IO’s done:
-An example of a ‘strace’ of a process (the databasewriter, dbwr) doing asynchronous IO, which is (easy) visible by the asynchronous IO system calls: io_submit(), which issues an IO request, and io_getevents(), which reads the completion queue to verify the status of submitted IO’s.
-An example of a ‘strace’ of the same process doing the same calls using synchronous IO, which also is easy visible by the synchronous write call: pwrite().
When ASMLib is used, it gets a little more difficult: processes use synchronous IO (read()) to issue calls to ASMLib meta-devices, which do asynchronous IO, depending on Oracle database settings, on the behalf of the calling process. Please mind I haven’t investigated this more in-depth, this is what the metalink document says!
So: the IO method is visible by tracing the systemcalls of a process:
io_submit, io_getevents = asynchronous IO
pread, pwrite = synchronous IO
disk_asynch_io
The usage of asynchronous IO depends on the database parameter ‘disk_asynch_io’. If set to ‘false’, the database uses synchronous IO, even when ASM is used. So the above mentioned note is NOT true. With ASM and ‘disk_asynch_io’ set to ‘false’, it means the IO calls will be unbuffered, but synchronous. I have no databases on raw devices, but I am confident it will behave the same.
But whilst disk_asynch_io=false always means the IO is synchronous, when set to ‘true’ it does not mean asynchronous IO is always used.
filesystemio_options
By default, the parameter ‘filesystemio_options’ is empty after a database is created. On linux (2.6 kernel) both 32 and 64 bit, a database (I’ve tested 11.2.0.1, but assume it’s the same for lower versions) is doing synchronous IO if it’s using the ‘ext3’ filesystem. In order to activate unbuffered (direct) IO, the parameter filesystemio_options needs to be set to ‘directio’, in order to use asynchronous IO it needs to be set to ‘asynch’, in order to use both use ‘setall’. To list all options: set it to ‘none’ to disable both DIO and AIO.
If a database is created which uses ASM for storage, asynchronous IO is done (and direct IO, by design) whilst the parameter ‘filesystemio_options’ is empty.
Mostly asynchronous
I’ve mentioned ‘mostly asynchronous’ a number of times. What does that mean? In the current version (11.2.0.1) of the database, physical reads of data dictionary objects are ALWAYS done synchronously, regardless of options set and storage types.
Below samples done on a VM in VMWare fushion, OEL5u5 x64, database 11.2.0.1.
1. Database on ext3 filesystem, filesystemio_options=none (or empty)
1.a. logon to the database, list the PID of the server process
$ sqlplus / as sysdba
SQL*Plus: Release 11.2.0.1.0 Production on Wed Jul 21 16:24:26 2010
Copyright (c) 1982, 2009, Oracle. All rights reserved.
Connected to:
Oracle Database 11g Enterprise Edition Release 11.2.0.1.0 - 64bit Production
With the Partitioning, OLAP, Data Mining and Real Application Testing options
SQL> select spid from v$session a, v$process b where a.paddr=b.addr and a.sid = (select distinct sid from v$mystat);
SPID
------------------------
6775
1.b. open a second session (on the database server), and start a strace:
$ strace -cp 6775
Process 6775 attached - interrupt to quit
1.c. switch back to the sqlplus session, and execute ‘select count(*) from dba_extents’ and exit:
SQL> select count(*) from dba_extents;
COUNT(*)
----------
7963
SQL> exit
Disconnected from Oracle Database 11g Enterprise Edition Release 11.2.0.1.0 - 64bit Production
With the Partitioning, OLAP, Data Mining and Real Application Testing options
1.d. now switch to the strace session, it will summarise the systemcalls:
Process 7165 detached
% time seconds usecs/call calls errors syscall
------ ----------- ----------- --------- --------- ----------------
94.38 1.550058 304 5091 pread
2.57 0.042190 0 295641 getrusage
2.52 0.041446 364 114 munmap
0.45 0.007470 22 346 mmap
0.07 0.001130 0 10408 times
It tells us we spend 1.55 seconds on pread systemcalls. This is expected. But: this is synchronous IO. I guess most Oracle databases on linux just run synchronous and buffered.
2. Database on ext3 filesystem, filesystemio_options=setall
2.a logon to the database, list the PID of the server process
$ sqlplus / as sysdba
SQL*Plus: Release 11.2.0.1.0 Production on Wed Jul 21 16:24:26 2010
Copyright (c) 1982, 2009, Oracle. All rights reserved.
Connected to:
Oracle Database 11g Enterprise Edition Release 11.2.0.1.0 - 64bit Production
With the Partitioning, OLAP, Data Mining and Real Application Testing options
SQL> select spid from v$session a, v$process b where a.paddr=b.addr and a.sid = (select distinct sid from v$mystat);
SPID
------------------------
7586
2.b. open a second session (on the database server), and start a strace:
$ strace -cp 7586
Process 7586 attached - interrupt to quit
2.c. switch back to the sqlplus session, and execute ‘select count(*) from dba_extents’ and exit:
SQL> select count(*) from dba_extents;
COUNT(*)
----------
7963
SQL> exit
Disconnected from Oracle Database 11g Enterprise Edition Release 11.2.0.1.0 - 64bit Production
With the Partitioning, OLAP, Data Mining and Real Application Testing options
2.d. now switch to the strace session for the summary:
Process 7586 detached
% time seconds usecs/call calls errors syscall
------ ----------- ----------- --------- --------- ----------------
93.42 1.412650 278 5089 pread
3.26 0.049300 0 295641 getrusage
2.99 0.045252 393 115 munmap
0.23 0.003473 10 344 mmap
0.10 0.001463 0 10408 times
This tells us we still did pread(), alias synchronous IO calls, despite filesystemio_options being set to ‘setall’.
3. Reading a regular table.
3.a. login to the database and create a table:
$ sqlplus / as sysdba
SQL*Plus: Release 11.2.0.1.0 Production on Wed Jul 21 16:48:54 2010
Copyright (c) 1982, 2009, Oracle. All rights reserved.
Connected to:
Oracle Database 11g Enterprise Edition Release 11.2.0.1.0 - 64bit Production
With the Partitioning, OLAP, Data Mining and Real Application Testing options
SQL> create table test as select * from dba_source;
Table created.
3.b. list PID
SQL> select spid from v$session a, v$process b where a.paddr=b.addr and a.sid = (select distinct sid from v$mystat);
SPID
------------------------
15716
3.c. start a strace session in another window:
$ strace -cp 15716
Process 15716 attached - interrupt to quit
3.d. issue ‘select count(*) from test’ in the sqlplus session, and exit:
SQL> select count(*) from test;
COUNT(*)
----------
623146
SQL> exit
Disconnected from Oracle Database 11g Enterprise Edition Release 11.2.0.1.0 - 64bit Production
With the Partitioning, OLAP, Data Mining and Real Application Testing options
3.e. now look at the strace session for the summary:
Process 14284 detached
% time seconds usecs/call calls errors syscall
------ ----------- ----------- --------- --------- ----------------
57.39 0.023942 73 326 io_submit
35.43 0.014781 336 44 pread
7.18 0.002997 9 317 io_getevents
0.00 0.000000 0 2 read
0.00 0.000000 0 2 write
Ah, now we have done asynchronous IO (io_submit, io_getevents), but there’s also ‘pread’, thus synchronous IO?
The pread() is the lookup in the data-dictionary the session needed to do to be able to get all the needed metadata for reading the ‘TEST’ table.
Let’s look into a database using ASM:
4.a. logon, verify ASM usage:
$ sq
SQL*Plus: Release 11.2.0.1.0 Production on Wed Jul 21 17:12:50 2010
Copyright (c) 1982, 2009, Oracle. All rights reserved.
Connected to:
Oracle Database 11g Enterprise Edition Release 11.2.0.1.0 - 64bit Production
With the Partitioning, Automatic Storage Management, OLAP, Data Mining
and Real Application Testing options
SQL> select name from v$datafile;
NAME
--------------------------------------------------------------------------------
+DATA/bioinfo1/datafile/system.256.721389685
+DATA/bioinfo1/datafile/sysaux.257.721389685
+DATA/bioinfo1/datafile/undotbs1.258.721389685
+DATA/bioinfo1/datafile/users.259.721389685
+DATA/bioinfo1/datafile/original.266.721394641
+DATA/bioinfo1/datafile/dbfs.267.724598427
6 rows selected.
SQL>
This database is using ASM!
4.b. start a ‘strace -cp ‘ in another session.
4.c. execute ‘select count(*) from dba_extents’; exit
4.d. investigate summary of strace:
Process 5269 detached
% time seconds usecs/call calls errors syscall
------ ----------- ----------- --------- --------- ----------------
94.23 1.559458 279 5599 pread
3.89 0.064443 0 323685 getrusage
1.54 0.025487 167 153 munmap
0.28 0.004667 23 207 mmap
0.06 0.000955 0 11394 times
So, we just done synchronous IO, despite being on ASM.
Lets create the table from dba_source again, and look if that will invoke AIO:
5.a. login, create table
$ sq
SQL*Plus: Release 11.2.0.1.0 Production on Wed Jul 21 17:18:38 2010
Copyright (c) 1982, 2009, Oracle. All rights reserved.
Connected to:
Oracle Database 11g Enterprise Edition Release 11.2.0.1.0 - 64bit Production
With the Partitioning, Automatic Storage Management, OLAP, Data Mining
and Real Application Testing options
SQL> create table test as select * from dba_source;
Table created.
5.b. set ‘strace -cp ‘ again in another session
5.c. execute ‘select count(*) from test;’, then exit
5.d. investigate strace summary:
Process 5742 detached
% time seconds usecs/call calls errors syscall
------ ----------- ----------- --------- --------- ----------------
44.59 0.023152 71 327 io_submit
26.93 0.013983 215 65 pread
18.16 0.009431 63 149 munmap
9.62 0.004994 17 292 io_getevents
0.55 0.000283 283 1 mmap
Ah, now we are mostly doing asynchronous IO, again not for the data-dictionary lookups, but we do for the data itself!
You know what I find mesmerizing? How complicated it is to verify true asynch IO in Linux!
After all these years, one would have thought the penny would have dropped with the Linux community that asynch IO is ESSENTIAL for database servers.
And if they want their OS to be used as such, then they better make it EASY to set AIO and make it work. But apparently, it’s still a major mental as well as physical hurdle…
With Oracle taking over a Linux distro, let’s hope this will eventually change. About time, too!
I don’t think this is a linux issue. In my opinion, AIO has been implemented in linux in a decent way.
The points I try to make in this article are:
->AIO isn’t used with an out-of-the-box database creation on linux on a filesystem
(which is an Oracle issue, if the default settings of ‘filesystemio_options’ on linux would be ‘setall’ AIO+DIO would be enabled by default. On windows AIO+DIO are used by default)
->If everything is set properly for AIO, Oracle still does synchronised IO’s for querying data dictionary tables.
(This is what kept me puzzling for a while, and almost got me to believe AIO didn’t work. Still, this is how Oracle implemented doing IO’s to the data dictionary)
Pingback: Blogroll Report 16/07/2010 – 23/07/2010 « Coskan’s Approach to Oracle
How does this all work in 11g. It is quite a bit different. I have read you should bypass the slab kio altogether. In other words the grep kio slabinfo should return
kioctx 0 0
kiocb 0 0
I can get kiocb to return 0 0 , but not kioctx. Any idea why??