IBM AIX Concurrent IO howto

I recently got some questions about IBM AIX and Concurrent IO (CIO).

First of all, there’s Direct IO (DIO) and Concurrent IO (DIO). The advantage of DIO and CIO is the data for both reading and writing is not buffered at the operating system level. This means ‘double buffering’ is omitted, which saves CPU cycles. If the database uses the operating system buffering (for example if there’s plenty of memory on the system), performance can degrade when DIO or CIO is used, because then all IO’s are done physically from the operating system’s perspective (it could be buffered at the lower layer, which could be a SAN or NAS).

The difference between DIO and CIO is the way write concurrency is handled (this is a bit simplified). DIO serialises IO’s to prevent writes from corrupting each other, whilst CIO does not serialise, because it assumes the writer has done proper serialisation. With the Oracle database, this is the case (remember, Oracle can work without a filesystem too).

The IBM JFS filesystem has DIO implemented, the IBM JFS2 filesystem has CIO implemented.

In order to implement CIO, there are two ways:
-Use a database parameter: FILESYSTEMIO_OPTIONS; set it to ‘setall’ (which also implements Asynchronous IO), or ‘directIO’ (which opens files in DIO mode on JFS, and in CIO mode on JFS2)
-Use a mount option (‘cio’) to make all IO’s to the filesystem behave as if CIO was done

A note on Oracle’s metalink (‘Direct I/O or Concurrent I/O on AIX 5L’; ML 272520.1) describes this behavior. This note also states that BOTH should be enabled. I don’t find this the best way to make a database do CIO; merely setting FILESYSTEMIO_OPTIONS properly (‘setall’,’directIO’) will do, mounting the filesystem with the ‘cio’ option will not add anything, it will make regular IO’s do CIO, of which some do not benefit. For example, if the database software and the dump destinations are on this filesystem too.

  1. edda kang said:

    hi! I’m a student in Seoul, Korea. I referred to “install RAC 11g with ASM on vmware” that presumably you wrote, and have some questions while following the instructions. On the step 19, I got showed it could not identify the OS. Could you help me ? please email me and i’ll let you know more about my environment and the problem. Hope you read this. Thanks.

  2. lonnyniederstadt said:

    Hello! Do buffered filesystems (I’m specifically interested in buffered filesystem access vs CIO/DIO on IBM JFS2) influence effective MBRC behavior? Trying to work out why I’m seeing an average read size just above 8k on an IBM system with a ton of large tables, disk max transfer size of 1 mb and adapter max transfer size of 1 mb.

    • Hi Lonny!

      The behaviour of multiblock reads is depended on the setting of the parameter db_file_multiblock_read_count (set in blocks). Oracle will not read more blocks than set with this parameter, unless the value 0 or the parameter not set. In the last cases, Oracle will dynamically set this parameter. Charles Hooper wrote about this (, and Christian Antognini in his book ‘Troubleshooting Oracle Performance’.

      The maximal number of blocks a multiblock read (full table scan/fast full index scan/etc.) can do is limited to the maximal IO size for buffered multiblockreads (buffered multiblockreads means read which are going to the Oracle buffer cache visible via the event ‘db file scattered read’, maximum IO size is 1MB on most platforms), and seems to be 1024 Oracle blocks (block size 8kB, Oracle, Linux X64) for serial direct path reads. Given the ability construct a large IO request. Oracle buffered reads (db file scattered read) are bound to extent borders, bound to adjacent data blocks not being in the cache, and non-data blocks. serial direct path reads are only bound to adjacent data blocks.

      This is all on the Oracle level, with a little influence of the O/S (maximum IO size). As far as I know do the filesystem (and if it’s a filesystem or ASM), and if this is AIO and/or DIO not matter for the decision how much to read from the operating system; once the database needs certain data the IO is constructed using the rules above, and then issued to the operating system.

      As far as I know the ASM/KFK layer does not change the IO request.

  3. Amir Hameed said:

    Hi Frits,
    Does Linux Kernel 2.6, support Concurrent IO? If not then I am assuming that when filesystemio_options is set to SETALL, database enables DIO only.


    • Hi Amir, thanks for reading, and I think that’s a good question to ask.

      Let’s start off with Oracle first. FILESYSTEMIO_OPTIONS sets specific usage for using a filesystem (as the parameter name indicates). This means that it’s not used by Oracle to determine how it should do IO when ASM is in use. To influence the behaviour of doing IO for ASM, it’s only DISK_ASYNCH_IO to turn on asynchronous IO (when it makes sense, the database doesn’t do everything using asynchronous IO calls if it’s set to true) that I think can be used. Also, DISK_ASYNCH_IO functions as a ‘master switch’: if asynchronous IO is turned off with this switch, it overrides FILESYSTEMIO_OPTIONS.

      So, when on a filesystem, you can influence the calls and flags that Oracle uses with FILESYSTEMIO_OPTIONS. Essentially, Oracle allows you to set two distinct things: asynchronous IO (asynch and setall) and direct IO (directio and setall). You can’t set concurrent IO at the Oracle level.

      At the Linux level, there isn’t a concurrent IO switch, like on AIX. It’s useful to understand what concurrent IO actually means. Concurrent IO for database performance is the ability to concurrently write to the same file (this means I silently assume concurrent reads can always happen, which I haven’t verified!). While this ability may sound obvious, it really isn’t that obvious, traditionally writes are serialised on the inode (file) level. Also, concurrent IO needs direct IO. So on the Oracle level, FILESYSTEMIO_OPTIONS needs to be set to directio or setall to allow concurrent IO on the operating system level.

      Now, to finally answer your question: does the Linux kernel 2.6 support concurrent IO: Concurrent IO is not a linux kernel feature. It is a filesystem feature. So the next thing to ask is: if concurrent IO is a filesystem feature, which filesystems on linux do support concurrent IO? You might conclude that a modern operating system with active development might actually have solved this issue and allows for it. That is not the case!

      The linux ext filesystems (ext2, ext3 and ext4) do NOT allow concurrent IO. This means write IOs are serialised. Yes, this could influence performance for databases that need a lot of concurrent write IOs badly (think database writer!). However, that is a specific scenario.
      The linux XFS filesystem DOES support concurrent IO. This means I always choose XFS if I need to use Oracle on a filesystem. Remind you again what I said: not a lot of databases that I have seen would get massive problems if no concurrent IO was available. But if you can choose, it makes sense to choose XFS for having the ability to do concurrent write IO in case it might be needed; there is no harm if it isn’t needed.

      If you like to read more about ext versus XFS, read up on it on the blogs of two friends of mine: Kevin Closson and Martin Bach.

      Please mind ASM (when used directly with block devices!) is inherently concurrent IO, because there is no (filesystem) layer that could lock an inode, because there are no inodes, the database engine directly issues IOs to the block devices.
      I don’t know about common filesystems like NFS (probably does), ZFS (probably doesn’t, because there is no directio), btrfs. Please mind these are really just guesses.

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: