Oracle IO wait events: db file sequential read

(the details are investigated and specific to Oracle’s database implementation on Linux x86_64)

Exadata IO: This event is not used with Exadata storage, ‘cell single block physical read’ is used instead.
p1: file#
p2: block#
p3: blocks

Despite p3 listing the number of blocks, I haven’t seen a db file sequential read event that read more than one block ever. Of course this could change in a newer release.

One of the important things to realise here is that regardless of asynchronous IO settings (disk_asynch_io, filesystemio_options), Oracle always uses a pread() systemcall, so synchronous IO for reading blocks which are covered with this event. If you realise what the purpose of fetching the single block is in most cases: fetching a single database block which contents are necessary in order to continue processing, it should become apparent that issuing a synchronous IO call makes sense. This is also the reason the V$IOSTAT* view lists both SMALL_READ_REQS, SMALL_SYNC_READ_REQS and SMALL_READ_SERVICETIME, SMALL_SYNC_READ_LATENCY, to make a distinction between SYNC (pread()) reads and non-sync (thus asynchronous) calls, using the io_submit()-io_getevents() call combination.

IO done under the event ‘db file sequential read’ means a single block is read into the buffer cache in the SGA via the system call pread(). Regardless of physical IO speed, this wait always is recorded, in other words: there is a strict relation between the event and the physical IO. Just to be complete: if a block needed is already in the Oracle database buffer cache, no wait event is triggered and the block is read. This is called a logical IO. When the wait event ‘db file sequential read’ is shown, both a physical and a logical IO are executed.

This event means a block is not found in the database buffer cache. It does not mean the block is really read from a physical disk. If DIO (direct IO) is not used (filesystemio_options is set to ‘none’ or ‘async’ when using a filesystem, ASM (alias “Oracle managed raw devices”) is inherently direct path IO, except when the ASM “disks” are on a filesystem (when ASM is used with NFS (!), then filesystemio_options is obeyed)), the block could very well be coming from the filesystem cache of linux. In fact, without DIO a phenomenon known as ‘double buffering’ takes place, which means the IO doesn’t happen to it’s visible disk devices directly, but it needs to take a mandatory step in between, done at the kernel level, which means the data is put in the filesystem cache of linux too. It should be obvious that this extra work comes at the cost of extra CPU cycles being used, and is in almost any case unnecessary.

If you take a step back you should realise this event should take place for a limited amount of blocks during execution. Because of the inherent single block IO nature of db file sequential read, every physical read (when it needs to read from a physical disk device) takes the IO latency penalty. Even with solid state disk devices, which have an inherently lower latency time because there are no rotating parts and disk heads to be moved, chopping up an operation in tiny parts when a full table scan or fast full index scan could be done means a lot of CPU time is used whilst it could be done more efficient.

The time spend on ‘db file sequential read’ quite accurately times single block IO. This means a direct relationship between ‘db file sequential read’ timings and amount should exist with operating system measured IO statistics (iostat, sar and more).


Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: