How tuning can work out the wrong way

(…and always check all direct involved components yourself)

Recently I’ve been asked to look into a performance issue where an import already was taking a week.

Oracle version: 64-bit EE
Operating system: SunOS 5.8 Generic_117350-26 sun4u sparc SUNW, Sun-Fire-880
The hardware has 8 CPU’s, 32G RAM and a sun T2000 storage cabinet attached.

There already was a database present on the machine, which was a testdatabase which wasn’t doing anything. Also, it was sized rather small for this kind of machine (25M buffercache; 25M pga (automatic)).

The import took place on a newly created database. Because this database instance configuration was also tremendously small (25M buffercache; 25M pga (automatic)), I decided to kill the current import and set sensible values for some key memory parts to take advantage of the hardware (10G buffercache; 5G pga). Also I asked the unix administrator to mount the oracle data filesystem (one for both databases) with the “forcedirectio” and “noatime” options.

The “forcedirectio” option enables direct IO on the filesystem, to avoid double buffering (rough and short: meaning reads are not buffered by the operating system buffercache), “noatime” disables updating the Inodes on the filesystem on access.

So, with a decent configured database and filesystem configured for optimal IO, we started the import again.

Next, it appeared the testdatabase wasn’t really doing nothing. In fact, it was continuously used and because of the tuning actions, it was performing worse. A look in the performance repository revealed a big fall back in physical IO:
Physical IO
(rightclick, “view image”)

Also, the import still performed the same as before. Taking an enormous time…

What has happened, and what went wrong?

Always check all the compontents involved in a tuning action. The first thing which did go wrong, was a database which was reported to nothing, did approximately 1500 physical IO’s per second (from the database point of view).

Because the testdatabase (which was presumed not to do anything) was sized very small, the oracle buffercache couldn’t cache all the data of its transactions, but it’s IO was buffered on the operating system level (the operating system had plenty of space in it’s memory for that). That means that the IO seen as physical by the database, was in fact all buffered on the operating system level.

By forcing IO to go direct, it removed the ability to cache on the operating system level, which means the IO was really done by the storage cabinet now. Apparently (when looking at the graph), the maximum speed of the IO done by this cabinet was approximately 550 IO’s per second.

I didn’t check what the import was waiting for. A look at the waits of the import session revealed mainly ‘direct path write’ waits, and the ‘seconds_in_wait’ column showed waittimes going as high as 500 (seconds!). Also, the alertlog showed: ‘WARNING: aiowait timed out 1 times’.

Both the ‘direct path write’ wait and the warning in the alertlog point us to IO contention. Some investigation more revealed that this behavior is known on sun (and is in fact CPU contention; we do not get enough priority to be able to write) and solved by both patch 112254-01 and setting the operating system kernel parameter ts_sleep_promote=1 (default 0). In short, this patch alters the behavior of the operating system which could prevent certain processes from getting runnable (and thus waiting for a very long time, making the import last longer than a week).

Okay, how to solve this issues?

1. To reduce the number of IO’s of the testdatabase, scale up the buffercache. Plenty of space is available.
2. Because the number of IO’s is reduced, and the priorisation of runnable processes is altered (ts_sleep_promote) both are able to run perfectly alongside each other.

So, lesson learned: always check all the components involved.

  1. The ‘direct path write’ if seen with ‘direct path read’ could also mean that the import was attempting to do a sort. Check the detail of the wait to find out where it was trying to write, a write to temp it’s a likely cause. For large imports an alternative is to sidestep the 5% allocation of pga_aggregate_area to your session, by setting your workarea to manual for the session and reverting to modifying the earlier sort_area_size and sort_area_retained_size etc.


  2. Administrator said:

    The direct path write obviously is the index creation, which is requiring a sort. In many situations the amount of data for the sort can exceed the process’ sort area. This means that I think IO to the temporary tablespace is unavoidable in many cases. I agree that for import speed it is wise/advisable to use manual workarea, and specify a large sort area. (because a single process can not exceed 5% of the pga aggregate size by default)

    Thanks mat!


  3. Michel said:

    SunOS 5.8 huh?

    I wonder what filesystem was being used? Since Oracle uses async read/writes on SunOS by default since version 8i and UFS on SunOS 5.8 doesn’t support kernalized async IO. Unfortunately, the syscall isn’t failing but falling back to emulated async IO using LWP’s, which can degrade read/write performance. At least this is what I’ve seen happening on SunOS 5.8 using truss on DBWR. Removing “asynch” from filesystem_iooptions would have helped then as well.

    Another thing: filesystem_iooptions=directio already enables direct I/O (avoiding usage of filesystem buffer cache) on a file level, so you don’t have to specify the mount options. This way Oracle opens the datafiles using the open() using the O_DIRECT option. This helps if you have export dumpfiles on those filesystems as well… since (at least, in my experience) exp/imp benefits from the filesystem buffer cache 😉

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: