Extra huge database IOs

It’s been a while since I presented the first incarnation of my ‘about multiblock reads’ presentation. When I did this at the UKOUG TEBS conference in Birmingham in 2011, Christian Antognini chaired my presentation. After my presentation Christian showed me it’s possible to set the parameter ‘db_file_multiblock_read_count’ higher than 1MB/db_block_size (which is 128 if your blocksize is 8kB), and you could benefit from it if your hardware is sufficient. In fact, Christian showed me AWR reports (could also be statspack reports, not sure) which showed the benefit.

My understanding of the parameter db_file_multiblock_read_count at the time was:

The maximum value is the operating system’s maximum I/O size expressed as Oracle blocks ((max I/O size)/DB_BLOCK_SIZE). If you set this parameter to a value greater than the maximum, Oracle uses the maximum.

Which is taken from the Reference guide, in which this text is there for both 10.2 and 11.2, and even for 12.1.

This is interesting, because the statement from the Oracle reference guide conflicts with Christians statement and measurements!

My test environment: Oracle 11.2.0.3 on Linux OL6u3 X64 with ASM, on my Macbook with SSD.

I’ve got a table created in a bigfile ASSM tablespace, system allocation type. This means objects get extents allocated which are increased progressively in size in steps:

 EXTENT_ID     BLOCKS
---------- ----------
	 0	    8
...
	15	    8
	16	  128
...
	78	  128
	79	 1024
...
	91	 1024

First let’s take a look at a full scan done “buffered”, alias blocks read to the buffercache, which is visible via the “db file scattered read” wait event. This is the “traditional” multiblock read.

I’ve flushed the buffer cache, set db_file_multiblock_read_count to 128, set event 10949 at level 1 and enabled SQL trace at level 8.
Event 10949 is needed to get a buffered multiblock read of an table which the Oracle database deems ‘big’ (to avoid non-buffered alias direct path reads).
Another thing is both ‘db file sequential read’ and ‘db file scattered read’ will always show up (even if the action (the physical read) is very fast), so there’s no need the change things (slow down I/O) to be sure we get these waits.

The statement executed is ‘select count(*) from t2’.

The first section of extent size 8, which shows the typical alteration of 8 and 7 blocks being read (after the initial smaller sizes) which is because of the space bitmap block every other extent:

WAIT #140150651756760: nam='db file sequential read' ela= 272 file#=5 block#=23938 blocks=1 obj#=75579 tim=1372970940124856
WAIT #140150651756760: nam='db file scattered read' ela= 2162 file#=5 block#=23939 blocks=5 obj#=75579 tim=1372970940127442
WAIT #140150651756760: nam='db file scattered read' ela= 725 file#=5 block#=23944 blocks=8 obj#=75579 tim=1372970940128478
WAIT #140150651756760: nam='db file scattered read' ela= 144 file#=5 block#=23953 blocks=7 obj#=75579 tim=1372970940128908
WAIT #140150651756760: nam='db file scattered read' ela= 253 file#=5 block#=23960 blocks=8 obj#=75579 tim=1372970940129636

Then the extents of 128 blocks, which show 126 blocks being read because of 2 space bitmap blocks at the beginning of every extent of 128 blocks:

WAIT #140150651756760: nam='db file scattered read' ela= 3741 file#=5 block#=24066 blocks=126 obj#=75579 tim=1372970940139996
WAIT #140150651756760: nam='db file scattered read' ela= 955 file#=5 block#=24194 blocks=126 obj#=75579 tim=1372970940142894
WAIT #140150651756760: nam='db file scattered read' ela= 2916 file#=5 block#=24322 blocks=126 obj#=75579 tim=1372970940148287
WAIT #140150651756760: nam='db file scattered read' ela= 2111 file#=5 block#=24450 blocks=126 obj#=75579 tim=1372970940152527
WAIT #140150651756760: nam='db file scattered read' ela= 2952 file#=5 block#=24578 blocks=126 obj#=75579 tim=1372970940157492

Then the extents of 1024 blocks, which start off with 4 space bitmap blocks at the beginning of every extents:

WAIT #140150651756760: nam='db file scattered read' ela= 3079 file#=5 block#=36228 blocks=128 obj#=75579 tim=1372970941036955
WAIT #140150651756760: nam='db file scattered read' ela= 2737 file#=5 block#=36356 blocks=128 obj#=75579 tim=1372970941048199
WAIT #140150651756760: nam='db file scattered read' ela= 2531 file#=5 block#=36484 blocks=128 obj#=75579 tim=1372970941059473
WAIT #140150651756760: nam='db file scattered read' ela= 2912 file#=5 block#=36612 blocks=128 obj#=75579 tim=1372970941071015
WAIT #140150651756760: nam='db file scattered read' ela= 2768 file#=5 block#=36740 blocks=128 obj#=75579 tim=1372970941082467
WAIT #140150651756760: nam='db file scattered read' ela= 287 file#=5 block#=36868 blocks=128 obj#=75579 tim=1372970941091378
WAIT #140150651756760: nam='db file scattered read' ela= 2831 file#=5 block#=36996 blocks=128 obj#=75579 tim=1372970941102969
WAIT #140150651756760: nam='db file scattered read' ela= 536 file#=5 block#=37124 blocks=124 obj#=75579 tim=1372970941111811

We see 7 lines of 128 blocks read (7*128=896) and one line of 124 blocks (896+124=1020) which is 4 blocks short of the extent size of 1024, which are the space bitmap blocks.

Okay, nothing new here.

If I raise the parameter db_file_multiblock_read_count to 2000, and reissue the same ‘select count(*) from t2’, I get exactly the same waits. In other words: nothing different.

So the conclusion is: when using buffered reads, the statement about 1MB being the limit of the size of a single multiblock read indeed is true.

Now let’s move on to direct path reads. In short, when a table is deemed ‘big’ by the database, meaning the number of blocks occupied by the table is more than five times _small_table_threshold (which is 2% of the buffercache by default) then the database foregrounds will not use the buffercache to read the table blocks to, but read them to the process’ PGA to avoid discarding usefull blocks in the cache, and avoid the overhead of concurrency control (latching). There are more criteria, but that’s outside the scope of this blogpost.

First let’s trace a session doing direct path reads. This is easier than the above situation where I forced buffered reads, because I can do the same as above, but skip setting event 10949. On the other hand it’s harder, because in order to see all the IOs using tracing (which means ‘get a wait for all IOs’), I need reduce the number of IOs (I use 1 IOPS), because otherwise some IOs will not get a wait, so not all IOs will be visible in the tracefile (look at this article for learning how to reduce IOPS with Linux, go see me present my About multiblock reads presentation to learn why some waits for direct path read IOs could not show up)

So, here are snippets from the tracefile of a ‘select count(*) from t2’, with db_file_multiblock_read_count set to 128 and SQL trace at level 8:

First we see the reads in the small extents (8 blocks) from BMB to BMB, not stopping at the extent border:

WAIT #140402935204600: nam='direct path read' ela= 996117 file number=5 first dba=23939 block cnt=13 obj#=75579 tim=1373055770533891
WAIT #140402935204600: nam='direct path read' ela= 997014 file number=5 first dba=23953 block cnt=15 obj#=75579 tim=1373055771531222
WAIT #140402935204600: nam='direct path read' ela= 1000275 file number=5 first dba=23969 block cnt=15 obj#=75579 tim=1373055772531716
WAIT #140402935204600: nam='direct path read' ela= 999111 file number=5 first dba=23985 block cnt=15 obj#=75579 tim=1373055773531249
WAIT #140402935204600: nam='direct path read' ela= 999628 file number=5 first dba=24001 block cnt=15 obj#=75579 tim=1373055774531093

Then we see the reads of 126 blocks, of the extents of 128 blocks:

WAIT #140402935204600: nam='direct path read' ela= 1004881 file number=5 first dba=24066 block cnt=126 obj#=75579 tim=1373055778541646
WAIT #140402935204600: nam='direct path read' ela= 989042 file number=5 first dba=24194 block cnt=126 obj#=75579 tim=1373055779531980
WAIT #140402935204600: nam='direct path read' ela= 1002835 file number=5 first dba=24322 block cnt=126 obj#=75579 tim=1373055780541509
WAIT #140402935204600: nam='direct path read' ela= 989545 file number=5 first dba=24450 block cnt=126 obj#=75579 tim=1373055781531621
WAIT #140402935204600: nam='direct path read' ela= 1007730 file number=5 first dba=24578 block cnt=126 obj#=75579 tim=1373055782539986

And then the reads of the extents of the size of 1024 blocks:

WAIT #140402935204600: nam='direct path read' ela= 2000086 file number=5 first dba=36228 block cnt=124 obj#=75579 tim=1373055875542615
WAIT #140402935204600: nam='direct path read' ela= 1995550 file number=5 first dba=36480 block cnt=128 obj#=75579 tim=1373055877539193
WAIT #140402935204600: nam='direct path read' ela= 1999327 file number=5 first dba=36736 block cnt=128 obj#=75579 tim=1373055879539584
WAIT #140402935204600: nam='direct path read' ela= 1998883 file number=5 first dba=36992 block cnt=128 obj#=75579 tim=1373055881539584
WAIT #140402935204600: nam='direct path read' ela= 1998753 file number=5 first dba=37252 block cnt=124 obj#=75579 tim=1373055883539416

If you take a close look, you’ll see IO’s are still missing (just add cnt to dba and see if it fits the next dba, with the exception of BMB’s, which are not read too, so will leave a gap). Nevertheless: this is as we expect.

Now let’s do the same, but set db_file_multiblock_read_count to 2000! In order to reduce length: the IO’s of the 8 blocks and 128 blocks extents are identical. Let’s look at the 1024 blocks extent IOs:

WAIT #140550163651400: nam='direct path read' ela= 7862301 file number=5 first dba=32132 block cnt=1020 obj#=75579 tim=1373057289713781
WAIT #140550163651400: nam='direct path read' ela= 7996018 file number=5 first dba=33156 block cnt=1020 obj#=75579 tim=1373057297713687
WAIT #140550163651400: nam='direct path read' ela= 7996272 file number=5 first dba=34180 block cnt=1020 obj#=75579 tim=1373057305713868
WAIT #140550163651400: nam='direct path read' ela= 7995719 file number=5 first dba=35204 block cnt=1020 obj#=75579 tim=1373057313713547
WAIT #140550163651400: nam='direct path read' ela= 7996310 file number=5 first dba=36228 block cnt=1020 obj#=75579 tim=1373057321713593

Wow! These are HUGE IOs! The entire 1024 block extent is read in one (oracle) IO excluding the 4 BMB’s.

So this means Christian was right, and the Oracle documentation is wrong about the maximum size of db_file_multiblock_read_count for direct path reads.

There’s much more to be said about huge IOs, but I’ll leave that for another blogpost.

2 comments

Leave a comment

This site uses Akismet to reduce spam. Learn how your comment data is processed.