Oracle 12c in-memory option and IO

This article is about the Oracle 12c in-memory option, and specifically looks at how the background worker processes do IO to populate the in-memory column store.

Hardware: Apple Macbook with VMWare Fusion 7.1.3.
Operating system: Oracle Linux 6.7, kernel: 3.8.13-118.el6uek.x86_64.
Database version: Oracle 12.1.0.2
Patch: opatch lspatches
19392604;OCW PATCH SET UPDATE : 12.1.0.2.1 (19392604)
19303936;Database Patch Set Update : 12.1.0.2.1 (19303936)

But first things first, let’s setup the in-memory option first with a test table. The first thing to consider is to create the in-memory area to store the objects. I only want a single table stored in the in-memory area, so I can very simply look at the size of object:

TS@fv12102 > select segment_name, round(bytes/power(1024,2)) "MB" from user_segments where segment_name = 'T2';

SEGMENT_NAME                                     MB
---------------------------------------- ----------
T2                                              168

So, that means that an in-memory area of 200MB should suffice, let’s set it and bounce the instance:

SYS@fv12102 AS SYSDBA> alter system set inmemory_size = 200m scope=spfile sid='*';

System altered.

SYS@fv12102 AS SYSDBA> startup force;
ORACLE instance started.

Total System Global Area 1048576000 bytes
Fixed Size		    2932336 bytes
Variable Size		  226492816 bytes
Database Buffers	  469762048 bytes
Redo Buffers		  139673600 bytes
In-Memory Area		  209715200 bytes
Database mounted.
Database opened.

Line 13 shows the in-memory area is created!

Now I need to set the inmemory attribute for my table T2. This code example shows the inmemory attribute for my T2 table without the in-memory set, then alters the table to belong in the in-memory area, and shows the inmemory attribute again:

TS@fv12102 > select inmemory from user_tables where table_name = 'T2';

INMEMORY
--------
DISABLED

TS@fv12102 > alter table t2 inmemory;

Table altered.

TS@fv12102 > select inmemory from user_tables where table_name = 'T2';

INMEMORY
--------
ENABLED

If we now look at the V$IM_SEGMENTS, the view is empty:

SYS@fv12102 AS SYSDBA> select owner, segment_name, bytes normal_size, bytes_not_populated, inmemory_size, populate_status from v$im_segments;

no rows selected

This is because the inmemory_priority has not been specified when we altered the table to inmemory, which means the default priority NONE is given, which means the table will be loaded into memory once it is touched.

Now, in order to understand how the in-memory area is populated we need to execute a sql trace including waits on the Wnnn process or processes that populate the in-memory area. However, that is not as straightforward as it seems: there are a couple of these (background) processes already in my instance, but there can be created more if it necessary. Luckily, we can use Oracle’s improvements to the diagnosability infrastructure and set sql trace and filter on the process name:

SYS@fv12102 AS SYSDBA> alter system set events 'sql_trace {process: pname = w} level 8';

System altered.

Now that we enabled sql_trace for all Wnnn processes, let’s touch the table, and see what is in the trace!

First touch the table:

TS@fv12102 > select count(*) from t2;

  COUNT(*)
----------
   1000000

That should result in the T2 table being read in the in-memory area:

SYS@fv12102 AS SYSDBA> select owner, segment_name, bytes normal_size, bytes_not_populated, inmemory_size, populate_status from v$im_segments;

OWNER  SEGMENT_NAME		      NORMAL_SIZE BYTES_NOT_POPULATED INMEMORY_SIZE POPULATE_
------ ------------------------------ ----------- ------------------- ------------- ---------
TS     T2				176160768		    0	   53673984 COMPLETED

Bingo!

Now go to the trace directory, and look for w processes:

[oracle@bigmachine [fv12102] trace]$ ls -l *_w*
-rw-r----- 1 oracle oinstall   2145 Nov 14 08:52 fv12102_w000_30938.trc
-rw-r----- 1 oracle oinstall    166 Nov 14 08:52 fv12102_w000_30938.trm
-rw-r----- 1 oracle oinstall   8616 Nov 14 09:10 fv12102_w000_32380.trc
-rw-r----- 1 oracle oinstall    752 Nov 14 09:10 fv12102_w000_32380.trm
-rw-r----- 1 oracle oinstall  35401 Nov 14 09:10 fv12102_w002_31964.trc
-rw-r----- 1 oracle oinstall   2612 Nov 14 09:10 fv12102_w002_31964.trm
-rw-r----- 1 oracle oinstall 232444 Nov 14 09:10 fv12102_w001_30940.trc
-rw-r----- 1 oracle oinstall   7120 Nov 14 09:10 fv12102_w001_30940.trm

Because of the size of the file, it was my guess that file would contain the trace output of the reading of the T2 table into the in-memory area. That is correct.

If a worker process was up before doing an action, we can see it sleeps for 5 seconds in the wait event ‘Space Manager: slave idle wait’, after which it does some (internal) housekeeping, which means reading its CPU usage using getrusage(), after which it will sleep on semtimedop() for 5 seconds again. This is how that looks like when traced with waits:

WAIT #0: nam='Space Manager: slave idle wait' ela= 5000827 Slave ID=1 p2=0 p3=0 obj#=10986 tim=7189216230

*** 2015-11-14 08:52:32.981
WAIT #0: nam='Space Manager: slave idle wait' ela= 5000350 Slave ID=1 p2=0 p3=0 obj#=10986 tim=7194217089

*** 2015-11-14 08:52:37.982
WAIT #0: nam='Space Manager: slave idle wait' ela= 5001004 Slave ID=1 p2=0 p3=0 obj#=10986 tim=7199218359

*** 2015-11-14 08:52:42.982
WAIT #0: nam='Space Manager: slave idle wait' ela= 5000000 Slave ID=1 p2=0 p3=0 obj#=10986 tim=7204218487

When the worker starts working on populating the in-memory cache, it first reads some dictionary views (partobj$, compression$ and hist_head$), after which it begins to read the table into memory:

WAIT #140530969221688: nam='Disk file operations I/O' ela= 21 FileOperation=2 fileno=4 filetype=2 obj#=20434 tim=7207107727
WAIT #140530969221688: nam='direct path read' ela= 78 file number=4 first dba=22416 block cnt=8 obj#=20434 tim=7207125617
WAIT #140530969221688: nam='direct path read' ela= 113 file number=4 first dba=22425 block cnt=7 obj#=20434 tim=7207126486
WAIT #140530969221688: nam='direct path read' ela= 136 file number=4 first dba=22432 block cnt=8 obj#=20434 tim=7207127263
WAIT #140530969221688: nam='direct path read' ela= 60 file number=4 first dba=22441 block cnt=7 obj#=20434 tim=7207128054
WAIT #140530969221688: nam='direct path read' ela= 68 file number=4 first dba=22448 block cnt=8 obj#=20434 tim=7207128718
WAIT #140530969221688: nam='direct path read' ela= 114 file number=4 first dba=22457 block cnt=7 obj#=20434 tim=7207129417
WAIT #140530969221688: nam='direct path read' ela= 106 file number=4 first dba=22464 block cnt=8 obj#=20434 tim=7207130248
WAIT #140530969221688: nam='direct path read' ela= 100 file number=4 first dba=22473 block cnt=7 obj#=20434 tim=7207130878
WAIT #140530969221688: nam='direct path read' ela= 202 file number=4 first dba=22480 block cnt=8 obj#=20434 tim=7207132096
WAIT #140530969221688: nam='direct path read' ela= 173 file number=4 first dba=22489 block cnt=7 obj#=20434 tim=7207132857
WAIT #140530969221688: nam='direct path read' ela= 293 file number=4 first dba=22496 block cnt=8 obj#=20434 tim=7207134040

The event ‘Disk file operations I/O’ is the instrumentation of meta-operations which need to happen before a process can work on a file. In our case, FileOperation=2 means opening a file.

Then we see the event ‘direct path read’! That means that the worker uses the modern reading method of reading data, instead of the older method indicated by ‘db file scattered read’.

The reason this is important, is the ‘scattered read’ method does not scale very well. The basic setup of the ‘scattered read’ method is:
1. Determine chunk to read (limited by DB_FILE_MULTIBLOCK_READ, extent border, space management blocks and blocks already in the cache).
2. Read chunk.
3. Repeat until high water mark is reached.
Even if this is done via asynchronous IO, there is never read more than one chunk at a time.

If we look at the ‘direct path read’ method, then it does this in a simplified way:
1. Determine number of IOs allowable in flight (default/minimum two!)
2. Submit reads for chunks of the segment to scan, amount of IOs is determined at 1.
3. Wait for minimal 1 IO to return.
4. Repeat until high water mark is reached.

Essentially, using the ‘direct path read’ method a single process can take advantage of the available bandwidth of modern storage and asynchronous IO and issue multiple IOs.

During tracing I noticed the worker processes writing to the database’s diagnostics destination, however in a directory ‘log’, to a file ‘imdb_ORACLE_SID.log’ which seem to contain information about chunks read by worker processes:

Sat Nov 14 09:51:10 2015
c:0o:20434d:20434/St.RDBA: 0xac00 No.blks: 896
c:0o:20434d:20434/St.RDBA: 0xaf84 No.blks: 1020
c:0o:20434d:20434/St.RDBA: 0xb384 No.blks: 1020
c:0o:20434d:20434/St.RDBA: 0xb784 No.blks: 1020
c:0o:20434d:20434/St.RDBA: 0xbb84 No.blks: 1020
c:0o:20434d:20434/St.RDBA: 0xbf84 No.blks: 1020
c:0o:20434d:20434/St.RDBA: 0xc384 No.blks: 1020
c:0o:20434d:20434/St.RDBA: 0xc784 No.blks: 1020
c:0o:20434d:20434/St.RDBA: 0xcb84 No.blks: 1020
c:0o:20434d:20434/St.RDBA: 0xcf84 No.blks: 644
c:0o:20434d:20434/rows buff: 460730

And in a directory ‘imdb’ inside the ‘log’ directory, which looks like the same data in an XML format:

<msg time='2015-11-14T08:52:45.889+01:00' org_id='oracle' comp_id='rdbms'
 msg_id='kdmlScanAndCreateCU:1537:858585840' type='UNKNOWN' group='IM_populate'
 level='16' host_id='bigmachine.local' host_addr='127.0.0.1'>
 <txt>c:0o:20434d:20434/St.RDBA: 0x5790 No.blks: 8
 </txt>
</msg>

It reveals a code location kdmlScanAndCreateCU, which probably means Kernel Data in-Memory for the first letters.

In most circumstances, and in my opinion, this information should be printed when some kind of debug flag is set, not ‘just because’, because writing these statistics costs time. I also don’t know if these files are maintained by ADR. If not, doing a lot of in-memory transactions can fill it up rather quickly.

Advertisements
2 comments
  1. Hi Frits,

    It seems you can also use
    alter session set events ‘trace[RDBMS.IM_populate]’;
    alter session set events ‘trace[RDBMS.IM_background]’;
    to reveal some detaila of th immemory population.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: