Every DBA working with the Oracle database must have seen memory dumps in tracefiles. It is present in ORA-600 (internal error) ORA-7445 (operating system error), system state dumps, process state dumps and a lot of other dumps.

This is how it looks likes:

Dump of memory from 0x00007F06BF9A9E00 to 0x00007F06BF9ADE00
7F06BF9A9E00 0000C215 0000001F 00000CC1 0401FFFF  [................]
7F06BF9A9E10 000032F3 00010003 00000002 442B0000  [.2............+D]
7F06BF9A9E20 2F415441 31323156 4F2F3230 4E494C4E  [ATA/V12102/ONLIN]
7F06BF9A9E30 474F4C45 6F72672F 315F7075 3735322E  [ELOG/group_1.257]
7F06BF9A9E40 3336382E 36313435 00003338 00000000  [.863541683......]
7F06BF9A9E50 00000000 00000000 00000000 00000000  [................]

The first column is the memory location in hexadecimal.
The second to fifth columns represent the actual memory values in hexadecimal.
The sixth column shows an ASCII representation of the memory contents. If a position does not represent an ASCII character, a dot (“.”) is printed.

Actually, the values in the second to fifth column are grouped in four columns. This is how the values in a column look like:
{hex val}{hex val}{hex val}{hex val}, for example: 00010203 means: 0, 1, 2, 3.

In the ASCII representation (sixth column) the spaces after every four values are not put in.

However, look at the following line:

7F06BF9A9E10 000032F3 00010003 00000002 442B0000  [.2............+D]

And focus on the last four characters:
“..+D” (two non-printables, plus, D)
Now look at the corresponding memory contents from the dump:
“442B0000″ This is: “44 2B 00 00″, which should correspond to “. . + D”.
There is something the matter here: the plus and the D seem to be represented by “00”. That’s not correct.

Let’s see what “442B0000″ actually represents in ASCI:

$ echo -e "\x44\x2B\x00\x00"

Ah! That looks backwards! Let’s take a full line and see what that gives:
(This is the line with memory address 0x7F06BF9A9E20)

$ echo -e "\x2F\x41\x54\x41 \x31\x32\x31\x56 \x4F\x2F\x32\x30 \x4E\x49\x4C\x4E"
/ATA 121V O/20 NILN

So if you want to look at the actual memory contents, you need to start with the column on the left side, read the values from right to left, then go the next column, etc.

Actual, I asked my friend Philippe Fierens for a trace file from a SPARC (big endian) platform, to see if the endianness of the platform was causing this. I test my stuff on Linux, which is little endian.

Here’s a little snippet:

Dump of memory from 0xFFFFFFFF7D977E00 to 0xFFFFFFFF7D97BE00
FFFFFFFF7D977E00 15C20000 00000001 00000000 00000104  [................]
FFFFFFFF7D977E10 F4250000 00000000 0B200400 E2EB8A3D  [.%....... .....=]
FFFFFFFF7D977E20 44475445 53540000 32F6D98B 00000590  [DGTEST..2.......]
FFFFFFFF7D977E30 00004000 00000001 00000000 00000000  [..@.............]
FFFFFFFF7D977E40 00000000 00000000 00000000 00000000  [................]

Let’s test the line from address 0xFFFFFFFF7D977E20:

[oracle@bigmachine [v12102] trace]$ echo -e "\x44\x47\x54\x45 \x53\x54\x00\x00 \x32\xF6\xD9\x8B \x00\x00\x05\x90"
DGTE ST 2� �

So, the endianness determines how the raw memory contents should be read.

This is the 4th post in a series of posts on PGA behaviour of Oracle. Earlier posts are: here (PGA limiting for Oracle 12), here (PGA limiting for Oracle 11.2) and the quiz on using PGA with AMM, into which this blogpost dives deeper.

As laid out in the quiz blogpost, I have a database with the following specifics:
-Oracle Linux x86_64 6u6.
-Oracle database PSU 4
-Oracle database (single instance) with the following parameter set: memory_target=1G. No other memory related parameters set.

In this setup, I run the pga_filler script (source code here), which creates a collection until the session statistic ‘session pga memory’ exceeds the grow_until variable, which for this case I set to 2100000000 (approximately 2.1G).

So: the instance is set to have AMM (memory_target) with a size of 1GB, which is supposed to be the total amount memory which this instance uses, and a session runs a PL/SQL procedure which only stops if it has allocated 2.1GB, which is clearly more than configured with the memory_target parameter. Please mind a collection, which the anonymous procedure uses to allocate memory, is outside of the memory areas for which Oracle can move data to the assigned temporary tablespace (sort, hash and bitmap memory areas).

After startup of the instance with only memory_target set to 1G, the memory partitioning looks like this:

SYS@v11204 AS SYSDBA> select component, current_size/power(1024,2), last_oper_type from v$memory_dynamic_components where current_size != 0;

---------------------------------------------------------------- -------------------------- -------------
shared pool										168 STATIC
large pool										  4 STATIC
java pool										  4 STATIC
SGA Target										612 STATIC
DEFAULT buffer cache									424 INITIALIZING
PGA Target										412 STATIC

This is how v$pgastat looks like:

SYS@v11204 AS SYSDBA> select * from v$pgastat;

NAME								      VALUE UNIT
---------------------------------------------------------------- ---------- ------------
aggregate PGA target parameter					  432013312 bytes
aggregate PGA auto target					  318200832 bytes
global memory bound						   86402048 bytes
total PGA inuse 						   78572544 bytes
total PGA allocated						   90871808 bytes
maximum PGA allocated						   93495296 bytes
total freeable PGA memory					    2818048 bytes
process count								 57
max processes count							 58
PGA memory freed back to OS					    3211264 bytes
total PGA used for auto workareas					  0 bytes
maximum PGA used for auto workareas					  0 bytes
total PGA used for manual workareas					  0 bytes
maximum PGA used for manual workareas					  0 bytes
over allocation count							  0
bytes processed 						    8479744 bytes
extra bytes read/written						  0 bytes
cache hit percentage							100 percent
recompute count (total) 						 18

SYS@v11204 AS SYSDBA> show parameter pga

------------------------------------ ----------- ------------------------------
pga_aggregate_target		     big integer 0

Okay, so far so good. v$memory_dynamic_components shows the PGA Target being 412M, and v$pgastat shows the aggregate PGA target setting being 412M too. I haven’t set pga_aggregate_target (as shown with ‘show parameter pga’), because I am using memory_target/AMM for the argument I hear the most in favour of it: one knob to tune.

Next up, I start the pga_filler script, which means the session starts to allocate PGA.

I keep a close watch using v$pgastat:

SYS@v11204 AS SYSDBA> select * from v$pgastat;

NAME								      VALUE UNIT
---------------------------------------------------------------- ---------- ------------
aggregate PGA target parameter					  432013312 bytes
aggregate PGA auto target					  124443648 bytes
global memory bound						   86402048 bytes
total PGA inuse 						  296896512 bytes
total PGA allocated						  313212928 bytes
maximum PGA allocated						  313212928 bytes

This shows the pga_filler script in progress by looking at v$pgastat from another session. The total amount of PGA allocated has grown to 313212928 (298M) here.

A little while later, the amount of PGA taken has grown beyond the PGA target (only relevant rows):

total PGA inuse 						  628974592 bytes
total PGA allocated						  645480448 bytes
maximum PGA allocated						  645480448 bytes

However, when looking at the memory components using v$memory_dynamic_components, it gives the impression PGA memory is still 412M:

SYS@v11204 AS SYSDBA> select component, current_size/power(1024,2), last_oper_type from v$memory_dynamic_components where current_size != 0;

---------------------------------------------------------------- -------------------------- -------------
shared pool										168 STATIC
large pool										  4 STATIC
java pool										  4 STATIC
SGA Target										612 STATIC
DEFAULT buffer cache									424 INITIALIZING
PGA Target										412 STATIC

You could argue PGA is explicitly mentioned as ‘PGA Target’, but then: the total of the memory area’s (PGA Target+SGA Target) do show a size that roughly sums up to be equal to the memory_target.

A little while later, this is what v$pgastat is showing:

total PGA inuse 						  991568896 bytes
total PGA allocated						 1008303104 bytes
maximum PGA allocated						 1008303104 bytes

Another glimpse at v$memory_dynamic_components shows the same output as above, PGA Target at 412M. This is the point where it get’s a bit weird: the total amount of PGA memory (according to v$pgastat) shows it’s almost 1G, memory_target is set at 1G, and yet v$memory_dynamic_components show no change at all.

Again a little further in time:

total PGA inuse 						 1325501440 bytes
total PGA allocated						 1342077952 bytes
maximum PGA allocated						 1342077952 bytes

Okay, here it get’s really strange: there’s more memory allocated for PGA memory alone than has been set with memory_target for both PGA and SGA memory structures. Also, v$memory_dynamic_components shows no change in SGA memory structures or exchange of memory from SGA to PGA memory.

If v$pgastat is correct, and memory_target actively limits the total amount of both SGA and PGA, then the session must allocate memory out of thin air! But I guess you already came to the conclusion too that either v$pgastat is incorrect, or memory_target does not limit memory allocations (as at least I think it would do).

Let’s dump the PGA heap of the active process to see the real memory allocations of this process:

SYS@v11204 AS SYSDBA> oradebug setospid 9041
Oracle pid: 58, Unix process pid: 9041, image: oracle@bigmachine.local (TNS V1-V3)
SYS@v11204 AS SYSDBA> oradebug unlimit
Statement processed.
SYS@v11204 AS SYSDBA> oradebug dump heapdump 1
Statement processed.

(9041 is the PID of the process running PL/SQL)

Now look into (the relevant) data of the PGA heap dump:

[oracle@bigmachine [v11204] trace]$ grep Total\ heap\ size v11204_ora_9041.trc
Total heap size    =1494712248
Total heap size    =    65512
Total heap size    =  1638184

Okay, this is clear: the process actually took 1494712248 (=1425M) plus a little more memory. So, memory_target isn’t that much of a hard setting after all.

But where does this memory come from? There ought to be a sort of combined memory effort together with the SGA for memory, right? That was the memory_target promise!

Let’s take a look at the actual memory allocations of a new foreground process in /proc/PID/maps:

[oracle@bigmachine [v11204] trace]$ less /proc/11405/maps
00400000-0bcf3000 r-xp 00000000 fc:02 405855559                          /u01/app/oracle/product/
0bef2000-0c0eb000 rw-p 0b8f2000 fc:02 405855559                          /u01/app/oracle/product/
0c0eb000-0c142000 rw-p 00000000 00:00 0
0c962000-0c9c6000 rw-p 00000000 00:00 0                                  [heap]
60000000-60001000 r--s 00000000 00:10 351997                             /dev/shm/ora_v11204_232652803_0
60001000-60400000 rw-s 00001000 00:10 351997                             /dev/shm/ora_v11204_232652803_0
9fc00000-a0000000 rw-s 00000000 00:10 352255                             /dev/shm/ora_v11204_232685572_252
a0000000-a0400000 rw-s 00000000 00:10 354306                             /dev/shm/ora_v11204_232718341_0
3bb3000000-3bb3020000 r-xp 00000000 fc:00 134595                         /lib64/
3bb321f000-3bb3220000 r--p 0001f000 fc:00 134595                         /lib64/
3bb3220000-3bb3221000 rw-p 00020000 fc:00 134595                         /lib64/
3bb3221000-3bb3222000 rw-p 00000000 00:00 0
3bb3400000-3bb3401000 r-xp 00000000 fc:00 146311                         /lib64/
3bb5e16000-3bb5e17000 rw-p 00016000 fc:00 150740                         /lib64/
3bb5e17000-3bb5e19000 rw-p 00000000 00:00 0
7f018415a000-7f018416a000 rw-p 00000000 00:05 1030                       /dev/zero
7f018416a000-7f018417a000 rw-p 00000000 00:05 1030                       /dev/zero
7f018417a000-7f018418a000 rw-p 00000000 00:05 1030                       /dev/zero
7f018418a000-7f018419a000 rw-p 00000000 00:05 1030                       /dev/zero
7f018419a000-7f01841aa000 rw-p 00000000 00:05 1030                       /dev/zero
7f01841aa000-7f01841ba000 rw-p 00000000 00:05 1030                       /dev/zero
7f01841ba000-7f01841ca000 rw-p 00000000 00:05 1030                       /dev/zero
7f01841ca000-7f01841da000 rw-p 00000000 00:05 1030                       /dev/zero
7f01841da000-7f01841ea000 rw-p 00000000 00:05 1030                       /dev/zero
7f01841ea000-7f01841fa000 rw-p 00000000 00:05 1030                       /dev/zero
7f01841fa000-7f018420a000 rw-p 00000000 00:05 1030                       /dev/zero
7f018420a000-7f018421a000 rw-p 00000000 00:05 1030                       /dev/zero
7f018421a000-7f018422a000 rw-p 00000000 00:05 1030                       /dev/zero
7f68d497b000-7f68d4985000 r-xp 00000000 fc:02 268585089                  /u01/app/oracle/product/

When I run the pga_filler anonymous PL/SQL block, and strace (system call trace) utility, I see (snippet):

mmap(0x7f0194f7a000, 65536, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED, 6, 0) = 0x7f0194f7a000
mmap(0x7f0194f8a000, 65536, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED, 6, 0) = 0x7f0194f8a000
mmap(0x7f0194f9a000, 65536, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED, 6, 0) = 0x7f0194f9a000
mmap(0x7f0194faa000, 65536, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED, 6, 0) = 0x7f0194faa000
mmap(0x7f0194fba000, 65536, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED, 6, 0) = 0x7f0194fba000
mmap(0x7f0194fca000, 65536, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED, 6, 0) = 0x7f0194fca000
mmap(0x7f0194fda000, 65536, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED, 6, 0) = 0x7f0194fda000
mmap(NULL, 1048576, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_NORESERVE, 6, 0xea000) = 0x7f0194e6a000
mmap(0x7f0194e6a000, 65536, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED, 6, 0) = 0x7f0194e6a000
mmap(0x7f0194e7a000, 131072, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED, 6, 0) = 0x7f0194e7a000
mmap(0x7f0194e9a000, 131072, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED, 6, 0) = 0x7f0194e9a000
mmap(0x7f0194eba000, 131072, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED, 6, 0) = 0x7f0194eba000

So, when looking back, it’s very easy to spot the SGA memory, which resides in /dev/shm in my case, and looks like ‘/dev/shm/ora_v11204_232652803_0′ in the above /proc/PID/maps snippet.
This means that the mmap() calls are simply, as anyone would have guessed by now, the PGA memory allocations. In the maps snippet these are visible as being mapped to /dev/zero.
When looking at the mmap() call, at the 5th argument, which is the number 6, we look at a file descriptor. In /proc/PID/fd the file descriptors can be seen, and file descriptor 6 is /dev/zero, as you probably suspected. This way the allocated memory is initial set to zero.

By now, the pga_filler script finishes:

TS@v11204 > @pga_filler
begin pga size : 3908792
last  pga size : 2100012216
begin uga size : 1607440
last  uga size : 2000368
parameter pat  : 0

Taking the entire 2.1G I made the collection to grow to. With memory_target set to 1G.

The first conclusion I made is that PGA memory is very much different than SGA/shared memory. Anyone with a background in Oracle operating-system troubleshooting will find this quite logical. However, the “promise” AMM/memory_target made, in my interpretation, is that the memory would be used seamless. This is simply not the case. Shared memory is in /dev/shm, and PGA is mmaped/allocated as private memory.

Still, this wouldn’t be that much of an issue if memory_target would limit memory in a rigid way, and memory could, and actually would, very easily float between PGA and SGA. It simply doesn’t.

Why don’t we see Oracle trying to reallocate memory? This is the point where I can only guess.

– Probably, Oracle would try to grow the shared pool if it has problems allocating memory for SQL, library cache, etc. This probably hasn’t happened in my test.
– Probably, Oracle would try to grow the buffer cache if it can calculate a certain benefit from enlarging it. This probably hasn’t happened in my test.
– The other SGA area’s (large and java pool) probably are grown if these are used, and need more space for allocations. This probably didn’t happen in my test.
– For the PGA, a wild guess is the memory manager calculates using the workarea sizes (sort, hash and bitmap areas), which are not noticeably used in my test.

Another conclusion and opinion is AMM/memory_target is not a set once and forget option. In fact, it isn’t that much of a difference from using ASMM from a DBA perspective: you carefully need to understand the SGA size, and you carefully need to (try to) manage the PGA memory. Or reasoned the other way around: the only way you can sensibly set memory_target is if you know the correct SGA size and the PGA usage. Also having Oracle manage the memory area’s automatically is not unique to AMM: Oracle will reallocate (inside the SGA) if it finds it necessary, with AMM, ASMM and even manual set memory area’s. But the big dis-advantage of AMM (at least on linux, not sure about other operating systems) is that huge pages can’t be used, which has a severe impact on “real life” databases, in my experience. (Solaris CAN use huge pages with AMM(!)).

A final word: of course I tested a very specific situation. In most real-life cases there will be multiple sessions, and the PGA manageable memory areas will be used. However, the point I try to make is memory_target is simply not a way to very easily make your database be hard limited to the value set. Probably, in real life, the real amount of memory used by the instance will in the area of the value set with memory_target, but this will be subject to what memory areas you are exactly using. Of course it can differ in a spectaculair way if collections or alike structures are used by a large number of sessions.

This is a series of blogposts on how the Oracle database makes use of PGA. Earlier posts can be found here (PGA limiting for Oracle 12) and here (PGA limiting for Oracle 11.2).

Today a little wednesday fun: a quiz.

What do you think will happen in the following situation (leave a response as comment please!):

-Oracle Linux x86_64 6u6.
-Oracle database PSU 4
-Oracle database (single instance) with the following parameter set: memory_target=1G. No other memory related parameters set.

Run the pga_filler script (which can be found here (PGA limiting for Oracle 12)), with grow_until set to 2100000000 (approximately 2.1G).

I’ll try to create a blogpost on the outcome and an explanation on short notice!

This is the second part of a series of blogpost on Oracle database PGA usage. See the first part here. The first part described SGA and PGA usage, their distinction (SGA being static, PGA being variable), the problem (no limitation for PGA allocations outside of sort, hash and bitmap memory), a resolution for Oracle 12 (PGA_AGGREGATE_LIMIT), and some specifics about that (it doesn’t look like a very hard limit).

But this leaves out Oracle version 11.2. In reality, the vast majority of the database that I deal with at the time of writing is at version 11.2, and my guess is that this is not just the databases I deal with, but a general tendency. This could change in the coming time with the desupport of Oracle 11.2, however I suspect the installed base of Oracle version 12 to increase gradually and smoothly instead of in a big bang.

With version 11.2 there’s no PGA_AGGREGATE_LIMIT. This simply means there is no official way to limit the PGA. Full stop. However, there is an undocumented event to limit PGA usage: event 10261. This means that if you want to use this in a production database, you should ask Oracle support to bless the usage of it. On the other hand, Oracle corporation made this event public in an official white paper: Exadata consolidation best practices.

Let’s test event 10261! I’ve got the same table (T2) setup, a description how to set this up, and the anonymous PL/SQL code to allocate PGA using a collection is in the first part. I am using a database version with PSU 4 applied. The reason for choosing this version is that if you run a serious business on Oracle 11.2, THAT should be the version you should be running on!
(disclaimer: everything shown in this blogpost is purely for educational purposes. Do test everything thoroughly before applying this to a production system. Behaviour can or may be different in your specific situation)
The reason for this disclaimer: Bernhard (@bdcbuning_gridit) tweeted that he was warned that when setting it at the instance level, it could crash the instance. I am not sure if this means setting it at runtime, this event is always evaluated at the instance level.

Okay, let’s replicate more or less the test done to Oracle version in the first part. In this database PGA_AGGREGATE_SIZE is set to 500M, now let’s try to set the event to 600M, which means we set the PGA limit to 600M:
This is setting the event on runtime:

SYS@v11204 AS SYSDBA> alter system set events = '10261 trace name context forever, level 600000';

System altered.

This is setting the event in the spfile (which means you need a restart of the instance to activate this event, or the above syntax to set it on runtime):

SYS@v11204 AS SYSDBA> alter system set event = '10261 trace name context forever, level 600000' scope=spfile;

System altered.

The level is the amount of memory to which the PGA must be limited, in kilobytes.

Now start the anonymous PL/SQL block to fill up the PGA with a collection, again set to 900M:

TS@v11204 > @pga_filler
ERROR at line 1:
ORA-10260: limit size (600000) of the PGA heap set by event 10261 exceeded
ORA-06512: at line 20

That’s nice! There’s actually a meaningful, describing error message which explains why this PL/SQL block ended!

Let’s look at the actual PGA memory used, as reported by v$pgastat:

SYS@v11204 AS SYSDBA> select value/power(1024,2) from v$pgastat where name = 'maximum PGA allocated';


This is different than setting PGA_AGGREGATE_LIMIT, however there’s still more memory allocated than set as the limit (600000KB), but lesser (676M in versus 1041M in The outside visibility of the limiting happening is different too: there is NO notice of a process hitting the PGA limit set in the alert.log file nor the process’ trace file(!). Another difference is even SYS is limited, a test with the procedure running as SYS gotten me the ORA-10260 too, PGA_AGGREGATE_LIMIT does not limit SYS.

Event 10261 has got the same description to at least as low as version Here’s a test with with the event 10261 set at version to 600M:

TS@v11203 > @pga_filler
ERROR at line 1:
ORA-00600: internal error code, arguments: [723], [123552], [top uga heap], [], [], [], [], [], [], [], [], []
ORA-06512: at line 20

As has been detailed in the Oracle white paper, prior to version, an ORA-600 [723] is signalled when event 10261 is set, and more PGA memory is allocated as has been specified as limit. The amount of total allocated PGA is 677M, so roughly the same as with version

Because this is a genuine ORA-600 (internal error, ‘OERI’), this gives messages in the alert.log file:

Tue Dec 16 10:40:09 2014
Errors in file /u01/app/oracle/diag/rdbms/v11203/v11203/trace/v11203_ora_8963.trc  (incident=9279):
ORA-00600: internal error code, arguments: [723], [123552], [top uga heap], [], [], [], [], [], [], [], [], []
Incident details in: /u01/app/oracle/diag/rdbms/v11203/v11203/incident/incdir_9279/v11203_ora_8963_i9279.trc
Use ADRCI or Support Workbench to package the incident.
See Note 411.1 at My Oracle Support for error and packaging details.

The process’ trace file in the trace directory only points to the incident file, no further details are available there.
The incident trace file contains a complete diagnostics dump.

The behaviour is identical with Oracle

The limiting of the total amount of PGA memory used must be done using an undocumented event prior to Oracle version 12. The event is 10261. The event is made known in an official white paper. Still I would open a service request with Oracle to ask blessing for setting this. This does not mean this functionality is not needed, I would deem it highly important in almost any environment, even when running a single database: this setting, when done appropriately, protects your system from over allocating memory, which could mean entering the swapping death-spiral. The protection means a process gets an ORA message, and the PGA allocation aborted and deallocated.

With version hitting the limit as set with event 10261 is not published, outside of the process getting the ORA-10260.

With versions prior to ( and verified) processes do get an ORA-600 [723], which is also visible in the alert.log, and incidents are created accordingly.

When a limit has been set using event 10261, it still means more memory is allocated than set as limit (approximately 677M when 600M is set), but this is way less than with the PGA_AGGREGATE_LIMIT (1041M when 600M is set) in my specific situation. Test this in your own environment when you start using this.

Important addendum:
A very good comment to emphasise on the behaviour of using/setting event 10261 by Alexander Sidorov: this event sets a limit per process, not for the entire instance!! (tested with and

This post is about memory management on the operating system level of an Oracle database. The first question that might pop in your head is: isn’t this a solved problem? The answer is: yes, if you use Oracle’s AMM (Automatic Memory Management) feature, which let’s you set a limit for the Oracle datababase’s two main memory area’s: SGA and PGA. But in my opinion any serious, real life, usage of an Oracle database on Linux will be (severely) constrained in performance because of the lack of huge pages with AMM, and I personally witnessed very strange behaviour and process deaths with the AMM feature and high demand for memory.

This means that I strongly advise customers to use Oracle’s ASMM (Automatic Shared Memory Management) feature. In the newer versions of 11.2 I found this to be working very well. Earlier versions like 10.2 could suffer from an ever growing shared pool (which also means an ever shrinking buffer cache), especially when bind variables weren’t used. This still could happen, but it seems the SGA memory management feature in 11.2 handles this well in most cases. The ASMM feature means a fixed memory area is allocated for the SGA. SGA allocation has always been fixed outside of the AMM feature, as far as I know.

When ASMM doesn’t work, meaning the memory areas are getting sized wrong and performance is influenced by that, the last option is to size the memory area’s yourself. However, since version Oracle will resize when the memory manager thinks it’s feasible. See Kurt van Meerbeek’s article about that.

That leaves the PGA (Process Global Area) as a memory area on itself. Most databases are using the automatic PGA memory management, which is enabled once the PGA_AGGREGATE_TARGET parameter is set to a non zero value. A common misunderstanding is this setting is actually limiting the overall PGA usage of an instance. The truth is automatic PGA memory management will make attempts to adhere to the PGA_AGGREGATE_TARGET value. These are the actual words in the official Oracle documentation: ‘attempts to adhere’!

This means sort memory, hash memory and bitmap memory will be actively limited in size per process by automatic PGA memory management, any attempt to allocate more than automatic PGA memory management allows will result in moving some contents of these memory areas to the assigned temporary tablespace of the database user, to make room for new data.

However, there are more memory area’s allocatable per process, which are never swapped to disk, thus always will stay in memory, and these could not be limited in an officially supported way prior to Oracle version 12. Two structures which are allocated in PGA and never swapped to disk are PL/SQL collections and PL/SQL tables. Creating and filling these requires the usage of PL/SQL (hence their names); the reason for mentioning this is that if your database is not used by PL/SQL but only SQL, you almost certainly will not run into the problem I describe below.

You might be thinking: wait a minute! Does this mean a developer can just create such a structure, and allocate whatever he/she likes, with all the consequences that it can have, like the operating system starting to swap, and can do that for every single process? Yes, this is what this means. This is why Oracle introduced a parameter called PGA_AGGREGATE_LIMIT with Oracle 12, to effectively limit the overall PGA heap size.

In case you wonder what this means, or even doubting my words, I have written a little program to demonstrate this behaviour.

This is the source code to create my test table T2:

exec dbms_random.seed('abracadabra');
create table t2
with generator as (
    select      rownum      id
    from        dual
    connect by
                rownum <= 1000
    rownum                                                id,
    trunc((rownum-1)/50)                            clustered,
    mod(rownum,20000)                               scattered,
    trunc(dbms_random.value(0,20000))               randomized,
    trunc(sysdate) + dbms_random.value(-180, 180)   random_date,
    dbms_random.string('l',6)                       random_string,
    lpad(rownum,10,0)                               vc_small,
    rpad('x',100,'x')                               vc_padding
    generator   g1,
    generator   g2
    rownum <= 1000000
exec dbms_stats.gather_table_stats(null,'T2');

This is a very smart way to generate a table. I actually borrowed this from Jonathan Lewis.

Next up, I created a small anonymous PL/SQL block to take the contents from the T2 table, and store them in a collection until I hit the limit in the variable ‘grow_until’.

	type sourcetab is table of t2%ROWTYPE;
	c_tmp		sourcetab;
	c_def		sourcetab	:= sourcetab();
	v_b_p		number		:= 0;
	v_c_p		number		:= 0;
	v_b_u		number		:= 0;
	v_c_u		number		:= 0;
	grow_until	number		:= 700000000;
	p_a_t		number;
	select value into v_b_p from v$mystat m, v$statname n where m.statistic#=n.statistic# and name = 'session pga memory max';
	select value into v_b_u from v$mystat m, v$statname n where m.statistic#=n.statistic# and name = 'session uga memory max';
	select value into p_a_t from v$parameter where name = 'pga_aggregate_target';
	select * bulk collect into c_tmp from t2;
	while v_c_p < grow_until loop
		for c in c_tmp.first .. c_tmp.last loop
			c_def(c_def.last) := c_tmp(c);
			select value into v_c_p from v$mystat m, v$statname n where m.statistic#=n.statistic# and name = 'session pga memory max';
			select value into v_c_u from v$mystat m, v$statname n where m.statistic#=n.statistic# and name = 'session uga memory max';
			if v_c_p >= grow_until then
			end if;
		end loop;
	end loop;
	dbms_output.put_line('vbp : '||v_b_p);
	dbms_output.put_line('vcp : '||v_c_p);
	dbms_output.put_line('vbu : '||v_b_u);
	dbms_output.put_line('vcu : '||v_c_u);
	dbms_output.put_line('pat : '||p_a_t);

Please mind the session needs to have create table, create session granted, enough quota in the default tablespace and select on v_$mystat, v_$parameter and v_$statname granted.

This is run on an Oracle database:

TS@v12102 > @pga_filler
vbp : 3535368
vcp : 700051976
vbu : 1103192
vcu : 4755704
pat : 524288000

PL/SQL procedure successfully completed.

The begin sizes of the UGA (vbu) and PGA (vbp) are 1’103’192 and 3’535’368. The PGA_AGGREGATE_TARGET size is set to 524’288’000 (500MB). I did set the grow_until variable to 700’000’000 (roughly 700MB), which is more than PGA_AGGREGATE_TARGET. After running this, it’s easy to spot the values of vcu (UGA allocation) and vcp (PGA allocation). vcu grew to 4’755’704 during the run, however vcp grew to 700’051’976, a little more than 700MB! This shows that the collection is stored in the PGA, and that the collection grew beyond the value set with PGA_AGGREGATE_TARGET.

This behaviour is consistent in versions,,, and

Let me emphasise once again that the above proof of concept code managed to allocate more memory than was set for the overall PGA usage of the entire instance. This can have an enormous, devastating impact on a consolidated database setup (meaning having multiple instances running on a single machine). Typically, once memory consumption of all the processes exceeds physically available memory, the operating system tries to use the swap device, to which it will swap memory pages in and out depending on memory usage of active (=on CPU) processes. Mild swapping shows as severely slowed-down processing (because a number of memory pages for processing need to be read from the swap device and placed in memory, from which the former contents need to be written to the swap device), heavy swapping shows as the machine coming down to a standstill.

Please mind that a diagnosis on the state of memory usage (alias swapping), just by looking at the amount of used swap (as can be seen in the ‘top’ output, or ‘swapon -s’) could be misleading. It’s also important to look at actual swapping in and out, as can be seen with ‘vmstat 1′ (si/so columns) or swap -W. I’ve found several systems which had been running for some time (approximately longer than a month) that had swap usage, sometimes up to 40%, while no ‘active swapping’, so memory pages being transfered to and from the swap device, was happening.

Luckily, starting with Oracle 12 you can actually limit overall PGA usage using the parameter PGA_AGGREGATE_LIMIT. The default value is the greater of (list from Oracle documentation):
a) 2GB
b) 200% of PGA_AGGREGATE_TARGET parameter (or lower if 200% > (90% of physical memory – total SGA size) but not below 100%)
c) 3MB * PROCESSES parameter
The parameter can not set below it’s default value, except when set in a pfile or spfile.

Let’s set the PGA_AGGREGATE_LIMIT to 600MB and see what happens when we start doing a large allocation again:

SQL> alter system set pga_aggregate_limit=600m scope=spfile;

System altered.

SQL> startup force;

Okay, let’s run the pga_filler.sql script again, and try to allocate 900MB. This means the “grow_until” variable must be set to 900000000.
PLEASE MIND this is done as a regular user, the SYS user and background processes other than job queue processes are not subject to the limiting.

TS@v12102 > @pga_filler
ERROR at line 1:
ORA-01423: error encountered while checking for extra rows in exact fetch
ORA-00039: error during periodic action
ORA-04036: PGA memory used by the instance exceeds PGA_AGGREGATE_LIMIT
ORA-06512: at line 21

Great! Exactly like we expect, right?
Well…yes, but let’s look at the alert.log

Sat Dec 13 15:08:57 2014
Errors in file /u01/app/oracle/diag/rdbms/v12102/v12102/trace/v12102_ora_4147.trc  (incident=46599):
ORA-04036: PGA memory used by the instance exceeds PGA_AGGREGATE_LIMIT
Incident details in: /u01/app/oracle/diag/rdbms/v12102/v12102/incident/incdir_46599/v12102_ora_4147_i46599.trc
Sat Dec 13 15:09:07 2014
Dumping diagnostic data in directory=[cdmp_20141213150907], requested by (instance=1, osid=4147), summary=[incident=46599].
Sat Dec 13 15:09:09 2014
Sweep [inc][46599]: completed
Sweep [inc2][46599]: completed

Okay, essentially, this tells us nothing interesting, except for the tracefile. Let’s look in/u01/app/oracle/diag/rdbms/v12102/v12102/trace/v12102_ora_4147.trc, being the tracefile as indicated in the above alert.log snippet:

*** 2014-12-13 15:08:57.351
Process may have gone over pga_aggregate_limit
Just allocated 65536 bytes
Dumping short stack in preparation for potential ORA-4036
----- Abridged Call Stack Trace -----
----- End of Abridged Call Stack Trace -----
781 MB total:
   781 MB commented, 646 KB permanent
   208 KB free (0 KB in empty extents),
     779 MB,   2 heaps:   "koh-kghu call  "            57 KB free held
Summary of subheaps at depth 1
779 MB total:
   778 MB commented, 110 KB permanent
    63 KB free (0 KB in empty extents),
     667 MB, 42786 chunks:  "pmuccst: adt/record       "
      83 MB, 5333 chunks:  "pl/sql vc2                "

Actually, this is the end of the tracefile. It seems that the pga limit dump (the text in between “Process may have gone over pga_aggregate_limit” to the private memory summary heap dumps) occurs several times before an actual ORA-4036 is triggered. In my private test instance, where I am obviously the only user process doing something, I get a pga limit dump approximately 20 times before the ORA-4036 is actually triggered:

sending 4036 interrupt
Incident 46599 created, dump file: /u01/app/oracle/diag/rdbms/v12102/v12102/incident/incdir_46599/v12102_ora_4147_i46599.trc
ORA-04036: PGA memory used by the instance exceeds PGA_AGGREGATE_LIMIT

Did you actually spot the oddity here?

Remember the PGA_AGGREGATE_LIMIT was set to 600M. Now look at the process’ PGA/Private heap summary dump above: it says 781M. Please mind the 781M is the PGA heap of a SINGLE process! When looking at the total PGA allocated for the entire instance, it’s even more:

SYS@v12102 AS SYSDBA> select value/power(1024,2) "MB" from v$pgastat where name = 'maximum PGA allocated';


So…despite PGA_AGGREGATE_LIMIT set to 600M, according to the v$pgastat view, there’s 1041MB allocated for PGA. Please mind I haven’t looked into how accurate v$pgastat is, but I tend to believe this.

I’ve seen PGA_AGGREGATE_TARGET being used as a calculation value for actual PGA usage of an instance. This is simply wrong. The actual amount of PGA memory allocated by the instance is highly depended on what is done, and can be less than PGA_AGGREGATE_TARGET, or more. Automatic PGA can control three per process memory area’s: the sort, hash and bitmap memory area’s. These are sized based on the setting of PGA_AGGREGATE_TARGET and the actual PGA memory usage instance wide. If more memory is needed for sort, hash or bitmap memory than is made available by the memory manager, excess memory needed is allocated in the temporary tablespace. Any other PGA memory allocation is always done, regardless of the setting of PGA_AGGREGATE_TARGET.

Starting with Oracle 12, it seems the actual PGA allocation now can actually be limited with the new parameter PGA_AGGREGATE_LIMIT. However, during some simple testing it shows that actually more memory is allocated than set with PGA_AGGREGATE_LIMIT as limit. I haven’t tested it in more situations, this post is meant to grow awareness that the actual limit as set by PGA_AGGREGATE_LIMIT might not be that hard as you would expect.

Please mind, PGA_AGGREGATE_LIMIT seems to truly limit PGA usage instance wide, not limit the PGA heap per process, as event 10251 (PGA usage limiting way for Oracle 11.2) does. However, once again: PGA_AGGREGATE_LIMIT seems to try to be smart and actually does not limit at the exact size set, but beyond that.

The next post will introduce a way to limit PGA usage in Oracle 11.2. Stay tuned!

I posted a fair amount of stuff on how Oracle is generating IOs, and especially large IOs, meaning more than one Oracle block, so > 8KB. This is typically what is happening when the Oracle database is executing a row source which does a full segment scan. Let’s start off with a quiz: what you think Oracle is the maximum IO size the Oracle engine is capable of requesting of the Operating System (so the IO size as can be seen at the SCI (system call interface) layer? If you made up your answer, remember it, and read on!

The real intention of this blogpost is to describe what is going on in the Oracle database kernel, but also what is being done in the Linux kernel. Being a performance specialised Oracle DBA means you have to understand what the operating system does. I often see that it’s of the utmost importance to understand how an IO ends up as a request at the NAS or SAN head, so you understand what a storage admin is talking about.

Many people (including myself in the past) would state that the maximum IO size on Linux is 1MB. For the Linux 2.6 kernel and higher this statement is incorrect because there is no such thing as a single maximum IO size on Linux. There used to be one in the Linux 2.4 era, which was set with the maxphys parameter, but that time is long gone. In order to find out what it is now, let’s test and see!

First let’s get a Linux system and a big table!
The system I got is a VMWare Fusion VM, running Linux 3.8.13-44.1.5.el6uek.x86_64 (UEK3) on Oracle Linux 6u6. On top of that I am using the Oracle database and grid infrastructure version This system is using udev for providing disk access (as opposed to asmlib)
The redundancy mode of ASM is external, although for reading (what I will be covering) this doesn’t matter.

Now let’s get a normal database session, and use a combination of sql_trace with waits (10046/8) and strace to see how the Oracle database interfaces with the kernel. Please mind I’ve prepared a heap table with no indexes on it, so a count(*) on it always will result in a full table scan. Also, the buffercache is sized small enough (or the table is created large enough, it depends on how you look at it) to have the session make the decision to do a direct path read, instead of a buffered read. If you don’t know what that means: please search this blog on direct path reads, or even better, download my presentation ‘about multiblock reads’.

The direct path read decision is visible via the ‘direct path read’ wait event. If you get a full table scan operation and see ‘db file scattered read’ waits, you are doing a buffered scan.

In the most cases, you will get a maximum value of 1MB if possible, which seems to support the generally assumed 1MB maximum operating system induced IO size. Why? Well, because you probably set the DB_FILE_MULTIBLOCK_READ_COUNT parameter to 128, which means you have explicitly set the Oracle process not to do IO with a size more than 1MB (8192*128). In this blogpost, I explain that Oracle can request IOs bigger than 1MB.

In the blogpost series on extra huge database IOs, I show that Oracle can do huge (1MB+) IOs, but the physical request size (what Oracle actually requests at the SCI layer, visible with the pread/pwrite/io_submit/io_getevents functions) still is 1MB. This limit is imposed by the physical storage structure which the database uses with ASM, called allocation unit (often called ‘AU’). The default size of an allocation unit is 1MB. The allocation unit can be seen in both the database and the ASM instance with the following query:

SYS@+ASM AS SYSASM> select name, allocation_unit_size from v$asm_diskgroup;

------------------------------ --------------------
DATA					    1048576

How about doing an unbuffered read on a filesystem? I’ve created a database on an (XFS, but this doesn’t matter AFAIK) filesystem, and tried to set the maximum value to DB_FILE_MULTIBLOCK_READ_COUNT. I’ve done this by setting DB_FILE_MULTIBLOCK_READ_COUNT to 10000 (ten thousand), and then bounce the database to see what the number has become. In my case, the value became 4096. I think this is the limit for Oracle on Linux x86_64, but love to hear if you have gotten different results:

I set 10000:

SYS@fv12102 AS SYSDBA> select name, value from v$spparameter where name like 'db_file_multiblock%';

NAME						   VALUE
-------------------------------------------------- ----------------------------------------------------------------------
db_file_multiblock_read_count			   10000

But Oracle limits this to 4096:

SYS@fv12102 AS SYSDBA> select name, value from v$parameter where name like 'db_file_multiblock%';

NAME						   VALUE
-------------------------------------------------- ----------------------------------------------------------------------
db_file_multiblock_read_count			   4096

Okay. Let’s start our investigation at that point: a database which is set up with a DB_FILE_MULTIBLOCK_READ_COUNT set to 4096, alias 32MB (with a block size of 8KB), and a table which got extents large enough to accommodate huge (32MB) IOs.

Fire up a session regular database session, and enable sql trace at level 8:

$ sqlplus ts/ts@//localhost/v11204
SQL> alter session set events 'sql_trace level 8';

Now start another session as root on the database server, and find the PID of the server process of the sqlplus process we just created above. Issue strace with verbose writing setting:

# strace -e write=all -e all -p PID
Process PID attached - interrupt to quit

Okay, we are setup and ready to go, but there is one additional thing: the way direct path reads work, they would probably give little waits with fast IO capabilities. One way to get the waits back, is to limit the IO capabilities of the process. Doing so is documented in this article.

Now issue the full table scan on a large table in sqlplus while strace is attached:

SQL> select count(*) from bigtab;

Now take a peek at the strace output!
The output first shows IOs as we expect:

io_getevents(139717184229376, 1, 128, {{0x7f126dd3d780, 0x7f126dd3d780, 33554432, 0}}, {600, 0}) = 1
times(NULL)                             = 431386800
write(7, "\n*** 2014-11-24 13:09:28.028\n", 29) = 29
 | 00000  0a 2a 2a 2a 20 32 30 31  34 2d 31 31 2d 32 34 20  .*** 201 4-11-24  |
 | 00010  31 33 3a 30 39 3a 32 38  2e 30 32 38 0a           13:09:28 .028.    |
lseek(7, 0, SEEK_CUR)                   = 31181
write(7, "WAIT #139717129509840: nam='dire"..., 130) = 130
 | 00000  57 41 49 54 20 23 31 33  39 37 31 37 31 32 39 35  WAIT #13 97171295 |
 | 00010  30 39 38 34 30 3a 20 6e  61 6d 3d 27 64 69 72 65  09840: n am='dire |
 | 00020  63 74 20 70 61 74 68 20  72 65 61 64 27 20 65 6c  ct path  read' el |
 | 00030  61 3d 20 33 39 30 37 33  30 20 66 69 6c 65 20 6e  a= 39073 0 file n |
 | 00040  75 6d 62 65 72 3d 34 20  66 69 72 73 74 20 64 62  umber=4  first db |
 | 00050  61 3d 37 34 31 33 37 36  20 62 6c 6f 63 6b 20 63  a=741376  block c |
 | 00060  6e 74 3d 34 30 39 36 20  6f 62 6a 23 3d 32 30 34  nt=4096  obj#=204 |
 | 00070  37 34 20 74 69 6d 3d 31  39 32 30 30 37 31 30 31  74 tim=1 92007101 |
 | 00080  39 39                                             99                |

What is visible here, is first the reap of an I/O request (with asynchronous IO on Linux this is typically the io_getevents() call). If you take a close look at the arguments of the io_getevents() call (taken from the manpage of io_getevents):

int io_getevents(aio_context_t ctx_id, long min_nr, long nr, struct io_event *events, struct timespec *timeout);

And then focus on the struct io_event:

struct io_event {
         __u64           data;           /* the data field from the iocb */
         __u64           obj;            /* what iocb this event came from */
         __s64           res;            /* result code for this event */
         __s64           res2;           /* secondary result */

The above description is taken from the annotated Linux kernel source, as available here: I use this site for navigating the Linux kernel source. What is important, is that the third field (io_event.res) contains the size of the IO request. Having learned this, now look again in the io_getevents call. The size of the IO reaped above is 33554432, which is 33554432/1024/1024=32 MB. Yes, that’s a single IO of 32MB! Also, this is consistent with the wait line a little lower:

 | 00050  61 3d 37 34 31 33 37 36  20 62 6c 6f 63 6b 20 63  a=741376  block c |
 | 00060  6e 74 3d 34 30 39 36 20  6f 62 6a 23 3d 32 30 34  nt=4096  obj#=204 |

Block count = 4096 * 8192 (block size) = 33554432

So, I wonder what you thought was possible, the correct answer on my operating system (Linux x86_64) with Oracle is 32MB. It turned out the big IOs in the ASM case were limited by the allocation unit size of 1MB.

The next thing I’ve wondered is how this matches with the maximum IO size of the disk devices as visible by the Operating System. You can request 32MB, but a normal SCSI disk doesn’t do 32MB IOs. Of course in my case the SCSI disk really is a VMWare virtual disk device.

Let’s keep the 32MB IO in mind, now dive from the top layer, the SCI (system call interface) where an IO enters the kernel to the bottom of the kernel from an IO perspective, to the block device. The block device settings are found in /sys/block/DEVICE/queue. The maximum IO size the device is capable of is found in max_hw_sectors_kb. This is in kilobytes, and read only (can’t change hardware, right?). In my case this is:

[root@bigmachine queue]# cat max_hw_sectors_kb

My disk supports a maximum of 4M for an IO size! But this is not what is used, the actual setting is in max_sectors_kb:

[root@bigmachine queue]# cat max_sectors_kb

That’s half a megabyte!

So…we got (up to) 32MB sized IO requests coming in, and a device that is set to 512KB IOs. This means that somewhere between the SCI and the device, there is a mechanism to scatter the request size to the device’s maximum IO size, and once the IO requests are done, going back to gather the IO results to the original request.

There are a couple of layers in the Linux kernel through which the call travels (including common functions):

-SCI/system call interface: system_call, sys_io_submit…. (io_submit, do_io_submit, io_submit_one; these seem to be in the VFS layer)
-VFS/virtual filesystem: aio_run_iocb, do_aio_read, xfs_file_read_iter, generic_file_read_iter, xfs_vm_direct_IO, bio_*, kiocb_batch_refill
-Block layer: blk_finish_plug, blk_flush_plug_list, queue_unplugged, __blk_run_queue, blk_run_queue
-SCSI layer: scsi_*
-Device driver: mptspi_qcmd, mptscsih_qcmd, mpt_put_msg_frame

(note: there seems to be consensus the above mentioned layers exist, although there is different wording and different numbers by different sources. Also, there doesn’t seem to be a very clear description of what is done by which layer, and what typically defines a kernel layer. For some functions it is clear they belong to a certain layer (for example aio_run_iocb in Linux/fs/aio.c, bulk_finish_plug in Linux/block/blk-core.c, etc.), for some layers, like the SCI layer, it seems there isn’t a clear layer definition by looking at where the function is defined. Also please mind the SCSI layer is implemented as a driver, just like the actual device driver for the hardware. This is very understandable, but makes it a bit harder to see it in a layered way)

System Call Interface (SCI)
The request enters kernel space via the SCI. The function of the SCI is to elevate a process to system priority to perform a kernel mode task, like (but not limited to) doing I/O. The system call implementation on Linux makes use of a wrapper function in glibc, which executes the system call on behalf of the user systemcall request. The reason for mentioning this, is that sometimes the glibc wrapper “hides” the real system call, for example calling the semtimedop() function:

(gdb) break semtimedop
Breakpoint 1 at 0x3bb38eb090: file ../sysdeps/unix/syscall-template.S, line 82.
(gdb) c

Breakpoint 1, semtimedop () at ../sysdeps/unix/syscall-template.S:82

Above is a gdb (GNU debugger) session which attaches to an Oracle background process, which I know is sleeping in the system call semtimedop() when idle, A breakpoint is set on the semtimedop function, and the execution of the attached process is resumed. It then breaks on the function, showing the source code at which the break happened. Instead of showing the actual semtimedop function, it shows the pseudo function in glibc which wraps this system call. This hides the arguments of calling the semtimedop() function. My current workaround is to read the kernel registers which “carry” the arguments (RDI, RSI, RDX, RCX, R8, R9 for the first 6 arguments in most cases).

Virtual File System (VFS)
The next layer is virtual filesystem. Here we see functions specific to asynchronous IO or synchronous IO, and doing direct IO or not, and also actual filesystem specific functions (in my case xfs, when ext4 is used, you will see specific functions for that. I highly recommend XFS!). This layer also uses a structure called ‘request_queue’, which keeps track of the actual IO requests for a block device, of which each individual request is a struct ‘request’, which contains one or more structs called ‘bio’ which contains a description of the request, which points to structure called ‘bio_vec’, which points to pages for storing the disk request contents. This is all setup and created in kernel memory by the user process in system mode. It’s my assumption that the properties of the disk device (=maximum advertised IO size) are taken into account when the VFS filesystem implementation creates requests and all necessary structs and memory area’s. Please mind it’s important that enough memory is available to setup the necessary structures, and enough CPU to make this happen. Also some of the crucial structures for doing IO (request, bio, bio_vec) seem to be setup in this layer. An IO can’t be done without a memory area for the IO request to hold the data for sending it to the device (alias a write), or a memory area for the IO request to hold the data which is fetched from the device (alias a read).

The funny thing is that when you use ASM (the simple version 11.2 ASM with a local ASM instance and local disk devices), you will still see some functions of the VFS layer, because you use a disk device which is opened using the local filesystem. Examples of these functions are: aio_run_iocb, do_aio_read.

Block Layer
The next layer is the block layer. Here the request queue is handled, and I/O scheduling is done. Oracle advises the deadline scheduler in all cases. The scheduler works by plugging a request queue, much like a plug in your bathtub, letting the requests enter the queue. Having multiple requests in a queue means it can be optimised by reordering the requests, and merging adjacent requests up to the device’s advertised maximum IO size. Once a request’s timeout expires, or the requesting process finishes submitting IO, the queue is unplugged.

SCSI layer
The SCSI layer is responsible for communicating with SCSI devices to do IOs.

Device driver
The device driver layer is the layer that truly physically communicates with a device, and implements the device specific communication. In my case the functions start with mpt, which is the driver for LSI PCI adapters.

To see how the flow of IO going through the block layer, there is a tool called blktrace. Actually this is a mini-suite of tools consisting of blktrace (tracing the IO requests through the block layer), blkparse (parsing the output of blktrace to make it human readable), btrace (script to combine blktrace and blkparse, and btt (a blktrace output post processing tool)), among others.

In order to use blktrace, the debug file system of the Linux kernel needs to be mounted. Here is how that is done:

# mount -t debugfs debugfs /sys/kernel/debug

If the kernel debugfs is not mounted, you get the following message:

[root@bigmachine ~]# btrace /dev/oracleasm/disk1
Invalid debug path /sys/kernel/debug: 0/Success

I use blktrace in this article for looking at the IO requests to understand what is going on. The workflow for this use of blktrace is:
– create a trace file of the block flow using blktrace
– make the trace file human readable via blkparse or analyse via btt (block trace times)

Actually, you can parse the output of blktrace directly via blkparse using ‘blktrace -d DEVICE – | blkparse -i -‘. To make that even simpler, the script ‘btrace’ is created, to do exactly that.

Here’s how that looks like (depending on the number of processes using it, the output can be huge, this is only a snippet):

[root@bigmachine ~]# btrace /dev/oracleasm/disk1
  8,16   0       57     0.260669503  2421  Q  WS 4088 + 8 [asm_gmon_+asm]
  8,16   0       58     0.260672502  2421  G  WS 4088 + 8 [asm_gmon_+asm]
  8,16   0       59     0.260673231  2421  P   N [asm_gmon_+asm]
  8,16   0       60     0.260674895  2421  I  WS 4088 + 8 [asm_gmon_+asm]
  8,16   0       61     0.260675745  2421  U   N [asm_gmon_+asm] 1
  8,16   0       62     0.260677119  2421  D  WS 4088 + 8 [asm_gmon_+asm]
  8,16   0       63     0.260882884     0  C  WS 4088 + 8 [0]

What is shown here, is the typical flow of an IO in the block layer:
Q – Queue. A request starts off sending a notification on the intent to queue at the given location.
G – Get request. A struct request is allocated.
P – Plug. When the block device queue is empty, the queue is plugged in order to receive further IOs and have the ability to optimise (merge and/or reorder) them before the data is sent to the device.
I – Insert. A request is sent to the IO scheduler for addition to the internal queue and later service by the driver. The request is fully allocated at this time.
U – Unplug. The start of sending requests to the driver.
D – Driver. A request has been sent to the driver and removed from the queue.
C – Complete. A previously issued request to the driver has been completed.

The main point is here, that you can truly see how the IO requests flow through the block layer and are issued to the storage device, in other words, you can see how the block layer receives the IOs, and what is exactly submitted to the driver as request for the physical storage layer.

This is a microscopic view of the disk IOs. In most cases, when you want to gain information on block layer IO processing, another view on it is provided by processing blktrace output with btt. This is an example output of btt:

First capture IO events using blktrace:

[root@bigmachine ~]# blktrace -w 60 -d /dev/oracleasm/disk1 -o - | blkparse -d sdb.blkparse -i -

In this example I captured IOs for 60 seconds. You can exclude ‘-w 60′, and press interrupt (ctrl-c) when you deem IO recording is enough. This produces a binary file ‘sdb.blkparse’, which can be used btt:

This is the first part, the flow through the block layer until IO completion:

==================== All Devices ====================

            ALL           MIN           AVG           MAX           N
--------------- ------------- ------------- ------------- -----------

Q2Q               0.000000001   0.239795347   3.002829973         238
Q2G               0.000000001   0.159337842   3.011192142         264
G2I               0.000000679   0.000001724   0.000011618         264
I2D               0.000000764   0.000007633   0.000153436         264
D2C               0.000000001   0.103328167   3.012509148         233
Q2C               0.000000001   0.270961298   3.012516496         233

Note: time is in milli seconds.
Q2Q – Time between IO requests.
Q2G – Time it takes for a request struct to be allocated.
G2I – Time it takes for the request to be inserted in the device’s queue.
I2D – Time spend in the device queue waiting to be issued to the driver.
D2C – Time spend between issuing to the driver and completion of the request. This includes controller, storage. This is the same figure as the ‘svctm’ column with iostat -x.
Q2C – Total time spend in block layer and physical IO. This is the same figure as the ‘await’ column with iostat -x.

The second part is the device overhead section:

==================== Device Overhead ====================

       DEV |       Q2G       G2I       Q2M       I2D       D2C
---------- | --------- --------- --------- --------- ---------
 (  8, 16) |  66.6284%   0.0007%   0.0000%   0.0032%  38.1339%
---------- | --------- --------- --------- --------- ---------
   Overall |  66.6284%   0.0007%   0.0000%   0.0032%  38.1339%

This is partly the same as the IO flow table above. This is expressed as a percentage of where the total time of the IO is spend.
Q2G – Request struct allocation.
G2I – Insertion in the device queue.
Q2M – Total time until merge.
I2D – Time spend in the queue until it was dispatched to the driver.
D2C – Time spend on doing the IO after submitting the request to the driver.

This post looks like I am jumping on the bandwagon of IT orchestration like a lot of people are doing. Maybe I should say ‘except for (die hard) Oracle DBA’s’. Or maybe not, it up to you to decide.

Most people who are interested in IT in general will have noticed IT orchestration has gotten attention, especially in the form of Puppet and/or Chef. I _think_ IT orchestration has gotten important with the rise of “web scale” (scaling up and down applications by adding virtual machines to horizontal scale resource intensive tasks), in order to provision/configure the newly added machines without manual intervention, and people start picking it up now to use it for more tasks than provisioning of virtual machines for web applications.

I am surprised by that. I am not surprised that people want boring tasks like making settings in configuration files and restarting daemons, installing software with all the correct options, etc. being automated. Instead, I am surprised that people are now picking this up after it has been around for so long.

A little history.
As far as I know, IT orchestration started with cfengine, which was really a configuration engine (hence the name). Despite having a little different purpose (configuration vs. orchestration), this tool is the parent of all the orchestration/configuration tools which exist nowaday. cfengine started off as a study in 1993 by Mark Burgess at the university of Oslo, with the creation of the cfengine software as a result. As far as I can see, it has been available as open source software since the beginning.

Now back to what I am surprised at: with cfengine, there has been a way to configure linux/unix systems in a structured way, and install and configure software on systems since the mid-nineties. Yet, this wasn’t picked up (of course with a few exceptions). Fast forward to today, we see it is being adopted. And that is a good thing.

I created a setup with cfengine for a client a long time ago, which had the ability to install the Oracle software, different PSU’s in different home’s, and remove it by adding or removing machines to groups in a cfengine configuration file. It wasn’t picked up by the client, it’s way more fun running X to install the software, and make the choices by hand, and redo this over and over on every machine, right?

I almost forgotten about my work with cfengine, until I spoke with Alex Gorbatchev at a conference, at which he pointed me to Ansible. At first I didn’t do a lot with it, but lately I’ve given it a go, and I am very happy with it.

Another redo of cfengine?
From what I read, most of the configuration/orchestration engines created after cfengine are created to circumvent all kinds of difficulties with cfengine. I can understand that. It took me a while to learn cfengine, and indeed it forces you to think in a different way.

The Ansible project decided to radically do it different than all the other engines. It is different in the sense that it advertises itself as simple, agentless and powerful.

Simple is a terrific goal. For those of you that have worked with configuration/orchestration engines, there is a steep learning curve. It is just hard to get the basic principles in your head. To be honest, also Ansible took me a while too, to grasp the basic principles, and get the picture correctly in my head. Yet, having worked with cfengine comparing it with Ansible’s playbooks, which are the scripts to do things on the targets, it is a breath of fresh air. Playbooks are so clean they (almost) can be read and understood as plain english.

This is where Ansible is truly different than any of the other configuration/orchestration tools. Ansible does not require any agent installation on the targets. The obvious next question then is: how can this work? Well, quite simple: Ansible uses ssh to connect to the host, and executes commands via the shell. Having that said, it requires a little more detail; Ansible uses python on the remote host for it’s normal execution. However, you can use it without python, for example to setup the host up for the Ansible normal usage mode Which requires python and the simple-json module.

This is truly important, and makes it an excellent fit for my daily work as an IT consultant.

Ansible is powerful in the way that you can do the configuration and orchestration tasks in a simple clean way.

Summary on the introduction.
Above was a brief personal history, and some of the “marketed” features of Ansible. I think being agentless is the true “killer feature” here. All the other configuration/orchestration engines require you to setup and configure a fixed client-server connection, and install a deamon and a central server process. In case you wondered, yes, authentication is important, and it’s simply brilliant that the ssh password authentication or public key infrastructure can be used.

Because there’s no daemon to install, you can run your created play books everywhere. So instead of a fixed client configuration, you can create play books to do routine tasks, and repeat it at multiple sites.

Okay, how does this work?

Installation: add EPEL and install ansible.
If you are on one of the clones of RedHat Enterprise Linux (I use Oracle Linux), you simply need to add the EPEL repository to your yum source list, and run:

# yum install ansible

First steps.
One of the first things I do, is create a directory for a typical ansible ‘project’. Project means a set of tasks you want to do to a set of hosts here. Next, I create a file called ‘hosts’ which is the list of hosts you want to use for executing tasks on. By default, Ansible looks in /etc/ansible/hosts. In this case, I put a single machine in it (a test VM), but it can be a list of machines (ip addresses or hostnames).

$ cat hosts

In fact, you can create groups in the hosts file in the “ini style”. But I just put one host in for this example.
The next thing is to check if Ansible reads the file correctly. This is done in the following way:

$ ansible all -i hosts --list-hosts

Okay, this means Ansible will operate on this one host if invoked. The next logical thing (typically done when you are in a new client environment to check if you can reach the hosts):

$ ansible all -i hosts -m ping | FAILED => FAILED: Authentication failed.

Ping might be a bit misleading for some people. What ping does here (-m means module), is trying to connect to the host over ssh, and log in. Because I didn’t specify a user, it used the username of the current user on the machine, which is ‘ansible’. A user ‘ansible’ typically doesn’t exist on a normal server (and is not necessary or should be created), and also not on my test server. So it failed, as the message said, on authentication.

My test VM is a basic installed (OL) linux 6 server. This means there’s only one user: root.

So, let’s specify the user root as user:

$ ansible all -i hosts -m ping -u root | FAILED => FAILED: Authentication failed.

The authentication failed again. And it should! What this is doing, is trying to log on as root, and we haven’t given any password, nor have I put my local user’s public key in the remote authorised_keys file. So there is no way this could work. This is typically also the state when you want to do stuff with a “fresh” client system. Let’s add the ‘-k’ option (ask ssh password), and run again:

$ ansible all -i hosts -m ping -u root -k
SSH password: | success >> {
    "changed": false,
    "ping": "pong"

To walk you through the output: It now asks for a password, which I’ve filled out, then lists the host and the state: success. During this execution, there was nothing changed on the remote host, and the ping command resulted in a pong (alike the ICMP ping response).

With what we have learned now, we can do things like this:

$ ansible all -i hosts -u root -k -a "ifconfig"
SSH password: | success | rc=0 >>
eth0      Link encap:Ethernet  HWaddr 00:0C:29:14:65:ED
          inet addr:  Bcast:  Mask:
          inet6 addr: fe80::20c:29ff:fe14:65ed/64 Scope:Link
          RX packets:47 errors:0 dropped:0 overruns:0 frame:0
          TX packets:25 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:1000
          RX bytes:6293 (6.1 KiB)  TX bytes:2594 (2.5 KiB)

eth1      Link encap:Ethernet  HWaddr 00:0C:29:14:65:F7
          inet addr:  Bcast:  Mask:
          inet6 addr: fe80::20c:29ff:fe14:65f7/64 Scope:Link
          RX packets:188 errors:0 dropped:0 overruns:0 frame:0
          TX packets:112 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:1000
          RX bytes:142146 (138.8 KiB)  TX bytes:15545 (15.1 KiB)

lo        Link encap:Local Loopback
          inet addr:  Mask:
          inet6 addr: ::1/128 Scope:Host
          UP LOOPBACK RUNNING  MTU:65536  Metric:1
          RX packets:0 errors:0 dropped:0 overruns:0 frame:0
          TX packets:0 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:0
          RX bytes:0 (0.0 b)  TX bytes:0 (0.0 b)

Does this look familiar for you Exadata DBA’s? Yes, this replicates some of the functionality of dcli (although dcli is aimed at executing simple tasks to a group of hosts, whilst Ansible is aimed at enterprise configuration and orchestration).

One step beyond! Playbooks.
Now let’s progress to playbooks. An Ansible playbook is where the true strength lies of Ansible. It allows you to specify tasks to execute on the remote hosts, and create sequences of tasks and make decisions based on the outcome of a tasks for further execution. Let me show you a simple playbook, and guide you through it:

- hosts: all
  gather_facts: no
  remote_user: root

  - name: upgrade all packages
    yum: name=* state=latest

  - name: install python-selinux
    yum: name=libselinux-python state=installed

  - name: add public key to authorized_key file of root
    authorized_key: user=root state=present key="{{ lookup('file','/home/ansible/.ssh/') }}"

As you can see, this is a playbook with three tasks: upgrade all packages, install libselinux-python and adding my (local) public key to the authorised key file of root (to allow passwordless access).

Line 1 shows three dashes, which means the start of a YAML document.
Line 2 starts with a single dash, which indicates a list. There is one dash at this indention level, so it’s a list of one. The fields of this member are hosts, gather_facts and tasks. Tasks got his own list (mind the indention level, that is important). The fields are key/value pairs, with the separation indicated by the colon (:). The first field is ‘hosts’, with the value ‘all’. This means that all hosts in the hosts file are used for this playbook. I don’t think it’s hard to imagine how useful it can be to specify a group/kind of servers the playbook can run on. The next one is ‘gather_facts’. A normal playbook execution first gathers a lot of information from all the hosts it is going to run on before execution. These can be used during playbook execution. Next ‘remote_user’. This indicates with which user ansible is going to logon, so we don’t have to specify it on the command line. Then we see ‘tasks’ to indicate the list of tasks to be executed on the hosts.

It’s easy to spot we got three tasks. What is extremely important, is the indention of this list (it’s got a dash, so it’s a list!). Name is not mandatory, but it makes it easy to read if you give the tasks useful names and these will be shown when the playbook is executed. The first task has the name ‘upgrade all packages’. The next field shows the key is ‘yum’ indicating it is making use of the yum module. This key got two values: name=*, which means all ‘all packages’, and state=latest, which means we want all packages to be at the latest version. This means this command is the equivalent of ‘yum update’.

The second task is called ‘install python-selinux’. It makes use of the yum module again, and is self explanatory, it installs the libselinux-python package. This packages is necessary to work on a host which has selinux enabled on things that are protected by selinux.

The next task is called ‘add public key to authorised_key file of root’. It is making use of the authorized_key module. This module requires a parameter ‘key’, for which we use the lookup function to look up the local (!) public key, of the user with which I execute ansible, which is ‘ansible’. ‘state=present’ means we want this key to be present; ‘present’ is the default value, so it wasn’t necessary to put this in. Next ‘user=root': we want the public key to be added to the authorized_keys file of the user root.

Of course these tasks could be executed using the ‘ansible’ executable as single tasks. To show the importance of the installation of the libselinux-python module on a host with selinux enabled (which is the state of selinux on a fresh installed Oracle Linux machine), let’s execute the task using the authorized_key module:

$ ansible all -i hosts -k -u root -m authorized_key -a "user=root state=present key=\"{{ lookup('file','/home/ansible/.ssh/') }}\""
SSH password: | FAILED >> {
    "failed": true,
    "msg": "Aborting, target uses selinux but python bindings (libselinux-python) aren't installed!"

Clear, right? The host is selinux protected. Now, let’s execute the installation of the libselinux package as single task, and then add our public key to the authorized_key file of root:

$ ansible all -i hosts -k -u root -m yum -a "name=libselinux-python state=installed"
SSH password: | success >> {
    "changed": true,
    "msg": "",
    "rc": 0,
    "results": [
        "Loaded plugins: security\nSetting up Install Process\nResolving Dependencies\n--> Running transaction check\n---> Package libselinux-python.x86_64 0:2.0.94-5.3.el6_4.1 will be installed\n--> Finished Dependency Resolution\n\nDependencies Resolved\n\n================================================================================\n Package             Arch     Version                 Repository           Size\n================================================================================\nInstalling:\n libselinux-python   x86_64   2.0.94-5.3.el6_4.1      public_ol6_latest   201 k\n\nTransaction Summary\n================================================================================\nInstall       1 Package(s)\n\nTotal download size: 201 k\nInstalled size: 653 k\nDownloading Packages:\nRunning rpm_check_debug\nRunning Transaction Test\nTransaction Test Succeeded\nRunning Transaction\n\r  Installing : libselinux-python-2.0.94-5.3.el6_4.1.x86_64                  1/1 \n\r  Verifying  : libselinux-python-2.0.94-5.3.el6_4.1.x86_64                  1/1 \n\nInstalled:\n  libselinux-python.x86_64 0:2.0.94-5.3.el6_4.1                                 \n\nComplete!\n"

$ ansible all -i hosts -k -u root -m authorized_key -a "user=root state=present key=\"{{ lookup('file','/home/ansible/.ssh/') }}\""
SSH password: | success >> {
    "changed": true,
    "key": "ssh-rsa AAAAB3NzaC1yc2EAAAABIwAAAQEAliR905hxLnsOCRlOGnmN0H9dGH4NPV88ySC6GMv0KNnU7FfCXYE51Bkk97p2IWFsPhYO9qJDyAFxRm/lia1IZRDpCFcKKKMh5eXmEJC5XSrHWFdmGZRlFcS3VQ3rpCIyU3qFM6xMazh3JHKKEtE1J6nvw/hW3slY9G/6VoJ8CzpfeQMLDOdVXUIcZXqtCPuIEDBQ7yjfMzTGz+hEmz7ImbLaUyB4MDGrDnl33L8mkBEVYu8RrwgBcagDQSiQKnIca/EL45eX/74NG1e/6vxZkHZJz/W0ak4KD+o9vF4ikz0bdrGPMZ5gRYXWoSSHrVA+Rqk8A93qBXNKUUkzGoQYTQ== ansible@ansiblevm.local",
    "key_options": null,
    "keyfile": "/root/.ssh/authorized_keys",
    "manage_dir": true,
    "path": null,
    "state": "present",
    "unique": false,
    "user": "root"

Maybe your customer doesn’t want you to store your keys in their servers. It’s easy to do the reverse, and remove your key from the authorized_key file:

$ ansible all -i hosts -u root -m authorized_key -a "user=root state=absent key=\"{{ lookup('file','/home/ansible/.ssh/') }}\"" | success >> {
    "changed": true,
    "key": "ssh-rsa AAAAB3NzaC1yc2EAAAABIwAAAQEAliR905hxLnsOCRlOGnmN0H9dGH4NPV88ySC6GMv0KNnU7FfCXYE51Bkk97p2IWFsPhYO9qJDyAFxRm/lia1IZRDpCFcKKKMh5eXmEJC5XSrHWFdmGZRlFcS3VQ3rpCIyU3qFM6xMazh3JHKKEtE1J6nvw/hW3slY9G/6VoJ8CzpfeQMLDOdVXUIcZXqtCPuIEDBQ7yjfMzTGz+hEmz7ImbLaUyB4MDGrDnl33L8mkBEVYu8RrwgBcagDQSiQKnIca/EL45eX/74NG1e/6vxZkHZJz/W0ak4KD+o9vF4ikz0bdrGPMZ5gRYXWoSSHrVA+Rqk8A93qBXNKUUkzGoQYTQ== ansible@ansiblevm.local",
    "key_options": null,
    "keyfile": "/root/.ssh/authorized_keys",
    "manage_dir": true,
    "path": null,
    "state": "absent",
    "unique": false,
    "user": "root"

Please mind I didn’t specify ‘-k’ on the command line to send a password: in the previous step we added our key, so we can access our host using our public key. Another extremely important thing is ‘changed’. ‘changed’ indicates if the task did actually change something on the destination server.

I have ran single task until now, I changed the state of my test VM back to it’s state before I started changing it with ansible (by removing the libselinux package using ‘ansible all -i hosts -k -u root -m yum -a “name=libselinux-python state=absent”‘

Let’s run the above described playbook:

$ ansible-playbook -i hosts -k linux_setup_example.yml
 [WARNING]: The version of gmp you have installed has a known issue regarding
timing vulnerabilities when used with pycrypto. If possible, you should update
it (ie. yum update gmp).

SSH password:

PLAY [all] ********************************************************************

TASK: [upgrade all packages] **************************************************
changed: []

TASK: [install python-selinux] ************************************************
changed: []

TASK: [add public key to authorized_key file of root] *************************
changed: []

PLAY RECAP ********************************************************************              : ok=3    changed=3    unreachable=0    failed=0

Now at this point you might think: I get it, but these are all pretty simple tasks, it’s not special at all. Well, let me show you an actual thing which totally shows what the importance of using this is, even on a single machine, but even more when you got a large group of servers you have to administer.

The next example is a playbook created to apply PSU3 to an Oracle home. It’s still quite simple, it just applies PSU3 to the Oracle home. But totally automatic. The point I am trying to make is that this is already nice to have automated a lot of work for a single home, but it saves a lot of hours (read: a lot of money), and saves you from human error.

- hosts: all
    u01_size_gb: 1
    tmp_size_gb: 1
    oracle_base: /u01/app/oracle
    oracle_home: /u01/app/oracle/product/
    patch_dir: /u01/install
  remote_user: oracle

  - name: check u01 free disk space
    action: shell df -P /u01 | awk 'END { print $4 }'
    register: u01size
    failed_when: u01size.stdout|int < {{ u01_size_gb }} * 1024 * 1024

  - name: check tmp free disk space
    action: shell df -P /tmp | awk 'END { print $4 }'
    register: tmpsize
    failed_when: tmpsize.stdout|int < {{ tmp_size_gb }} * 1024 * 1024

  - name: create directory for installation files
    action: file dest={{ patch_dir }} state=directory owner=oracle group=oinstall

  - name: copy opatch and psu
    copy: src=files/{{ item }} dest={{ patch_dir }} owner=oracle group=oinstall mode=0644
     - ocm.rsp

  - name: install opatch in database home
    action: shell unzip -oq {{ patch_dir }}/ -d {{ oracle_home }}

  - name: unzip psu patch
    action: shell unzip -oq {{ patch_dir }}/ -d {{ patch_dir }}

  - name: patch conflict detection
    action: shell export ORACLE_HOME={{ oracle_home }}; cd {{ patch_dir }}/18522509; $ORACLE_HOME/OPatch/opatch prereq CheckConflictAgainstOHWithDetail -ph ./
    register: conflict_detection
    failed_when: "'Prereq \"checkConflictAgainstOHWithDetail\" passed.' not in conflict_detection.stdout"

  - name: apply psu
    action: shell export ORACLE_HOME={{ oracle_home}}; cd {{ patch_dir }}/18522509; $ORACLE_HOME/OPatch/opatch apply -silent -ocmrf {{ patch_dir }}/ocm.rsp
    register: apply_psu
    failed_when: "'Composite patch 18522509 successfully applied.' not in apply_psu.stdout"

  - name: clean up install directory
    file: path={{ patch_dir }} state=absent

Let me run you through this playbook! It starts off with the indication of a YAML document: ‘—‘. Next hosts: all again. I just put all the hosts in the hosts file, I did not create all kinds of groups of hosts (which would be fitting when you use it at a fixed environment, but I use it for various customers). Then vars, with a list of variables. As you can see, I can use the variables, which are shown in the playbook as {{ variable }}. Then remote_user: oracle and tasks.

The first and second task use variables, and use the argument ‘register’ to save all response into a named variable. I also use ‘failed_when’ to make the playbook stop executing when the argument after ‘failed_when’ is true. Arguments of ‘failed_when’ is the named variable, for which the output of the standard out is used (.stdout). Then a filter is used to cast the output to integer, and is compared with a calculation of the variable.

The third task is using the files module to create a directory. The fourth task is using the copy module. The copy module means a file or files (in this case) are copied from the machine from which the playbook is run, onto the destination host or hosts. Here is also another trick used, to process the task with a list of items. As you can see, the copy line contains a variable {{ items }}, and the task is executed for all the items in the list ‘with_items’. I found this is fine for smaller files (up to a few hundred of megabytes), but too slow for bigger files. I use http (the get_url module) to speed up file transfer.

The fifth and sixth tasks execute a shell command, unzip, to extract the contents of a zip file into a specific place.

The seventh task is executing a small list of shell commands, in order to be able to run the conflict detection option of opatch. The same trick as with the first two tasks is used, register a name for the output of the conflict detection. Here I check if the stdout contains what I would manually check for when I would run it. The eighth task is the main task of the whole playbook: the actual patch. However, it uses the same technique as task seven. The last task simply removes a directory, in order to remove the files we used for this patch.

I hope this shows what a tremendous help Ansible can be for a consultant. This kind of tool is simply mandatory if you got an environment with more than approximately ten to twenty servers to administer. Ansible can be used even if the organisation does not want to spend time on the implementation of a configuration tool.


Get every new post delivered to your Inbox.

Join 2,350 other followers

%d bloggers like this: