In my previous article I started exploring the memory usage of a process on a recent linux kernel (2.6.39-400.243.1 (UEK2)), recent means “recent for the Enterprise Linux distributions” in this context, linux kernel developers would point out that the kernel itself is at version 3.19 (“stable version” at the time of writing of this blogpost).

The previous article showed that every process has its own address space, and that different allocations exists for execution. These allocations can be seen in the proc pseudo filesystem, in a process specific file called ‘maps’. The process itself needs some administration area’s which are anonymous allocations which are marked with [heap] and [stack], and some a few others called [vdso] and [vsyscall]. Every process executes something (not all people realise that: a process can wait for something, but essentially is always executing). So there always will be an executable in a (regular) process’ address space. In a lot of cases, the executable uses shared libraries. In that case, the libraries are loaded in the address space of the process too, alongside the executable.

The executable contains (at least) two sections; the code segment, which is readonly and potentially shared, and the data segment, which is read write and gets truly allocated instead of used shared with the parent if the process needs to write. In a lot of cases, the executable uses shared libraries, which means it uses functions which are stored in that library. A library also needs to be loaded, and also contains multiple sections, which can be read only or read write, and are shared unless the process needs to write to that segment.

For completeness, here’s the complete maps output of a process executing the ‘cat’ executable again:

$ cat /proc/self/maps
00400000-0040b000 r-xp 00000000 fc:00 2605084                            /bin/cat
0060a000-0060b000 rw-p 0000a000 fc:00 2605084                            /bin/cat
0060b000-0060c000 rw-p 00000000 00:00 0
0139d000-013be000 rw-p 00000000 00:00 0                                  [heap]
7f444468d000-7f444a51e000 r--p 00000000 fc:00 821535                     /usr/lib/locale/locale-archive
7f444a51e000-7f444a6a8000 r-xp 00000000 fc:00 3801096                    /lib64/
7f444a6a8000-7f444a8a8000 ---p 0018a000 fc:00 3801096                    /lib64/
7f444a8a8000-7f444a8ac000 r--p 0018a000 fc:00 3801096                    /lib64/
7f444a8ac000-7f444a8ad000 rw-p 0018e000 fc:00 3801096                    /lib64/
7f444a8ad000-7f444a8b2000 rw-p 00000000 00:00 0
7f444a8b2000-7f444a8d2000 r-xp 00000000 fc:00 3801089                    /lib64/
7f444aacd000-7f444aad1000 rw-p 00000000 00:00 0
7f444aad1000-7f444aad2000 r--p 0001f000 fc:00 3801089                    /lib64/
7f444aad2000-7f444aad3000 rw-p 00020000 fc:00 3801089                    /lib64/
7f444aad3000-7f444aad4000 rw-p 00000000 00:00 0
7fff51980000-7fff519a1000 rw-p 00000000 00:00 0                          [stack]
7fff519ff000-7fff51a00000 r-xp 00000000 00:00 0                          [vdso]
ffffffffff600000-ffffffffff601000 r-xp 00000000 00:00 0                  [vsyscall]

If you look closely, you will see that I didn’t explain one type of allocation yet: the anonymous allocations. The anonymous allocations are visible in lines 4, 11, 13 and 16. The anonymous mappings directly following the data segment of either an executable or a library mapped into a process address space is called the BSS. The data segment and the BSS store static variables, however the data segment stores initialised variables, the BSS stores uninitialised variables. The anonymous mapping for the BSS section might exist for a library, as seen above, or might not exist; not all Oracle database executable libraries use an anonymous memory mapping for example. Actually, there is one other memory allocation visible, /usr/lib/locale/locale-archive, which is a file for locale (multi-language support) functions in the C library, which is out of scope for this article.

When a process requests memory to store something, the system call malloc() (memory allocation) can be called. This system call inspects the size of the allocation, and will allocate memory from either the process’ heap (the memory mapping with [heap], using the system call brk()) or it will allocate space using a new anonymous memory segment, using the system call mmap(). If you follow the link with malloc(), you can read the source code of the malloc() call. There are different malloc()’s, which fulfil different purposes (embedded devices have different requirements than huge servers), the implementation that Enterprise Linuxes use is one called ptmalloc2, which is based on a version written bij Doug Lea. If you read the comments in the source code, specifically at ‘Why use this malloc?’, you will see that it tries to be smart with requests up to 128KB (for memory re-usability, to avoid fragmentation and memory wastage), which are allocated from the heap. If an allocation is larger than 128KB, it will use the system memory mapping facilities.

Okay, this brings us back at the original question: how much memory does this process take? I hope you recall from the first blogpost that Linux tries to share as much memory as possible, and when a new process is created, the allocations in the address space of this new process are pointers to the memory areas of the parent process. Let’s first use a utility a lot of people are using: top.

top - 10:53:40 up 9 days, 14:11,  2 users,  load average: 1.34, 1.36, 1.37
Tasks: 1124 total,   1 running, 1123 sleeping,   0 stopped,   0 zombie
Cpu(s):  0.9%us,  0.8%sy,  0.1%ni, 98.2%id,  0.0%wa,  0.0%hi,  0.0%si,  0.0%st
Mem:  98807316k total, 97411444k used,  1395872k free,   400288k buffers
Swap: 25165820k total,  3560852k used, 21604968k free, 27573200k cached

 16391 oracle    -2   0 2369m 4960 4812 S  0.7  0.0  91:41.37 asm_vktm_+asm1
 16407 oracle    20   0 2388m  20m 8580 S  0.7  0.0  46:16.40 asm_dia0_+asm1

I edited the output a bit to show only two process from an ASM instance.
The columns that show information on memory are VIRT, RES, SHR, %MEM.

VIRT is described in the man-page of top as ‘The total amount of virtual memory used by the task’. This means it’s ALL the memory visible in the addressing space of the process. A useful utility to get the contents of the virtual memory allocations for a process is pmap, let’s use it for process 16391, which is asm_vktm_+asm1:

$ pmap -x 16391
16391:   asm_vktm_+ASM1
Address           Kbytes     RSS   Dirty Mode   Mapping
0000000000400000  246644    2520       0 r-x--  oracle
000000000f6dd000    1952      44      20 rw---  oracle
000000000f8c5000     140       0       0 rw---    [ anon ]
000000001042c000     356       0       0 rw---    [ anon ]
0000000060000000    4096       0       0 rw-s-  SYSV00000000 (deleted)
0000000061000000 2080768       0       0 rw-s-  SYSV00000000 (deleted)
00000000e0000000      12       4       4 rw-s-    [ shmid=0x4e78003 ]
00007fa275589000      72      12       0 r-x--
00007fa27559b000    2044       0       0 -----
00007fa27579a000       8       0       0 rw---
00007fa27579c000    1604       0       0 r-x--
00007fa27592d000    2044       0       0 -----
00007fa275b2c000      72       0       0 rw---
00007fa275b3e000      20       4       0 r-x--
00007fa275b43000    2044       0       0 -----
00007fa275d42000       4       0       0 rw---
00007fa27abd5000       8       0       0 r-x--
00007fa27abd7000    2044       0       0 -----
00007fa27add6000       4       0       0 rw---
00007fa27add7000     128     116       0 r-x--
00007fa27adf8000     512      12      12 rw---    [ anon ]
00007fa27ae78000     212       4       4 r--s-  passwd
00007fa27aead000    1260      68      68 rw---    [ anon ]
00007fa27aff3000       4       4       0 rw-s-  hc_+ASM1.dat
00007fa27aff4000       8       0       0 rw---    [ anon ]
00007fa27aff6000       4       0       0 r----
00007fa27aff7000       4       0       0 rw---
00007fa27aff8000       4       0       0 rw---    [ anon ]
00007fff7b116000     132       8       8 rw---    [ stack ]
00007fff7b1ff000       4       4       0 r-x--    [ anon ]
ffffffffff600000       4       0       0 r-x--    [ anon ]
----------------  ------  ------  ------
total kB         2426668    5056     156

The column ‘Kbytes’ represents the full size of the executable, libraries, shared memory, anonymous mappings and other mappings of this process. For completeness sake: 2426668/1024=2369.79, which matches the 2369 in the top output. This is all the memory this process can see, and could use. Does this tell us anything on what memory the process 16391 actually takes? No. (parts of) the Oracle executable’s allocations are potentially shared, the shared memory (SYSV00000000 (deleted) and [ shmid=0x4e78003 ], which represent the Oracle SGA) is shared, the memory allocations for the libraries are potentially shared. The anonymous memory mappings have been defined, but the actual allocation is not visible in the Kbytes column. What this column in top does for me, is tell the approximate SGA size, especially if the SGA is larger (meaning multiple gigabytes).

The second memory column in top is RES. RES is described as: ‘The non-swapped physical memory a task is using’. RES is sometimes referred to as RSS, and called ‘resident set size’. As we can see from the total, the RSS is way lesser than the virtual memory size. One important thing in the RES description of top is that it described that swapped memory pages are not counted for the RES/RSS value. RES/RSS corresponds to the actual used (“touched”) memory by a process, and is directly usable. If you look back to the RSS column of pmap above, you see the oracle executable’s two mappings, one has a RSS size of 2520, and one has a RSS size of 44. But…if you remember that the code/readonly segment is potentially shared with other process, and then look at the 2520 value (which is of the oracle memory segment with the rights r-x–, which means the code segment), I hope you understand this just means this process (vktm) read a subset of the entire executable, and more importantly: the RSS size does not reflect physical memory uniquely allocated by this process.

If we look at the shared memory segments, it’s interesting to see what happens during normal life of a database session. I think it is needless to say that you should calculate shared memory outside of process memory usage, since it’s a distinct memory set that is truly shared by all the processes that are created for the instance.

This is a session which has just started:

$ pmap -x 43853
Address           Kbytes     RSS   Dirty Mode   Mapping
0000000060000000       4       0       0 r--s-    [ shmid=0xb87801d ]
0000000060001000    2860     348     348 rw-s-    [ shmid=0xb87801d ]
0000000061000000 4046848    2316    2316 rw-s-    [ shmid=0xb88001e ]
0000000158000000  144592       0       0 rw-s-    [ shmid=0xb88801f ]
0000000161000000      20       4       4 rw-s-    [ shmid=0xb890020 ]

This instance has a SGA set to 4G. Because the session just started, it only touched 2316(KB) of the SGA. Next, I do a big (buffered!) full table scan, requiring the session to put a lot of blocks into the buffercache. After the scan, look at the shared memory segment using pmap again:

$ pmap -x 43853
Address           Kbytes     RSS   Dirty Mode   Mapping
0000000060000000       4       0       0 r--s-    [ shmid=0xb87801d ]
0000000060001000    2860     384     384 rw-s-    [ shmid=0xb87801d ]
0000000061000000 4046848 2279040 2279040 rw-s-    [ shmid=0xb88001e ]
0000000158000000  144592   66564   66564 rw-s-    [ shmid=0xb88801f ]
0000000161000000      20       4       4 rw-s-    [ shmid=0xb890020 ]

The session has touched half of the SGA shared memory segment (visible in the RSS column of the 6th line). This is logical if you understand what is going on: the process does a buffered table scan, which means the blocks read from disk need to be stored in the buffer cache, which is one of the memory structures in the Linux shared memory segments. However, if you look strictly at the top utility output of a database that has just started up, you see the RSS size of the all the processes that work with the buffercache growing. This phenomenon has lead to a repetitive question on the Oracle Linux forums if Oracle database processes are leaking memory. Of course the correct answer is that the RSS size just grows because the process just touches more of the shared memory (=SGA) that has been mapped into its address space. It will stop increasing once it touched all the memory it could touch.

%MEM is the RES/RSS size expressed as a percentage of the total physical memory.

SHR is the amount of shared memory. The manpage of top says ‘It simply reflects memory that could be potentially shared with other processes’. Do not confuse this with shared memory segments mapped into the process’ address space. Empirical tests show the SHR value always seems to be lower than the RSS size, which means it seems to track the RSS value of memory, and shows RSS (touched) memory that could be shared (which seems to contain both touched memory from the shared memory segments, as well as privately mapped executable and libraries). At least from the perspective of an Oracle performance and troubleshooting perspective I can’t see any benefit from using this value.

The conclusion of this part on memory usage is that both the virtual set size (VIRT/VSZ) and resident set size (RES/RSS) are no figures you can add up to indicate physical memory usage of a single process or a group of processes.

The virtual set size gives you the total amount of virtual memory available to a single process, which includes the executable, and potentially shared libraries, anonymous memory mappings and files mapped into the address space and shared memory. In a lot of cases, you get an indication of the total SGA size of the instance, because the shared memory segments which contain the SGA are entirely mapped into the address space of the process.

The resident set size shows you how much of the memory and files mapped into the address space are actually “touched”, and directly usable. Some of the memory usage result in memory pages private to the process, because of writing to the memory and the Copy On Write mechanism of the Linux memory manager. A lot of other memory mappings can be used, increasing the resident set size, while these are shared with other processes. A third potential component is the actual usage of memory as a result of anonymous memory mappings (versus the total allocation, which can be much more), which are private to the process.

This blogpost is about finding the actual amount of memory a process is taking. In order to do so, this post dives into the memory mechanisms of Linux. The examples in this article are taken from an Oracle Linux version 6.6 server, with kernel 2.6.39-400.243.1 (UEK2). This is written with the Oracle database processes in mind, but actually uses examples of a processes running ‘cat’, which means the contents of this post are absolutely not limited to Oracle database processes.

Let’s start off with a simple example. Let’s look at our own memory map. In order to do so, I use the ‘cat’ executable and the ‘maps’ entry in the proc pseudo-filesystem. This is how that is done, including the result:

$ cat /proc/self/maps
00400000-0040b000 r-xp 00000000 fc:00 2605084                            /bin/cat
0060a000-0060b000 rw-p 0000a000 fc:00 2605084                            /bin/cat
0060b000-0060c000 rw-p 00000000 00:00 0
0139d000-013be000 rw-p 00000000 00:00 0                                  [heap]
7f444468d000-7f444a51e000 r--p 00000000 fc:00 821535                     /usr/lib/locale/locale-archive
7f444a51e000-7f444a6a8000 r-xp 00000000 fc:00 3801096                    /lib64/
7f444a6a8000-7f444a8a8000 ---p 0018a000 fc:00 3801096                    /lib64/
7f444a8a8000-7f444a8ac000 r--p 0018a000 fc:00 3801096                    /lib64/
7f444a8ac000-7f444a8ad000 rw-p 0018e000 fc:00 3801096                    /lib64/
7f444a8ad000-7f444a8b2000 rw-p 00000000 00:00 0
7f444a8b2000-7f444a8d2000 r-xp 00000000 fc:00 3801089                    /lib64/
7f444aacd000-7f444aad1000 rw-p 00000000 00:00 0
7f444aad1000-7f444aad2000 r--p 0001f000 fc:00 3801089                    /lib64/
7f444aad2000-7f444aad3000 rw-p 00020000 fc:00 3801089                    /lib64/
7f444aad3000-7f444aad4000 rw-p 00000000 00:00 0
7fff51980000-7fff519a1000 rw-p 00000000 00:00 0                          [stack]
7fff519ff000-7fff51a00000 r-xp 00000000 00:00 0                          [vdso]
ffffffffff600000-ffffffffff601000 r-xp 00000000 00:00 0                  [vsyscall]

Most people know that in order to execute ‘cat’, the shell forks and executes the cat command in a new process.

What we see, is the executable (/bin/cat), the heap ([heap]), two dynamic libraries (/lib64/ and /lib64/, the stack ([stack]), two entries called [vdso] (virtual dynamically linked shared objects) and [vsyscall] (virtual syscall), and anonymous memory allocations (00000000 00:00 0 allocations without a marker to indicate a process function).

In order to understand the libraries, we first need to know something about the executable itself, using the command ‘file’:

$ file /bin/cat
/bin/cat: ELF 64-bit LSB executable, x86-64, version 1 (SYSV), dynamically linked (uses shared libs), for GNU/Linux 2.6.18, stripped

There actually is a great deal of information to be seen from this one line:
This is an ELF format executable. Another executable format is COFF, which is used on the Windows platform.
The executable is 64 bits (x86-64).
The most important part for this article: the executable is dynamically linked, and uses shared libraries.
The last word ‘stripped’ deserves some explanation too: the executables is stripped, which means a lot of symbolic information (function names for example) are removed from the executable. One reason for doing so is to make the executable smaller. If you look at the oracle executable, you will see it’s not stripped. Except for the Oracle XE executable, which is stripped.

We now established this is a dynamically linked executable. The next step is to see the libraries it is using. This is done with the ‘ldd’ (loader dependencies) executable:

$ ldd /bin/cat =>  (0x00007fffa9388000) => /lib64/ (0x00007f6ffa3dd000)
	/lib64/ (0x00007f6ffa772000)

Please mind these are the dependencies for executing /bin/cat, which does not necessarily means you see all those in the address space of the process executing the /bin/cat executable. Two you don’t see is, which is used for virtual dynamic shared objects, which essentially means from the oracle perspective that some system calls can be executed fully in userspace, most notably for Oracle engineers: gettimeofday(). Fellow Oaky James Morle wrote a nice article explaining this.
The other one is /lib64/, which is the dynamic loader needed for executing the executable. The last one is, which “truly” is a library that is dynamically loaded for use during execution. The other ones discussed earlier are necessary for the instantiating the execution, not so much during the execution.

If we now look back to the maps output, we see the library, and we see a library. The library is the dynamic loader library, to provide dynamic loading function on runtime.

Okay, we now gained a some understanding on executables and the libraries. Now let’s look a bit more in detail to the executable in maps:

00400000-0040b000 r-xp 00000000 fc:00 2605084                            /bin/cat
0060a000-0060b000 rw-p 0000a000 fc:00 2605084                            /bin/cat

Why is the executable mentioned twice?
The answer is this is because of the way an ELF object is built up. In order to look deeper into this ELF binary, we can use the ‘readelf’ executable:

$ readelf -l /bin/cat

Elf file type is EXEC (Executable file)
Entry point 0x401850
There are 8 program headers, starting at offset 64

Program Headers:
  Type           Offset             VirtAddr           PhysAddr
                 FileSiz            MemSiz              Flags  Align
  PHDR           0x0000000000000040 0x0000000000400040 0x0000000000400040
                 0x00000000000001c0 0x00000000000001c0  R E    8
  INTERP         0x0000000000000200 0x0000000000400200 0x0000000000400200
                 0x000000000000001c 0x000000000000001c  R      1
      [Requesting program interpreter: /lib64/]
  LOAD           0x0000000000000000 0x0000000000400000 0x0000000000400000
                 0x000000000000a204 0x000000000000a204  R E    200000
  LOAD           0x000000000000a208 0x000000000060a208 0x000000000060a208
                 0x0000000000000648 0x0000000000001000  RW     200000
  DYNAMIC        0x000000000000a3e8 0x000000000060a3e8 0x000000000060a3e8
                 0x0000000000000190 0x0000000000000190  RW     8
  NOTE           0x000000000000021c 0x000000000040021c 0x000000000040021c
                 0x0000000000000044 0x0000000000000044  R      4
  GNU_EH_FRAME   0x0000000000009414 0x0000000000409414 0x0000000000409414
                 0x000000000000027c 0x000000000000027c  R      4
  GNU_STACK      0x0000000000000000 0x0000000000000000 0x0000000000000000
                 0x0000000000000000 0x0000000000000000  RW     8

 Section to Segment mapping:
  Segment Sections...
   01     .interp
   02     .interp .note.ABI-tag .gnu.hash .dynsym .dynstr .gnu.version .gnu.version_r .rela.dyn .rela.plt .init .plt .text .fini .rodata .eh_frame_hdr .eh_frame
   03     .ctors .dtors .jcr .dynamic .got .got.plt .data .bss
   04     .dynamic
   05     .note.ABI-tag
   06     .eh_frame_hdr

When ‘readelf’ is invoked with the -l option, readelf displays the information that is contained in the ELF segment headers of the executable. This is a lot of detail into which I don’t go too deep.

The biggest distinction between the two entries of the executable in maps is the first entry is readonly executable (r-xp), and the second entry is read-write (rw-p). If we now look at the readelf -l output, we see the same flags in the FLAGS column, now indicated by R, W and E for Read, Write and Execute. If we look in the segment sections part, we see section 02 containing a lot of entries, and if we look up at the headers, and we count from zero, we see “LOAD”, and the entry being flagged as ‘R E’. This memory area often is referred to as the code segment.

The some of the types of memory allocations in section 02 are:
– Dynamic linking information (.interp, .gnu.hash, .dynsyms, dynstr)
– C runtime code (.init, .fini)
– Relocation information (.rela.dyn, .rela.plt)
– String constants (.rodata)
– Machine instructions (.text)
– Procedure Linkage Table (.plt)

The next section that deserves attention is section 03. If we look in the headers, count from zero, we see this section is a “LOAD” section too, but the entry being flagged as ‘RW ‘. This memory area often is referred to as the data segment.

The some of the types of memory allocations in section 03 are:
– Dynamic linking into (.dynamic)
– Relocated pointer values to external symbols (.got)
– Procedure linkage table of the global offset table (.got.plt)
– C runtime data (.ctors, .dtors)

As you can see with these two, section 02 contains information which can actually be readonly (the executable itself is never changed on runtime), which is why it’s readonly. Section 03 contains information which by the nature of dynamic linking, really can’t be readonly, because we don’t know how the library looks like which the executable needs to dynamically load via the dynamic loader; the only thing that is required from the library is that it contains the functions the executable needs from that library.

At this point I think it should be clear why there are two sections for a single executable. But how about the libraries?

Actually, simplified, a library is an executable with only functions, and not a program to run any of these functions. Libraries also can use other shared libraries for its functions. This means you can use the above mentioned executables (file, ldd, readelf) to examine libraries, and get quite similar results. The programmer can choose how the memory segments look like and how these are divided, which is the reason the dynamic loader library ( has 3 entries, and the libc library ( has 4. Although 4 or 3 sections seems to be extremely common for libraries, as well as 2 for executables.

Okay, now that we gained some understanding on the memory segments for a process, let’s continue on actual memory usage. When a process is forked, Linux tries to do the least possible and save as much memory as possible. To do so, the process gets its own virtual memory address space which is a duplicate of the process which called fork, including its memory. However, what really happens is the memory areas are the ones from the forking process, linked via pointers. Only when a change is needed, the process starts truly allocating its own memory, of course not using the memory from the forking process. This technique is often referred to as COW (Copy On Write). The ‘pointer trick’ is made possible by the kernel, and the use of virtual memory: every process allocates at the same (virtual) memory addresses, which are kept separate from other processes by translating them to different physical locations, or the same locations for copy on write. Of course the read-only sections as we saw in maps will never change, and as a result only will be in memory at a single place, and potentially referenced/used by a lot of processes. But even the writable sections are only physically allocated as soon as there truly is a write action.

The ‘just in time’ principle of memory allocation is also used for anonymous memory allocations via mmap(), which are most of the Oracle database processes PGA allocations. When a process allocates memory via mmap(), it results in an anonymous memory section, and is visible in maps. However, it isn’t allocated yet, only when the memory truly is getting used it is physically allocated.

I hope you start to see at this point that you can see the memory areas a process is using, but you can not tell how much of that is really, uniquely allocated for that process. Also, and maybe even more importantly, just adding memory figures of Oracle databases processes will do (a lot of) double counting. In a future blogpost or posts I will dive into how bits and pieces of the working can actually be seen from for example the /proc/PID/smaps meta-file, and how this knowledge can be applied for sizing Oracle databases.

I was testing Oracle Goldengate on a non-clustered Oracle database with ASM. With ASM, you need to have the grid infrastructure installed. The cluster ware for the single node install is called ‘oracle restart’.

The most convenient way to have Goldengate running at startup that I could find, was using the Oracle Grid Infrastructure Agents. These agents are not installed by default, you need to download these from the Oracle Technology Network. The download is with the grid infrastructure downloads in the database section.

The installation is very simple: unzip the file, and run the script with the arguments ‘–install’ and ‘–directory’ arguments. The directory argument needs to point to a directory outside of the grid infrastructure home. My first choice would be to have it in the infrastructure home, but the install script does not allow that.

$ ./ --install --directory /u01/app/oracle/product/xag
Installing Oracle Grid Infrastructure Agents on: ogg-dest

The control utility, ‘agctl’ can be run without any environment variables set to point to the directory where the utility is installed, or the grid infrastructure home.

$ /u01/app/oracle/product/xag/bin/agctl query releaseversion
The Oracle Grid Infrastructure Agents release version is 6.1.1

To add a golden gate resource, add it via the agctl utility:

$ /u01/app/oracle/product/xag/bin/agctl add goldengate gg1 --gg_home /u01/app/oracle/product/12.1.2/oggcore_1 --instance_type target --oracle_home /u01/app/oracle/product/ --databases ora.dest.db

This is a very simple example, where a cluster resource called ‘gg1’ is created, for which we point out the golden gate home, the database ORACLE_HOME and the database cluster resource (not the database name as some documentation from Oracle says).

In my case, golden gate was not running. After the golden gate cluster resource was added, it’s offline:

      1        OFFLINE OFFLINE                               STABLE

Now start the resource using the ‘agctl’ utility:

/u01/app/oracle/product/xag/bin/agctl start goldengate gg1

(please mind you could do exactly the same with crsctl: crsctl start res xag.gg1.goldengate)
Starting does not result in any message. Let’s look at the golden gate cluster resource:

      1        ONLINE  ONLINE       ogg-dest                 STABLE

What it does, is start the golden gate manager (mgr) process:

$ /u01/app/oracle/product/12.1.2/oggcore_1/ggsci << H
> info all
> H

Oracle GoldenGate Command Interpreter for Oracle
Version OGGCORE_12.
Linux, x64, 64bit (optimized), Oracle 12c on Aug  7 2014 10:21:34
Operating system character set identified as UTF-8.

Copyright (C) 1995, 2014, Oracle and/or its affiliates. All rights reserved.

GGSCI (ogg-dest.local) 1>
Program     Status      Group       Lag at Chkpt  Time Since Chkpt

REPLICAT    ABENDED     REP1        00:00:00      17:55:53

Let’s add monitoring of the replicat. In real life scenario’s it’s the extracts or replicats that are important, these perform the actual work golden gate is supposed to do!

/u01/app/oracle/product/xag/bin/agctl modify goldengate gg1 --monitor_replicats rep1

The cluster ware checks the golden gate resource type status every 30 seconds. So if you check the cluster immediately after modifying the golden gate resource, it could not have performed the check with the added check for the replicat.

      1        ONLINE  INTERMEDIATE ogg-dest                 ER(s) not running :

It now neatly reports the replicat group rep1 not running! I guess ‘ER’ means Extract and Replicat.
Now start the replicat, and check the status again:

      1        ONLINE  ONLINE       ogg-dest                 STABLE

A few final notes. This is the simplest possible setup. I couldn’t find a lot of information on the infrastructure agents except for some Oracle provided ones, which is why I created this blog. If you apply this on a cluster, I would urge you to look into the oracle provided ones, and create the full setup with golden gate writing its files on a cluster filesystem, using an application VIP, etcetera, so golden gate could be started or failed over to another host.

Goldengate is not the only thing this agents can incorporate into the cluster ware. The current version (6.1.1) supports, besides golden gate: tomcat, apache, JDE, mysql, various types of peoplesoft servers, siebel and web logic.

Deinstalling is very simple, but might put you on your wrong foot. The agents are deinstalled by using the setup script ( with the ‘–deinstall’ argument. This of course is quite normal, but it deinstalls the agents from the location where you call This means that if you run from the location where you unzipped it for installation, it will ‘deinstall’, which means clean out, from that directory, leaving the true installation alone. The documentation states that upgrades to the agents are done by removing them and installing the new version.

This article is written with examples taken from an (virtualised) Oracle Linux 6u6 X86_64 operating system, and Oracle database version However, I think the same behaviour is true for Oracle 11 and 10 and earlier versions.

Probably most readers of this blog are aware that a “map” of mapped memory for a process exists for every process in /proc, in a pseudo file called “maps”. If I want to look at my current process’ mappings, I can simply issue:

$ cat /proc/self/maps
00400000-0040b000 r-xp 00000000 fc:00 786125                             /bin/cat
0060a000-0060b000 rw-p 0000a000 fc:00 786125                             /bin/cat
0060b000-0060c000 rw-p 00000000 00:00 0
0080a000-0080b000 rw-p 0000a000 fc:00 786125                             /bin/cat
01243000-01264000 rw-p 00000000 00:00 0                                  [heap]
345b000000-345b020000 r-xp 00000000 fc:00 276143                         /lib64/
345b21f000-345b220000 r--p 0001f000 fc:00 276143                         /lib64/
345b220000-345b221000 rw-p 00020000 fc:00 276143                         /lib64/
345b221000-345b222000 rw-p 00000000 00:00 0
345b800000-345b98a000 r-xp 00000000 fc:00 276144                         /lib64/
345b98a000-345bb8a000 ---p 0018a000 fc:00 276144                         /lib64/
345bb8a000-345bb8e000 r--p 0018a000 fc:00 276144                         /lib64/
345bb8e000-345bb8f000 rw-p 0018e000 fc:00 276144                         /lib64/
345bb8f000-345bb94000 rw-p 00000000 00:00 0
7f8f69686000-7f8f6f517000 r--p 00000000 fc:00 396081                     /usr/lib/locale/locale-archive
7f8f6f517000-7f8f6f51a000 rw-p 00000000 00:00 0
7f8f6f524000-7f8f6f525000 rw-p 00000000 00:00 0
7fff2b5a5000-7fff2b5c6000 rw-p 00000000 00:00 0                          [stack]
7fff2b5fe000-7fff2b600000 r-xp 00000000 00:00 0                          [vdso]
ffffffffff600000-ffffffffff601000 r-xp 00000000 00:00 0                  [vsyscall]

What we see, is the start and end address, the rights (rwx), absence of rights is shown with a ‘-‘, and an indication of the mapped memory region is (p)rivate or (s)hared. In this example, there are no shared memory regions. Then an offset of the mapped file, then the device (major and minor device number). In our case sometimes this is ‘fc:00’. If you wonder what device this might be:

$ echo "ibase=16; FC" | bc
$ ls -l /dev | egrep 252,\ *0
brw-rw---- 1 root disk    252,   0 Mar 23 14:19 dm-0
$ sudo dmsetup info /dev/dm-0
Name:              vg_oggdest-lv_root
State:             ACTIVE
Read Ahead:        256
Tables present:    LIVE
Open count:        1
Event number:      0
Major, minor:      252, 0
Number of targets: 2
UUID: LVM-q4nr4HQXgotaaJFaGF1nzd4eZPPTohndgz553dw6O5pTlvM0SQGLFsdp170pgHuw

So, this is a logical volume lv_root (in the volume group vg_oggdest).

Then the inode number (if a file was mapped, if anonymous memory was mapped the number 0 is shown), and then the path if a file was mapped. This is empty for anonymous mapped memory (which is memory which is added to a process using the mmap() call). Please mind there are also special regions like: [heap],[stack],[vdso] and [vsyscall].

Okay, so far I’ve shown there is a pseudo file called ‘maps’ which shows mapped memory and told a bit about the fields in the file. Now let’s move on to the actual topic of this blog: the Oracle database SGA memory, and the indicator this is deleted!

In this example I pick the maps file of the PMON process of an Oracle database. Of course the database must use system V shared memory, not shared memory in /dev/shm (which is typically what you see when Oracle’s automatic memory (AMM) feature is used). This is a snippet from the maps file of the pmon process on my server:

 cat /proc/2895/maps
00400000-1093f000 r-xp 00000000 fc:00 1326518                            /u01/app/oracle/product/
10b3e000-10dbf000 rw-p 1053e000 fc:00 1326518                            /u01/app/oracle/product/
10dbf000-10df0000 rw-p 00000000 00:00 0
12844000-1289d000 rw-p 00000000 00:00 0                                  [heap]
60000000-60001000 r--s 00000000 00:04 111902723                          /SYSV00000000 (deleted)
60001000-602cc000 rw-s 00001000 00:04 111902723                          /SYSV00000000 (deleted)
60400000-96400000 rw-s 00000000 00:04 111935492                          /SYSV00000000 (deleted)
96400000-9e934000 rw-s 00000000 00:04 111968261                          /SYSV00000000 (deleted)
9ec00000-9ec05000 rw-s 00000000 00:04 112001030                          /SYSV6ce0e164 (deleted)
345b000000-345b020000 r-xp 00000000 fc:00 276143                         /lib64/
345b21f000-345b220000 r--p 0001f000 fc:00 276143                         /lib64/

If you look closely, you see the oracle executable first, with two entries, one being readonly (r-xp), the other being read-write (rw-p). The first entry is readonly because it is shared with other processes, which means that there is no need for all the processes to load the Oracle database executable in memory, it shares the executable with other process. There’s much to say about that too, which should be done in another blogpost.

After the executable there are two anonymous memory mappings, of which one is the process’ heap memory.

Then we see what this blogpost is about: there are 5 mappings which are shared (r–s and rw-s). These are the shared memory regions of the Oracle database SGA. What is very odd, is that at the end of the lines it says “(deleted)”.

Of course we all know what “deleted” means. But what does it mean in this context? Did somebody delete the memory segments? Which actually can be done with the ‘ipcrm’ command…

If you go look at the maps of other Oracle processes and other databases you will see that every database’s shared memory segment are indicated as ‘(deleted)’.

Word of warning: only execute the steps below on a test environment, do NOT do this in a production situation.

In order to understand this, the best way to see what actually is happening, is starting up the Oracle database with a process which is traced with the ‘strace’ utility with the ‘-f’ option set (follow). Together with the ‘-o’ option this will produce a (long) file with all the system calls and the arguments of the calls which happened during startup:

$ strace -f -o /tmp/oracle_startup.txt sqlplus / as sysdba

Now start up the database. Depending on your system you will notice the instance startup takes longer. This is because for every system call, strace needs to write a line in the file /tmp/oracle_start.txt. Because of this setup, stop the database as soon as it has started, on order to stop the tracing from crippling the database performance.

Now open the resulting trace file (/tmp/oracle_startup.txt) and filter it for the system calls that are relevant (calls with ‘shm’ in their name):

$ grep shm /tmp/oracle_startup.txt | less

Scroll through the output until you see a line alike ‘shmget(IPC_PRIVATE, 4096, 0600) = 130777091’:

4545  shmget(IPC_PRIVATE, 4096, 0600)   = 130777091
4545  shmat(130777091, 0, 0)            = ?
4545  shmctl(130777091, IPC_STAT, 0x7fff9eb9da30) = 0
4545  shmdt(0x7f406f2ba000)             = 0
4545  shmctl(130777091, IPC_RMID, 0)    = 0
4545  shmget(IPC_PRIVATE, 4096, 0600)   = 130809859
4545  shmat(130809859, 0, 0)            = ?
4545  shmctl(130809859, IPC_STAT, 0x7fff9eb9da30) = 0
4545  shmdt(0x7f406f2ba000)             = 0
4545  shmctl(130809859, IPC_RMID, 0)    = 0

What we see here is a (filtered) sequence of systems calls that could explain the status deleted of the shared memory segments. If you look up what process id is in front of these shm system calls, you will see it’s the foreground process starting up the instance. If you look closely, you’ll that there is a sequence which is repeated often:

1. shmget(IPC_PRIVATE, 4096, 0600) = 130777091
The system call shmget allocates a shared memory segment of 4 kilobyte, rights set to 600. The return value is the shared memory identifier of the requested shared memory segment.

2. shmat(130777091, 0, 0) = ?
The system call shmat attaches the a shared memory segment to the process’ address space. The first argument is the shared memory identifier, the second argument is the address to attach the segment to. If the argument is zero, like in the call above, it means the operating system is tasked with finding a suitable (non used) address. The third argument is for flags, the value zero here means no flags are used. The returncode (here indicated with a question mark) is the address at which the segment is attached. This being a question mark means strace is not able to read the address, which is a shame, because we can’t be 100% certain at which memory address this shared memory segment is mapped.

3. shmctl(130777091, IPC_STAT, 0x7fff9eb9da30) = 0
The system call shmctl with the argument IPC_STAT has the function to read the (kernel) shared memory information of the shared memory identifier indicated by the first argument, and write it at the memory location in the third argument in a struct called shmid_ds.

4. shmdt(0x7f406f2ba000) = 0
With this system call, the shared memory segment is detached from the process’ address space. For the sake of the investigation, I assumed that the address in this call is the address which is returned by the shmat() call earlier.

5. shmctl(130777091, IPC_RMID, 0) = 0
This is another shared memory control system call, concerning our just created shared memory segment (shared memory identifier 130777091), with the command ‘IPC_RMID’. This is what the manpage says about IPC_RMID:

       IPC_RMID  Mark the segment to be destroyed.  The segment will only  actually  be  destroyed
                 after the last process detaches it (i.e., when the shm_nattch member of the asso-
                 ciated structure shmid_ds is zero).  The caller must be the owner or creator,  or
                 be privileged.  If a segment has been marked for destruction, then the (non-stan-
                 dard) SHM_DEST flag of the shm_perm.mode field in the associated  data  structure
                 retrieved by IPC_STAT will be set.

What I thought this means was:
It looked like to me the database instance starts building up its shared memory segments per 4096 page. Because IPC_RMID only marks the segment to be destroyed, and because it will only be truly destroyed when there are no processes attached to the shared memory segment, it looked like to me the background processes were pointed to the shared memory segment which was marked destroyed (in some way I hadn’t discovered yet), which meant the shared memory segment would actually survive and all database processes can use it. If ALL the database processes would be killed for any reason, for example with a shutdown abort, the processes would stop being connected to the shared memory segment, which would mean the shared memory segment would vanish automatically, because it was marked for destruction.
Sounds compelling, right?

Well…I was wrong! The sequence of creating and destroying small shared memory segments is done, but it turns out these are truly destroyed with the shmctl(…,IPC_RMID,…) call. I don’t know why the sequence of creating shared memory segments is happening.

I started looking for the actual calls that create the final, usable shared memory segments in the /tmp/oracle_startup.txt file. This is actually quite easy to do; first look up the shared memory segment identifiers using the sysresv utility (make sure the database’s ORACLE_HOME and ORACLE_SID are set):

$ sysresv
...a lot of other output...
Shared Memory:
197394436	0x00000000
197427205	0x00000000
197361667	0x00000000
197459974	0x6ce0e164
1015811 	0xd5cdbca4
Oracle Instance alive for sid "dest"

Actually the ‘sysresv’ utility (system remove system V memory I think is what the name means) has the task of removing memory segments if there is no instance left to use them. It will not remove the memory segments if it finds the instance alive. It prints out a lot of information as a bonus.

Now that we got the shared memory identifiers, simply search in the trace file generated by strace, and search for the creation of the memory segment with the identifiers: (please mind searching with ‘less’ is done with the forward slash)

$ less /tmp/oracle_startup.txt
9492  shmget(IPC_PRIVATE, 905969664, IPC_CREAT|IPC_EXCL|0640) = 197394436
9492  shmat(197394436, 0x60400000, 0)   = ?
9492  times(NULL)                       = 430497743
9492  write(4, " Shared memory segment allocated"..., 109) = 109
9492  write(4, "\n", 1)                 = 1

Aha! here we see shmget() again, but now with a size (905969664) that looks much more like a real shared memory segment size used by the database! After the shared memory identifier is created, the process attaches it to its addressing space with shmat() to a specific memory address: 0x60400000.

The next thing to do, is to look for any shmctl() call for this identifier. Oracle could still do the trick of marking the segment for destruction…
…But…there are no shmctl() calls for this identifier, nor for any of the other identifiers shown with the sysresv utility. This is rather odd, because Linux shows them as “(deleted)”. There ARE dozens of shmat() calls, of the other (background) processes forked from the starting process when they attach to the shared memory segments.

So, conclusion at this point is Linux shows the shared memory segments as deleted in ‘maps’, but the Oracle database does not mark the segments for destruction after creation. This means that either Linux is lying, or something mysterious is happening in the Oracle executable which I didn’t discover yet.

I could only think of one way to verify what is truly happening here. That is to create a program myself that uses shared memory, so I have 100% full control over what is happening, and can control every distinct step.

This is what I came up with:

#include <stdio.h>
#include <sys/shm.h>
#include <sys/stat.h>

int main ()
  int segment_id;
  char* shared_memory;
  struct shmid_ds shmbuffer;
  int segment_size;
  const int shared_segment_size = 0x6400;

  /* Allocate a shared memory segment.  */
  segment_id = shmget (IPC_PRIVATE, shared_segment_size,
                     IPC_CREAT | IPC_EXCL | S_IRUSR | S_IWUSR);
  printf ("1.shmget done\n");
  /* Attach the shared memory segment.  */
  shared_memory = (char*) shmat (segment_id, 0, 0);
  printf ("shared memory attached at address %p\n", shared_memory);
  printf ("2.shmat done\n");
  /* Determine the segment's size. */
  shmctl (segment_id, IPC_STAT, &shmbuffer);
  segment_size  =               shmbuffer.shm_segsz;
  printf ("segment size: %d\n", segment_size);
  printf ("3.shmctl done\n");
  /* Write a string to the shared memory segment.  */
  sprintf (shared_memory, "Hello, world.");
  /* Detach the shared memory segment.  */
  shmdt (shared_memory);
  printf ("4.shmdt done\n");

  /* Deallocate the shared memory segment.  */
  shmctl (segment_id, IPC_RMID, 0);
  printf ("5.shmctl ipc_rmid done\n");

  return 0;

(I took the code from this site, and modified it a bit for my purposes)
If you’ve got a linux system which is setup with the preinstall rpm, you should be able to copy this in a file on your (TEST!) linux database server, in let’s say ‘shm.c’, and compile it using ‘cc shm.c -o smh’. This will create an executable ‘shm’ from this c file.

This program does more or less the same sequence we saw earlier:
1. Create a shared memory identifier.
2. Attach to the shared memory identifier.
3. Get information on the shared memory segment in a shmid_ds struct.
4. Detach the shared memory segment.
5. Destroy it using shmctl(IPC_RMID).

What I did was have two terminals open, one to run the shm program, and one to look for the results of the steps.

Step 1. (shmget)

$ ./shm
1. shmget done

When looking with ipcs, you can see the shared memory segment which is created because of the shmget() call:

$ ipcs -m

------ Shared Memory Segments --------
0x00000000 451608583  oracle     600        25600      0

when looking in the address space of the process running the shm program, the shared memory segment is not found. This is exactly what I expect, because it’s only created, not attached yet.

Step 2. (shmat)

shared memory attached at address 0x7f3c4aa6e000
2.shmat done

Of course the shared memory segment is still visible with ipcs:

0x00000000 451608583  oracle     600        25600      1

And we can see from ipcs in the last column (‘1’) that one process attached to the segment. Of course exactly what we suspected.
But now that we attached the shared memory to the addressing space, it should be visible in maps:

7f3c4aa6e000-7f3c4aa75000 rw-s 00000000 00:04 451608583                  /SYSV00000000 (deleted)

Bingo! The shared memory segment is visible, as it should be, because we just attached it with shmat(). But look: it’s deleted already according to Linux!

However I am pretty sure, as in 100% sure, that I did not do any attempts to mark the shared memory segment destroyed or do anything else to make it appear to be deleted. So, this means maps lies to us.

So, the conclusion is the shared memory Oracle uses is not deleted, it’s something that Linux shows us, and is wrong. When looking at the maps output again, we can see the shared memory identifier is put at the place of the inode number. This is handy, because it allows you to take the identifier, and look with ipcs for shared memory segments and understand which specific shared memory segment a process is using. It probably means that maps tries to look up the identifier number as inode number, which it will not be able to find, and then comes to the conclusion that it’s deleted.

However, this is speculation. Anyone with more or better insight is welcome to react on this article.

Every DBA working with the Oracle database must have seen memory dumps in tracefiles. It is present in ORA-600 (internal error) ORA-7445 (operating system error), system state dumps, process state dumps and a lot of other dumps.

This is how it looks likes:

Dump of memory from 0x00007F06BF9A9E00 to 0x00007F06BF9ADE00
7F06BF9A9E00 0000C215 0000001F 00000CC1 0401FFFF  [................]
7F06BF9A9E10 000032F3 00010003 00000002 442B0000  [.2............+D]
7F06BF9A9E20 2F415441 31323156 4F2F3230 4E494C4E  [ATA/V12102/ONLIN]
7F06BF9A9E30 474F4C45 6F72672F 315F7075 3735322E  [ELOG/group_1.257]
7F06BF9A9E40 3336382E 36313435 00003338 00000000  [.863541683......]
7F06BF9A9E50 00000000 00000000 00000000 00000000  [................]

The first column is the memory location in hexadecimal.
The second to fifth columns represent the actual memory values in hexadecimal.
The sixth column shows an ASCII representation of the memory contents. If a position does not represent an ASCII character, a dot (“.”) is printed.

Actually, the values in the second to fifth column are grouped in four columns. This is how the values in a column look like:
{hex val}{hex val}{hex val}{hex val}, for example: 00010203 means: 0, 1, 2, 3.

In the ASCII representation (sixth column) the spaces after every four values are not put in.

However, look at the following line:

7F06BF9A9E10 000032F3 00010003 00000002 442B0000  [.2............+D]

And focus on the last four characters:
“..+D” (two non-printables, plus, D)
Now look at the corresponding memory contents from the dump:
“442B0000” This is: “44 2B 00 00”, which should correspond to “. . + D”.
There is something the matter here: the plus and the D seem to be represented by “00”. That’s not correct.

Let’s see what “442B0000” actually represents in ASCI:

$ echo -e "\x44\x2B\x00\x00"

Ah! That looks backwards! Let’s take a full line and see what that gives:
(This is the line with memory address 0x7F06BF9A9E20)

$ echo -e "\x2F\x41\x54\x41 \x31\x32\x31\x56 \x4F\x2F\x32\x30 \x4E\x49\x4C\x4E"
/ATA 121V O/20 NILN

So if you want to look at the actual memory contents, you need to start with the column on the left side, read the values from right to left, then go the next column, etc.

Actual, I asked my friend Philippe Fierens for a trace file from a SPARC (big endian) platform, to see if the endianness of the platform was causing this. I test my stuff on Linux, which is little endian.

Here’s a little snippet:

Dump of memory from 0xFFFFFFFF7D977E00 to 0xFFFFFFFF7D97BE00
FFFFFFFF7D977E00 15C20000 00000001 00000000 00000104  [................]
FFFFFFFF7D977E10 F4250000 00000000 0B200400 E2EB8A3D  [.%....... .....=]
FFFFFFFF7D977E20 44475445 53540000 32F6D98B 00000590  [DGTEST..2.......]
FFFFFFFF7D977E30 00004000 00000001 00000000 00000000  [..@.............]
FFFFFFFF7D977E40 00000000 00000000 00000000 00000000  [................]

Let’s test the line from address 0xFFFFFFFF7D977E20:

[oracle@bigmachine [v12102] trace]$ echo -e "\x44\x47\x54\x45 \x53\x54\x00\x00 \x32\xF6\xD9\x8B \x00\x00\x05\x90"
DGTE ST 2� �

So, the endianness determines how the raw memory contents should be read.

This is the 4th post in a series of posts on PGA behaviour of Oracle. Earlier posts are: here (PGA limiting for Oracle 12), here (PGA limiting for Oracle 11.2) and the quiz on using PGA with AMM, into which this blogpost dives deeper.

As laid out in the quiz blogpost, I have a database with the following specifics:
-Oracle Linux x86_64 6u6.
-Oracle database PSU 4
-Oracle database (single instance) with the following parameter set: memory_target=1G. No other memory related parameters set.

In this setup, I run the pga_filler script (source code here), which creates a collection until the session statistic ‘session pga memory’ exceeds the grow_until variable, which for this case I set to 2100000000 (approximately 2.1G).

So: the instance is set to have AMM (memory_target) with a size of 1GB, which is supposed to be the total amount memory which this instance uses, and a session runs a PL/SQL procedure which only stops if it has allocated 2.1GB, which is clearly more than configured with the memory_target parameter. Please mind a collection, which the anonymous procedure uses to allocate memory, is outside of the memory areas for which Oracle can move data to the assigned temporary tablespace (sort, hash and bitmap memory areas).

After startup of the instance with only memory_target set to 1G, the memory partitioning looks like this:

SYS@v11204 AS SYSDBA> select component, current_size/power(1024,2), last_oper_type from v$memory_dynamic_components where current_size != 0;

---------------------------------------------------------------- -------------------------- -------------
shared pool										168 STATIC
large pool										  4 STATIC
java pool										  4 STATIC
SGA Target										612 STATIC
DEFAULT buffer cache									424 INITIALIZING
PGA Target										412 STATIC

This is how v$pgastat looks like:

SYS@v11204 AS SYSDBA> select * from v$pgastat;

NAME								      VALUE UNIT
---------------------------------------------------------------- ---------- ------------
aggregate PGA target parameter					  432013312 bytes
aggregate PGA auto target					  318200832 bytes
global memory bound						   86402048 bytes
total PGA inuse 						   78572544 bytes
total PGA allocated						   90871808 bytes
maximum PGA allocated						   93495296 bytes
total freeable PGA memory					    2818048 bytes
process count								 57
max processes count							 58
PGA memory freed back to OS					    3211264 bytes
total PGA used for auto workareas					  0 bytes
maximum PGA used for auto workareas					  0 bytes
total PGA used for manual workareas					  0 bytes
maximum PGA used for manual workareas					  0 bytes
over allocation count							  0
bytes processed 						    8479744 bytes
extra bytes read/written						  0 bytes
cache hit percentage							100 percent
recompute count (total) 						 18

SYS@v11204 AS SYSDBA> show parameter pga

------------------------------------ ----------- ------------------------------
pga_aggregate_target		     big integer 0

Okay, so far so good. v$memory_dynamic_components shows the PGA Target being 412M, and v$pgastat shows the aggregate PGA target setting being 412M too. I haven’t set pga_aggregate_target (as shown with ‘show parameter pga’), because I am using memory_target/AMM for the argument I hear the most in favour of it: one knob to tune.

Next up, I start the pga_filler script, which means the session starts to allocate PGA.

I keep a close watch using v$pgastat:

SYS@v11204 AS SYSDBA> select * from v$pgastat;

NAME								      VALUE UNIT
---------------------------------------------------------------- ---------- ------------
aggregate PGA target parameter					  432013312 bytes
aggregate PGA auto target					  124443648 bytes
global memory bound						   86402048 bytes
total PGA inuse 						  296896512 bytes
total PGA allocated						  313212928 bytes
maximum PGA allocated						  313212928 bytes

This shows the pga_filler script in progress by looking at v$pgastat from another session. The total amount of PGA allocated has grown to 313212928 (298M) here.

A little while later, the amount of PGA taken has grown beyond the PGA target (only relevant rows):

total PGA inuse 						  628974592 bytes
total PGA allocated						  645480448 bytes
maximum PGA allocated						  645480448 bytes

However, when looking at the memory components using v$memory_dynamic_components, it gives the impression PGA memory is still 412M:

SYS@v11204 AS SYSDBA> select component, current_size/power(1024,2), last_oper_type from v$memory_dynamic_components where current_size != 0;

---------------------------------------------------------------- -------------------------- -------------
shared pool										168 STATIC
large pool										  4 STATIC
java pool										  4 STATIC
SGA Target										612 STATIC
DEFAULT buffer cache									424 INITIALIZING
PGA Target										412 STATIC

You could argue PGA is explicitly mentioned as ‘PGA Target’, but then: the total of the memory area’s (PGA Target+SGA Target) do show a size that roughly sums up to be equal to the memory_target.

A little while later, this is what v$pgastat is showing:

total PGA inuse 						  991568896 bytes
total PGA allocated						 1008303104 bytes
maximum PGA allocated						 1008303104 bytes

Another glimpse at v$memory_dynamic_components shows the same output as above, PGA Target at 412M. This is the point where it get’s a bit weird: the total amount of PGA memory (according to v$pgastat) shows it’s almost 1G, memory_target is set at 1G, and yet v$memory_dynamic_components show no change at all.

Again a little further in time:

total PGA inuse 						 1325501440 bytes
total PGA allocated						 1342077952 bytes
maximum PGA allocated						 1342077952 bytes

Okay, here it get’s really strange: there’s more memory allocated for PGA memory alone than has been set with memory_target for both PGA and SGA memory structures. Also, v$memory_dynamic_components shows no change in SGA memory structures or exchange of memory from SGA to PGA memory.

If v$pgastat is correct, and memory_target actively limits the total amount of both SGA and PGA, then the session must allocate memory out of thin air! But I guess you already came to the conclusion too that either v$pgastat is incorrect, or memory_target does not limit memory allocations (as at least I think it would do).

Let’s dump the PGA heap of the active process to see the real memory allocations of this process:

SYS@v11204 AS SYSDBA> oradebug setospid 9041
Oracle pid: 58, Unix process pid: 9041, image: oracle@bigmachine.local (TNS V1-V3)
SYS@v11204 AS SYSDBA> oradebug unlimit
Statement processed.
SYS@v11204 AS SYSDBA> oradebug dump heapdump 1
Statement processed.

(9041 is the PID of the process running PL/SQL)

Now look into (the relevant) data of the PGA heap dump:

[oracle@bigmachine [v11204] trace]$ grep Total\ heap\ size v11204_ora_9041.trc
Total heap size    =1494712248
Total heap size    =    65512
Total heap size    =  1638184

Okay, this is clear: the process actually took 1494712248 (=1425M) plus a little more memory. So, memory_target isn’t that much of a hard setting after all.

But where does this memory come from? There ought to be a sort of combined memory effort together with the SGA for memory, right? That was the memory_target promise!

Let’s take a look at the actual memory allocations of a new foreground process in /proc/PID/maps:

[oracle@bigmachine [v11204] trace]$ less /proc/11405/maps
00400000-0bcf3000 r-xp 00000000 fc:02 405855559                          /u01/app/oracle/product/
0bef2000-0c0eb000 rw-p 0b8f2000 fc:02 405855559                          /u01/app/oracle/product/
0c0eb000-0c142000 rw-p 00000000 00:00 0
0c962000-0c9c6000 rw-p 00000000 00:00 0                                  [heap]
60000000-60001000 r--s 00000000 00:10 351997                             /dev/shm/ora_v11204_232652803_0
60001000-60400000 rw-s 00001000 00:10 351997                             /dev/shm/ora_v11204_232652803_0
9fc00000-a0000000 rw-s 00000000 00:10 352255                             /dev/shm/ora_v11204_232685572_252
a0000000-a0400000 rw-s 00000000 00:10 354306                             /dev/shm/ora_v11204_232718341_0
3bb3000000-3bb3020000 r-xp 00000000 fc:00 134595                         /lib64/
3bb321f000-3bb3220000 r--p 0001f000 fc:00 134595                         /lib64/
3bb3220000-3bb3221000 rw-p 00020000 fc:00 134595                         /lib64/
3bb3221000-3bb3222000 rw-p 00000000 00:00 0
3bb3400000-3bb3401000 r-xp 00000000 fc:00 146311                         /lib64/
3bb5e16000-3bb5e17000 rw-p 00016000 fc:00 150740                         /lib64/
3bb5e17000-3bb5e19000 rw-p 00000000 00:00 0
7f018415a000-7f018416a000 rw-p 00000000 00:05 1030                       /dev/zero
7f018416a000-7f018417a000 rw-p 00000000 00:05 1030                       /dev/zero
7f018417a000-7f018418a000 rw-p 00000000 00:05 1030                       /dev/zero
7f018418a000-7f018419a000 rw-p 00000000 00:05 1030                       /dev/zero
7f018419a000-7f01841aa000 rw-p 00000000 00:05 1030                       /dev/zero
7f01841aa000-7f01841ba000 rw-p 00000000 00:05 1030                       /dev/zero
7f01841ba000-7f01841ca000 rw-p 00000000 00:05 1030                       /dev/zero
7f01841ca000-7f01841da000 rw-p 00000000 00:05 1030                       /dev/zero
7f01841da000-7f01841ea000 rw-p 00000000 00:05 1030                       /dev/zero
7f01841ea000-7f01841fa000 rw-p 00000000 00:05 1030                       /dev/zero
7f01841fa000-7f018420a000 rw-p 00000000 00:05 1030                       /dev/zero
7f018420a000-7f018421a000 rw-p 00000000 00:05 1030                       /dev/zero
7f018421a000-7f018422a000 rw-p 00000000 00:05 1030                       /dev/zero
7f68d497b000-7f68d4985000 r-xp 00000000 fc:02 268585089                  /u01/app/oracle/product/

When I run the pga_filler anonymous PL/SQL block, and strace (system call trace) utility, I see (snippet):

mmap(0x7f0194f7a000, 65536, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED, 6, 0) = 0x7f0194f7a000
mmap(0x7f0194f8a000, 65536, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED, 6, 0) = 0x7f0194f8a000
mmap(0x7f0194f9a000, 65536, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED, 6, 0) = 0x7f0194f9a000
mmap(0x7f0194faa000, 65536, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED, 6, 0) = 0x7f0194faa000
mmap(0x7f0194fba000, 65536, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED, 6, 0) = 0x7f0194fba000
mmap(0x7f0194fca000, 65536, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED, 6, 0) = 0x7f0194fca000
mmap(0x7f0194fda000, 65536, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED, 6, 0) = 0x7f0194fda000
mmap(NULL, 1048576, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_NORESERVE, 6, 0xea000) = 0x7f0194e6a000
mmap(0x7f0194e6a000, 65536, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED, 6, 0) = 0x7f0194e6a000
mmap(0x7f0194e7a000, 131072, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED, 6, 0) = 0x7f0194e7a000
mmap(0x7f0194e9a000, 131072, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED, 6, 0) = 0x7f0194e9a000
mmap(0x7f0194eba000, 131072, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_FIXED, 6, 0) = 0x7f0194eba000

So, when looking back, it’s very easy to spot the SGA memory, which resides in /dev/shm in my case, and looks like ‘/dev/shm/ora_v11204_232652803_0’ in the above /proc/PID/maps snippet.
This means that the mmap() calls are simply, as anyone would have guessed by now, the PGA memory allocations. In the maps snippet these are visible as being mapped to /dev/zero.
When looking at the mmap() call, at the 5th argument, which is the number 6, we look at a file descriptor. In /proc/PID/fd the file descriptors can be seen, and file descriptor 6 is /dev/zero, as you probably suspected. This way the allocated memory is initial set to zero.

By now, the pga_filler script finishes:

TS@v11204 > @pga_filler
begin pga size : 3908792
last  pga size : 2100012216
begin uga size : 1607440
last  uga size : 2000368
parameter pat  : 0

Taking the entire 2.1G I made the collection to grow to. With memory_target set to 1G.

The first conclusion I made is that PGA memory is very much different than SGA/shared memory. Anyone with a background in Oracle operating-system troubleshooting will find this quite logical. However, the “promise” AMM/memory_target made, in my interpretation, is that the memory would be used seamless. This is simply not the case. Shared memory is in /dev/shm, and PGA is mmaped/allocated as private memory.

Still, this wouldn’t be that much of an issue if memory_target would limit memory in a rigid way, and memory could, and actually would, very easily float between PGA and SGA. It simply doesn’t.

Why don’t we see Oracle trying to reallocate memory? This is the point where I can only guess.

– Probably, Oracle would try to grow the shared pool if it has problems allocating memory for SQL, library cache, etc. This probably hasn’t happened in my test.
– Probably, Oracle would try to grow the buffer cache if it can calculate a certain benefit from enlarging it. This probably hasn’t happened in my test.
– The other SGA area’s (large and java pool) probably are grown if these are used, and need more space for allocations. This probably didn’t happen in my test.
– For the PGA, a wild guess is the memory manager calculates using the workarea sizes (sort, hash and bitmap areas), which are not noticeably used in my test.

Another conclusion and opinion is AMM/memory_target is not a set once and forget option. In fact, it isn’t that much of a difference from using ASMM from a DBA perspective: you carefully need to understand the SGA size, and you carefully need to (try to) manage the PGA memory. Or reasoned the other way around: the only way you can sensibly set memory_target is if you know the correct SGA size and the PGA usage. Also having Oracle manage the memory area’s automatically is not unique to AMM: Oracle will reallocate (inside the SGA) if it finds it necessary, with AMM, ASMM and even manual set memory area’s. But the big dis-advantage of AMM (at least on linux, not sure about other operating systems) is that huge pages can’t be used, which has a severe impact on “real life” databases, in my experience. (Solaris CAN use huge pages with AMM(!)).

A final word: of course I tested a very specific situation. In most real-life cases there will be multiple sessions, and the PGA manageable memory areas will be used. However, the point I try to make is memory_target is simply not a way to very easily make your database be hard limited to the value set. Probably, in real life, the real amount of memory used by the instance will in the area of the value set with memory_target, but this will be subject to what memory areas you are exactly using. Of course it can differ in a spectaculair way if collections or alike structures are used by a large number of sessions.

This is a series of blogposts on how the Oracle database makes use of PGA. Earlier posts can be found here (PGA limiting for Oracle 12) and here (PGA limiting for Oracle 11.2).

Today a little wednesday fun: a quiz.

What do you think will happen in the following situation (leave a response as comment please!):

-Oracle Linux x86_64 6u6.
-Oracle database PSU 4
-Oracle database (single instance) with the following parameter set: memory_target=1G. No other memory related parameters set.

Run the pga_filler script (which can be found here (PGA limiting for Oracle 12)), with grow_until set to 2100000000 (approximately 2.1G).

I’ll try to create a blogpost on the outcome and an explanation on short notice!


Get every new post delivered to your Inbox.

Join 2,650 other followers

%d bloggers like this: