There are many posts about the amount of memory that is taken by the Oracle database executables and the database SGA and PGA. The reason for adding yet another one on this topic is a question I recently gotten, and the complexities which surrounds memory usage on modern systems. The intention for this blogpost is to show a tiny bit about page sharing of linux for private pages, then move on to shared pages, and discuss how page allocation looks like with Oracle ASMM (sga_target or manual memory).
The version of linux in this blogpost is Oracle Linux 7.2, using kernel: 4.1.12-37.6.3.el7uek.x86_64 (UEK4)
The version of the Oracle database software is 18.104.22.168.160719 (july 2016).
Memory usage of virtual memory systems is complicated. For that reason I see a lot of people getting very confused about this topic. Let me state a very simple rule: the memory actively being used on a system should fit in physical memory. Swap (a file or partition), increases total virtual memory, but really only is a safety net for saving your system from an out of memory situation at the cost of moving pages from and to disk. Because modern linux kernels have swappiness (willingness to swap) to a non-zero value, it’s not uncommon to have some swap being used, despite physical memory not being oversubscribed. A system stops performing as soon as paging in and out starts to occur, and for that reason should not happen.
1. Private pages for linux executables
When an executable is executed on linux from the shell, the shell executes a fork() call to create a new process, which is implemented as a clone() system call on linux. Using the clone() system call, the virtual memory space of the newly created process is shared (readonly) with it’s parent. This includes the private allocations! Once the child process needs to write in it’s memory space, it will page fault and create it’s own version, abandoning the version of its parent.
Can we actually prove this is happening? Yes, the /proc/ filesystem gives an insight to a process’ virtual memory space.
Let’s start off with a very simple example: we execute ‘cat /proc/self/maps’ to see our own address space:
[oracle@oracle-linux ~]$ cat /proc/self/maps 00400000-0040b000 r-xp 00000000 fb:00 201666243 /usr/bin/cat 0060b000-0060c000 r--p 0000b000 fb:00 201666243 /usr/bin/cat 0060c000-0060d000 rw-p 0000c000 fb:00 201666243 /usr/bin/cat 00e41000-00e62000 rw-p 00000000 00:00 0 [heap] 7f69729be000-7f6978ee5000 r--p 00000000 fb:00 576065 /usr/lib/locale/locale-archive 7f6978ee5000-7f6979099000 r-xp 00000000 fb:00 522359 /usr/lib64/libc-2.17.so 7f6979099000-7f6979298000 ---p 001b4000 fb:00 522359 /usr/lib64/libc-2.17.so 7f6979298000-7f697929c000 r--p 001b3000 fb:00 522359 /usr/lib64/libc-2.17.so 7f697929c000-7f697929e000 rw-p 001b7000 fb:00 522359 /usr/lib64/libc-2.17.so 7f697929e000-7f69792a3000 rw-p 00000000 00:00 0 7f69792a3000-7f69792c4000 r-xp 00000000 fb:00 522352 /usr/lib64/ld-2.17.so 7f69794b9000-7f69794bc000 rw-p 00000000 00:00 0 7f69794c3000-7f69794c4000 rw-p 00000000 00:00 0 7f69794c4000-7f69794c5000 r--p 00021000 fb:00 522352 /usr/lib64/ld-2.17.so 7f69794c5000-7f69794c6000 rw-p 00022000 fb:00 522352 /usr/lib64/ld-2.17.so 7f69794c6000-7f69794c7000 rw-p 00000000 00:00 0 7ffdab1c7000-7ffdab1e8000 rw-p 00000000 00:00 0 [stack] 7ffdab1ea000-7ffdab1ec000 r--p 00000000 00:00 0 [vvar] 7ffdab1ec000-7ffdab1ee000 r-xp 00000000 00:00 0 [vdso] ffffffffff600000-ffffffffff601000 r-xp 00000000 00:00 0 [vsyscall]
Here’s a lot to see, but we see the cat executable at 0x00400000. The reason for three memory allocations are (linux/ELF) executables uses different sections with specific functions. A full overview of these can be obtained using the readelf executable. A simpler overview of an executable, which matches the above three memory allocations for the cat executable can be obtained using ‘size -B’ (the size executable, -B means ‘berkeley style’, which is default):
[oracle@oracle-linux ~]$ size -B /usr/bin/cat text data bss dec hex filename 43905 1712 2440 48057 bbb9 /usr/bin/cat
This describes the three memory sections an linux executable can have: text (the machine instructions, alias ‘the program’), data (all initialised variables declared in the program) and BSS (uninitialised data).
The first section always is the text allocation (not sure if it’s impossible to have the text section not being the first allocation, I have never seen it different). If you look at the memory flags, ‘r-xp’, this totally makes sense: ‘r-‘ meaning: read(only, followed by a’-‘ instead of a ‘w’), ‘x’: executable and ‘p’: this is a private allocation. The next allocation is the data section. We don’t execute variables, we read them, which is reflected in the flags: ‘r–p’. But what if we change the value of a variable? That is where the third section is for: changed values of initialised variables. This can be seen from the flag of this section: ‘rw-p’, read, write and private. The fourth allocation lists [heap], this is a mandatory allocation in every process’ memory space, which holds (small) memory allocations, this is NOT the BSS section. In this case, the BSS section does not seem to be allocated.
By having memory allocations for /usr/lib64/ld-2.17.so we can see this is a dynamically linked executable. You can also see this by executing ‘file’ on the executable:
[oracle@oracle-linux ~]$ file /usr/bin/cat /usr/bin/cat: ELF 64-bit LSB executable, x86-64, version 1 (SYSV), dynamically linked (uses shared libs), for GNU/Linux 2.6.32, BuildID[sha1]=3207edc47638918ceaeede21947a20a4a496cf63, stripped
If a linux executable is dynamically linked, you can see the libraries that are loaded by the dynamic linker/loader using the ldd utility:
[oracle@oracle-linux ~]$ ldd /usr/bin/cat linux-vdso.so.1 => (0x00007ffceb3e4000) libc.so.6 => /lib64/libc.so.6 (0x00007fd46fb7e000) /lib64/ld-linux-x86-64.so.2 (0x000055d5253c9000)
This output shows the dynamic loader (/lib64/ld-linux-x86-64.so.2), and two libraries the dynamic loader loads: libc.so.6 and linux-vdso.so.1. The first one, libc, is the standard C library. The second one, linux-vdso is for virtual dynamic shared object, which is an optimisation for certain system calls to be executed in user space (notably gettimeofday()).
The other allocations that exist in our example are anonymous mappings (usually done by programs using the mmap() call):
7f69794c6000-7f69794c7000 rw-p 00000000 00:00 0
And some allocations for system purposes, like stack, var, vdso and vsyscall.
Now that you have become familiar with some basic linux memory address space specifics, let’s take it a little further. It’s possible to see more about the memory segments using the proc filesystem smaps file:
[oracle@oracle-linux ~]$ cat /proc/self/smaps 00400000-0040b000 r-xp 00000000 fb:00 201666243 /usr/bin/cat Size: 44 kB Rss: 44 kB Pss: 44 kB Shared_Clean: 0 kB Shared_Dirty: 0 kB Private_Clean: 44 kB Private_Dirty: 0 kB Referenced: 44 kB Anonymous: 0 kB AnonHugePages: 0 kB Swap: 0 kB KernelPageSize: 4 kB MMUPageSize: 4 kB Locked: 0 kB VmFlags: rd ex mr mw me dw sd 0060b000-0060c000 r--p 0000b000 fb:00 201666243 /usr/bin/cat Size: 4 kB Rss: 4 kB ...etc...
Per allocation there are a lot of properties to be seen. ‘Size’ is the full size, ‘Rss’ is the resident set size, alias the amount of data of this segment that is truly resident for this process in it’s address space. ‘Pss’ is fairly unknown, and is the proportional size of this segment. The way it is proportional is that if pages in this allocation are shared with other processes, the size of these pages are divided by the number processes it is shared with. In this case, we have loaded the text segment of the cat executable into the process’ address space, which all is resident (size and rss are the same) and it’s not shared with any process (rss equals pss). There are many more properties, but these are out of scope for this blogpost.
Now let’s move on to Oracle. If you look at the maps output of the pmon process for example, you’ll see:
[oracle@oracle-linux 14153]$ cat maps 00400000-1096e000 r-xp 00000000 fb:03 67209358 /u01/app/oracle/product/22.214.171.124/dbhome_1/bin/oracle 10b6d000-10b8f000 r--p 1056d000 fb:03 67209358 /u01/app/oracle/product/126.96.36.199/dbhome_1/bin/oracle 10b8f000-10de8000 rw-p 1058f000 fb:03 67209358 /u01/app/oracle/product/188.8.131.52/dbhome_1/bin/oracle 10de8000-10e19000 rw-p 00000000 00:00 0 1190f000-11930000 rw-p 00000000 00:00 0 [heap] ...
Here we see the Oracle executable, with a text segment, a readonly data segment and a read/write data segment, and we see an anonymous mapping directly following the data segments. That’s the BSS segment!
However, what is more interesting to see, is the properties of the distinct memory allocations in smaps:
[oracle@oracle-linux 14153]$ cat smaps 00400000-1096e000 r-xp 00000000 fb:03 67209358 /u01/app/oracle/product/184.108.40.206/dbhome_1/bin/oracle Size: 267704 kB Rss: 40584 kB Pss: 819 kB Shared_Clean: 40584 kB Shared_Dirty: 0 kB Private_Clean: 0 kB Private_Dirty: 0 kB Referenced: 40584 kB Anonymous: 0 kB AnonHugePages: 0 kB Swap: 0 kB KernelPageSize: 4 kB MMUPageSize: 4 kB Locked: 0 kB VmFlags: rd ex mr mw me dw sd 10b6d000-10b8f000 r--p 1056d000 fb:03 67209358 /u01/app/oracle/product/220.127.116.11/dbhome_1/bin/oracle Size: 136 kB Rss: 124 kB ...
If we look at the text segment for the oracle binary, we see the total text size is 267704 kB (size), but resident (truly available for this process in its address space) is only 40584 kB (rss), and because the oracle executable’s text segment is shared with a lot of processes, the proportional size is only 819 kB (pss).
If you want to understand how much memory is taken in the system, the size is telling the total size of the segment, but it doesn’t say anything on true memory usage. The rss size tells the amount of pages for the segment that is paged in to the address space of every process, and can (and is, for oracle) different for every process. The pss size is the proportional size for every process. Probably the only way to tell the true amount of memory taken by executables and libraries is to add up all the pss sizes. Any other value only tells something about the process’ point of view on memory usage, but not overall, true consumed space because that would lead to counting too much.
This is different for anonymous allocations. Since anonymous allocations are created when a process is run, I’ve only seen them initialised purely private. For that reason rss and pss sizes are equal, because every process initialises it strictly for itself. This too works in a lazy allocation way. When memory is allocated, the size is defined, but is only really allocated once it’s truly used, which is expressed by a difference between size and rss.
2. shared pages
The Oracle databases relies on shared caches and data structures, which are put into what is called the SGA, the system global area. The main components of the SGA are the shared pool (shared structures), log buffer (change vectors to be written to disk to persist changes) and the buffer cache, amongst others. With any memory management option (manual management, ASMM (automatic shared memory management, sga_target) and AMM (automatic memory management, memory_target)) there is a SGA. Depending on the memory option, these are visible in a different way.
When manual memory or ASMM is used, shared memory is allocated as system V shared memory. The ‘classic’ way of looking at system V shared memory is using ipcs -m (m is for shared memory, you can also use s for semaphores and q for message queues):
[oracle@oracle-linux ~]$ ipcs -m ------ Shared Memory Segments -------- key shmid owner perms bytes nattch status 0x00000000 655360 oracle 600 2932736 124 0x00000000 688129 oracle 600 905969664 62 0x00000000 720898 oracle 600 139673600 62 0x5f921964 753667 oracle 600 20480 62
Please mind that if you have more than one instance active, or an ASM instance active, you will see more shared memory allocations.
Apparently, the oracle database allocates a couple of shared memory segments. If you want to understand what these memory allocations are for, you can use the oradebug ipc command to see what their functions are:
SQL> oradebug setmypid Statement processed. SQL> oradebug ipc IPC information written to the trace file
This generates a trace file in the ‘trace’ directory in the diagnostics destination. Here is how this looks like (partial output with content of interest to this blogpost):
Area #0 `Fixed Size' containing Subareas 2-2 Total size 00000000002cbe70 Minimum Subarea size 00000000 Area Subarea Shmid Segment Addr Stable Addr Actual Addr 0 2 655360 0x00000060000000 0x00000060000000 0x00000060000000 Subarea size Segment size Req_Protect Cur_protect 00000000002cc000 00000000002cc000 default readwrite Area #1 `Variable Size' containing Subareas 0-0 Total size 0000000036000000 Minimum Subarea size 00400000 Area Subarea Shmid Segment Addr Stable Addr Actual Addr 1 0 688129 0x00000060400000 0x00000060400000 0x00000060400000 Subarea size Segment size Req_Protect Cur_protect 0000000036000000 0000000036000000 default readwrite Area #2 `Redo Buffers' containing Subareas 1-1 Total size 0000000008534000 Minimum Subarea size 00001000 Area Subarea Shmid Segment Addr Stable Addr Actual Addr 2 1 720898 0x00000096400000 0x00000096400000 0x00000096400000 Subarea size Segment size Req_Protect Cur_protect 0000000008534000 0000000008534000 default readwrite Area #3 `skgm overhead' containing Subareas 3-3 Total size 0000000000005000 Minimum Subarea size 00000000 Area Subarea Shmid Segment Addr Stable Addr Actual Addr 3 3 753667 0x0000009ec00000 0x0000009ec00000 0x0000009ec00000 Subarea size Segment size Req_Protect Cur_protect 0000000000005000 0000000000005000 default readwrite
The first allocation is ‘fixed size’, alias the fixed SGA, the second allocation is the ‘variable size’, which contains the shared pool and the buffercache, the third allocation is the ‘redo buffers’ and the fourth is the ‘skgm overhead’ alias the index into the shared memory structures for this instance.
Because any memory allocation is visible in maps and smaps, this method can be used for shared memory too, to see how the shared memory segments are mapped into the process address space. All oracle database server processes have the shared memory segments for the instance mapped into their address space. The usage is different per process, so the amount of shared memory paged into the address space will be different:
... 12bcd000-12bee000 rw-p 00000000 00:00 0 [heap] 60000000-60001000 r--s 00000000 00:05 655360 /SYSV00000000 (deleted) 60001000-602cc000 rw-s 00001000 00:05 655360 /SYSV00000000 (deleted) 60400000-96400000 rw-s 00000000 00:05 688129 /SYSV00000000 (deleted) 96400000-9e934000 rw-s 00000000 00:05 720898 /SYSV00000000 (deleted) 9ec00000-9ec05000 rw-s 00000000 00:05 753667 /SYSV5f921964 (deleted) 7f473004e000-7f47301d4000 r-xp 00000000 fb:02 212635773 /u01/app/oracle/product/18.104.22.168/dbhome_1/lib/libshpkavx12.so ...
Shared memory is easily identified by the ‘s’, at which “normal” private memory mappings have ‘p’. If you want to know more about the process’ perspective of the shared memory, we can use smaps, just like with private memory mappings (virtual memory space of pmon):
60000000-60001000 r--s 00000000 00:05 655360 /SYSV00000000 (deleted) Size: 4 kB Rss: 0 kB Pss: 0 kB Shared_Clean: 0 kB Shared_Dirty: 0 kB Private_Clean: 0 kB Private_Dirty: 0 kB Referenced: 0 kB Anonymous: 0 kB AnonHugePages: 0 kB Swap: 0 kB KernelPageSize: 4 kB MMUPageSize: 4 kB Locked: 0 kB VmFlags: rd sh mr mw me ms sd 60001000-602cc000 rw-s 00001000 00:05 655360 /SYSV00000000 (deleted) Size: 2860 kB Rss: 392 kB Pss: 36 kB Shared_Clean: 0 kB Shared_Dirty: 372 kB Private_Clean: 0 kB Private_Dirty: 20 kB Referenced: 392 kB Anonymous: 0 kB AnonHugePages: 0 kB Swap: 0 kB KernelPageSize: 4 kB MMUPageSize: 4 kB Locked: 0 kB VmFlags: rd wr sh mr mw me ms sd
These two shared memory segments are belonging to the fixed sga. The reason for two segments is the first page (0x1000 equals 4096, alias a single linux page) is readonly (r–s). The other fixed SGA segment is read write (rw-s). Here we see that from the process’ perspective it really doesn’t matter much if a piece of mapped memory is shared or private; it’s exactly handled the same way, which means the full segment is mapped into the process’ virtual memory space, but only once pages are touched (alias truly used), the process registers the address in its pagetable, and the pages become resident (as can be seen in the difference between the total size and the rss). The sole purpose of shared memory is it is shared between process. That the pages are shared is very well visible with the difference between rss and pss size. Its also easy to spot this shared memory segment is created from small pages; MMUPageSize and KernelPageSize is 4kB.
However, this yields an interesting question: shared memory does not belong to any single process. Does that mean that if a shared memory segment is created, it is truly allocated, or can shared memory be lazy allocated as well? Please mind that above statistics are the process’ perspective, not the kernel’s perspective.
One way to see the state of shared memory system wide, is using the ‘-u’ flag with the ipcs command:
[oracle@oracle-linux [testdb] ~]$ ipcs -mu ------ Shared Memory Status -------- segments allocated 4 pages allocated 256005 pages resident 255684 pages swapped 0 Swap performance: 0 attempts 0 successes
This is a really useful view! What we can see from the output from this command, is that nearly all pages allocated as shared memory are resident. By having statistics for shared memory pages allocated and resident we can conclude that shared memory too could be allocated in a lazy, alias on demand. Also, there is a difference between resident and allocated, which indicates lazy allocation too.
Inside the database I am aware of two parameters that could influence shared pages usage; pre_page_sga and _touch_sga_pages_during_allocation, see my article on these. However, what is interesting, is that these parameters are different for the instance I am testing with for this blogpost, which is running on a VM:
SYS@testdb AS SYSDBA> @parms Enter value for parameter: page old 20: where name like nvl('%¶meter%',name) new 20: where name like nvl('%page%',name) Enter value for isset: old 21: and upper(isset) like upper(nvl('%&isset%',isset)) new 21: and upper(isset) like upper(nvl('%%',isset)) Enter value for show_hidden: Y old 22: and flag not in (decode('&show_hidden','Y',3,2)) new 22: and flag not in (decode('Y','Y',3,2)) NAME VALUE ISDEFAUL ISMODIFIED ISSET -------------------------------------------------- ---------------------------------------------------------------------- -------- ---------- ---------- olap_page_pool_size 0 TRUE FALSE FALSE pre_page_sga TRUE TRUE FALSE FALSE use_large_pages TRUE TRUE FALSE FALSE _max_largepage_alloc_time_secs 10 TRUE FALSE FALSE _olap_page_pool_expand_rate 20 TRUE FALSE FALSE _olap_page_pool_hi 50 TRUE FALSE FALSE _olap_page_pool_hit_target 100 TRUE FALSE FALSE _olap_page_pool_low 262144 TRUE FALSE FALSE _olap_page_pool_pressure 90 TRUE FALSE FALSE _olap_page_pool_shrink_rate 50 TRUE FALSE FALSE _realfree_heap_pagesize 65536 TRUE FALSE FALSE _realfree_pq_heap_pagesize 65536 TRUE FALSE FALSE _session_page_extent 2048 TRUE FALSE FALSE _touch_sga_pages_during_allocation FALSE TRUE FALSE FALSE 14 rows selected.
In the database I created on my VM, pre_page_sga equals to TRUE and _touch_sga_pages_during_allocation to FALSE, which is the exact inverse of the settings of a database (PSU 160419) on a huge machine. Perhaps these parameters are dynamically set based on size of the SGA and logic (if _touch_sga_pages_during_allocation is TRUE, it makes sense to set pre_page_sga to FALSE, as it’s function has been performed by the bequeathing session.
However, having pre_page_sga set to TRUE it makes sense almost all SGA (shared) pages are allocated, because pre_page_sga (at least in Oracle 12, not sure about earlier versions, because the Oracle description of this parameter is different from what happens in Oracle 12) spawns a background process (sa00) that scans SGA pages, which means it pages them, resulting in the actual allocation. Let’s test this by setting pre_page_sga to false, it should lead to way lesser shared memory pages allocated, which will eventually be allocated as database processes are paging them in:
SQL> alter system set pre_page_sga=false scope=spfile; SQL> startup force;
And then look at ipcs -mu again:
[oracle@oracle-linux [testdb] ~]$ ipcs -mu ------ Shared Memory Status -------- segments allocated 4 pages allocated 256005 pages resident 92696 pages swapped 0 Swap performance: 0 attempts 0 successes
As expected, only the bare necessary pages are resident after startup force, all the other shared pages will be slowly paged in as foreground and background processes touching SGA pages during execution.
How would that work when we set sga_max_size to a different value than sga_target? If the pages beyond the sga_target are never allocated, you could control the amount of SGA pages used by setting sga_target, but ‘reserve’ extra memory to use by setting sga_max_size higher, which is never allocated, so it is not wasted. Let’s setup the instance:
SQL> alter system set pre_page_sga=true scope=spfile; SQL> show spparameter sga_target SID NAME TYPE VALUE -------- ----------------------------- ----------- ---------------------------- * sga_target big integer 1000M SQL> ! ipcs -mu ------ Shared Memory Status -------- segments allocated 4 pages allocated 256005 pages resident 102512 pages swapped 0 Swap performance: 0 attempts 0 successes
This sets the pre_page_sga parameter from the spfile, which means the instance will spawn a process to touch SGA pages on next startup.
Currently, the sga_target for sizing the SGA is set to 1000M in the spfile.
ipcs tells us we got 256005 pages are allocated, which makes sense: 256005*4=1024020k, which is slightly more than the set 1000M, which means essentially sga_target equals pages allocated.
SQL> alter system set sga_max_size=2g scope=spfile; SQL> startup force; ORACLE instance started. Total System Global Area 2147483648 bytes Fixed Size 2926472 bytes Variable Size 1358956664 bytes Database Buffers 637534208 bytes Redo Buffers 148066304 bytes Database mounted. Database opened.
This sets sga_max_size to double the amount of sga_target, and ‘startup force’ bounces the instance.
SQL> show parameter sga_target NAME TYPE VALUE ------------------------------------ ----------- ------------------------------ sga_target big integer 1008M
Here we see the actual parameter in the database is set to 1008M. Now let’s look at the ipcs -mu values again:
> !ipcs -mu ------ Shared Memory Status -------- segments allocated 4 pages allocated 524291 pages resident 521923 pages swapped 0 Swap performance: 0 attempts 0 successes
521923*4=2087692. So (almost) all the memory set for sga_max_size is allocated. In fact, if you look at the values at instance startup values reported above, you see ‘Total System Global Area’ showing the 2G, it’s all SGA, so it’s all touched because of pre_page_sga being set to TRUE. So the next test would be to have pre_page_sga being set to FALSE:
SQL> alter system set pre_page_sga=false scope=spfile; SQL> startup force ORACLE instance started. Total System Global Area 2147483648 bytes Fixed Size 2926472 bytes Variable Size 1358956664 bytes Database Buffers 637534208 bytes Redo Buffers 148066304 bytes Database mounted. Database opened.
All memory is still declared SGA, as we can see. However, by having _touch_sga_pages_during_allocation set to FALSE and pre_page_sga set to FALSE, we should see only the actual used SGA pages being allocated:
SQL> !ipcs -mu ------ Shared Memory Status -------- segments allocated 4 pages allocated 524291 pages resident 91692 pages swapped 0 Swap performance: 0 attempts 0 successes
The above output shows the shared memory status directly after I restart my instance, so this is not only less than sga_max_size, it is even less than sga_target (91692*4=336768, ~ 336M). This will grow up to sga_target, because these pages will get paged in by the database processes.
How does this look like when we add in huge pages? In Oracle 22.214.171.124.160719 in my instance the parameter to tell oracle to allocate huge pages if there are any (‘use_large_pages’) is set to TRUE. This will make Oracle use large pages if any are available. This is true, even if there are not enough huge pages to satisfy the entire SGA; Oracle will just allocate all that can be allocated, and create a new shared memory segment using small pages for the remainder of the needed shared memory.
Sadly, it seems per memory segment statistics like rss, pss, shared and private clean and dirty, etc. are not implemented for huge pages:
[oracle@oracle-linux [testdb] ~]$ cat /proc/$(pgrep pmon)/smaps ... 61000000-d8000000 rw-s 00000000 00:0e 688129 /SYSV00000000 (deleted) Size: 1949696 kB Rss: 0 kB Pss: 0 kB Shared_Clean: 0 kB Shared_Dirty: 0 kB Private_Clean: 0 kB Private_Dirty: 0 kB Referenced: 0 kB Anonymous: 0 kB AnonHugePages: 0 kB Swap: 0 kB KernelPageSize: 2048 kB MMUPageSize: 2048 kB Locked: 0 kB VmFlags: rd wr sh mr mw me ms de ht sd ...
This is the main shared memory segment, allocated from huge pages (as can be seen with KernelPageSize and MMUPageSize), which means it’s the segment holding the shared pool and buffercache. This can also be seen by the size: 1949696 kB, which is nearly the 2G of sga_max_size.
However, we can just use the global information on system V shared memory (ipcs -mu) and we can use the huge page information in /proc/meminfo:
[oracle@oracle-linux [testdb] ~]$ grep -i huge /proc/meminfo AnonHugePages: 0 kB HugePages_Total: 1100 HugePages_Free: 880 HugePages_Rsvd: 805 HugePages_Surp: 0 Hugepagesize: 2048 kB
The statistics of interest are:
hugepages_total: the total number of huge pages allocated. warning: huge pages memory allocated by the kernel is NOT available for allocation of regular sized pages (which means you can starve your processes and the kernel for normal pages by setting the number of huge pages too high).
hugepages_free: the number of huge pages which are not used currently. warning: this includes allocated but not yet initialised pages, which hugepages_rsvd shows.
hugepages_rsvd: the number of huge pages allocated but not yet initialised.
hugepages_surp: the number of huge pages allocated (truly allocated and not yet initialised) greater than the total number of huge pages set. this value can be greater than zero if the kernel setting vm.nr_overcommit_hugepages is greater than zero. The value of this setting is zero by default, and at least for usage with the Oracle database, this value should remain zero.
The same information can be obtained using ipcs -mu, but with a twist:
[oracle@oracle-linux [testdb] ~]$ ipcs -mu ------ Shared Memory Status -------- segments allocated 4 pages allocated 524803 pages resident 122881 pages swapped 0 Swap performance: 0 attempts 0 successes
Some of you might get the twist on this by looking at the number.
It turns out ipcs has no facility for huge pages, it just reports the number of pages as if these were 4 kB.
524803*4 (kB) / 1024 (to make it MB) = 2050.
Now going back to the goal of looking into this: I told shared memory is allocated and paged at startup time when _touch_sga_pages_during_allocation is set to TRUE (set to false as default value in my current database), and it could be explicitly paged by the background process sa00 after startup of the instance when pre_page_sga is set to TRUE. When both are set to false, shared memory allocated from default sized 4kB pages is allocated only when it’s used. In the above examples with huge pages, the tests were done with pre_page_sga set to false. This shows exactly the same ‘lazy allocation’ behaviour as 4kB pages.
When ‘extra’ memory is reserved from the operating system by setting sga_max_size to a higher value than sga_target, this will all be allocated and paged if either _touch_sga_pages_during_allocation or pre_page_sga is set to TRUE, which doesn’t make sense; if the memory is taken, you might as well use it. However, this is different if both _touch_sga_pages_during_allocation and pre_page_sga are set to false. All memory beyond sga_target up to sga_max_size is allocated, but never touched, and thus never paged in, so never truly allocated. Please mind linux itself understands this perfectly (aiming at huge pages and ‘reserved’ pages), however the system V ipc kernel settings do not; you need to set the shared memory values high enough to facilitate the total sum of sga_max_size values, not the truly used sizes as indicated by the sum of sga_target values.
The inspiration for this investigation came from a question on my blog. However, the question was about memory_target and memory_max_target and AIX. I do not have an AIX system at hand. I did not investigate the implementation of memory_target and memory_max_target on AIX. So I can’t comment on that. What I can say, is that on Linux, you really, really should use automatic shared memory management (ASMM) alias setting sga_target or setting it manually (and set huge pages!). If you are used to these memory management settings on databases not on AIX, it probably makes sense to use that on AIX too, even if the automatic memory management (AMM) alias setting memory_target is implemented brilliantly on AIX, for the sake of predictability and standardisation.