Oracle memory troubleshooting using analysis on heapdumps
This blogpost is about analysing Oracle heap dumps. It is an extension to earlier work, Tanel Poder’s heap dump analyzer. So hat tip to Tanel, he’s done the hard work, I merely increased the presentation options. The heap analyser script that I wrote analyses Oracle heapdumps from the trace file that the dump was written to by the Oracle database. Because the heap dump representation is the same between PGA and SGA memory, it can work on both. The reason for this is that memory management is done by the same memory manager, and is commonly called ‘kgh’ (kernel generic heap) managed memory.
Please mind that for PGA analysis, not all memory is managed by the kgh memory manager. For example memory used for networking (sqlnet) is allocated totally outside of the kgh memory manager.
Let’s take the output of a PGA heap dump at level 29 (PGA, UGA, CGA, top call heaps, call heaps, session heap; executed via ‘alter session set events ‘immediate trace name heapdump level 29”):
$ ./heap_analyze.awk /u01/app/oracle/diag/rdbms/tt/tt/trace/tt_ora_6515.trc Heap dump analyzer v0.1 by Frits Hoogland heap size totals (all sizes in bytes) ============================================================================================== heap name total size ---------------------------------------------------------------------------------------------- top call heap 393096 callheap 36200 pga heap 1016816 top uga heap 262048 session heap 261744 ------------------------------------------ total 1969904 top 5 allocation by total size per alloc reason per heap ============================================================================================== heap alloc reason #chunks total size ---------------------------------------------------------------------------------------------- top call heap 5 354456 top call heap callheap 11 37616 top call heap perm 2 1024 heap alloc reason #chunks total size ---------------------------------------------------------------------------------------------- callheap 10 35920 callheap ctxtcds: kksQui 1 160 callheap perm 1 120 heap alloc reason #chunks total size ---------------------------------------------------------------------------------------------- pga heap perm 44 290256 pga heap Fixed UGA heap 1 226232 pga heap diag pga 21 214048 pga heap 7 62616 pga heap kfkio bucket 1 40984 heap alloc reason #chunks total size ---------------------------------------------------------------------------------------------- top uga heap session heap 4 261928 top uga heap perm 1 120 heap alloc reason #chunks total size ---------------------------------------------------------------------------------------------- session heap perm 3 63608 session heap 8 63528 session heap koh-kghu sessi 5 28288 session heap kxsFrame4kPage 6 24912 session heap kxsFrame8kPage 3 24744 top 5 allocations by total size per alloc reason, type, chunk size per heap ============================================================================================== heap alloc reason #chunks type chunk size total size ---------------------------------------------------------------------------------------------- top call heap 1 free 131048 131048 top call heap 2 free 65512 131024 top call heap 1 free 63496 63496 top call heap 1 free 28888 28888 top call heap callheap 6 freeable 4224 25344 heap alloc reason #chunks type chunk size total size ---------------------------------------------------------------------------------------------- callheap 6 free 4184 25104 callheap 2 free 4192 8384 callheap 1 free 1784 1784 callheap 1 free 648 648 callheap ctxtcds: kksQui 1 freeable 160 160 heap alloc reason #chunks type chunk size total size ---------------------------------------------------------------------------------------------- pga heap Fixed UGA heap 1 recreate 226232 226232 pga heap perm 1 perm 71720 71720 pga heap diag pga 1 freeable 55048 55048 pga heap diag pga 2 freeable 27456 54912 pga heap 1 free 47296 47296 heap alloc reason #chunks type chunk size total size ---------------------------------------------------------------------------------------------- session heap perm 1 perm 43584 43584 session heap 1 free 36544 36544 session heap kxsFrame4kPage 6 freeable 4152 24912 session heap kxsFrame8kPage 3 freeable 8248 24744 session heap 1 free 20352 20352 heap alloc reason #chunks type chunk size total size ---------------------------------------------------------------------------------------------- top uga heap session heap 3 freeable 65512 196536 top uga heap session heap 1 recreate 65392 65392 top uga heap perm 1 perm 120 120 top 5 allocations by total size per heap, alloc reason, type, chunk size ============================================================================================== heap alloc reason #chunks type chunk size total size ---------------------------------------------------------------------------------------------- pga heap Fixed UGA heap 1 recreate 226232 226232 top uga heap session heap 3 freeable 65512 196536 top call heap 1 free 131048 131048 top call heap 2 free 65512 131024 pga heap perm 1 perm 71720 71720
I figured that the first thing you want to see, are the heaps that are in the dump, and their sizes. That’s what is visible in rows 2-12.
I summed the heap sizes, which might make sense, or it might not.
In this case, with a heap dump that includes PGA, UGA and CGA plus session and call heaps, it means there are heaps in the dump that are part of another heap that is in the dump. So the total size here is bogus. This means that you need to have an understanding of what is actually dumped.
The next section, top 5 allocations by total size per alloc reason per dump, shows a per-heap summary by the allocation reason in the dump. If there’s no alloc reason, it’s free memory. Because this is a heap dump of a session that has done nothing (I just started sqlplus and ran the dump of its own PGA memory), you see that a lot of memory chunks are free memory. If you look closely to the allocation reasons, you can see that the ‘top call heap’ has a memory section that is called ‘callheap’ which is slightly larger than the ‘callheap’ section in the heap totals, and the ‘top uga heap’ section has a memory section called ‘session heap’ that is slightly larger than the ‘session heap’ section in the heap totals. In this case, it means that you can actually see the subheads in the parent heap allocation totals. The subheap size must be slightly larger in the parent heap because of headers in the memory allocations which are needed memory management. Please mind that this is based on my knowledge of how process memory is created, the only way to be absolutely sure is that a heap is part of another heap is to look at the memory addresses. This output only shows the heap names, not the addresses. The purpose of this section is to have an understanding of where memory is allocated to in a heap.
The following section, top 5 allocations by total size per alloc reason, type, chunk size per heap shows a per-heap summary by reason, type and chunksize, so you can investigate if specific types and sizes of chunks are causing issues or weird behaviour.
The last section is the same as the previous section, but doesn’t do it per heap. This is identical to what Tanel’s heapdump_analyzer shows.
You can find the heap_analyze.awk script here: https://gitlab.com/FritsHoogland/oracle_memory_analyze/blob/master/heap_analyze.awk
This is the output of a dump of the SGA of my small test database (oradebug dump heapdump 2):
$ ./heap_analyze.awk /u01/app/oracle/diag/rdbms/tt/tt/trace/tt_ora_9461.trc Heap dump analyzer v0.1 by Frits Hoogland heap size totals (all sizes in bytes) ============================================================================================== heap name total size ---------------------------------------------------------------------------------------------- sga heap(1,0) 285210904 sga heap(1,3) 83885560 ------------------------------------------ total 369096464 top 5 allocation by total size per alloc reason per heap ============================================================================================== heap alloc reason #chunks total size ---------------------------------------------------------------------------------------------- sga heap(1,0) perm 22 188068904 sga heap(1,0) 141 17975320 sga heap(1,0) SO private sga 17 14268008 sga heap(1,0) KQR PO 8271 7568912 sga heap(1,0) KGLHD 9381 7023960 heap alloc reason #chunks total size ---------------------------------------------------------------------------------------------- sga heap(1,3) 51 14120288 sga heap(1,3) SQLA^6d9b8a7e 337 1380352 sga heap(1,3) SQLA^31cc505b 167 684032 sga heap(1,3) SQLA^aab93e92 162 663552 sga heap(1,3) PLDIA^191e0a8d 155 634880 top 5 allocations by total size per alloc reason, type, chunk size per heap ============================================================================================== heap alloc reason #chunks type chunk size total size ---------------------------------------------------------------------------------------------- sga heap(1,0) perm 1 perm 15937496 15937496 sga heap(1,0) perm 1 perm 15931312 15931312 sga heap(1,0) perm 1 perm 15811464 15811464 sga heap(1,0) perm 1 perm 15741952 15741952 sga heap(1,0) perm 1 perm 15723584 15723584 heap alloc reason #chunks type chunk size total size ---------------------------------------------------------------------------------------------- sga heap(1,3) 1 free 9883208 9883208 sga heap(1,3) 4 R-free 839480 3357920 sga heap(1,3) SQLA^6d9b8a7e 336 freeableU 4096 1376256 sga heap(1,3) 1 R-free 839360 839360 sga heap(1,3) SQLA^31cc505b 166 freeableU 4096 679936 top 5 allocations by total size per heap, alloc reason, type, chunk size ============================================================================================== heap alloc reason #chunks type chunk size total size ---------------------------------------------------------------------------------------------- sga heap(1,0) perm 1 perm 15937496 15937496 sga heap(1,0) perm 1 perm 15931312 15931312 sga heap(1,0) perm 1 perm 15811464 15811464 sga heap(1,0) perm 1 perm 15741952 15741952 sga heap(1,0) perm 1 perm 15723584 15723584
It’s interesting to see only subpool 1 sub-sub pool 1 and 3 are used. Subpool 1,1 contains a lot of permanent allocations and a lot of allocations that might have a more permanent nature, like KQR (dictionary cache) allocations, Subpool 1,3 seems to have allocations that are deemed more transient in nature, like SQLA (sql area) allocations. This might be wildly different in databases that are actually heavily used, this is an idle lab database.
Please mind it’s important to understand that dumping the shared pool requires obtaining the respective latches, so doing that on a live production system might lead to (severe) issues. Only do this if you know what you are doing. For the PGA there can only be one using process by definition, but be careful there too, if you request a PGA dump you are interacting with memory that is deemed private by the process that is using that.
If you require more rows to be shown than the 5 that are shown, set the ‘group_sum_nr’ variable to the amount you need on row 4 of the script.
Pingback: Oracle memory troubleshooting using analysis on heapdumps, part 2 | Frits Hoogland Weblog