Archive

Tag Archives: pin

This post is an introduction to pinatrace annotate version 2, which is a tool to annotate the output of the Intel Pin tools ‘pinatrace’ tool.

The pinatrace tool generates a file with every single memory access of a process. Please realise what this means: this is every single read from main memory or write to main memory from the CPU. This allows you to get an understanding what happens within a C function. This means you can determine what information or data is accessed in what function. Needless to say this is a tool for internals investigations and research, not something for normal daily database maintenance and support. Also, the performance of the process that you attached to is severely impacted, and it can only be turned off by stopping the process. Do not use this on a production database, use this at your own risk for research and investigational purposes only.

The general way to use the pin tool is to run something together with pin, for example ‘ls’:

$ ./pin -t source/tools/SimpleExamples/obj-intel64/pinatrace.so -- ls

This generates an output file that looks like this:

#
# Memory Access Trace Generated By Pin
#
0x00007f49de60f173: W 0x00007ffd160c46f8  8     0x7f49de60f178
0x00007f49de60f840: W 0x00007ffd160c46f0  8                  0
0x00007f49de60f844: W 0x00007ffd160c46e8  8                  0
0x00007f49de60f846: W 0x00007ffd160c46e0  8                  0
0x00007f49de60f848: W 0x00007ffd160c46d8  8                  0
0x00007f49de60f84a: W 0x00007ffd160c46d0  8                  0
0x00007f49de60f84f: W 0x00007ffd160c46c8  8                  0
0x00007f49de60f85f: R 0x00007f49de82fe00  8                0xe
0x00007f49de60f866: W 0x00007f49de82fbd8  8      0x5f25f53b3dc
0x00007f49de60f877: R 0x00007f49de82ff90  8           0x221e00
0x00007f49de60f881: W 0x00007f49de8309a8  8     0x7f49de82fe00
0x00007f49de60f888: W 0x00007f49de830998  8     0x7f49de60e000

This on itself is not very helpful. Let me explain what you are seeing: the first column is the instruction pointer address, in other words: the place in the executable in which the memory transfer was initiated, ‘W’ and ‘R’ are self explanatory, the third column is the memory address to which the read or write is addressed, the fourth column is the size of the memory IO, and the fifth is the value.

Luckily, there is another way to run pin, which is attaching to an already running process. This is what is helpful in using the memory access trace with an Oracle process. The way this is done, is quite simple; first get the linux process number, then run pin specifying the process number:

$ ps -ef | grep [L]OC
oracle    7250  7249  0 09:42 ?        00:00:00 oracletest (DESCRIPTION=(LOCAL=YES)(ADDRESS=(PROTOCOL=beq)))
$ ./pin -pid 7250 -t source/tools/SimpleExamples/obj-intel64/pinatrace.so

There is an important caveat here: because pin inserts itself into the running process, the pin library (source/tools/SimpleExamples/obj-intel64/pinatrace.so in this case) must be available to this process. So if you downloaded, extracted and linked pin tools as root, and then as root try to attach to an oracle process, it will not work because the oracle process can not reach the library. So: extract and link the pin tools as oracle.

The output of the pinatrace (a file called pinatrace.out by default) is in $ORACLE_HOME/dbs when you attach to an Oracle database process.

Back to the pinatrace output: what if we could see the C function that was at the instruction pointer address, the name of the Oracle memory area for the memory address and for the fifth column the name of the Oracle memory area again in case the value is actually a pointer? That is exactly what my pinatrace annotate oracle tool does!

How to use it: first clone the pinatrace_annotate repository:

git clone https://gitlab.com/FritsHoogland/pinatrace_annotate.git

Then set the O_BINARY to your oracle binary at line 6 of pinatrace_annotate_oracle.sh.

The next thing is to generate the oracle memory area CSV files. I added a script to generate the memory ranges for the PGA too, however you should be aware this is very fluent, and it requires an execution of the oracle PID you point it to before the v$process_memory_detail gets populated. Sometimes I skip the PGA detail step because it doesn’t add too much value.
To run the scripts for generating the memory area CSV files:

$ sqlplus / as sysdba
...
SQL> @get_memory_ranges
...
	76 14,20,@1	   8645 		    SYS 			   sqlplus@memory-presentation.local (TNS V1-V3)
...
enter pid number to get pga details: 76
old   1: alter session set events 'immediate trace name pga_detail_get level &pid'
new   1: alter session set events 'immediate trace name pga_detail_get level 76'

Session altered.

execute something (eg. select * from dual) in the target process, then press enter

Now head over to the session you want the PGA memory details from, and execute something like ‘select * from dual’, and press enter in the sqlplus session in which you ran get_memory_ranges.sql. This will then run for a while, it generates pipe-delimited files that sqlite can parse.

After the CSV files have been generated, create the sqlite database for use by the tool:

$ ./pinatrace_annotate_oracle.sh -c
generate symbols from /u01/app/oracle/product/12.2.0.1/dbhome_1/bin/oracle.
create and populate table symbol_ranges with symbols.
create index.
create table memory_ranges.
import memory_ranges.csv
import memory_ranges_xtables.csv
import memory_ranges_pga.csv
create index.

Now the pinatrace_annotate_oracle.sh tool is all set for usage!

Example usage
Let’s use an actual example to see where the memory trace provides benefit: latches. Nowadays, latches are not the issue these were in the oracle 8 days. In these days, the latches protecting the database buffers double linked lists, the ‘cache buffers chains’ latches, were not gotten in shared mode. That meant especially index root blocks could get latch contention if there was much concurrency.

Some great work has been done on the internals of latches by Andrey Nikolaev; see his latch internals presentation. His work is done on Solaris with Dtrace, I did some of that on linux (part 1, part 2 and part 3).

Let’s replay a willing to wait latch get and see how that looks like in the annotated pinatrace output! Setup the host with the pin tools (link it to get the pinatrace.so library), and clone pinatrace_annotate and configure it to point to the oracle binary you are using.

1. session 1:

$ sqlplus / as sysdba
SQL> select distinct sid from v$mystat;

2. session 2:

$ sqlplus / as sysdba
SQL> @get_memory_ranges

identify session 1, and fill out the pid.
take note of the O/S pid.
it will ask you to execute something in the session for which the pid was entered.

3. session 1:

SQL> select * from dual;

4. session 2:
press enter, and wait for @get_memory_ranges to finish.

SQL> exit
$ ./pinatrace_annotate_oracle.sh -c
$ sqlplus / as sysdba
SQL> select addr from v$latch where name = 'cache table scan latch';

note the latch address, it’s 0x6004A400 in my case.

SQL> oradebug setmypid
SQL> oradebug call kslgetl 0x6004A400 0 0 2442

result: ‘Function returned 1’ indicating a successful latch get.

5. session 1:

SQL> oradebug setmypid

6. session 3:
go to the directory where pin is extracted and linked, and go to the directory where the pin executable is located. Execute the following and change the pid for the O/S pid obtained in step 2.

$ ./pin -pid 8645 -t source/tools/SimpleExamples/obj-intel64/pinatrace.so -o pina_latch.txt

now the memory trace is enabled for pid 8645

7. session 1:

SQL> oradebug call kslgetl 0x6004A400 1 0 2442

this will hang, it is waiting for the latch which is taken by session 2.

8. session 2:

SQL> oradebug call kslfre 0x6004A400

this will return ‘Function returned 0’ indicating it freed the latch.

9. session 1:
the oradebug call will now return ‘Function returned 1’ indicating it has gotten the latch.

10. session 3:

$ cp /u01/app/oracle/product/12.2.0.1/dbhome_1/dbs/pina_latch.txt ~/pinatrace_annotate/pina_latch.txt

as indicated, the pinatrace output file will be in $ORACLE_HOME/dbs, and with this I copy the current state of the trace to the pinatrace_annotate directory where my pinatrace annotate tool is cloned.

$ cd ~/pinatrace_annotate
$ ./pinatrace_annotate_oracle.sh pina_latch.txt > pina_latch_annotated.txt

This will run for a while…

Once pinatrace_annotate_oracle.sh is done, open the pina_latch_annotated.txt file. Short guide on how this looks like; the first two lines in my file are:

function+offset:memory address(annotation+offset):Read/Write:size:value hex/dec(pointer annotation+offset)
??:0x00007fff224e6498():R:8:0x11060785/285607813()

The first line is the header, which is displayed once, and essentially explains it all. The second line starts with ‘??’. As indicated by the header, the first field is the function. Any call to a function (‘symbol’) that is not in the oracle executable, like functions in shared libraries, do not have an address in the executable. In fact, the place at which the shared libraries are loaded, and thus their functions are located is randomised by default, which explains why the executable does not have the address.

The second column is the address memory is read from or written into. In the second line, there is nothing in between the parentheses, meaning this address is not in a known memory range.

The ‘R’ or ‘W’ is obvious, and the number is the amount of memory read or written too.

The fifth field shows the content of the read or write in hexadecimal and in decimal. Here again there are parenthesis which are empty. I look up the value in the memory area’s, and put the memory area, if the value is in a known memory area, as annotation including offset. What you should realise is that I don’t know if a value is in fact a memory area and thus a pointer, or that the value coincidentally the same as a known memory area. I do think understanding if the value is a known memory area is handy and useful, in the C language pointers are heavily used.

Another example:

nsbasic_brc+548:0x00007f33d3026690(pga|Other, Fixed Uga, Fixed UGA heap+88712 ):R:8:0x7f33d3019600/139860560156160(pga|Other, Fixed Uga, Fixed UGA heap+3532
0 )

Here we see an example where the function is a function in the oracle executable, nsbasic_brc, which at offset 548 of the function read 8 bytes from memory address 0x00007f33d3026690, which was a known memory area at the time of running 0_get_pga_detail.sql when ‘select * from dual’ was run, which was in the PGA, in the Fixed UGA heap at offset 88712, and the value it read was 0x7f33d3019600, which is a memory address in the same chunk of memory at offset 3532, and therefore likely to be a pointer.

Now let’s look further into the annotated pinatracefile. If you search for ‘kslgetl’, you will find the actual latch get executed with oradebug. This is how that looks like on my system:

kslgetl+2:0x00007fff224e4ec0():W:8:0x7fff224e4ff0/140733768945648()
kslgetl+13:0x00007fff224e4ea0():W:8:0x7fff224e5870/140733768947824()
kslgetl+17:0x00007f33d8095f40(pga|Other, free memory, top call heap+243776 ):R:8:0x7f33d8095f40/139860644552512(pga|Other, free memory, top call heap+243776 )
kslgetl+26:0x00007fff224e4e98():W:8:0x1/1()
kslgetl+33:0x00007fff224e4e90():W:8:0/0()
kslgetl+37:0x00007fff224e4ea8():W:8:0x4/4()
kslgetl+44:0x00007f33d8087e78(pga|Other, free memory, top call heap+186232 ):R:8:0x7a63d268/2053362280(shared pool|permanent memor,duration 1,cls perm+5702248 )
kslgetl+58:0x00007fff224e4eb0():W:8:0x7fff224e57d0/140733768947664()
kslgetl+65:0x00007fff224e4e60():W:8:0/0()
kslgetl+69:0x000000006004a40f(fixed sga|(parent)latch:cache table scan latch+15 fixed sga|var:kcb_table_scan_latch_+15 ):R:1:0x37/55()
kslgetl+80:0x000000007a63d5c0(shared pool|X$KSUPR.KSLLALAQ+0 shared pool|permanent memor,duration 1,cls perm+5703104 ):R:8:0/0()
kslgetl+96:0x000000006004a410(fixed sga|(parent)latch:cache table scan latch+16 fixed sga|var:kcb_table_scan_latch_+16 ):R:4:0x1/1()
kslgetl+103:0x000000007a63d5f0(shared pool|X$KSUPR.KSLLALOW+0 shared pool|permanent memor,duration 1,cls perm+5703152 ):R:4:0/0()
kslgetl+109:0x0000000011141524():R:4:0x1cffe/118782()
kslgetl+122:0x000000007a63d5c0(shared pool|X$KSUPR.KSLLALAQ+0 shared pool|permanent memor,duration 1,cls perm+5703104 ):W:8:0x6004a400/1610916864(fixed sga|(parent)latch:cache table scan latch+0 fixed sga|var:kcb_table_scan_latch_+0 )
kslgetl+131:0x00007f33d8087ea8(pga|Other, free memory, top call heap+186280 ):R:8:0x4b/75()
kslgetl+138:0x000000006004a400(fixed sga|(parent)latch:cache table scan latch+0 fixed sga|var:kcb_table_scan_latch_+0 ):R:8:0x22/34()
kslgetl+138:0x000000006004a400(fixed sga|(parent)latch:cache table scan latch+0 fixed sga|var:kcb_table_scan_latch_+0 ):W:8:0x22/34()
kslgetl+1898:0x000000007a63d5c0(shared pool|X$KSUPR.KSLLALAQ+0 shared pool|permanent memor,duration 1,cls perm+5703104 ):W:8:0/0()
kslgetl+1909:0x00000000600119b4(fixed sga|var:kcbccc_+0 ):R:1:0/0()
kslgetl+1938:0x00007fff224e4e60():R:8:0/0()
kslgetl+1942:0x00007fff224e4e38():W:8:0x10a6084b/279316555()

The two highlighted lines show that the function read the beginning of the latch memory and found a number, indicating the oracle process id that has the latch taken already. In other words, this means the process found a latch that has already been taken by another process, oracle pid 34.

If you scroll down further you see kslgetl called kslges, which is how a willing to wait latch works if the latch it wants to take is already taken, and then further down, you will see this pattern:

kslges+930:0x00007fff224e4d58():W:4:0/0()
kslges+937:0x00007fff224e4d8a():W:1:0/0()
kslges+944:0x000000006004a400(fixed sga|(parent)latch:cache table scan latch+0 fixed sga|var:kcb_table_scan_latch_+0 ):R:8:0x22/34()
kslges+930:0x00007fff224e4d58():W:4:0/0()
kslges+937:0x00007fff224e4d8a():W:1:0/0()
kslges+944:0x000000006004a400(fixed sga|(parent)latch:cache table scan latch+0 fixed sga|var:kcb_table_scan_latch_+0 ):R:8:0x22/34()
kslges+930:0x00007fff224e4d58():W:4:0/0()
kslges+937:0x00007fff224e4d8a():W:1:0/0()
kslges+944:0x000000006004a400(fixed sga|(parent)latch:cache table scan latch+0 fixed sga|var:kcb_table_scan_latch_+0 ):R:8:0x22/34()
kslges+930:0x00007fff224e4d58():W:4:0/0()
kslges+937:0x00007fff224e4d8a():W:1:0/0()
kslges+944:0x000000006004a400(fixed sga|(parent)latch:cache table scan latch+0 fixed sga|var:kcb_table_scan_latch_+0 ):R:8:0x22/34()
kslges+930:0x00007fff224e4d58():W:4:0/0()
kslges+937:0x00007fff224e4d8a():W:1:0/0()
kslges+944:0x000000006004a400(fixed sga|(parent)latch:cache table scan latch+0 fixed sga|var:kcb_table_scan_latch_+0 ):R:8:0x22/34()

This is what is the spinning for the latch looks like from a memory access perspective. So that means that spinning on a latch does not mean the entire latching function (kslgetl or kslges) is called to try to take the latch, but rather when kslgetl finds the latch to be taken, it calls kslges, which does some housekeeping, and then in a really tight loop reads the latch’ memory at offset 0 to detect when it is freed. We can see it is a loop by looking at the offset numbers of the function, these are 930, 937 and 944, which are the offsets we see over and over.

In fact, if I grep for that line and count the number of lines, we can see this version of oracle (12.2.0.1) still does spin 20’000 times for a latch in willing to wait mode:

grep 'kslges+944:0x000000006004a400(fixed sga|(parent)latch:cache table scan latch' pina_latch_annotated.txt | wc -l
20001

Yes, this shows 20001 executions, remember we freed the latch in the other session?

$ grep 'kslges+944:0x000000006004a400(fixed sga|(parent)latch:cache table scan latch' pina_latch_annotated.txt | tail -3
kslges+944:0x000000006004a400(fixed sga|(parent)latch:cache table scan latch+0 fixed sga|var:kcb_table_scan_latch_+0 ):R:8:0x22/34()
kslges+944:0x000000006004a400(fixed sga|(parent)latch:cache table scan latch+0 fixed sga|var:kcb_table_scan_latch_+0 ):R:8:0x22/34()
kslges+944:0x000000006004a400(fixed sga|(parent)latch:cache table scan latch+0 fixed sga|var:kcb_table_scan_latch_+0 ):R:8:0/0()

The last line shows kslges finding the latch in a free state, which is after it has been woken from sleeping on a semaphore.

A modern version of Oracle doesn’t sleep for an arbitrary time and then spin again, it goes to sleep on a semaphore after it spun 20’000 times and then waits until it gets posted. That can be seen in the pinatrace output too, but if you want to look at the function flow, it might be better to generate a debugtrace trace, the pinatrace trace shows what information is used within a function, and might be too verbose. In fact, the function flow towards the semaphore call is kslges > skgpwwait > sskgpwwait.

Conclusion
pinatrace_annotate_oracle.sh is a tool to annotate a tracefile generated by the pin tool ‘pinatrace’ when running the oracle binary. It shows information about any memory access done by a process, which allows you to understand what information is obtained in what function of Oracle. This is a tool only for research and debug purposes.

In my previous post, I introduced Intel Pin. If you are new to pin, please follow this link to my previous post on how to set it up and how to run it.

One of the things you can do with Pin, is profile memory access. Profiling memory access using the pin tool ‘pinatrace’ is done in the following way:

$ cd ~/pin/pin-3.0-76991-gcc-linux
$ ./pin -pid 12284 -t source/tools/SimpleExamples/obj-intel64/pinatrace.so

The pid is a pid of an oracle database foreground process. Now execute something in the session you attached pin to and you find the ‘pinatrace’ output in $ORACLE_HOME/dbs:

$ ls -l $ORACLE_HOME/dbs
total 94064
-rw-rw----. 1 oracle oinstall     1544 Nov 16 09:40 hc_testdb.dat
-rw-r--r--. 1 oracle oinstall     2992 Feb  3  2012 init.ora
-rw-r-----. 1 oracle oinstall       57 Nov  5 09:42 inittestdb.ora
-rw-r-----. 1 oracle oinstall       24 Nov  5 09:32 lkTESTDB
-rw-r-----. 1 oracle oinstall     7680 Nov  5 09:41 orapwtestdb
-rw-r--r--  1 oracle oinstall 10552584 Nov 17 06:36 pinatrace.out

Please mind memory access generates A LOT of information! The above 11MB is what a ‘select * from dual’ generates (!)

This is how the file looks like:

$ head pinatrace.out
#
# Memory Access Trace Generated By Pin
#
0x00007f85c63fe218: R 0x00007fff6fd2c4c8  8          0xcefb615
0x000000000cefb61e: W 0x00007fff6fd2c4f8  8              0x12c
0x000000000cefb621: R 0x00007fff6fd2c4d0  8     0x7f85c5bebd96
0x000000000cefb625: R 0x00007fff6fd2c4d8  8     0x7f85c5bebd96
0x000000000cefb62c: R 0x00007fff6fd2c4e0  8     0x7fff6fd2c570
0x000000000cefb62d: R 0x00007fff6fd2c4e8  8          0xcefb54e

The first field is the function location, the second field is R or W (reading or writing obviously), the third field is the memory location read or written the fourth field is the amount of bits read and the fifth field is prefetched memory.

The function that is used can be looked up using the addr2line linux utility:

$ addr2line -p -f -e /u01/app/oracle/product/12.1.0.2/dbhome_1/bin/oracle 0x000000000cefb61e
sntpread at ??:?

I looked up the second address from the pinatrace.out file above, and that address belongs to the function sntpread. There is no additional information available for this function (‘at ??:?’). If the address is not available in the oracle executable, a ‘??’ is displayed:

$ addr2line -p -f -e /u01/app/oracle/product/12.1.0.2/dbhome_1/bin/oracle 0x00007f85c63fe218
?? ??:0

The pinatrace.out file is usable if you know the exact instruction pointer address or the memory location. However, that usage is fairly limited. An example of that is Mahmoud Hatem’s blog on tracing access to a memory location. Wouldn’t it be nice if we can change the functions addresses to function names, and the memory addresses to named memory locations whenever possible?

That’s where I created the pinatrace annotate oracle tool for. This is a little scriptset that contains scripts to generate memory information from the instance, after which the instruction pointer addresses and memory locations of a pinatrace.out file generated by pinatrace are translated to function names and memory area names. Let’s have a look what that means. This is a snippet of a pinatrace.out file:

0x000000000c967e46: R 0x0000000095f69910  8         0x95fcf6b0
0x000000000c967e4d: W 0x00007fff6fd2b2b8  8          0xc967e52
0x000000000c937b32: W 0x00007fff6fd2b2b0  8     0x7fff6fd2bdb0
0x000000000c937b3a: W 0x00007fff6fd2b278  8                0xe
0x000000000c937b41: W 0x00007fff6fd2b298  8         0x95f68ea8
0x000000000c937b45: W 0x00007fff6fd2b270  8                0x1
0x000000000c937b49: W 0x00007fff6fd2b280  8     0x7f85ca1db280
0x000000000c937b4d: R 0x0000000095fcf6bc  2               0x12
0x000000000c937b52: W 0x00007fff6fd2b288  8              0x2c4
0x000000000c937b59: W 0x00007fff6fd2b290  8          0xd8f898c
0x000000000c937b60: W 0x00007fff6fd2b2a0  4               0x73
0x000000000c937b6b: W 0x00007fff6fd2b2a8  4                0x1
0x000000000c937b6e: R 0x00007f85ca1db280  8     0x7f85ca1db280
0x000000000c937b77: R 0x000000000d0a40e4  4                0x1
0x000000000c937b84: R 0x00007f85ca1d43c8  8         0x95dc0e20
0x000000000c937b92: R 0x0000000095dc10b0  8                  0
0x000000000c937ba2: R 0x0000000095fcf6c0  4                0x1
0x000000000c937ba9: R 0x0000000095dc10e0  4                  0
0x000000000c937baf: R 0x000000000cfbe644  4            0x1cffe
0x000000000c937bbc: W 0x0000000095dc10b0  8         0x95fcf6b0
0x000000000c937bc5: R 0x0000000095fcf6b0  8                  0
0x000000000c937bc5: W 0x0000000095fcf6b0  8                0x1
0x000000000c937bca: W 0x00007fff6fd2b260  8                  0
0x000000000c937be1: R 0x00007f85ca1d4290  8     0x7f85ca1a9ca0
0x000000000c937bec: R 0x00007f85ca1ab1c0  4                0x3
0x000000000c937bf3: W 0x0000000095dc0faa  2                0x3
0x000000000c937bf9: R 0x00007f85ca1d43e0  8         0x95f68ea8
0x000000000c937c09: R 0x0000000095f69470  2                  0
0x000000000c937c16: W 0x0000000095dc0fac  2                  0
0x000000000c937c1e: R 0x0000000095dc10e0  4                  0
0x000000000c937c1e: W 0x0000000095dc10e0  4                0x2
0x000000000c937c24: W 0x0000000095dc0fa0  8         0x95fcf6b0
0x000000000c937c28: W 0x0000000095dc0fa8  2                0x8
0x000000000c937c2e: R 0x000000006000a9d8  4                0x1
0x000000000c937c3b: R 0x00007fff6fd2b298  8         0x95f68ea8
0x000000000c937c3f: R 0x00007fff6fd2b2a0  4               0x73
0x000000000c937c42: W 0x0000000095fcf6c8  8         0x95f68ea8
0x000000000c937c46: W 0x0000000095fcf6c4  4               0x73
0x000000000c937c4a: R 0x00007fff6fd2b2a8  4                0x1
0x000000000c937c50: R 0x0000000095fcf6b8  4              0x83e
0x000000000c937c50: W 0x0000000095fcf6b8  4              0x83f
0x000000000c937c5a: W 0x0000000095dc10b0  8                  0
0x000000000c937c65: R 0x00007f85ca1d71d6  1                  0
0x000000000c937c76: R 0x00007fff6fd2b270  8                0x1
0x000000000c937c7a: R 0x00007fff6fd2b290  8          0xd8f898c
0x000000000c937c7e: R 0x00007fff6fd2b288  8              0x2c4
0x000000000c937c82: R 0x00007fff6fd2b280  8     0x7f85ca1db280
0x000000000c937c86: R 0x00007fff6fd2b278  8                0xe
0x000000000c937c8d: R 0x00007fff6fd2b2b0  8     0x7fff6fd2bdb0
0x000000000c937c8e: R 0x00007fff6fd2b2b8  8          0xc967e52

The usefulness of this is limited in this form. The only thing I could derive is that big numbers in the memory access column (‘0x00007fff6fd2ac60’) are probably PGA related, and the numbers between roughly 0x000000006000000 and 0x0000000095dc0fd0 are probably SGA related. After running the annotate tool, it looks like this:

ksl_get_shared_latch:W:0x00007fff6fd2b2b0():8
ksl_get_shared_latch:W:0x00007fff6fd2b278():8
ksl_get_shared_latch:W:0x00007fff6fd2b298():8
ksl_get_shared_latch:W:0x00007fff6fd2b270():8
ksl_get_shared_latch:W:0x00007fff6fd2b280():8
ksl_get_shared_latch:R:0x0000000095fcf6bc(shared pool|permanent memor,duration 1,cls perm shared pool|(child)latch:session idle bit):2
ksl_get_shared_latch:W:0x00007fff6fd2b288():8
ksl_get_shared_latch:W:0x00007fff6fd2b290():8
ksl_get_shared_latch:W:0x00007fff6fd2b2a0():4
ksl_get_shared_latch:W:0x00007fff6fd2b2a8():4
ksl_get_shared_latch:R:0x00007f85ca1db280(pga|Other, pga heap, permanent memory pga|Other, top call heap, free memory):8
ksl_get_shared_latch:R:0x000000000d0a40e4():4
ksl_get_shared_latch:R:0x00007f85ca1d43c8(pga|Other, pga heap, permanent memory pga|Other, top call heap, free memory):8
ksl_get_shared_latch:R:0x0000000095dc10b0(shared pool|permanent memor,duration 1,cls perm shared pool|X$KSUPR.KSLLALAQ):8
ksl_get_shared_latch:R:0x0000000095fcf6c0(shared pool|permanent memor,duration 1,cls perm shared pool|(child)latch:session idle bit):4
ksl_get_shared_latch:R:0x0000000095dc10e0(shared pool|permanent memor,duration 1,cls perm shared pool|X$KSUPR.KSLLALOW):4
ksl_get_shared_latch:R:0x000000000cfbe644():4
ksl_get_shared_latch:W:0x0000000095dc10b0(shared pool|permanent memor,duration 1,cls perm shared pool|X$KSUPR.KSLLALAQ):8
ksl_get_shared_latch:R:0x0000000095fcf6b0(shared pool|permanent memor,duration 1,cls perm shared pool|(child)latch:session idle bit):8
ksl_get_shared_latch:W:0x0000000095fcf6b0(shared pool|permanent memor,duration 1,cls perm shared pool|(child)latch:session idle bit):8
ksl_get_shared_latch:W:0x00007fff6fd2b260():8
ksl_get_shared_latch:R:0x00007f85ca1d4290(pga|Other, pga heap, permanent memory pga|Other, top call heap, free memory):8
ksl_get_shared_latch:R:0x00007f85ca1ab1c0(pga|Other, pga heap, kgh stack pga|Other, pga heap, free memory pga|Other, pga heap, permanent memory):4
ksl_get_shared_latch:W:0x0000000095dc0faa(shared pool|permanent memor,duration 1,cls perm):2
ksl_get_shared_latch:R:0x00007f85ca1d43e0(pga|Other, pga heap, permanent memory pga|Other, top call heap, free memory):8
ksl_get_shared_latch:R:0x0000000095f69470(shared pool|permanent memor,duration 1,cls perm):2
ksl_get_shared_latch:W:0x0000000095dc0fac(shared pool|permanent memor,duration 1,cls perm):2
ksl_get_shared_latch:R:0x0000000095dc10e0(shared pool|permanent memor,duration 1,cls perm shared pool|X$KSUPR.KSLLALOW):4
ksl_get_shared_latch:W:0x0000000095dc10e0(shared pool|permanent memor,duration 1,cls perm shared pool|X$KSUPR.KSLLALOW):4
ksl_get_shared_latch:W:0x0000000095dc0fa0(shared pool|permanent memor,duration 1,cls perm):8
ksl_get_shared_latch:W:0x0000000095dc0fa8(shared pool|permanent memor,duration 1,cls perm):2
ksl_get_shared_latch:R:0x000000006000a9d8(fixed sga|var:kslf_stats_):4
ksl_get_shared_latch:R:0x00007fff6fd2b298():8
ksl_get_shared_latch:R:0x00007fff6fd2b2a0():4
ksl_get_shared_latch:W:0x0000000095fcf6c8(shared pool|permanent memor,duration 1,cls perm shared pool|(child)latch:session idle bit):8
ksl_get_shared_latch:W:0x0000000095fcf6c4(shared pool|permanent memor,duration 1,cls perm shared pool|(child)latch:session idle bit):4
ksl_get_shared_latch:R:0x00007fff6fd2b2a8():4
ksl_get_shared_latch:R:0x0000000095fcf6b8(shared pool|permanent memor,duration 1,cls perm shared pool|(child)latch:session idle bit):4
ksl_get_shared_latch:W:0x0000000095fcf6b8(shared pool|permanent memor,duration 1,cls perm shared pool|(child)latch:session idle bit):4
ksl_get_shared_latch:W:0x0000000095dc10b0(shared pool|permanent memor,duration 1,cls perm shared pool|X$KSUPR.KSLLALAQ):8
ksl_get_shared_latch:R:0x00007f85ca1d71d6(pga|Other, pga heap, permanent memory pga|Other, top call heap, free memory):1
ksl_get_shared_latch:R:0x00007fff6fd2b270():8
ksl_get_shared_latch:R:0x00007fff6fd2b290():8
ksl_get_shared_latch:R:0x00007fff6fd2b288():8
ksl_get_shared_latch:R:0x00007fff6fd2b280():8
ksl_get_shared_latch:R:0x00007fff6fd2b278():8
ksl_get_shared_latch:R:0x00007fff6fd2b2b0():8
ksl_get_shared_latch:R:0x00007fff6fd2b2b8():8

So, now you can see the reason I picked a seemingly arbitrary range of lines actually was because that range is the memory accesses of the ksl_get_shared_latch function. This annotated version show a shared latch get for the ‘session idle bit’ latch. It’s also visible the function uses PGA memory, some of it annotated, some of it not, and that most of the shared pool access is for the latch (a latch essentially is a memory range with the function of serialising access to a resource), which is in the shared pool because it’s a child latch. It’s also visible memory belonging to X$KSUPR is read and written (X$KSUPR is the table responsible for V$PROCESS, the fields KSLLALAQ and KSLLALOW are not externalised in V$PROCESS).

Why are a lot of the assumed PGA addresses (the ones like 0x00007fff6fd2b2b8) not annotated? Well, PGA memory allocations are very transient of nature. Because a PGA memory snapshot is made at a certain point in time, this snapshot represents the memory layout of that moment, which has a high probability of having memory deallocated and freed to the operating system. A lot of the SGA/shared pool allocations on the other hand have the intention of re-usability, and thus are not freed immediately after usage, which gives the SGA memory snapshot a good chance of capturing a lot of the memory allocations.

Get the pinatrace oracle annotate tool via github: git clone https://github.com/FritsHoogland/pinatrace_annotate_oracle.git

Please mind this tool uses the bash shell, it might not work in other shells like ksh.

How to use the tool?
– Use pin with the pinatrace.so tool, as described above. Move the the pinatrace.out file from $ORACLE_HOME/dbs to the directory with the pinatrace_annotate_oracle.sh script.
Immediately after the trace has been generated (!), execute the following scripts using sqlplus as SYSDBA:
– 0_get_pga_detail.sql (this lists the sessions in the database and requires you to specify the oracle PID of the session)
– 1_generate_memory_ranges.sql
– 2_generate_memory_ranges_xtables.sql
– 3_generate_memory_ranges_pga.sql
This results in the following files: memory_ranges.csv, memory_ranges_pga.csv and memory_ranges_xtables.csv.
Now execute the annotate script:
– ./pinatrace_annotate_oracle.sh pinatrace.out
The script outputs to STDOUT, so if you want to save the annotation, redirect it to a file (> file.txt) or if you want to look and redirect to a file: | tee file.txt.

I hope this tool is useful for your research. If you know a memory area described in the data dictionary that is not included, please drop me a message with the script, then I’ll include it.

This blogpost is an introduction to Intel’s Pin dynamic instrumentation framework. Pin and the pintools were brought to my attention by Mahmoud Hatem in his blogpost Tracing Memory access of an oracle process: Intel PinTools. The Pin framework provides an API that abstracts instruction-set specifics (on the CPU layer). Because this is a dynamic binary instrumentation tool, it requires no recompiling of source code. This means we can use it with programs like the Oracle database executable.
The Pin framework download comes with a set of pre-created tools called ‘Pintools’. Some of these tools are really useful for Oracle investigation and research.

Pin works in a very sophisticated way. The description in the Pin manualis to think of Pin as a JIT (just in time) compiler, where the compiler does not take byte code (as JIT compilation does with Java), but the executable of the process pin is executed against. This means pin inserts itself into the process’ execution. This can be seen when looking at the memory map of such a process:

$ cat /proc/29595/maps | grep -e pin-3.0 -e oracle
00400000-1098a000 r-xp 00000000 fb:02 68469986                           /u01/app/oracle/product/12.1.0.2/dbhome_1/bin/oracle
10b8a000-10bac000 r--p 1058a000 fb:02 68469986                           /u01/app/oracle/product/12.1.0.2/dbhome_1/bin/oracle
10bac000-10e05000 rw-p 105ac000 fb:02 68469986                           /u01/app/oracle/product/12.1.0.2/dbhome_1/bin/oracle
7fa01e04c000-7fa01e31d000 r-xp 00000000 fb:04 67152845                   /home/oracle/pin/pin-3.0-76991-gcc-linux/intel64/lib-ext/libpin3dwarf.so
7fa01e51c000-7fa01e530000 r--p 002d0000 fb:04 67152845                   /home/oracle/pin/pin-3.0-76991-gcc-linux/intel64/lib-ext/libpin3dwarf.so
7fa01e530000-7fa01e531000 rw-p 002e4000 fb:04 67152845                   /home/oracle/pin/pin-3.0-76991-gcc-linux/intel64/lib-ext/libpin3dwarf.so
7fa01e533000-7fa01e71a000 r-xp 00000000 fb:04 954267                     /home/oracle/pin/pin-3.0-76991-gcc-linux/source/tools/SimpleExamples/obj-intel64/calltrace.so
7fa01e91a000-7fa01e91c000 r--p 001e7000 fb:04 954267                     /home/oracle/pin/pin-3.0-76991-gcc-linux/source/tools/SimpleExamples/obj-intel64/calltrace.so
7fa01e91c000-7fa01e91e000 rw-p 001e9000 fb:04 954267                     /home/oracle/pin/pin-3.0-76991-gcc-linux/source/tools/SimpleExamples/obj-intel64/calltrace.so
7fa01e946000-7fa01e9c9000 r-xp 00000000 fb:04 136702                     /home/oracle/pin/pin-3.0-76991-gcc-linux/intel64/runtime/pincrt/libc-dynamic.so
7fa01e9ca000-7fa01e9cc000 r--p 00083000 fb:04 136702                     /home/oracle/pin/pin-3.0-76991-gcc-linux/intel64/runtime/pincrt/libc-dynamic.so
7fa01e9cc000-7fa01e9ce000 rw-p 00085000 fb:04 136702                     /home/oracle/pin/pin-3.0-76991-gcc-linux/intel64/runtime/pincrt/libc-dynamic.so
7fa01e9d6000-7fa01ea04000 r-xp 00000000 fb:04 136694                     /home/oracle/pin/pin-3.0-76991-gcc-linux/intel64/runtime/pincrt/libm-dynamic.so
7fa01ea04000-7fa01ea05000 r--p 0002d000 fb:04 136694                     /home/oracle/pin/pin-3.0-76991-gcc-linux/intel64/runtime/pincrt/libm-dynamic.so
7fa01ea05000-7fa01ea06000 rw-p 0002e000 fb:04 136694                     /home/oracle/pin/pin-3.0-76991-gcc-linux/intel64/runtime/pincrt/libm-dynamic.so
7fa01ea06000-7fa01eac4000 r-xp 00000000 fb:04 136696                     /home/oracle/pin/pin-3.0-76991-gcc-linux/intel64/runtime/pincrt/libstlport-dynamic.so
7fa01eac5000-7fa01eac8000 r--p 000be000 fb:04 136696                     /home/oracle/pin/pin-3.0-76991-gcc-linux/intel64/runtime/pincrt/libstlport-dynamic.so
7fa01eac8000-7fa01eaca000 rw-p 000c1000 fb:04 136696                     /home/oracle/pin/pin-3.0-76991-gcc-linux/intel64/runtime/pincrt/libstlport-dynamic.so
7fa01eacb000-7fa01edb7000 r-xp 00000000 fb:04 100663633                  /home/oracle/pin/pin-3.0-76991-gcc-linux/extras/xed-intel64/lib/libxed.so
7fa01edb8000-7fa01ee24000 r--p 002ec000 fb:04 100663633                  /home/oracle/pin/pin-3.0-76991-gcc-linux/extras/xed-intel64/lib/libxed.so
7fa01ee24000-7fa01ee25000 rw-p 00358000 fb:04 100663633                  /home/oracle/pin/pin-3.0-76991-gcc-linux/extras/xed-intel64/lib/libxed.so
7fa01ee80000-7fa01f385000 r-xp 00000000 fb:04 136689                     /home/oracle/pin/pin-3.0-76991-gcc-linux/intel64/bin/pinbin
7fa01f385000-7fa01f38d000 r--p 00504000 fb:04 136689                     /home/oracle/pin/pin-3.0-76991-gcc-linux/intel64/bin/pinbin
7fa01f38d000-7fa01f394000 rw-p 0050c000 fb:04 136689                     /home/oracle/pin/pin-3.0-76991-gcc-linux/intel64/bin/pinbin
7fa01f54e000-7fa01f6d4000 r-xp 00000000 fb:02 212053053                  /u01/app/oracle/product/12.1.0.2/dbhome_1/lib/libshpkavx12.so
7fa01f6d4000-7fa01f8d3000 ---p 00186000 fb:02 212053053                  /u01/app/oracle/product/12.1.0.2/dbhome_1/lib/libshpkavx12.so

Here you see this process an oracle foreground (rows 1-3), after which we see a lot of stuff that pin pushed into the process’ address space.

For this reason, it’s vitally important to use and run pin as the same user as the process you want to run pin against. The way pin works is that, upon execution of pin, the pin executable inserts itself into the process’ address space, gains control and then tries to load necessary libraries. If it can’t find these libraries, it will send a SIGKILL to the process, effectively stopping it!

In order to obtain pin, go to the pin homepage, downloads, linux and select the ‘gcc compiler kit’. This is a zipped tarball. Upload that to a linux server as the database owner, usually ‘oracle’. I created a directory ‘pin’ in which I put the tarball, and extracted it (tar xzf). The next step is to compile the tools that come with pin:

$ cd pin-3.0-76991-gcc-linux/source/tools
$ make

This will output a lot of stuff on your screen and compiles everything in the tools directory.

Now that we have seen an introduction, let’s use a Pin tool to do something useful! One such useful is the tool ‘DebugTrace’. First obtain the process id (pid) from an Oracle server foreground process. Then execute the pin tool against this process:

$ pwd
/home/oracle/pin/pin-3.0-76991-gcc-linux
$ ./pin -pid 2407 -t source/tools/DebugTrace/obj-intel64/debugtrace.so

Upon execution, nothing is returned, because the majority of the things are happening in the process pin is run against (2407 in this case). If no parameters are specified, the pintool will create a file in the current working directory of the process it is run against. For an oracle foreground process, the current working directory is $ORACLE_HOME/dbs:

$ ls -l /proc/2407/cwd
lrwxrwxrwx 1 oracle oinstall 0 Nov 17 01:45 /proc/2407/cwd -> /u01/app/oracle/product/12.1.0.2/dbhome_1/dbs

For the sake of testing this, I ran ‘select * from dual’ in the oracle database foreground process.

Let’s look what this produces! This is how my dbs directory looks like:

$ ls -ltr
total 2332
-rw-r--r--. 1 oracle oinstall    2992 Feb  3  2012 init.ora
-rw-r-----. 1 oracle oinstall      24 Nov  5 09:32 lkTESTDB
-rw-r-----. 1 oracle oinstall    7680 Nov  5 09:41 orapwtestdb
-rw-r-----. 1 oracle oinstall      57 Nov  5 09:42 inittestdb.ora
-rw-rw----. 1 oracle oinstall    1544 Nov 16 09:40 hc_testdb.dat
-rw-r--r--  1 oracle oinstall 2361460 Nov 17 01:49 debugtrace.out

You can see the debugtrace.out file is what is produced by the pin tool, and it look rather big (remember I only ran ‘select * from dual’)!

Let’s look into the file:

$ less debugtrace.out
         3 @@@ return underflow
Return 0x00007f706b664218 /lib64/libpthread.so.0:__read_nocancel+0x00000000000f returns: 0x12c
        14 @@@ return underflow
Return 0x000000000cefb62d /u01/app/oracle/product/12.1.0.2/dbhome_1/bin/oracle:sntpread+0x00000000003d returns: 0
        31 @@@ return underflow
Return 0x000000000cefb580 /u01/app/oracle/product/12.1.0.2/dbhome_1/bin/oracle:ntpfprd+0x0000000000c0 returns: 0
       119 @@@ return underflow
Return 0x000000000cedcc9d /u01/app/oracle/product/12.1.0.2/dbhome_1/bin/oracle:nsbasic_brc+0x00000000032d returns: 0
       122 @@@ return underflow
Return 0x000000000cedc90a /u01/app/oracle/product/12.1.0.2/dbhome_1/bin/oracle:nsbrecv+0x00000000005a returns: 0
       152 Call 0x000000000ceeb746 /u01/app/oracle/product/12.1.0.2/dbhome_1/bin/oracle:nioqrc+0x000000000276 -> 0x0000000005b170d0 /u01/app/oracle/product/12.1.0.2/dbhome_1/bin/oracle:_intel_fast_memcpy(0x7ffeef95c305, 0x7f706ae51da0, ...)
       156 Tailcall 0x0000000005b170de /u01/app/oracle/product/12.1.0.2/dbhome_1/bin/oracle:_intel_fast_memcpy+0x00000000000e -> 0x0000000005b170c0 /u01/app/oracle/product/12.1.0.2/dbhome_1/bin/oracle:_intel_fast_memcpy.P(0x7ffeef95c305, 0x7f706ae51da0, ...)
       158 | Call 0x0000000005b170c1 /u01/app/oracle/product/12.1.0.2/dbhome_1/bin/oracle:_intel_fast_memcpy.P+0x000000000001 -> 0x0000000005b1dcb0 /u01/app/oracle/product/12.1.0.2/dbhome_1/bin/oracle:__intel_ssse3_rep_memcpy(0x7ffeef95c305, 0x7f706ae51da0, ...)
       172 | Return 0x0000000005b20097 /u01/app/oracle/product/12.1.0.2/dbhome_1/bin/oracle:__intel_ssse3_rep_memcpy+0x0000000023e7 returns: 0x7ffeef95c305
       174 Return 0x0000000005b170c7 /u01/app/oracle/product/12.1.0.2/dbhome_1/bin/oracle:_intel_fast_memcpy.P+0x000000000007 returns: 0x7ffeef95c305
       197 @@@ return underflow
Return 0x000000000ceeb7ad /u01/app/oracle/product/12.1.0.2/dbhome_1/bin/oracle:nioqrc+0x0000000002dd returns: 0
       213 Call 0x000000000cb5d8b0 /u01/app/oracle/product/12.1.0.2/dbhome_1/bin/oracle:opikndf2+0x000000000410 -> 0x000000000c940560 /u01/app/oracle/product/12.1.0.2/dbhome_1/bin/oracle:kslwtectx(0x7ffeef95c040, 0x7f706ae51da1, ...)
       268 | Call 0x000000000c94066e /u01/app/oracle/product/12.1.0.2/dbhome_1/bin/oracle:kslwtectx+0x00000000010e -> 0x000000000cd585c0 /u01/app/oracle/product/12.1.0.2/dbhome_1/bin/oracle:sltrgftime64(0x7ffeef95c040, 0x7f706ae51da1, ...)
       278 | | Call 0x000000000cd5863b /u01/app/oracle/product/12.1.0.2/dbhome_1/bin/oracle:sltrgftime64+0x00000000007b -> 0x000000000b13ef90 /u01/app/oracle/product/12.1.0.2/dbhome_1/bin/oracle:sltrgftime64_cgt(0x7ffeef95c040, 0x7f706ae51da1, ...)
       286 | | | Call 0x000000000b13efa3 /u01/app/oracle/product/12.1.0.2/dbhome_1/bin/oracle:sltrgftime64_cgt+0x000000000013 -> 0x0000000000ba6860 /u01/app/oracle/product/12.1.0.2/dbhome_1/bin/oracle:.plt+0x000000001360(0x1, 0x7ffeef95beb0, ...)
       293 | | | | Call 0x00007f706b188dcb /lib64/libc.so.6:__clock_gettime+0x00000000001b -> 0x00007ffeef9ab8a0 clock_gettime(0x1, 0x7ffeef95beb0, ...)
...etc...

I think this is rather exciting! This is a FULL function call trace, indented by call depth!
The first lines in the trace are returns, and these returns produce a ‘return underflow’, which is because these functions have been called before the trace was put on the process.
This trace shows:
Calling: the address and function where the function call is made, the address and function that is called and first two arguments of the function.
Returning: the address and function where the return is executed, and the returncode.
Tailcall: this is a function that is called as the last instruction in the current function. This means it returns as part of the current function, which is shown by not indenting this call further (call, tailcall and return are all indented at the same level).

Please mind currently I am not aware of a way to remove the pin framework and tool from a process address space. Also mind that because of how this works (a lot of intercepting in the execution flow), it will slow down the process significantly. Use this tool and the above described techniques at your own risk.