Archive

Tag Archives: intel pin

This is the fourth blogpost on a series of blogposts about how the Oracle database handles redo. The previous blogpost talked about the work cycle of the log writer: https://fritshoogland.wordpress.com/2018/02/12/a-look-into-oracle-redo-part-3-the-log-writer-work-cycle-overview/. This posts is looking into the execution of the kcrfw_redo_write_driver function executed in the ksbcti.

Now that we are familiar with how the logwriter works in general, we need to take a closer look to the kcrfw_redo_write_driver function. First let me be clear: the kcrfw_redo_write_driver function inside ksbcti is called every time the logwriter process times out on its semaphore, which is every 3 seconds, so this means it is called too when nothing is written in the public redo strands. In fact, when the log writer times out on semtimedop (which means it is not (yet) semctl’ed or semop’ed), it doesn’t know if there are entries in the public redo strands at that point in time.

There is a lot of things that go on in the kcrfw_redo_write_driver function. The following tracing are small snippets of execution from the function, not the every detail in this function.

The first thing that is of interest is the logwriter obtaining a LWN (log write number) number and SCN.

 | | > kcsnew3(0x600113b8, 0x7ffc3dd10d28, ...)
 | | | > kcsnew8(0x600113b8, 0x7ffc3dd10b70, ...)
 | | | | > kslgetl(0x60049800, 0x1, ...)
 | | | | < kslgetl+0x00000000012f returns: 0x1
 | | | | > kslfre(0x60049800, 0x1, ...)
 | | | | < kslfre+0x0000000001e2 returns: 0
 | | | < kcsnew8+0x000000000117 returns: 0

Please mind at this time it’s unknown if there is anything to write in the public redo strands.

So the kcsnew3 function is executed, with 0x600113b8 as first argument. That is the address of kcsgscn_, the instance global SCN. The kcsnew3 function calls the kcsnew8 function, which calls kslgetl for latch address 0x60049800. That address belongs to the ‘lgwr LWN SCN’ latch in my instance. In the function call overview there is nothing more to see. If see look at the pina memory trace, we see a lot more; this is what is executed in between kslgetl and kslfre:

kcsnew8+107:0x0000000060016274(fixed sga|var:kcrfsg_+76 ):W:4:0x1/1()
kcsnew8+118:0x0000000060016270(fixed sga|var:kcrfsg_+72 ):R:4:0xc5/197()
kcsnew8+118:0x0000000060016270(fixed sga|var:kcrfsg_+72 ):W:4:0xc6/198()
kcsnew8+125:0x00000000600113b8(fixed sga|var:kcsgscn_+0 ):R:8:0x762b7e/7744382()
kcsnew8+131:0x00007fffa1071ae0():W:4:0/0()
kcsnew8+140:0x00007fffa1071b12():W:1:0/0()
kcsnew8+149:0x00000000600113b8(fixed sga|var:kcsgscn_+0 ):R:8:0x762b7e/7744382()
kcsnew8+149:0x00000000600113b8(fixed sga|var:kcsgscn_+0 ):W:8:0x762b7f/7744383()
kcsnew8+165:0x00007fffa1071ae0():R:4:0/0()
kcsnew8+184:0x00007f520ba8be08():R:8:0x600113b8/1610683320(fixed sga|var:kcsgscn_+0 )
kcsnew8+193:0x00007f520ba8ae90():R:8:0x7a823620/2055353888(shared pool|permanent memor,duration 1,cls perm+769 )
kcsnew8+212:0x000000007a823e40(shared pool|permanent memor,duration 1,cls perm+7695936 ):R:8:0x719beed8/190604(shared pool|ksu:stats_freel,duration 1,cls freeabl+24 )
kcsnew8+219:0x00000000719bf790(shared pool|ksu:stats_freel,duration 1,cls freeabl+2256 ):R:8:0xc6/198()
kcsnew8+219:0x00000000719bf790(shared pool|ksu:stats_freel,duration 1,cls freeabl+2256 ):W:8:0xc7/199()
kcsnew8+223:0x00007fffa1071bf0():W:8:0x762b7f/7744383()
kcsnew8+232:0x0000000060016260(fixed sga|var:kcrfsg_+56 shared pool|x_kcrfws.lwn_scn+0 ):W:8:0x762b7f/7744383()
kcsnew8+238:0x0000000060016274(fixed sga|var:kcrfsg_+76 ):W:4:0/0()
kcsnew8+246:0x0000000060016288(fixed sga|var:kcrfsg_+96 shared pool|pointer:lgwr lwn scn latch+0 ):R:8:0x60049800/1610913792(fixed sga|(parent)latch:lgwr LWN SCN+0 fixed sga|var:lwn_scn_lat_+0 )
kcsnew8+250:0x00007fffa1071ac8():W:8:0x10e3ed3f/283372863()

What is visible here, is that at kcrfsg_ offset 76 ‘1’ is written. This is quite probably a flag indicating the LWN related entries are being modified (this is a guess, of course the LWN SCN latch essentially performs the same function). Especially since this is set to ‘0’ at the end if the function (kcsnew8+238). At kcrfsg_ offset 72 a number is read and increased by 1. This is most likely the LWN number. If you take a look at the number of gets for the ‘lgwr LWN’ you will see the number of gets is very close (albeit not equal) to the LWN number.

Then the LWN SCN is fetched from kcsgscn_; it first is read, and increased by one to 7744383 and written back in kcsgscn_ (get and advance). The SCN from kcsgscn_ then is written in the kcrfsg_ struct at offset 56 a little further as well, after which the earlier set ‘1’ is reset to ‘0’ at offset 76. This SCN is visible in x$kcrfws as LWN SCN.

The next thing to look at is the function kcrfw_gather_lwn a little further:

 | | > kcrfw_gather_lwn(0x7ffc3dd10d68, 0x77eced90, ...)
 | | | > kslgetl(0x76fe0c10, 0x1, ...)
 | | | < kslgetl+0x00000000012f returns: 0x1
 | | | > kcrfw_gather_strand(0x7ffc3dd10d68, 0, ...)
 | | | < kcrfw_gather_strand+0x0000000000c2 returns: 0
 | | | > kslfre(0x76fe0c10, 0x118b2, ...)
 | | | < kslfre+0x0000000001e2 returns: 0
 | | | > kslgetl(0x76fe0cb0, 0x1, ...)
 | | | < kslgetl+0x00000000012f returns: 0x1
 | | | > kcrfw_gather_strand(0x7ffc3dd10d68, 0x1, ...)
 | | | < kcrfw_gather_strand+0x0000000000c2 returns: 0
 | | | > kslfre(0x76fe0cb0, 0x19, ...)
 | | | < kslfre+0x0000000001e2 returns: 0
 | | < kcrfw_gather_lwn+0x00000000065c returns: 0xffffffff

In the kcrfw_gather_lwn function first latch address x76fe0c10 is taken in immediate mode, which is the redo allocation child #1, and then the function kcrfw_gather_strand is called, which returns 0, after which the latch is freed. This pattern repeats itself with a different latch address, 0x76fe0cb0, which is redo allocation child #2. This gives a hint the two public strands are active. By looking at the memory accesses trace again, a fixed sga variable is used to find out how many strands are active:

7038:kcrfw_gather_lwn+76:0x00000000600169b8(Fixed Size(pgsz:2048k)|+92600 fixed sga|var:kcrf_max_strands_+0 ):R:4:0x2/2()

A little further information is obtained from kcrfsg_ about the redo allocation latch and the KCRFA structure address:

7052:kcrfw_gather_lwn+223:0x0000000060016570(fixed sga|var:kcrfsg_+840 ):R:8:0x76fe0c10/1996360720(shared pool|(child)latch#1:redo allocation+0 shared pool|permanent memor,duration 1,cls perm+16120848 )
7053:kcrfw_gather_lwn+230:0x0000000060016228(fixed sga|var:kcrfsg_+0 shared pool|pointer:shared pool redo struct+0 ):R:8:0x76fdf388/1996354440(shared pool|KCRFA+0 shared pool|permanent memor,duration 1,cls perm+16114568 )

Then the redo allocation latch child #1 is obtained via kslgetl, after which kcrfw_gather_lwn calls kcrfw_gather_strand:

kcrfw_gather_strand+2:0x00007fffa1071b00():W:8:0x7fffa1071c00/140735894985728()
kcrfw_gather_strand+15:0x0000000060016228(fixed sga|var:kcrfsg_+0 shared pool|pointer:shared pool redo struct+0 ):R:8:0x76fdf388/1996354440(shared pool|KCRFA+0 shared pool|permanent memor,duration 1,cls perm+16114568 )
kcrfw_gather_strand+22:0x00007fffa1071af8():W:8:0x76fdf388/1996354440(shared pool|KCRFA+0 shared pool|permanent memor,duration 1,cls perm+16114568 )
kcrfw_gather_strand+29:0x00007fffa1071af0():W:8:0x76fdf2b0/1996354224(shared pool|permanent memor,duration 1,cls perm+16114352 )
kcrfw_gather_strand+36:0x0000000076fdf458(shared pool|KCRFA+208 shared pool|LAST_BUF_WRITTEN+0 shared pool|permanent memor,duration 1,cls perm+16114776 ):R:4:0x65e/1630()
kcrfw_gather_strand+53:0x0000000076fdf480(shared pool|KCRFA+248 shared pool|TOTAL_BUFS_KCRFA+0 shared pool|permanent memor,duration 1,cls perm+16114816 ):R:4:0x20000/131072()
kcrfw_gather_strand+70:0x00007fffa1071de8():R:8:0x762b7f/7744383()
kcrfw_gather_strand+79:0x0000000076fdf390(shared pool|KCRFA+8 shared pool|permanent memor,duration 1,cls perm+16114576 ):R:8:0x762b7c/7744380()
kcrfw_gather_strand+86:0x0000000076fdf3a8(shared pool|KCRFA+32 shared pool|NEXT_BUF_NUM_KCRFA_CLN+0 shared pool|permanent memor,duration 1,cls perm+16114600 ):R:4:0x65e/1630()
kcrfw_gather_strand+121:0x0000000076fdf45c(shared pool|KCRFA+212 shared pool|LAST_BUF_GATHERED_KCRFA+0 shared pool|permanent memor,duration 1,cls perm+16114780 ):R:4:0x65e/1630()
kcrfw_gather_strand+169:0x00007fffa1071bf8():W:4:0x65e/1630()
kcrfw_gather_strand+176:0x00007fffa1071bfc():W:4:0/0()
kcrfw_gather_strand+182:0x00007fffa1071af0():R:8:0x76fdf2b0/1996354224(shared pool|permanent memor,duration 1,cls perm+16114352 )
kcrfw_gather_strand+186:0x00007fffa1071af8():R:8:0x76fdf388/1996354440(shared pool|KCRFA+0 shared pool|permanent memor,duration 1,cls perm+16114568 )
kcrfw_gather_strand+193:0x00007fffa1071b00():R:8:0x7fffa1071c00/140735894985728()
kcrfw_gather_strand+194:0x00007fffa1071b08():R:8:0x31f0f00/52367104()

In kcrfw_gather_strand, we see the sequence we have seen before: at offset 0 of kcrfsg_, the KCRFA struct address is looked up, after which the function investigates addresses in the KCRFA struct for the first public redo strand only. I’ve annotated these in the pinatrace snippet above. At offset 36 in the function, the last buffer written to the online redologfile is obtained from offset 208 in KCRFA, which is public strand buffer 1630. A little further, at offset 86 and 121 in the function kcrfw_gather_strand, the most recently used public strand buffer that is written to is obtained at KCRFA offset 32 (NEXT_BUF_NUM_KCRFA_CLN), and the highest buffer number that has been investigated (gathered) by the logwriter process at KCRFA offset 212 (LAST_BUF_GATHERED_KCRFA). In all these cases, the public redo strand buffer number is 1630. This means no redo has been written into the first public redo strand.
After reading the KCRFA information for the first public redo strand, the latch for this struct is freed.

A little further, the latch is obtained that protects the second public strand, and kcrfw_gather_strand is executed again to read the KCRFA information for this strand:

kcrfw_gather_strand+2:0x00007fffa1071b00():W:8:0x7fffa1071c00/140735894985728()
kcrfw_gather_strand+15:0x0000000060016228(fixed sga|var:kcrfsg_+0 shared pool|pointer:shared pool redo struct+0 ):R:8:0x76fdf388/1996354440(shared pool|KCRFA+0 shared pool|permanent memor,duration 1,cls perm+16114568 )
kcrfw_gather_strand+22:0x00007fffa1071af8():W:8:0x76fdf4b0/1996354736(shared pool|KCRFA+296 shared pool|permanent memor,duration 1,cls perm+16114864 )
kcrfw_gather_strand+29:0x00007fffa1071af0():W:8:0x76fdf1f0/1996354032(shared pool|permanent memor,duration 1,cls perm+16114160 )
kcrfw_gather_strand+36:0x0000000076fdf580(shared pool|KCRFA+504 shared pool|permanent memor,duration 1,cls perm+16115072 ):R:4:0x1d/29()
kcrfw_gather_strand+53:0x0000000076fdf5a8(shared pool|KCRFA+544 shared pool|permanent memor,duration 1,cls perm+16115112 ):R:4:0x20000/131072()
kcrfw_gather_strand+70:0x00007fffa1071de8():R:8:0x762b7f/7744383()
kcrfw_gather_strand+79:0x0000000076fdf4b8(shared pool|KCRFA+304 shared pool|permanent memor,duration 1,cls perm+16114872 ):R:8:0x762ae7/7744231()
kcrfw_gather_strand+86:0x0000000076fdf4d0(shared pool|KCRFA+328 shared pool|permanent memor,duration 1,cls perm+16114896 ):R:4:0x1d/29()
kcrfw_gather_strand+121:0x0000000076fdf584(shared pool|KCRFA+508 shared pool|permanent memor,duration 1,cls perm+16115076 ):R:4:0x1d/29()
kcrfw_gather_strand+169:0x00007fffa1071bf8():W:4:0x1d/29()
kcrfw_gather_strand+176:0x00007fffa1071bfc():W:4:0/0()
kcrfw_gather_strand+182:0x00007fffa1071af0():R:8:0x76fdf1f0/1996354032(shared pool|permanent memor,duration 1,cls perm+16114160 )
kcrfw_gather_strand+186:0x00007fffa1071af8():R:8:0x76fdf4b0/1996354736(shared pool|KCRFA+296 shared pool|permanent memor,duration 1,cls perm+16114864 )
kcrfw_gather_strand+193:0x00007fffa1071b00():R:8:0x7fffa1071c00/140735894985728()
kcrfw_gather_strand+194:0x00007fffa1071b08():R:8:0x31f0f00/52367104()

This is exactly the same function, just reading the same information for a different strand, which means at a different offset in KCRFRA. If you subtract the size of the the information for a single strand which we calculated to be 296 from the KCRFA offsets (see blogpost number 2), you get the same numbers as above (504-296=208=LAST_BUF_WRITTEN). Once you understand what is read, it should be fairly easy to see nothing has been written into the second strand too. So at this point, the logwriter knows there is nothing to write.

What does the logwriter do when there is nothing to write? It executes a ‘null write’:

 | | > kcrfw_do_null_write(0, 0, ...)
 | | | > kcrfw_slave_phase_batchdo(0, 0, ...)
 | | | | > kcrfw_slave_phase_enter(0, 0x40, ...)
 | | | | < kcrfw_slave_phase_enter+0x000000000449 returns: 0
 | | | <> kcrfw_slave_phase_exit(0, 0x40, ...)
 | | | < kcrfw_slave_phase_exit+0x00000000035a returns: 0
 | | | > kcrfw_post(0, 0, ...)
 | | | | > kcrfw_slave_single_getactivegroup(0, 0, ...)
 | | | | < kcrfw_slave_single_getactivegroup+0x000000000047 returns: 0x769a5620
 | | | | > kspGetInstType(0x1, 0x1, ...)
 | | | | | > vsnffe_internal(0x19, 0x1, ...)
 | | | | | | > vsnfprd(0x19, 0x1, ...)
 | | | | | | < vsnfprd+0x00000000000f returns: 0x8
 | | | | | | > kfIsASMOn(0x19, 0x1, ...)
 | | | | | | <> kfOsmInstanceSafe(0x19, 0x1, ...)
 | | | | | | < kfOsmInstanceSafe+0x000000000031 returns: 0
 | | | | | < vsnffe_internal+0x0000000000a7 returns: 0
 | | | | | > kspges(0x115, 0x1, ...)
 | | | | | < kspges+0x00000000010f returns: 0
 | | | | < kspGetInstType+0x0000000000b1 returns: 0x1
 | | | | > kcrfw_slave_phase_enter(0x1, 0x40, ...)
 | | | | < kcrfw_slave_phase_enter+0x00000000006f returns: 0x40
 | | | | > kcscu8(0x60016290, 0x7ffc3dd10a98, ...)
 | | | | < kcscu8+0x000000000047 returns: 0x1
 | | | | > kcsaj8(0x60016290, 0x7ffc3dd10a38, ...)
 | | | | < kcsaj8+0x0000000000dc returns: 0x1
 | | | | > kcrfw_slave_phase_exit(0x1, 0x40, ...)
 | | | | < kcrfw_slave_phase_exit+0x00000000008e returns: 0
 | | | | > kslpsemf(0x97, 0, ...)
 | | | | | > ksl_postm_init(0x7ffc3dd08730, 0x7ffc3dd10750, ...)
 | | | | | < ksl_postm_init+0x00000000002b returns: 0
 | | | | < kslpsemf+0x0000000006b5 returns: 0x1f
 | | | | > kcrfw_slave_barrier_nonmasterwait(0x769a5628, 0x4, ...)
 | | | | < kcrfw_slave_barrier_nonmasterwait+0x000000000035 returns: 0x600161a0
 | | | < kcrfw_post+0x000000000c1c returns: 0xd3
 | | < kcrfw_do_null_write+0x0000000000b2 returns: 0xd3

This means there is housekeeping to do, despite no public redo strand buffers needing writing. The most crucial things it does that I can detect, is updating the on disk SCN in the kcrfsg_ struct with the already set LWN SCN and posting any outstanding processes that are sleeping on a semaphore via post/wait. These are done in the kcrfw_post function, which probably means (kernel cache redo file write) ‘post write’, however there was nothing to write (null write), since there was nothing in the public redo strands. If you look to the above function trace, you see inside kcrfw_do_null_write, kcrfw_post is called.

Inside kcrfw_post, kcscu8 (kernel cache service get current SCN) is called with 0x60016290 as first argument, which is the memory address of offset 104 in the kcrfsg_ struct in the fixed SGA, which is externalised as ON_DISK_SCN in X$KCRFWS, which is done to read the current value of the on disk SCN:

kcscu8+2:0x00007fffa10718a0():W:8:0x7fffa1071bb0/140735894985648()
kcscu8+13:0x00007fffa1071898():W:8:0x77eced90/2012016016(Variable Size(pgsz:2048k)|+384626064 shared pool|permanent memor,duration 1,cls perm+2451496 )
kcscu8+20:0x00007fffa1071890():W:8:0x7f520ba98f40/139990359707456()
kcscu8+27:0x00000000600162a0(Fixed Size(pgsz:2048k)|+90784 fixed sga|var:kcrfsg_+120 ):R:4:0/0()
kcscu8+31:0x00000000600162a4(Fixed Size(pgsz:2048k)|+90788 fixed sga|var:kcrfsg_+124 ):R:4:0/0()
kcscu8+39:0x0000000060016290(Fixed Size(pgsz:2048k)|+90768 fixed sga|var:kcrfsg_+104 shared pool|x_kcrfws.on_disk_scn+0 ):R:8:0x762b7e/7744382()
kcscu8+43:0x00000000600162a0(Fixed Size(pgsz:2048k)|+90784 fixed sga|var:kcrfsg_+120 ):R:4:0/0()
kcscu8+56:0x00007fffa1071b18():W:8:0x762b7e/7744382()
kcscu8+59:0x00007fffa1071898():R:8:0x77eced90/2012016016(Variable Size(pgsz:2048k)|+384626064 shared pool|permanent memor,duration 1,cls perm+2451496 )
kcscu8+63:0x00007fffa1071890():R:8:0x7f520ba98f40/139990359707456()
kcscu8+70:0x00007fffa10718a0():R:8:0x7fffa1071bb0/140735894985648()
kcscu8+71:0x00007fffa10718a8():R:8:0x3202150/52437328()

The next function that is called is kcsaj8 (kernel cache service adjust SCN), with 0x60016290 as first argument, indicating it is going to change the on disk SCN:

kcsaj8+105:0x0000000060016290(Fixed Size(pgsz:2048k)|+90768 fixed sga|var:kcrfsg_+104 shared pool|x_kcrfws.on_disk_scn+0 ):R:8:0x762b7e/7744382()
kcsaj8+105:0x0000000060016290(Fixed Size(pgsz:2048k)|+90768 fixed sga|var:kcrfsg_+104 shared pool|x_kcrfws.on_disk_scn+0 ):W:8:0x762b7f/7744383()
(these are only the relevant lines from the memory trace for the function kcsaj8)

What this means is that the LWN SCN that was obtained at te beginning of the write cycle using the kcsnew3 function, now is written into the on disk SCN too, despite anything actually being written to disk. The obvious question at this point is ‘if the on disk SCN is increased without anything actually written, would there be a SCN value that actually tracks the SCN of truly written (“on disk”) redo? To spoil the next blogpost about actual redo writing, this is administered in a SCN value called the ‘real redo SCN’.

After kcrfw_post returns, the execution returns from kcrfw_do_null_write and kcrfw_redo_write_driver, ending the ‘null write’.

Advertisements

This is the third part of a series of blogposts on how the Oracle database handles redo. The previous part talked about the memory area that stores redo strand information: https://fritshoogland.wordpress.com/2018/02/05/a-look-into-oracle-redo-part-2-the-discovery-of-the-kcrfa-structure/.

The single most important process in the Oracle database for handling redo is the log writer, which primary task is flushing the redo information other Oracle database processes put in the public redo strands to disk. Now that we have investigated the public redo strands and concurrency (first part) and kcrfsg_ and the KCRFA structure (second part), it seems logical to me to look at the log writer.

Simply because we have seen the foreground process take the redo allocation latch, ‘fiddles around’ in the kcrfsg_ and KCRFA structures and releases the latch, it is a pretty safe guess the logwriter is using these structures to determine if it needs to write. It looks like the kcrfsg_ structure contains global redo information like redo log file information, and the KCRFA structure strand specific information.

First we need to get an understanding of what the logwriter process is actually doing. The below investigation is done to see how the idle and write work cycles work, I explicitly excluded additional tasks like redo log file switches.

First of all, actually quite surprisingly, there is little information about what the log writer is actually doing. The only well known features of the logwriter is that it writes the log buffer instrumented by the ‘log file parallel write’ wait event, and sleeps on a semaphore in the wait event ‘rdbms ipc message’. In order to learn more, I’ve run a debugtrace on the logwriter process running idle, and stripped excess information and then captured only the calls one level from the stack depth the logwriter process is on in between the work cycles:

$ awk -n '$0 ~ /^\ \x3e\ /' lgwr_full_cycle_stripped.txt
 > ksarcv(0x7ffc3dd12338, 0x7ffc3dd122a0, ...)               <<<<
 > ksl_exit_main_loop_wait(0x9d64fb00c, 0x7, ...)
 > ksbcti(0x12b84c18, 0x7f2c1eb804b0, ...)          (nr 1)
 > dbktFlush(0, 0, ...)
 > sltrgatime64(0x7f2c23b9e9a0, 0x7f2c23b9ebe8, ...)
 > ksbcti(0x12b84be0, 0x7f2c1eb80208, ...)          (nr 2)
 > ksbcti(0x12b84bfc, 0x7f2c1eb804b0, ...)          (nr 3)
 > sltrgatime64(0, 0, ...)
 > ksl_enter_main_loop_wait(0x1, 0x4ccef9975, ...)
 > ksarcv(0x7ffc3dd12338, 0x7ffc3dd122a0, ...)               <<<<
 > ksl_exit_main_loop_wait(0x9d67d8b01, 0x7, ...)
 > ksbcti(0x12b84c18, 0x7f2c1eb804b0, ...)          (nr 1)
 > dbktFlush(0, 0, ...)
 > sltrgatime64(0x7f2c23b9e9a0, 0x7f2c23b9ebe8, ...)
 > ksbcti(0x12b84be0, 0x7f2c1eb80208, ...)          (nr 2)
 > ksbcti(0x12b84bfc, 0x7f2c1eb804b0, ...)          (nr 3)
 > sltrgatime64(0, 0, ...)
 > ksl_enter_main_loop_wait(0x1, 0x4cd05fb07, ...)
 > ksarcv(0x7ffc3dd12338, 0x7ffc3dd122a0, ...)               <<<<

ksarcv means kernel service asynchronous receive message, this is the function that sets up the semaphore and puts the logwriter process to sleep. This also explains the ksl_exit_main_loop_wait and ksl_enter_main_loop_wait functions, these are called exactly before (enter) and after (exit) the ksarcv function. The dbktFlush function is managing trace files, this function is used to write messages to the logwriter tracefile. The sltrgatime64 function is a function that calls clock_gettime. That leaves one function that can be seen in my example: ksbcti. ksbcti means kernel service background processes call timeout/interrupts. This is a function that performs several different functions based on the first argument.

The first ksbcti function following ksl_exit_main_loop_wait (with first argument 0x12b84c18) seems to be the function that performs administering the log writer process resource usage details in the SGA among other things:

 | | > kews_timeout(0x7f2c1eb93ec0, 0, ...)
 | | | > sltrgftime64(0x7f2c1eb93ec0, 0, ...)
 | | | | > clock_gettime@plt(0x1, 0x7ffc3dd10ad0, ...)
 | | | | | > clock_gettime(0x1, 0x7ffc3dd10ad0, ...)
 | | | | | < clock_gettime+0x000000000069 returns: 0
 | | | | < clock_gettime+0x00000000003a returns: 0
 | | | < sltrgftime64+0x00000000004c returns: 0x9cb6163fa
 | | | > slcpums(0x7f2c23ba3c58, 0x191ae1d5, ...)
 | | | | > getrusage(0x1, 0x7ffc3dd10a30, ...)
 | | | | | > fthread_self(0x1, 0x7ffc3dd10a30, ...)
 | | | | | < fthread_self+0x000000000024 returns: 0
 | | | | <> getrusage_ext(0x1, 0x7ffc3dd10a30, ...)
 | | | | < getrusage+0x00000000000f returns: 0
 | | | < slcpums+0x00000000008f returns: 0x31d592
 | | | > kews_sqlst_flush(0xc, 0x72fd7d40, ...)
 | | | | > sltrgftime64(0xc, 0x72fd7d40, ...)
 | | | | | > clock_gettime@plt(0x1, 0x7ffc3dd10a60, ...)
 | | | | | | > clock_gettime(0x1, 0x7ffc3dd10a60, ...)
 | | | | | | < clock_gettime+0x000000000069 returns: 0
 | | | | | < clock_gettime+0x00000000003a returns: 0
 | | | | < sltrgftime64+0x00000000004c returns: 0x9cb61723f
 | | | | > slcpums(0x7f2c23ba3c58, 0x7ae0ee70, ...)
 | | | | | > getrusage(0x1, 0x7ffc3dd109c0, ...)
 | | | | | | > fthread_self(0x1, 0x7ffc3dd109c0, ...)
 | | | | | | < fthread_self+0x000000000024 returns: 0
 | | | | | <> getrusage_ext(0x1, 0x7ffc3dd109c0, ...)
 | | | | | < getrusage+0x00000000000f returns: 0
 | | | | < slcpums+0x00000000008f returns: 0x31e417
 | | | < kews_sqlst_flush+0x00000000016a returns: 0x7f2c23b9e900
 | | < kews_timeout+0x00000000052c returns: 0x7f2c23b9e900

kews_timeout means kernel event wait statistics, so statistics updated because of the process timing out. sltrgftime64 is an o/s specific call wrapper to read time from the operating system, it calls clock_gettime that reads the system clock (argument 1 is CLOCK_MONOTONIC, which is a clock that is not affected by ‘discontinuous’ changes to the clock time; think daylight saving time time changes for example). Similar, slcpums is an o/s specific call wrapper to read cpu usage, it calls getrusage to obtain CPU accounting data from the operating system. kews_sqlst_flush seems to be related to automatic tuning features, this *probably* (I haven’t investigated further) flushes status data like time and cpu to a buffer so features like sql monitor and adaptive features can use it.

The second ksbcti function call (with first argument 0x12b84be0) is the main function that performs the actual redo write, which is done by the kcrfw_redo_write_driver function:

 > ksbcti(0x12b84be0, 0x7f2c1eb80208, ...)
 | > ksbckbast(0, 0, ...)
 | < ksbckbast+0x000000000061 returns: 0x1
 | > ksumcl(0, 0, ...)
 | < ksumcl+0x000000000045 returns: 0x1540
 | > kcrfw_redo_write_driver(0, 0, ...) 
...

The next blogpost will look into what the kcrfw_redo_write_driver function actually performs. To spoil it a bit: the kcrfw_redo_write_driver function is called every 3 seconds, even if there’s nothing to write!

The third ksbcti function call (with first argument 0x12b84bfc) performs a function or series of functions that are not clear to me, and it’s not doing a lot inside the functions that are called in ksbcti to give me a hint:

 > ksbcti(0x12b84bfc, 0x7f2c1eb804b0, ...)
 | > ksbmsg(0, 0, ...)
 | < ksbmsg+0x000000000040 returns: 0x60065d90
 | > ksbxiaf(0, 0, ...)
 | < ksbxiaf+0x000000000058 returns: 0x795b5828
 | > kjci_action(0, 0, ...)
 | < kjci_action+0x000000000035 returns: 0x60065d90
 | > kcfsmpoll(0, 0, ...)
 | < kcfsmpoll+0x000000000030 returns: 0x60065d90
 | > ksuwaitsysevent(0, 0, ...)
 | < ksuwaitsysevent+0x0000000001e7 returns: 0x795b4aa8
 | > krdrsb_lgwr_chintr(0, 0, ...)
 | | > krsh_trace(0x2000, 0, ...)
 | | < krsh_trace+0x00000000005d returns: 0
 | < krdrsb_lgwr_chintr+0x00000000008b returns: 0
 | > ksbckbast(0, 0, ...)
 | < ksbckbast+0x000000000061 returns: 0x1
 < ksbcti+0x00000000018f returns: 0x1

Of course there are several things that point to features that are not used in my database; krdrsb points to dataguard and kjci points to RAC.

If you look again to the sequence you might wonder why the logwriter is not executing the redo write function immediately after waking up from the semaphore in kcarcv, in order to be able to write as soon as a process semctl’s/semop’s the semaphore asking the logwriter to write. Well, actually there is another sequence of function calls that can happen:

 > ksarcv(0x7ffc3dd12338, 0x7ffc3dd122a0, ...)
 > ksl_exit_main_loop_wait(0x6003ef40, 0x7af972b8, ...)
 > kcrfw_redo_write_driver(0x7ffc3dd122a0, 0x18, ...)         <<<<<
 > ksbcti(0x12b84c74, 0x7f2c1eb804b0, ...)
 > ksarcv(0x7ffc3dd12338, 0x7ffc3dd122a0, ...)
 > dbktFlush(0x7ffc3dd12338, 0x7ffc3dd122a0, ...)
 > sltrgatime64(0x7f2c23b9e9a0, 0x7f2c23b9ebe8, ...)
 > ksl_enter_main_loop_wait(0x1, 0x7a6032e8, ...)
 > ksarcv(0x7ffc3dd12338, 0x7ffc3dd122a0, ...)

Here you see ksarcv, which calls kcrfw_redo_write_driver immediately after ksarcv and ksl_exit_main_loop_wait, not inside ksbcti. What happened in ksarcv, is that semtimedop call returned 0, indicating the semaphore was semctl’ed or semop’ed, instead of timing out after 3 seconds. If you payed close attention you saw that the first argument of the ksbcti function of kcrfw_redo_write_driver is different. I am not sure why, because the same functions as the call with 0x12b84bfc as first argument are called (the ones with functions hinting at dataguard and RAC). The ksarcv call that follows is not putting the process to sleep on a semaphore, it reads in the asynchronous messages structure in fixed sga (ksasga_) after which it returns immediately. What follows are dbktFlush and sltrgatime64 for administrative purposes (writing to the tracefile if necessary, and updating resource usage information), after which the process prepares the semaphore sleep using ksl_enter_main_loop_wait and then goes to sleep on a semaphore in ksarcv.

Please note this is the general cycle, there are other tasks that the log writer performs. One example of such a task is the log writer is looking up the process monitor (PMON) process to see if it is alive, which is done every minute.

This is the second post in a series of blogposts on Oracle database redo internals. If you landed on this blogpost without having read the first blogpost, here is a link to the first blogpost: https://fritshoogland.wordpress.com/2018/01/29/a-look-into-oracle-redo-part-1-redo-allocation-latches/ The first blogpost contains all the versions used and a synopsis on what the purpose of this series of blogposts is.

In the first part, I showed how the principal access to the public redo strands is controlled by redo allocation latches, and showed a snippet of trace information of memory accesses of a foreground session when using the first public redo strand:

kcrfw_redo_gen_ext+11277:0x0000000076fdf3bc(shared pool|permanent memor,duration 1,cls perm+16114620 ):R:4:0x1256f/75119()
kcrfw_redo_gen_ext+11290:0x0000000076fdf3d8(shared pool|permanent memor,duration 1,cls perm+16114648 ):R:8:0/0()
kcrfw_redo_gen_ext+11300:0x0000000076fdf430(shared pool|permanent memor,duration 1,cls perm+16114736 ):W:8:0/0()
kcrfw_redo_gen_ext+11307:0x0000000076fdf3d0(shared pool|permanent memor,duration 1,cls perm+16114640 ):R:8:0xdeadbeef0000726e/-2401053092612115858()
kcrfw_redo_gen_ext+11312:0x0000000076fdf428(shared pool|permanent memor,duration 1,cls perm+16114728 ):W:8:0xdeadbeef0000726e/-2401053092612115858()
kcrfw_redo_gen_ext+11317:0x0000000076fdf3c8(shared pool|permanent memor,duration 1,cls perm+16114632 ):R:8:0x100010000726e/281479271707246()
kcrfw_redo_gen_ext+11322:0x0000000076fdf420(shared pool|permanent memor,duration 1,cls perm+16114720 ):W:8:0x100010000726e/281479271707246()
kcrfw_redo_gen_ext+11327:0x0000000076fdf3c0(shared pool|permanent memor,duration 1,cls perm+16114624 ):R:8:0x1b800001000/1889785614336(??)
kcrfw_redo_gen_ext+11332:0x0000000076fdf418(shared pool|permanent memor,duration 1,cls perm+16114712 ):W:8:0x1b800001000/1889785614336(??)
kcrfw_redo_gen_ext+11337:0x0000000076fdf3b8(shared pool|permanent memor,duration 1,cls perm+16114616 ):R:8:0x1256f0000b2fe/322633648354046()
kcrfw_redo_gen_ext+11342:0x0000000076fdf410(shared pool|permanent memor,duration 1,cls perm+16114704 ):W:8:0x1256f0000b2fe/322633648354046()
kcrfw_redo_gen_ext+11347:0x0000000076fdf3b0(shared pool|permanent memor,duration 1,cls perm+16114608 ):R:8:0/0()
kcrfw_redo_gen_ext+11352:0x0000000076fdf408(shared pool|permanent memor,duration 1,cls perm+16114696 ):W:8:0/0()
kcrfw_redo_gen_ext+11307:0x0000000076fdf3a8(shared pool|permanent memor,duration 1,cls perm+16114600 ):R:8:0x7270/29296()
kcrfw_redo_gen_ext+11312:0x0000000076fdf400(shared pool|permanent memor,duration 1,cls perm+16114688 ):W:8:0x7270/29296()
kcrfw_redo_gen_ext+11317:0x0000000076fdf3a0(shared pool|permanent memor,duration 1,cls perm+16114592 ):R:8:0xd8e4e200/3638878720(Redo Buffers(pgsz:2048k)|+15000064 redo|PUB_REDO_0+15000064 )
kcrfw_redo_gen_ext+11322:0x0000000076fdf3f8(shared pool|permanent memor,duration 1,cls perm+16114680 ):W:8:0xd8e4e200/3638878720(Redo Buffers(pgsz:2048k)|+15000064 redo|PUB_REDO_0+15000064 )
kcrfw_redo_gen_ext+11327:0x0000000076fdf398(shared pool|permanent memor,duration 1,cls perm+16114584 ):R:8:0/0()
kcrfw_redo_gen_ext+11332:0x0000000076fdf3f0(shared pool|permanent memor,duration 1,cls perm+16114672 ):W:8:0/0()
kcrfw_redo_gen_ext+11337:0x0000000076fdf390(shared pool|permanent memor,duration 1,cls perm+16114576 ):R:8:0x758e9f/7704223()
kcrfw_redo_gen_ext+11342:0x0000000076fdf3e8(shared pool|permanent memor,duration 1,cls perm+16114664 ):W:8:0x758e9f/7704223()
kcrfw_redo_gen_ext+11347:0x0000000076fdf388(shared pool|permanent memor,duration 1,cls perm+16114568 ):R:8:0x758ea0/7704224()
kcrfw_redo_gen_ext+11352:0x0000000076fdf3e0(shared pool|permanent memor,duration 1,cls perm+16114656 ):W:8:0x758ea0/7704224()
kcrfw_redo_gen_ext+11366:0x0000000076fdf438(shared pool|permanent memor,duration 1,cls perm+16114744 ):W:4:0xdeadbeef/3735928559(Redo Buffers(pgsz:2048k)|+112049903 redo|PUB_REDO_1+44941039 )
kcrfw_redo_gen_ext+11380:0x0000000076fdf3a8(shared pool|permanent memor,duration 1,cls perm+16114600 ):R:4:0x7270/29296()
kcrfw_redo_gen_ext+11386:0x0000000076fdf3a8(shared pool|permanent memor,duration 1,cls perm+16114600 ):W:4:0x7271/29297()
kcrfw_redo_gen_ext+11392:0x0000000076fdf468(shared pool|permanent memor,duration 1,cls perm+16114792 ):R:8:0xe0000000/3758096384(Redo Buffers(pgsz:2048k)|+134217728 )

And a snippet of trace information of memory accesses of a foreground session when using the second public redo strand:

kcrfw_redo_gen_ext+11277:0x0000000076fdf4e4(shared pool|permanent memor,duration 1,cls perm+16114916 ):R:4:0x1256f/75119()
kcrfw_redo_gen_ext+11290:0x0000000076fdf500(shared pool|permanent memor,duration 1,cls perm+16114944 ):R:8:0/0()
kcrfw_redo_gen_ext+11300:0x0000000076fdf558(shared pool|permanent memor,duration 1,cls perm+16115032 ):W:8:0/0()
kcrfw_redo_gen_ext+11307:0x0000000076fdf4f8(shared pool|permanent memor,duration 1,cls perm+16114936 ):R:8:0xdeadbeef000000e6/-2401053092612144922()
kcrfw_redo_gen_ext+11312:0x0000000076fdf550(shared pool|permanent memor,duration 1,cls perm+16115024 ):W:8:0xdeadbeef000000e6/-2401053092612144922()
kcrfw_redo_gen_ext+11317:0x0000000076fdf4f0(shared pool|permanent memor,duration 1,cls perm+16114928 ):R:8:0x10001000000e6/281479271678182()
kcrfw_redo_gen_ext+11322:0x0000000076fdf548(shared pool|permanent memor,duration 1,cls perm+16115016 ):W:8:0x10001000000e6/281479271678182()
kcrfw_redo_gen_ext+11327:0x0000000076fdf4e8(shared pool|permanent memor,duration 1,cls perm+16114920 ):R:8:0x800/2048()
kcrfw_redo_gen_ext+11332:0x0000000076fdf540(shared pool|permanent memor,duration 1,cls perm+16115008 ):W:8:0x800/2048()
kcrfw_redo_gen_ext+11337:0x0000000076fdf4e0(shared pool|permanent memor,duration 1,cls perm+16114912 ):R:8:0x1256f00012488/322633648383112()
kcrfw_redo_gen_ext+11342:0x0000000076fdf538(shared pool|permanent memor,duration 1,cls perm+16115000 ):W:8:0x1256f00012488/322633648383112()
kcrfw_redo_gen_ext+11347:0x0000000076fdf4d8(shared pool|permanent memor,duration 1,cls perm+16114904 ):R:8:0/0()
kcrfw_redo_gen_ext+11352:0x0000000076fdf530(shared pool|permanent memor,duration 1,cls perm+16114992 ):W:8:0/0()
kcrfw_redo_gen_ext+11307:0x0000000076fdf4d0(shared pool|permanent memor,duration 1,cls perm+16114896 ):R:8:0xe6/230()
kcrfw_redo_gen_ext+11312:0x0000000076fdf528(shared pool|permanent memor,duration 1,cls perm+16114984 ):W:8:0xe6/230()
kcrfw_redo_gen_ext+11317:0x0000000076fdf4c8(shared pool|permanent memor,duration 1,cls perm+16114888 ):R:8:0xdc01ce00/3691105792(Redo Buffers(pgsz:2048k)|+67227136 redo|PUB_REDO_1+118272 )
kcrfw_redo_gen_ext+11322:0x0000000076fdf520(shared pool|permanent memor,duration 1,cls perm+16114976 ):W:8:0xdc01ce00/3691105792(Redo Buffers(pgsz:2048k)|+67227136 redo|PUB_REDO_1+118272 )
kcrfw_redo_gen_ext+11327:0x0000000076fdf4c0(shared pool|permanent memor,duration 1,cls perm+16114880 ):R:8:0/0()
kcrfw_redo_gen_ext+11332:0x0000000076fdf518(shared pool|permanent memor,duration 1,cls perm+16114968 ):W:8:0/0()
kcrfw_redo_gen_ext+11337:0x0000000076fdf4b8(shared pool|permanent memor,duration 1,cls perm+16114872 ):R:8:0x75a1d2/7709138()
kcrfw_redo_gen_ext+11342:0x0000000076fdf510(shared pool|permanent memor,duration 1,cls perm+16114960 ):W:8:0x75a1d2/7709138()
kcrfw_redo_gen_ext+11347:0x0000000076fdf4b0(shared pool|permanent memor,duration 1,cls perm+16114864 ):R:8:0x75a1d2/7709138()
kcrfw_redo_gen_ext+11352:0x0000000076fdf508(shared pool|permanent memor,duration 1,cls perm+16114952 ):W:8:0x75a1d2/7709138()
kcrfw_redo_gen_ext+11366:0x0000000076fdf560(shared pool|permanent memor,duration 1,cls perm+16115040 ):W:4:0xdeadbeef/3735928559(Redo Buffers(pgsz:2048k)|+112049903 redo|PUB_REDO_1+44941039 )
kcrfw_redo_gen_ext+11380:0x0000000076fdf4d0(shared pool|permanent memor,duration 1,cls perm+16114896 ):R:4:0xe6/230()
kcrfw_redo_gen_ext+11386:0x0000000076fdf4d0(shared pool|permanent memor,duration 1,cls perm+16114896 ):W:4:0xe7/231()
kcrfw_redo_gen_ext+11392:0x0000000076fdf590(shared pool|permanent memor,duration 1,cls perm+16115088 ):R:8:0xe0200000/3760193536(Redo Buffers(pgsz:2048k)|+136314880 )

If you pay really close attention you see the memory location moved from 0x76fdf3?? / 0x76fdf4?? to 0x76fdf4?? / 0x76fdf5??, all in permanent memory in the shared pool. Would this mean information for each strand is kept in a struct per strand? Let’s take a memory allocation on a specific offset in the kcrfw_redo_gen_ext function, and take the allocation on exactly the same offset in the kcrfw_redo_gen_ext function, which means it’s probably doing the same, and calculate the difference of the memory allocation:

strand 0
kcrfw_redo_gen_ext+11322:0x0000000076fdf3f8(shared pool|permanent memor,duration 1,cls perm+16114680 ):W:8
strand 1
kcrfw_redo_gen_ext+11322:0x0000000076fdf520(shared pool|permanent memor,duration 1,cls perm+16114976 ):W:8

Now take the memory addresses, and calculate the difference: 0x76fdf520 – 0x76fdf3f8 = 0x128 = 296

Okay, so would the size of the probable struct per strand be 296 bytes? Another question: where would this potential struct start?

Often, the location of structs is fetched from another location, just like we saw earlier for the redo allocation latch, especially if they are more dynamically created than fixed SGA structures, which are not dynamic by nature. Upon close examination of similarities between both pina traces, I found usage of a pointer in the kcrfsg_ struct at offset +0:

kcrfw_redo_gen_ext+14362:0x0000000060016228(fixed sga|var:kcrfsg_+0 ):R:8:0x76fdf388/1996354440(shared pool|permanent memor,duration 1,cls perm+16114568 )

The difference between the memory address I took and the pointer address in kcrfsg_ at offset 0 for the first public redo strand is:
0x76fdf3f8 – 0x76fdf388 = 0x70 = 112
The difference between the memory address I took for the second public redo strand and the pointer in kcrfsg_ at offset 0 is:
0x76fdf520 – 0x76fdf388 = 0x198 = 408
Then subtract with the calculated difference between the memory locations of the first and second public redo strand:
408 – 296 = 112
Bingo!

It would be really helpful to have some kind of explanation about this structure… I searched through My Oracle Support (very seldomly the regular knowledge base provides some internals information, the best chance for internals is the bug database, the rest is…well, let me say: not that helpful. I also hate that other tabs are enabled if nothing is found), oraus.msg, oradebug dumplist, oradebug doc component for any tracing or dumps that could give more information, but nothing seemed to point to any explanation. I dumped the shared pool, but I found the same information (the memory location being permanent memory) as I obtained from x$ksmsp. Also, since the pinatrace annotation takes care of any x$ view that has it’s source address in the SGA, it seems there is no way other than puzzle all the individual locations in what I assume is a structure that holds per strand metadata.

I decided to go through the x$ views one more time, and found an x$ view that I already knew of, which points to both ‘strands’ and ‘kcrf’; x$kcrfstrand. However, the reason it is not annotated is because this x$ view has it’s addresses in the PGA:

SQL> select addr from x$kcrfstrand;

ADDR
----------------
00007F14CE31AE00
00007F14CE31AE00

This means that when querying this x$ view, the results are gathered from wherever it comes from (the source can be file headers like redo (x$kccle) and controlfile (x$kcccf) for example, in this case we will the actual source of x$kcrfstrand is the SGA) and then stored temporarily in PGA, potentially manipulated, and then shown from the PGA, as we can see from the addresses (addr field in x$kcrfstrand).

However, what I can do to investigate where the x$kcfstrand takes its information from and run the pina (memory access) trace on the execution of ‘select * from x$kcrfstrand’, and try to see if I can find where it takes its data from. This is how the relevant part looks like:

qerfxFetch+1563:0x00007ffee590a638():W:8:0x10ea8781/283805569()
kcrf_strand_ftcb+2:0x00007ffee590a630():W:8:0x7ffee590b130/140732749885744()
kcrf_strand_ftcb+10:0x00007ffee590a620():W:8:0x7f7780569f40/140151230996288()
kcrf_strand_ftcb+23:0x00000000600169b8(Fixed Size(pgsz:2048k)|+92600 fixed sga|var:kcrf_max_strands_+0 ):R:4:0x2/2()
kcrf_strand_ftcb+29:0x00000000600169c8(Fixed Size(pgsz:2048k)|+92616 fixed sga|var:kcrf_actv_pvt_strands_+0 ):R:4:0/0()
kcrf_strand_ftcb+35:0x00007f7780448558():R:4:0/0()
kcrf_strand_ftcb+52:0x0000000060016228(Fixed Size(pgsz:2048k)|+90664 fixed sga|var:kcrfsg_+0 shared pool|pointer:shared pool redo struct+0 ):R:8:0x76fdf388/1996354440(shared pool|permanent memor,duration 1,cls perm+16114568 )
kcrf_strand_ftcb+64:0x00007ffee590a618():W:8:0xcf7cb25/217565989()
_intel_fast_memcpy+7:0x00000000157f62a0():R:8:0x137fff/1277951()
__intel_ssse3_rep_memcpy+160:0x00000000157b9488():R:4:0x8000/32768()
__intel_ssse3_rep_memcpy+184:0x0000000076fdf4a0(shared pool|permanent memor,duration 1,cls perm+16114848 ):R:16:0x00000000200040000/8590196736(??)
__intel_ssse3_rep_memcpy+3438:0x00007f7780448670():W:16:0x0000ec00000000000/4151755906482176()
...
__intel_ssse3_rep_memcpy+9254:0x0000000076fdf388(shared pool|permanent memor,duration 1,cls perm+16114568 ):R:16:0xfe56e00000fe56e00000/-2305841916834480128()
__intel_ssse3_rep_memcpy+9258:0x00007f7780448560():W:16:0xfe56e00000fe56e00000/-2305841916834480128()

Going from top to bottom through this snippet:
– qerfxFetch is the query execute rowsource function to query x$ tables, kcrf_strand_ftcb is an helper function (kernel cache recovery file strand fetch copy buffer is my guess) to fetch the kcrf strand data from the shared pool structure; the other functions are memory copy functions.
– Interestingly, the kcrf_strand_ftcb function reads offset 0 of kcrfsg_ just like we seen before. This is quite logical if you think about it, this struct is dynamically allocated in the shared pool, which means it doesn’t have a fixed address, so it needs to be looked up every time something must be read from it.
– The next functions are memcpy functions. I’ve shown the relevant ones. The first memcpy (_intel_fast_memcpy) probably is a helper function that switches to an intel optimised memcpy, __intel_ssse3_rep_memcpy, which takes advantage of intel’s SIMD optimisations (SSE3 means Supplemental Streaming SIMD Extensions 3).
– The memcpy functions at lines 11 and 12 show it’s reading 16 bytes from 0x76fdf4a0, and then writing 16 bytes to 0x7f7780448670.
– 0x76fdf4a0 is the struct for the second strand: kcrfsg_+0 = 0x76fdf388 + 296 = 0x76fdf4b0 + 16 (copy size) = 0x76fdf4a0.
– 0x7f7780448670 is quite probably and address in the process’ PGA, which is used for the intermediate representation of the struct’s data for displaying it for the x$ view.

Even more interesting, this confirms that x$kcrfstrand is the externalisation of the struct that is used together with kcrfsg_, and the calculations that were made previously are correct; kcrfsg_+0 = 0x76fdf388, and that address with 296 added is 0x76fdf4b0, which is the same as 0x76fdf4a0 + 16 (if you look into the above example, the memcpy function copies 16 bytes at a time).

I think it’s pretty safe to assume this kcrf related struct in the shared pool is commonly referred to as ‘KCRFA’ in x$kcrfstrand. Of course we can only guess that the ‘A’ means in kernel cache recovery file, I would say ‘auxiliary’, but is could also very well be the A in ‘strAnd’. It is a structure that clearly was needed when oracle changed the concept of a single log buffer into multiple public redo strands, because identical information of multiple strands need to be stored at the same time, and the information in kcrfsg_ is singular of nature.

Please mind that because of the intermediate representation of the x$ table in PGA, I could not use the trick I showed in my ‘drilldown into the logwriter inner working and communication’, which is query a single column in the x$ view, and see which memory addresses in this struct are touched; the entire struct is copied into PGA and processed before it is used to show the contents, even if only a single column of a single row is queried.

Also, also consider that since the struct data is copied into the process’ PGA in order to make this information human readable, not all offsets (in x$kqfco) of the PGA representation might be applicable to the SGA struct. In fact, I think I have proven that the KCRFA per strand metadata width is 296. Now look at the column offsets from x$kqfco of x$kcrfstrand:

SQL> select c.kqfconam, c.kqfcooff, c.kqfcosiz
from x$kqfco c, x$kqfta t
where t.indx = c.kqfcotab
and t.kqftanam = 'X$KCRFSTRAND'
order by c.kqfcooff;

KQFCONAM                         KQFCOOFF   KQFCOSIZ
------------------------------ ---------- ----------
ADDR                                    0          8
INDX                                    0          4
INST_ID                                 0          4
CON_ID                                  0          2
STRAND_HDR_SCN_KCRFA_CLN                0          8
HIGHEST_SCN_KCRFA_CLN                   0          8
PNEXT_BUF_KCRFA_CLN                    32          8
NEXT_BUF_NUM_KCRFA_CLN                 40          4
BYTES_IN_BUF_KCRFA_CLN                 48          8
AVAILABLE_BUFS_KCRFA_CLN               56          4
MEM_BUFS_AVAILABLE_KCRFA_CLN           60          4
STRAND_HEADER_BNO_KCRFA_CLN            72          4
FLAG_KCRFA_CLN                         76          1
PVT_STRAND_STATE_KCRFA_CLN             77          1
HIGHEST_SUBSCN_KCRFA_CLN               78          2
LWN_BUF_NUM_KCRFA_CLN                  80          4
STRAND_NUM_ORDINAL_KCRFA_CLN           84          4
STRAND_INFO_PTR_KCRFA_CLN              88          8
ACTION_KCRFA                          184          4
LOG_RESIDUE_KCRFA                     188          1
LAST_BUF_WRITTEN_KCRFA                216          4
LAST_BUF_GATHERED_KCRFA               220          4
CONS_LOW_BUFNO_KCRFA                  224          4
FIRST_BUF_KCRFA                       240          8
LAST_BUF_KCRFA                        248          8
TOTAL_BUFS_KCRFA                      256          4
STRAND_SIZE_KCRFA                     260          4
LOWBUF_KCRFA                          272          4
HIBUF_KCRFA                           276          4
WRAPHIBUF_KCRFA                       280          4
LOG_SEQ_KCRFA                         284          4
PTR_KCRF_PVT_STRAND                   304          8
INDEX_KCRF_PVT_STRAND                 312          4
SPACE_KCRF_PVT_STRAND                 316          4
TXN_KCRF_PVT_STRAND                   320          2

This shows that the representation of the KCRFA struct in PGA apparently has a width of 322 bytes, while we know the KCRFA struct itself is 296. I am not sure why this is, the most obvious reason would be some data is processed already in PGA to make easier to show. This might also very well be the reason there is a need for copying it to PGA first, and not display it directly from its shared pool location.

This will be a series of posts about Oracle database redo handling. The database in use is Oracle version 12.2.0.1, with PSU 170814 applied. The operating system version is Oracle Linux Server release 7.4. In order to look into the internals of the Oracle database, I use multiple tools; very simple ones like the X$ views and oradebug, but also advanced ones, quite specifically the intel PIN tools (https://software.intel.com/en-us/articles/pin-a-dynamic-binary-instrumentation-tool). One of these tools is ‘debugtrace’, which contains pretty usable output on itself (a indented list of function calls and returns), for which I essentially filter out some data, another one is ‘pinatrace’, which does not produce directly usable output, because it provides instruction pointer and memory addresses. However, the information it provides is extremely useful (and detailed!) once you translate the addresses to Oracle function names and memory region names. I have written a tool to do that: https://gitlab.com/FritsHoogland/pinatrace_annotate.git

The blog posts are meant for investigating Oracle database internals. This is a microscopic view into how the database works. I do believe by understanding the mechanics, you can very easy and quickly assess and resolve issues because of a fundamental understanding of how the database works.

First of all to immediately show the value of looking into the internals, a clear documentation bug: this entry from the online documentation of 12.2: https://docs.oracle.com/en/database/oracle/oracle-database/12.2/cncpt/memory-architecture.html#GUID-C2AD1BF6-A5AE-42E9-9677-0AA08126864B (not sure how long this link will be valid, Oracle is notorious for changing it) states that the log buffer is singular buffer called the log buffer. In fact, the log buffer is not a singular buffer, but a buffer that contains at least to public redo strands, which changed since at least Oracle 10g. These strands are externalised in x$kcrfstrand:

SQL> select indx,strand_size_kcrfa, first_buf_kcrfa, last_buf_kcrfa from x$kcrfstrand where last_buf_kcrfa != '00';

      INDX STRAND_SIZE_KCRFA FIRST_BUF_KCRFA  LAST_BUF_KCRFA
---------- ----------------- ---------------- ----------------
         0          67108864 00000000D8000000 00000000DBFFFE00
         1          67108864 00000000DC000000 00000000DFFFFE00

The number of public redo buffers is dependent on the undocumented parameter “_log_parallelism_max”. A quick question out on twitter revealed that systems with a high count of processors could have a higher public redo strand count.

The public strands are allocated in the ‘Redo Buffers’ memory allocation:

SQL> select "AREA NAME", "START ADDR", to_char((to_number(rawtohex("START ADDR"),'xxxxxxxxxxxxxxxx')+"SEGMENT SIZE"),'XXXXXXXXXXXXXXXX'),pagesize from x$ksmssinfo; 

AREA NAME                        START ADDR       TO_CHAR((TO_NUMBE   PAGESIZE
-------------------------------- ---------------- ----------------- ----------
Variable Size                    0000000061000000          D8000000    2097152
Redo Buffers                     00000000D8000000          E07C6000    2097152
Fixed Size                       0000000060000000          60A00000    2097152
skgm overhead                    00000000E1000000          E1003000       4096

Each public strand is protected by a redo allocation latch:

SQL> select addr, child#, name, gets, misses, immediate_gets, immediate_misses from v$latch_children where name = 'redo allocation';

ADDR                 CHILD# NAME                       GETS     MISSES IMMEDIATE_GETS IMMEDIATE_MISSES
---------------- ---------- -------------------- ---------- ---------- -------------- ----------------
0000000076FE17F0         20 redo allocation               5          0              0                0
0000000076FE1750         19 redo allocation               5          0              0                0
0000000076FE16B0         18 redo allocation               5          0              0                0
0000000076FE1610         17 redo allocation               5          0              0                0
0000000076FE1570         16 redo allocation               5          0              0                0
0000000076FE14D0         15 redo allocation               5          0              0                0
0000000076FE1430         14 redo allocation               5          0              0                0
0000000076FE1390         13 redo allocation               5          0              0                0
0000000076FE12F0         12 redo allocation               5          0              0                0
0000000076FE1250         11 redo allocation               5          0              0                0
0000000076FE11B0         10 redo allocation               5          0              0                0

ADDR                 CHILD# NAME                       GETS     MISSES IMMEDIATE_GETS IMMEDIATE_MISSES
---------------- ---------- -------------------- ---------- ---------- -------------- ----------------
0000000076FE1110          9 redo allocation               5          0              0                0
0000000076FE1070          8 redo allocation               5          0              0                0
0000000076FE0FD0          7 redo allocation               5          0              0                0
0000000076FE0F30          6 redo allocation               5          0              0                0
0000000076FE0E90          5 redo allocation               5          0              0                0
0000000076FE0DF0          4 redo allocation               5          0              0                0
0000000076FE0D50          3 redo allocation               5          0              0                0
0000000076FE0CB0          2 redo allocation            6168          1           1708                2
0000000076FE0C10          1 redo allocation            7267         20          34804               15

Redo allocation latch child# 1 is the latch protecting allocations in public redo buffer 0, child# 2 is the latch protecting allocations in public redo buffer 1. Two questions that you would probably ask yourself:
1: why are there 20 redo allocation latches whilst there are only two public redo buffers?
2: how can you tell child# 1 is bound to public redo buffer 0, and child# 2 is bound to public redo buffer 1?

Answers to these questions:
1: the 18 other redo allocation latches are because of a mechanism called ‘private redo strands’, see https://blog.tanelpoder.com/files/Oracle_Performance_Scalability_10g_11g.pdf, https://fritshoogland.wordpress.com/2016/11/15/redo-a-blogpost/
2: in order to be sure about what that redo allocation latch protects, you need to look at the actual function calls.

For this, I am using a tool from intel, called ‘pin tools’, which is a suite of tools, of which one is called ‘debugtrace’: https://software.intel.com/en-us/articles/pin-a-dynamic-binary-instrumentation-tool

First I am going to look into how a latch is chosen.

Attach a debugtrace to a normal foregrond session that executes an insert and a commit, and filter the output to make it more readable using a custom filter: https://gitlab.com/snippets/1686145

Now search for the latch address (in lowercase) 0x76fe0c10 (the address of the redo allocation latch, see above query from v$latch_children):

 | | | | | | | | | | | | | | | > kcrfw_redo_gen_ext(0x3, 0x7ffd52f513e0, ...)
 | | | | | | | | | | | | | | | | > kcobrh(0x7ffd52f513e0, 0x3, ...)
 | | | | | | | | | | | | | | | | < kcobrh+0x00000000014b returns: 0x1b4
 | | | | | | | | | | | | | | | | > kcb_get_redo_version(0x3, 0x7ffd52f513a0, ...)
 | | | | | | | | | | | | | | | | < kcb_get_redo_version+0x0000000000c4 returns: 0x1
 | | | | | | | | | | | | | | | | > kcmglscn8(0x7ffd52f508d0, 0, ...)
 | | | | | | | | | | | | | | | | | > kcscu8(0x600113b8, 0x7ffd52f508d0, ...)
 | | | | | | | | | | | | | | | | | < kcscu8+0x000000000047 returns: 0x1
 | | | | | | | | | | | | | | | | < kcmglscn8+0x000000000054 returns: 0x756239
 | | | | | | | | | | | | | | | | > kscn_to_ub8_impl(0x7ffd52f50838, 0x7ffd52f508d0, ...)
 | | | | | | | | | | | | | | | | < kscn_to_ub8_impl+0x00000000003e returns: 0
 | | | | | | | | | | | | | | | | > kslgetl(0x76fdebe0, 0, ...)
 | | | | | | | | | | | | | | | | < kslgetl+0x00000000012f returns: 0x1
 | | | | | | | | | | | | | | | | > kslgetl(0x76fe0c10, 0, ...)                             <<<<<<<<
 | | | | | | | | | | | | | | | | < kslgetl+0x00000000012f returns: 0x1
 | | | | | | | | | | | | | | | | > ksvgcls(0x76fe0c10, 0, ...)
 | | | | | | | | | | | | | | | | < ksvgcls+0x000000000021 returns: 0
 | | | | | | | | | | | | | | | | > kcscu8(0x60016260, 0x7ffd52f508d8, ...)
 | | | | | | | | | | | | | | | | < kcscu8+0x000000000047 returns: 0x1
 | | | | | | | | | | | | | | | | > ub8_to_kscn_impl(0x756239, 0xd8eab838, ...)
 | | | | | | | | | | | | | | | | < ub8_to_kscn_impl+0x000000000031 returns: 0
 | | | | | | | | | | | | | | | | > ub8_to_kscn_impl(0x756239, 0xd8eab848, ...)
 | | | | | | | | | | | | | | | | < ub8_to_kscn_impl+0x000000000031 returns: 0
 | | | | | | | | | | | | | | | | > ub8_to_kscn_impl(0x756239, 0x7ffd52f50848, ...)
 | | | | | | | | | | | | | | | | < ub8_to_kscn_impl+0x000000000031 returns: 0
 | | | | | | | | | | | | | | | | > kscnbur3_int_impl(0x756239, 0xd8eab824, ...)
 | | | | | | | | | | | | | | | | < kscnbur3_int_impl+0x000000000030 returns: 0x756239
 | | | | | | | | | | | | | | | | > ub8_to_kscn_impl(0x756231, 0xd8eab840, ...)
 | | | | | | | | | | | | | | | | < ub8_to_kscn_impl+0x000000000031 returns: 0
 | | | | | | | | | | | | | | | | > kslfre(0x76fe0c10, 0xd8eab840, ...)
 | | | | | | | | | | | | | | | | < kslfre+0x0000000001e2 returns: 0

If we look at the work of andrey nikolaev: https://andreynikolaev.files.wordpress.com/2016/03/latches_and_mutexes_in_oracle_12c.pdf
We know that the first argument of kslgetl is the latch address, and the second argument is 0 for no wait or 1 for willing to wait mode. What becomes clear, is that a foreground process gets the redo allocation latch in nowait mode. Of course this is widely known and documented. the return code of 0x1 means the latch is successfully taken. The latch that is fetched before the redo allocation latch of course is the redo copy latch:

SQL> @flba 76fdebe0
ADDR             NAME
---------------- ------------------------
0000000076FDEBE0 redo copy

If we look again at the figures of the two redo allocation latches protecting the public redo strands:

ADDR                 CHILD# NAME                       GETS     MISSES IMMEDIATE_GETS IMMEDIATE_MISSES
---------------- ---------- -------------------- ---------- ---------- -------------- ----------------
0000000076FE0CB0          2 redo allocation            6168          1           1708                2
0000000076FE0C10          1 redo allocation            7267         20          34804               15

It becomes apparent that child# 1 is used way more than child# 2, which is visible in the column ‘immediate_gets’. How does that work? Let’s take the first redo allocation child latch, and issue an insert again with debugtrace on it.

session 1:

SQL> oradebug setmypid
Statement processed.
SQL> oradebug call kslgetl 0x76FE0C10 0 0 2442
Function returned 1

session 2; debugtrace output:

 | | | | | | | | | | | | | | | | > kslgetl(0x76fdebe0, 0, ...)
 | | | | | | | | | | | | | | | | < kslgetl+0x00000000012f returns: 0x1
 | | | | | | | | | | | | | | | | > kslgetl(0x76fe0c10, 0, ...)
 | | | | | | | | | | | | | | | | < kslgetl+0x00000000012f returns: 0
 | | | | | | | | | | | | | | | | > kslgetl(0x76fe0cb0, 0, ...)
 | | | | | | | | | | | | | | | | < kslgetl+0x00000000012f returns: 0x1

First child# 1 is tried, but when the latch is already taken, indicated by returncode 0, it turns to the second latch.

What will happen when both latches are taken?

session 1:

SQL> oradebug call kslgetl 0x76FE0C10 0 0 2442
Function returned 1
SQL> oradebug call kslgetl 0x76FE0CB0 0 0 2442
Function returned 1

This is what it looks like on in debugtrace output:

 | | | | | | | | | | | | | | | | > kslgetl(0x76fe0cb0, 0, ...)
 | | | | | | | | | | | | | | | | < kslgetl+0x00000000012f returns: 0
 | | | | | | | | | | | | | | | | > kslgetl(0x76fe0c10, 0, ...)
 | | | | | | | | | | | | | | | | < kslgetl+0x00000000012f returns: 0
 | | | | | | | | | | | | | | | | > kslgetl(0x76fe0cb0, 0x1, ...)
 | | | | | | | | | | | | | | | | | > kslges(0x76fe0cb0, 0x7a63c410, ...)
 | | | | | | | | | | | | | | | | | | > ksuSdiInProgress(0x2, 0x7a63c410, ...)
 | | | | | | | | | | | | | | | | | | < ksuSdiInProgress+0x000000000035 returns: 0
 | | | | | | | | | | | | | | | | | | > kskthbwt(0, 0, ...)
 | | | | | | | | | | | | | | | | | | < kskthbwt+0x0000000000a1 returns: 0x10
 | | | | | | | | | | | | | | | | | | > kslwlmod(0x7ffc45d6ab00, 0x7a63c410, ...)
 | | | | | | | | | | | | | | | | | | | > kstmgetsectick(0x7ffc45d6ab00, 0x7a63c410, ...)
 | | | | | | | | | | | | | | | | | | | < kstmgetsectick+0x00000000003a returns: 0x5a4bd4b9
 | | | | | | | | | | | | | | | | | | < kslwlmod+0x0000000000cb returns: 0xffffffff
 | | | | | | | | | | | | | | | | | | > kslwtbctx(0x7ffc45d6a988, 0x7a63c588, ...)

What this shows, is the foreground session first trying to get the latch of the first redo allocation child in immediate mode, which failed, then the latch of the second redo allocation child in immediate mode, which failed too, after which it had no other option than to get a redo allocation latch in willing to wait mode. For that it performed a redo allocation latch get for the first child latch again with the second argument set to 1, indicating the willing to wait mode. Because the latch was still taken, kslgetl called kslges in order to spin (which means a series of retries of getting the latch) for it; on Linux the spinning happens inside kslges, so it is not not visible in a debugtrace). After it spun for 20000 times, the process registers a wait (kslwtbctx), and then goes to sleep on a semaphore in the wait:

 | | | | | | | | | | | | | | | | | | < kslwtbctx+0x00000000050b returns: 0x77c000a4
 | | | | | | | | | | | | | | | | | | > skgpwwait(0x7ffc45d6abf8, 0x7f910bf92ca0, ...)
 | | | | | | | | | | | | | | | | | | | > kslwait_conv_wait_time(0xffffffff, 0x7f910bf92ca0, ...)
 | | | | | | | | | | | | | | | | | | | < kslwait_conv_wait_time+0x000000000027 returns: 0xffffffff
 | | | | | | | | | | | | | | | | | | | > sskgpwwait(0x7ffc45d6abf8, 0x7f910bf92ca0, ...)
 | | | | | | | | | | | | | | | | | | | | > semop@plt(0x98000, 0x7ffc45d6a888, ...)

This is the modern latch implementation where a session goes to sleep on a semaphore after spinning for a latch in willing to wait mode. In order to learn more about that, look in the work of Andrey Nikolaev. The semaphore implementation means that as soon as the latch becomes available, the session is posted and can do it’s allocation in the public redo strand.

However, this does not tie the child latch to a public strand, it only shows how a session bunny-hops public redo allocation latches in nowait mode until it hits a blind wall (if all have been taken) after which it will start waiting in willing to wait mode, waiting to be posted.

In order to see the connection between the redo allocation latches and the actual public strand memory areas, we need to look one level deeper, which means looking at (main) memory access of a process. This can be done using the pin tools pinatrace tool, and adding annotations to the pinatrace in order to understand what is going on. for this, I created a tool: https://gitlab.com/FritsHoogland/pinatrace_annotate

First, create a pinatrace of a normal insert on a test database. A word of warning: if you think a function call trace generates a lot of detailed data, the pinatrace alias memory access trace generates *WAY* more. Even modest things like ‘select count(*) from dual’ generates heaps of data.

Back to the pinatrace of the normal insert; generate the trace, and then annotate it using pinatrace_annotate tool. This should show the insert takes redo allocation latch child #1. To validate, search for the latch address again:

kcrfw_redo_gen_ext+1479:0x0000000060016570(Fixed Size(pgsz:2048k)|+91504 fixed sga|var:kcrfsg_+840 ):R:8:0x76fe0c10/1996360720(Variable Size(pgsz:2048k)|+368970768 shared pool|(child)latch#1:redo allocation+0 shared pool|permanent memor,duration 1,cls perm+16120848 )
kcrfw_redo_gen_ext+1486:0x00007fff5d88efe8():W:8:0x76fe0c10/1996360720(Variable Size(pgsz:2048k)|+368970768 shared pool|(child)latch#1:redo allocation+0 shared pool|permanent memor,duration 1,cls perm+16120848 )
kcrfw_redo_gen_ext+1493:0x00007fff5d88efc8():R:4:0/0()
kcrfw_redo_gen_ext+1516:0x00007fff5d88efe8():R:8:0x76fe0c10/1996360720(Variable Size(pgsz:2048k)|+368970768 shared pool|(child)latch#1:redo allocation+0 shared pool|permanent memor,duration 1,cls perm+16120848 )

What this shows is the kcrfw_redo_gen_ext function reading address 0x60016570, which is in the fixed SGA, which belongs to a ‘fixed sga variable’ called kcrfsg_ at offset 840, reading the value 0x76fe0c10, which we already know is the address of redo allocation latch child #1. The latch address is written into address 0x7fff5d88efe8, which is in the process’ PGA, but we got no annotation of it, because there is nothing between the round brackets. Later, the latch address in PGA is read again.

We know the kcrfsg_ variable/struct from my earlier investigation into how the logwriter works. This is a struct in fixed sga that holds data important to kcrf (kernel cache redo file). My current guess is sg means ‘SGA’.

Only a little further in the trace, we see that the kslgetl function is invoked, and indeed it is trying to get this latch:

kslgetl+69:0x0000000076fe0c1f(Variable Size(pgsz:2048k)|+368970783 shared pool|(child)latch#1:redo allocation+15 shared pool|permanent memor,duration 1,cls perm+16120863 ):R:1:0x37/55()

The way a successful latch get looks like, which is visible in the trace, is this:

kslgetl+138:0x0000000076fe0c10(Variable Size(pgsz:2048k)|+368970768 shared pool|(child)latch#1:redo allocation+0 shared pool|permanent memor,duration 1,cls perm+16120848 ):R:8:0/0()
kslgetl+138:0x0000000076fe0c10(Variable Size(pgsz:2048k)|+368970768 shared pool|(child)latch#1:redo allocation+0 shared pool|permanent memor,duration 1,cls perm+16120848 ):W:8:0x1a/26()

The work of Andrey nikolaev describes that when a latch is taken, the ora pid is written at offset 0. Here we see at offset 0, the kslgetl function reads a 0, indicating the latch is free, after which the value 26, the ora pid of the process, is written into it, which means this process now has taken the latch.

The next step is to see if there is any indication of it pointing to the public redo strand. It should be close to the latch get, because a latch should be hold for as little time as possible. Essentially, what I did then is search for ‘Redo Buffers’. I found this a little further still in the kcrfw_redo_gen_ext function:

kcrfw_redo_gen_ext+11277:0x0000000076fdf3bc(shared pool|permanent memor,duration 1,cls perm+16114620 ):R:4:0x1256f/75119()
kcrfw_redo_gen_ext+11290:0x0000000076fdf3d8(shared pool|permanent memor,duration 1,cls perm+16114648 ):R:8:0/0()
kcrfw_redo_gen_ext+11300:0x0000000076fdf430(shared pool|permanent memor,duration 1,cls perm+16114736 ):W:8:0/0()
kcrfw_redo_gen_ext+11307:0x0000000076fdf3d0(shared pool|permanent memor,duration 1,cls perm+16114640 ):R:8:0xdeadbeef0000726e/-2401053092612115858()
kcrfw_redo_gen_ext+11312:0x0000000076fdf428(shared pool|permanent memor,duration 1,cls perm+16114728 ):W:8:0xdeadbeef0000726e/-2401053092612115858()
kcrfw_redo_gen_ext+11317:0x0000000076fdf3c8(shared pool|permanent memor,duration 1,cls perm+16114632 ):R:8:0x100010000726e/281479271707246()
kcrfw_redo_gen_ext+11322:0x0000000076fdf420(shared pool|permanent memor,duration 1,cls perm+16114720 ):W:8:0x100010000726e/281479271707246()
kcrfw_redo_gen_ext+11327:0x0000000076fdf3c0(shared pool|permanent memor,duration 1,cls perm+16114624 ):R:8:0x1b800001000/1889785614336(??)
kcrfw_redo_gen_ext+11332:0x0000000076fdf418(shared pool|permanent memor,duration 1,cls perm+16114712 ):W:8:0x1b800001000/1889785614336(??)
kcrfw_redo_gen_ext+11337:0x0000000076fdf3b8(shared pool|permanent memor,duration 1,cls perm+16114616 ):R:8:0x1256f0000b2fe/322633648354046()
kcrfw_redo_gen_ext+11342:0x0000000076fdf410(shared pool|permanent memor,duration 1,cls perm+16114704 ):W:8:0x1256f0000b2fe/322633648354046()
kcrfw_redo_gen_ext+11347:0x0000000076fdf3b0(shared pool|permanent memor,duration 1,cls perm+16114608 ):R:8:0/0()
kcrfw_redo_gen_ext+11352:0x0000000076fdf408(shared pool|permanent memor,duration 1,cls perm+16114696 ):W:8:0/0()
kcrfw_redo_gen_ext+11307:0x0000000076fdf3a8(shared pool|permanent memor,duration 1,cls perm+16114600 ):R:8:0x7270/29296()
kcrfw_redo_gen_ext+11312:0x0000000076fdf400(shared pool|permanent memor,duration 1,cls perm+16114688 ):W:8:0x7270/29296()
kcrfw_redo_gen_ext+11317:0x0000000076fdf3a0(shared pool|permanent memor,duration 1,cls perm+16114592 ):R:8:0xd8e4e200/3638878720(Redo Buffers(pgsz:2048k)|+15000064 redo|PUB_REDO_0+15000064 )
kcrfw_redo_gen_ext+11322:0x0000000076fdf3f8(shared pool|permanent memor,duration 1,cls perm+16114680 ):W:8:0xd8e4e200/3638878720(Redo Buffers(pgsz:2048k)|+15000064 redo|PUB_REDO_0+15000064 )
kcrfw_redo_gen_ext+11327:0x0000000076fdf398(shared pool|permanent memor,duration 1,cls perm+16114584 ):R:8:0/0()
kcrfw_redo_gen_ext+11332:0x0000000076fdf3f0(shared pool|permanent memor,duration 1,cls perm+16114672 ):W:8:0/0()
kcrfw_redo_gen_ext+11337:0x0000000076fdf390(shared pool|permanent memor,duration 1,cls perm+16114576 ):R:8:0x758e9f/7704223()
kcrfw_redo_gen_ext+11342:0x0000000076fdf3e8(shared pool|permanent memor,duration 1,cls perm+16114664 ):W:8:0x758e9f/7704223()
kcrfw_redo_gen_ext+11347:0x0000000076fdf388(shared pool|permanent memor,duration 1,cls perm+16114568 ):R:8:0x758ea0/7704224()
kcrfw_redo_gen_ext+11352:0x0000000076fdf3e0(shared pool|permanent memor,duration 1,cls perm+16114656 ):W:8:0x758ea0/7704224()
kcrfw_redo_gen_ext+11366:0x0000000076fdf438(shared pool|permanent memor,duration 1,cls perm+16114744 ):W:4:0xdeadbeef/3735928559(Redo Buffers(pgsz:2048k)|+112049903 redo|PUB_REDO_1+44941039 )
kcrfw_redo_gen_ext+11380:0x0000000076fdf3a8(shared pool|permanent memor,duration 1,cls perm+16114600 ):R:4:0x7270/29296()
kcrfw_redo_gen_ext+11386:0x0000000076fdf3a8(shared pool|permanent memor,duration 1,cls perm+16114600 ):W:4:0x7271/29297()
kcrfw_redo_gen_ext+11392:0x0000000076fdf468(shared pool|permanent memor,duration 1,cls perm+16114792 ):R:8:0xe0000000/3758096384(Redo Buffers(pgsz:2048k)|+134217728 )

There’s a lot to say about this. The first thing to point out, is that these are all memory locations of a memory segment in the shared pool in permanent memory somewhere around 0x76fdf300 / 0x76fdf400. The first public strand is explicitly mentioned (by my annotation of the memory structures):

kcrfw_redo_gen_ext+11317:0x0000000076fdf3a0(shared pool|permanent memor,duration 1,cls perm+16114592 ):R:8:0xd8e4e200/3638878720(Redo Buffers(pgsz:2048k)|+15000064 redo|PUB_REDO_0+15000064 )
kcrfw_redo_gen_ext+11322:0x0000000076fdf3f8(shared pool|permanent memor,duration 1,cls perm+16114680 ):W:8:0xd8e4e200/3638878720(Redo Buffers(pgsz:2048k)|+15000064 redo|PUB_REDO_0+15000064 )

As well as the second public strand:

kcrfw_redo_gen_ext+11366:0x0000000076fdf438(shared pool|permanent memor,duration 1,cls perm+16114744 ):W:4:0xdeadbeef/3735928559(Redo Buffers(pgsz:2048k)|+112049903 redo|PUB_REDO_1+44941039 )

I am confident that the second public strand just happens to collide with a sort of ‘not used’ code, if you look at what the hexadecimal value of this is: 0xdeadbeef

The second thing to notice is there is a lot of information written really close to each other (0x76fdf300-0x76fdf400). The only location I know that holds this kind of redo related information is the previously mentioned kcrfsg_ struct, but this seems to be another location that holds critical information about redo, of which all we know is it sits in the shared pool (not in fixed sga), and it is allocated as permanent memory.

I tried various things to get more information about this memory location, like dumping the sga heap, inspecting this memory location in a heap dump to see if there is a sub heap that I can dump for more information (there isn’t), asking on twitter (which sometimes helps), but I gotten no information. I spend a considerable time searching in the bug database, and still gotten no answers….

I decided to prove usage of the second public strand by manually obtain the latch for the first strand, so an insert will take the second latch, and if that indeed means the second public redo strand is used. This means fetching the first latch using oradebug in a session, and then starting another database session, attach a pinatrace to it, and execute an insert.

Once the pinatrace is annotated, search for 0x76fe0c10 (the first redo allocation latch) again. The start is exactly the same, the latch address is fetched from kcrfsg_ at offset 840.

kcrfw_redo_gen_ext+1479:0x0000000060016570(Fixed Size(pgsz:2048k)|+91504 fixed sga|var:kcrfsg_+840 ):R:8:0x76fe0c10/1996360720(Variable Size(pgsz:2048k)|+368970768 shared pool|(child)latch#1:redo allocation+0 shared pool|permanent memor,duration 1,cls perm+16120848 )
kcrfw_redo_gen_ext+1486:0x00007ffcca22e6e8():W:8:0x76fe0c10/1996360720(Variable Size(pgsz:2048k)|+368970768 shared pool|(child)latch#1:redo allocation+0 shared pool|permanent memor,duration 1,cls perm+16120848 )
kcrfw_redo_gen_ext+1493:0x00007ffcca22e6c8():R:4:0/0()
kcrfw_redo_gen_ext+1516:0x00007ffcca22e6e8():R:8:0x76fe0c10/1996360720(Variable Size(pgsz:2048k)|+368970768 shared pool|(child)latch#1:redo allocation+0 shared pool|permanent memor,duration 1,cls perm+16120848 )

Now look at the same offsets in the function kslgetl (+138) again:

kslgetl+138:0x0000000076fe0c10(Variable Size(pgsz:2048k)|+368970768 shared pool|(child)latch#1:redo allocation+0 shared pool|permanent memor,duration 1,cls perm+16120848 ):R:8:0x27/39()
kslgetl+138:0x0000000076fe0c10(Variable Size(pgsz:2048k)|+368970768 shared pool|(child)latch#1:redo allocation+0 shared pool|permanent memor,duration 1,cls perm+16120848 ):W:8:0x27/39()

Now we see the latch already has a value at offset zero, which is 39, indicating the latch is taken. Now that the session executing the insert found the first redo allocation latch busy, first this happens:

kcrfw_redo_gen_ext+14267:0x00000000600169c4(Fixed Size(pgsz:2048k)|+92612 fixed sga|var:kcrf_actv_strands_+0 ):R:4:0x1/1()
kcrfw_redo_gen_ext+14273:0x00007ffcca22e6d8():R:4:0x2/2()
kcrfw_redo_gen_ext+14283:0x00000000600169c4(Fixed Size(pgsz:2048k)|+92612 fixed sga|var:kcrf_actv_strands_+0 ):W:4:0x2/2()

Because the latch for the first public redo strand was taken, an additional strand is enabled by setting the fixed sga variable kcrf_actv_strands_ to 2, indicating the second strand is now active too.

Then kcrfsg_ at offset 840 is read again, but a higher value is saved in PGA memory. In fact, the latch size is known (160), so in order to get then next latch, the session picks up the latch address of child #1, and adds 160 in order to get to child #2.

kcrfw_redo_gen_ext+14335:0x0000000060016570(Fixed Size(pgsz:2048k)|+91504 fixed sga|var:kcrfsg_+840 ):R:8:0x76fe0c10/1996360720(Variable Size(pgsz:2048k)|+368970768 shared pool|(child)latch#1:redo allocation+0 shared pool|permanent memor,duration 1,cls perm+16120848 )
kcrfw_redo_gen_ext+14342:0x00007ffcca22e6e8():W:8:0x76fe0cb0/1996360880(Variable Size(pgsz:2048k)|+368970928 shared pool|(child)latch#2:redo allocation+0 shared pool|permanent memor,duration 1,cls perm+16121008 )

This latch is free, and is taken by this session:

kslgetl+138:0x0000000076fe0cb0(Variable Size(pgsz:2048k)|+368970928 shared pool|(child)latch#2:redo allocation+0 shared pool|permanent memor,duration 1,cls perm+16121008 ):R:8:0/0()
kslgetl+138:0x0000000076fe0cb0(Variable Size(pgsz:2048k)|+368970928 shared pool|(child)latch#2:redo allocation+0 shared pool|permanent memor,duration 1,cls perm+16121008 ):W:8:0x1a/26()

Now let’s look at the same function locations with the first redo allocation latch taken, and the second redo allocation latch acquired:

kcrfw_redo_gen_ext+11277:0x0000000076fdf4e4(shared pool|permanent memor,duration 1,cls perm+16114916 ):R:4:0x1256f/75119()
kcrfw_redo_gen_ext+11290:0x0000000076fdf500(shared pool|permanent memor,duration 1,cls perm+16114944 ):R:8:0/0()
kcrfw_redo_gen_ext+11300:0x0000000076fdf558(shared pool|permanent memor,duration 1,cls perm+16115032 ):W:8:0/0()
kcrfw_redo_gen_ext+11307:0x0000000076fdf4f8(shared pool|permanent memor,duration 1,cls perm+16114936 ):R:8:0xdeadbeef000000e6/-2401053092612144922()
kcrfw_redo_gen_ext+11312:0x0000000076fdf550(shared pool|permanent memor,duration 1,cls perm+16115024 ):W:8:0xdeadbeef000000e6/-2401053092612144922()
kcrfw_redo_gen_ext+11317:0x0000000076fdf4f0(shared pool|permanent memor,duration 1,cls perm+16114928 ):R:8:0x10001000000e6/281479271678182()
kcrfw_redo_gen_ext+11322:0x0000000076fdf548(shared pool|permanent memor,duration 1,cls perm+16115016 ):W:8:0x10001000000e6/281479271678182()
kcrfw_redo_gen_ext+11327:0x0000000076fdf4e8(shared pool|permanent memor,duration 1,cls perm+16114920 ):R:8:0x800/2048()
kcrfw_redo_gen_ext+11332:0x0000000076fdf540(shared pool|permanent memor,duration 1,cls perm+16115008 ):W:8:0x800/2048()
kcrfw_redo_gen_ext+11337:0x0000000076fdf4e0(shared pool|permanent memor,duration 1,cls perm+16114912 ):R:8:0x1256f00012488/322633648383112()
kcrfw_redo_gen_ext+11342:0x0000000076fdf538(shared pool|permanent memor,duration 1,cls perm+16115000 ):W:8:0x1256f00012488/322633648383112()
kcrfw_redo_gen_ext+11347:0x0000000076fdf4d8(shared pool|permanent memor,duration 1,cls perm+16114904 ):R:8:0/0()
kcrfw_redo_gen_ext+11352:0x0000000076fdf530(shared pool|permanent memor,duration 1,cls perm+16114992 ):W:8:0/0()
kcrfw_redo_gen_ext+11307:0x0000000076fdf4d0(shared pool|permanent memor,duration 1,cls perm+16114896 ):R:8:0xe6/230()
kcrfw_redo_gen_ext+11312:0x0000000076fdf528(shared pool|permanent memor,duration 1,cls perm+16114984 ):W:8:0xe6/230()
kcrfw_redo_gen_ext+11317:0x0000000076fdf4c8(shared pool|permanent memor,duration 1,cls perm+16114888 ):R:8:0xdc01ce00/3691105792(Redo Buffers(pgsz:2048k)|+67227136 redo|PUB_REDO_1+118272 )
kcrfw_redo_gen_ext+11322:0x0000000076fdf520(shared pool|permanent memor,duration 1,cls perm+16114976 ):W:8:0xdc01ce00/3691105792(Redo Buffers(pgsz:2048k)|+67227136 redo|PUB_REDO_1+118272 )
kcrfw_redo_gen_ext+11327:0x0000000076fdf4c0(shared pool|permanent memor,duration 1,cls perm+16114880 ):R:8:0/0()
kcrfw_redo_gen_ext+11332:0x0000000076fdf518(shared pool|permanent memor,duration 1,cls perm+16114968 ):W:8:0/0()
kcrfw_redo_gen_ext+11337:0x0000000076fdf4b8(shared pool|permanent memor,duration 1,cls perm+16114872 ):R:8:0x75a1d2/7709138()
kcrfw_redo_gen_ext+11342:0x0000000076fdf510(shared pool|permanent memor,duration 1,cls perm+16114960 ):W:8:0x75a1d2/7709138()
kcrfw_redo_gen_ext+11347:0x0000000076fdf4b0(shared pool|permanent memor,duration 1,cls perm+16114864 ):R:8:0x75a1d2/7709138()
kcrfw_redo_gen_ext+11352:0x0000000076fdf508(shared pool|permanent memor,duration 1,cls perm+16114952 ):W:8:0x75a1d2/7709138()
kcrfw_redo_gen_ext+11366:0x0000000076fdf560(shared pool|permanent memor,duration 1,cls perm+16115040 ):W:4:0xdeadbeef/3735928559(Redo Buffers(pgsz:2048k)|+112049903 redo|PUB_REDO_1+44941039 )
kcrfw_redo_gen_ext+11380:0x0000000076fdf4d0(shared pool|permanent memor,duration 1,cls perm+16114896 ):R:4:0xe6/230()
kcrfw_redo_gen_ext+11386:0x0000000076fdf4d0(shared pool|permanent memor,duration 1,cls perm+16114896 ):W:4:0xe7/231()
kcrfw_redo_gen_ext+11392:0x0000000076fdf590(shared pool|permanent memor,duration 1,cls perm+16115088 ):R:8:0xe0200000/3760193536(Redo Buffers(pgsz:2048k)|+136314880 )

Here we see reads and writes in the same fashion, including the 0xdeadbeef write, but now we see PUB_REDO_1, alias the second public redo strand. This proves the second redo allocation child latch is really protecting the second public redo strand!

%d bloggers like this: