A look into Oracle redo, part 5: the log writer writing

This the the fifth blog in a series of blogposts about Oracle database redo. The previous blog looked into the ‘null write’ (kcrfw_do_null_write actually) function inside kcrfw_redo_write_driver, which does housekeeping like updating SCNs and posting processes if needed, this blog looks into what happens when the log writer is actually posted by a process or if public redo strand buffers have been written into. In part 3 of this blog series (the log writer working cycle) it can be seen that when a session posts the log writer, it returns from the semaphore related functions, and calls ‘kcrfw_redo_write_driver’ directly, which otherwise is called inside ksbcti.

Inside the kcrfw_redo_write_driver function, the first thing of interest is executed only when the logwriter is posted, and the kcrfw_redo_write_driver function is called directly after returning from ksarcv and ksl_exit_main_loop_wait:

kcrfw_redo_write_driver+145:0x000000006001642c(fixed sga|var:kcrfsg_+516 shared pool ):W:4:0/0()

At offset 516 in kcrfsg_, a 4 byte field is used for indicating that the logwriter is posted. This field is filled out by a process that wants the logwriter to write. The way it works is that a process reads this field, and changes it to 1 if this field contains 0, and then performs a semctl or semop to post the logwriter and get it working. If the field already contains 1, no further action is taken by the a committing process. The logwriter sets this value to 0 as soon as it has been posted and executes the kcrfw_redo_write_driver function. If the kcrfw_redo_write_driver function is called as part of the normal logwriter cycle (which means inside the ksbcti function), it will not read or write this field.

Then a lot of things are executed that have been shown in the previous post. The next thing the kcrfw_redo_write_driver executes is kcsnew3 is to obtain a LWN SCN and set the LWN number protected by the ‘lgwr LWN SCN’ latch:

kcsnew8+107:0x0000000060016274(fixed sga|var:kcrfsg_+76 ):W:4:0x1/1()
kcsnew8+118:0x0000000060016270(fixed sga|var:kcrfsg_+72 ):R:4:0xc6/198()
kcsnew8+118:0x0000000060016270(fixed sga|var:kcrfsg_+72 ):W:4:0xc7/199()
kcsnew8+125:0x00000000600113b8(fixed sga|var:kcsgscn_+0 ):R:8:0x762b80/7744384()
kcsnew8+131:0x00007fffa10728f0():W:4:0/0()
kcsnew8+140:0x00007fffa1072922():W:1:0/0()
kcsnew8+149:0x00000000600113b8(fixed sga|var:kcsgscn_+0 ):R:8:0x762b80/7744384()
kcsnew8+149:0x00000000600113b8(fixed sga|var:kcsgscn_+0 ):W:8:0x762b81/7744385()
kcsnew8+165:0x00007fffa10728f0():R:4:0/0()
kcsnew8+184:0x00007f520ba8be08():R:8:0x600113b8/1610683320(fixed sga|var:kcsgscn_+0 )
kcsnew8+193:0x00007f520ba8ae90():R:8:0x7a823620/2055353888(shared pool|permanent memor,duration 1,cls perm+7693856 )
kcsnew8+212:0x000000007a823e40(shared pool|permanent memor,duration 1,cls perm+7695936 ):R:8:0x719beed8/1906044632(shared pool|ksu:stats_freel,duration 1,cls freeabl+24 )
kcsnew8+219:0x00000000719bf790(shared pool|ksu:stats_freel,duration 1,cls freeabl+2256 ):R:8:0xc7/199()
kcsnew8+219:0x00000000719bf790(shared pool|ksu:stats_freel,duration 1,cls freeabl+2256 ):W:8:0xc8/200()
kcsnew8+223:0x00007fffa1072a00():W:8:0x762b81/7744385()
kcsnew8+232:0x0000000060016260(fixed sga|var:kcrfsg_+56 shared pool|x_kcrfws.lwn_scn+0 ):W:8:0x762b81/7744385()
kcsnew8+238:0x0000000060016274(fixed sga|var:kcrfsg_+76 ):W:4:0/0()
kcsnew8+246:0x0000000060016288(fixed sga|var:kcrfsg_+96 shared pool|pointer:lgwr lwn scn latch+0 ):R:8:0x60049800/1610913792(fixed sga|(parent)latch:lgwr LWN SCN+0 fixed sga|var:lwn_scn_lat_+0 )
kcsnew8+250:0x00007fffa10728d8():W:8:0x10e3ed3f/283372863()

The LWN is set to 0x762b81 / 7744385, LWN number is 199.

The next thing to look at is the kcrfw_gather_lwn function flow:

 | > kcrfw_gather_lwn(0x7ffc3dd11b78, 0x77eced90, ...)
 | | > kslgetl(0x76fe0c10, 0x1, ...)
 | | < kslgetl+0x00000000012f returns: 0x1
 | | > kcrfw_gather_strand(0x7ffc3dd11b78, 0, ...)
 | | < kcrfw_gather_strand+0x0000000000c2 returns: 0x2
 | | > kslfre(0x76fe0c10, 0x20002, ...)
 | | < kslfre+0x0000000001e2 returns: 0
 | | > kslgetl(0x76fe0cb0, 0x1, ...)
 | | < kslgetl+0x00000000012f returns: 0x1
 | | > kcrfw_gather_strand(0x7ffc3dd11b78, 0x1, ...)
 | | < kcrfw_gather_strand+0x0000000000c2 returns: 0
 | | > kslfre(0x76fe0cb0, 0x19, ...)
 | | < kslfre+0x0000000001e2 returns: 0
 | < kcrfw_gather_lwn+0x00000000065c returns: 0

Here we see exactly the same function sequence in kcrfw_gather_lwn as we saw in the ‘null write’ blogpost, however pay close attention to the return codes of the kcrfw_gather_strand functions. The first call returns 2 instead of 0, indicating two (strand) buffers need writing.

If we zoom in to the first kcrfw_gather_strand call, we see:

kcrfw_gather_strand+2:0x00007fffa1072910():W:8:0x7fffa1072a10/140735894989328()
kcrfw_gather_strand+15:0x0000000060016228(fixed sga|var:kcrfsg_+0 shared pool|pointer:shared pool redo struct+0 ):R:8:0x76fdf388/1996354440(shared pool|KCRFA+0 shared pool|permanent memor,duration 1,cls perm+16114568 )
kcrfw_gather_strand+22:0x00007fffa1072908():W:8:0x76fdf388/1996354440(shared pool|KCRFA+0 shared pool|permanent memor,duration 1,cls perm+16114568 )
kcrfw_gather_strand+29:0x00007fffa1072900():W:8:0x76fdf130/1996353840(shared pool|permanent memor,duration 1,cls perm+16113968 )
kcrfw_gather_strand+36:0x0000000076fdf458(shared pool|KCRFA+208 shared pool|LAST_BUF_WRITTEN+0 shared pool|permanent memor,duration 1,cls perm+16114776 ):R:4:0x65e/1630()
kcrfw_gather_strand+53:0x0000000076fdf480(shared pool|KCRFA+248 shared pool|TOTAL_BUFS_KCRFA+0 shared pool|permanent memor,duration 1,cls perm+16114816 ):R:4:0x20000/131072()
kcrfw_gather_strand+70:0x00007fffa1072bf8():R:8:0x762b81/7744385()
kcrfw_gather_strand+79:0x0000000076fdf390(shared pool|KCRFA+8 shared pool|permanent memor,duration 1,cls perm+16114576 ):R:8:0x762b7f/7744383()
kcrfw_gather_strand+86:0x0000000076fdf3a8(shared pool|KCRFA+32 shared pool|NEXT_BUF_NUM_KCRFA_CLN+0 shared pool|permanent memor,duration 1,cls perm+16114600 ):R:4:0x660/1632()
kcrfw_gather_strand+121:0x0000000076fdf45c(shared pool|KCRFA+212 shared pool|LAST_BUF_GATHERED_KCRFA+0 shared pool|permanent memor,duration 1,cls perm+16114780 ):R:4:0x65e/1630()
kcrfw_gather_strand+169:0x00007fffa1072a08():W:4:0x660/1632()
kcrfw_gather_strand+176:0x00007fffa1072a0c():W:4:0/0()
kcrfw_gather_strand+182:0x00007fffa1072900():R:8:0x76fdf130/1996353840(shared pool|permanent memor,duration 1,cls perm+16113968 )
kcrfw_gather_strand+186:0x00007fffa1072908():R:8:0x76fdf388/1996354440(shared pool|KCRFA+0 shared pool|permanent memor,duration 1,cls perm+16114568 )
kcrfw_gather_strand+193:0x00007fffa1072910():R:8:0x7fffa1072a10/140735894989328()
kcrfw_gather_strand+194:0x00007fffa1072918():R:8:0x31f0f00/52367104()

What is visible here is roughly identical to the memory access of the kcrfw_gather_strand function we saw in the ‘null write’ blogpost, however there is one striking difference: at offset 86 in the kcrfw_gather_strand function KCRFA at offset 32 is read, which shows a value of 1632. The other locations show a value of 1630, which is an indication some public redo strand buffers have been written. This means there is potentially something to write! Please remember that the redo allocation latch was fetched before calling kcrfw_gather_strand.

Once the kcrfw_gather_strand function has returned to kcrfw_gather_lwn, the actual write is prepared by further manipulating the KCRFA struct for the first strand for the write:

kcrfw_gather_lwn+436:0x0000000076fdf45c(shared pool|KCRFA+212 shared pool|LAST_BUF_GATHERED_KCRFA+0 shared pool|permanent memor,duration 1,cls perm+16114780 ):W:4:0x660/1632()
kcrfw_gather_lwn+449:0x0000000076fdf3b0(shared pool|BYTES_IN_BUF_KCRFA_CLN+0 shared pool|KCRFA+40 shared pool|permanent memor,duration 1,cls perm+16114608 ):R:8:0x1b8/440()
kcrfw_gather_lwn+453:0x0000000060016a10(fixed sga|var:kcrf_kcrrh_sz_+0 ):R:8:0x18/24()
kcrfw_gather_lwn+462:0x0000000076fdf3a0(shared pool|KCRFA+24 shared pool|PNEXT_BUF_KCRFA_CLN+0 shared pool|permanent memor,duration 1,cls perm+16114592 ):R:8:0xd80cc048/3624714312(Redo Buffers(pgsz:2048k)|+835656 redo|PUB_REDO_0+835656 )
kcrfw_gather_lwn+466:0x00000000d80cc048(redo|PUB_REDO_0+835656 ):W:4:0/0()
kcrfw_gather_lwn+472:0x0000000076fdf3b0(shared pool|BYTES_IN_BUF_KCRFA_CLN+0 shared pool|KCRFA+40 shared pool|permanent memor,duration 1,cls perm+16114608 ):R:8:0x1b8/440()
kcrfw_gather_lwn+476:0x0000000076fdf3a0(shared pool|KCRFA+24 shared pool|PNEXT_BUF_KCRFA_CLN+0 shared pool|permanent memor,duration 1,cls perm+16114592 ):R:8:0xd80cc048/3624714312(Redo Buffers(pgsz:2048k)|+835656 redo|PUB_REDO_0+835656 )
kcrfw_gather_lwn+476:0x0000000076fdf3a0(shared pool|KCRFA+24 shared pool|PNEXT_BUF_KCRFA_CLN+0 shared pool|permanent memor,duration 1,cls perm+16114592 ):W:8:0xd80cc200/3624714752(Redo Buffers(pgsz:2048k)|+836096 redo|PUB_REDO_0+836096 )
kcrfw_gather_lwn+480:0x0000000076fdf3cc(shared pool|KCRFA+68 shared pool|permanent memor,duration 1,cls perm+16114636 ):R:1:0/0()
kcrfw_gather_lwn+480:0x0000000076fdf3cc(shared pool|KCRFA+68 shared pool|permanent memor,duration 1,cls perm+16114636 ):W:1:0x1/1()
kcrfw_gather_lwn+485:0x0000000076fdf3b0(shared pool|BYTES_IN_BUF_KCRFA_CLN+0 shared pool|KCRFA+40 shared pool|permanent memor,duration 1,cls perm+16114608 ):W:8:0/0()

The LAST_BUF_GATHERED_KCRFA field at KCRFA offset 212 is set to the new buffer number 1632. Then the actual amount of data inside the buffer is queried from KCRFA at offset 40 (BYTES_IN_BUF_KCRFA_CLN), and the pointer to the actual location in the first public redo strand is obtained from KCRFA offset 24 (PNEXT_BUF_KCRFA_CLN). The pointer into the first public redo strand then is increased by the amount of redo written, which is 440 in this case, and the BYTES_IN_BUF_KCRFA_CLN is reset to 0.

A little further in kcrfw_gather_lwn, LAST_BUF_WRITTEN in KCRFA at offset 208 is changed to public strand buffer 1632:

kcrfw_gather_lwn+977:0x0000000076fdf458(shared pool|KCRFA+208 shared pool|LAST_BUF_WRITTEN+0 shared pool|permanent memor,duration 1,cls perm+16114776 ):R:4:0x65e/1630()
kcrfw_gather_lwn+987:0x0000000076fdf480(shared pool|KCRFA+248 shared pool|TOTAL_BUFS_KCRFA+0 shared pool|permanent memor,duration 1,cls perm+16114816 ):R:4:0x20000/131072()
kcrfw_gather_lwn+996:0x0000000076fdf3c0(shared pool|KCRFA+56 shared pool|permanent memor,duration 1,cls perm+16114624 ):W:4:0x1000/4096()
kcrfw_gather_lwn+1006:0x0000000076fdf458(shared pool|KCRFA+208 shared pool|LAST_BUF_WRITTEN+0 shared pool|permanent memor,duration 1,cls perm+16114776 ):W:4:0x660/1632()

At this point the redo allocation latch for this strand is released.
After the release of the latch, the kcrfw_gather_lwn function changes a value in the struct kcrfsg_ at offset 8, which is X$KCRFWS.NEXT_BLK, alias the administration of where to write in the online redolog file(s):

kcrfw_gather_lwn+1038:0x0000000060016230(fixed sga|var:kcrfsg_+8 shared pool|x_kcrfws.next_blk+0 ):R:4:0x16a16/92694()
kcrfw_gather_lwn+1044:0x0000000077eceea8(shared pool|permanent memor,duration 1,cls perm+2451776 ):W:4:0x16a16/92694()
kcrfw_gather_lwn+1048:0x0000000060016230(fixed sga|var:kcrfsg_+8 shared pool|x_kcrfws.next_blk+0 ):R:4:0x16a16/92694()
kcrfw_gather_lwn+1048:0x0000000060016230(fixed sga|var:kcrfsg_+8 shared pool|x_kcrfws.next_blk+0 ):W:4:0x16a18/92696()

The second public strand has no buffers written, which we know because the kcrfw_gather_strand function returned 0, so that one is skipped.

At this point, the memory area’s that take part in the writing have been setup.

In order to write, the function kcrfw_redo_write is called to start the actual redo writing. First, the logwriter needs to make sure that the buffers it has marked for write, which is the difference between the original LAST_BUF_GATHERED and LAST_BUF_WRITTEN, are not still written into. This is done in the function kcrfw_wait_for_fg_copy, which is called in kcrfw_redo_write. The way this works, is it looks up the address of the first redo copy latch child which a process that is copying change vectors into the public strand must take which is stored in kcrfsg_ at offset 472 and the address of the redo copy latch support memory area, which is stored in kcrfsg_ at offset 488. Then it looks up the buffer assigned to the first latch in the redo copy latch support memory area and reads 8 bytes to see if it contains the value 0xffffffffffffffff, which is what a writer in a redo log strand must write in the support memory area if it’s done writing to the redo strand. This means that as long as a process writes into the redo log strands, it will have set this memory area to another value indicating it is being written into. Once the value becomes 0xffffffffffffffff, kcrfw_wait_for_fg_copy will continue to see if any other redo copy latches needs checking. It does this by reading the actual total amount of redo copy latches in kcrfsg_ at offset 464 and the number of the current redo copy latch it has checked. If not, it will move to the next redo copy latch and redo copy latch support memory area, checking for 0xffffffffffffffff again, etc. Please mind it does not check or obtain any of the redo copy latches itself, nor prevents any write to the redo strands that happens after the check, which is probably by design:

kcrfw_wait_for_fg_copy+132:0x0000000060016400(fixed sga|var:kcrfsg_+472 shared pool|pointer:redo copy latch+0 ):R:8:0x76fdea30/1996352048(shared pool|(child)latch#1:redo copy+0 shared pool|permanent memor,duration 1,cls perm+16112176 )
kcrfw_wait_for_fg_copy+142:0x0000000060016410(fixed sga|var:kcrfsg_+488 ):R:8:0x76fdf0f0/1996353776(shared pool|permanent memor,duration 1,cls perm+16113904 )
kcrfw_wait_for_fg_copy+149:0x00007fffa1072968():W:1:0/0()
kcrfw_wait_for_fg_copy+153:0x00007fffa1072960():W:8:0x76fdea30/1996352048(shared pool|(child)latch#1:redo copy+0 shared pool|permanent memor,duration 1,cls perm+16112176 )
kcrfw_wait_for_fg_copy+157:0x00007fffa1072958():W:4:0/0()
kcrfw_wait_for_fg_copy+161:0x0000000076fdf0f0(shared pool|permanent memor,duration 1,cls perm+16113904 ):R:4:0xffffffff/4294967295(??)
kcrfw_wait_for_fg_copy+169:0x0000000076fdf0f4(shared pool|permanent memor,duration 1,cls perm+16113908 ):R:4:0xffffffff/4294967295(??)
kcrfw_wait_for_fg_copy+176:0x00007fffa1072968():R:1:0/0()
kcrfw_wait_for_fg_copy+180:0x00007fffa1072958():R:4:0/0()
kcrfw_wait_for_fg_copy+195:0x00000000600163f8(fixed sga|var:kcrfsg_+464 ):R:4:0x8/8()
kcrfw_wait_for_fg_copy+132:0x0000000060016400(fixed sga|var:kcrfsg_+472 shared pool|pointer:redo copy latch+0 ):R:8:0x76fdea30/1996352048(shared pool|(child)latch#1:redo copy+0 shared pool|permanent memor,duration 1,cls perm+16112176 )
kcrfw_wait_for_fg_copy+142:0x0000000060016410(fixed sga|var:kcrfsg_+488 ):R:8:0x76fdf0f0/1996353776(shared pool|permanent memor,duration 1,cls perm+16113904 )
kcrfw_wait_for_fg_copy+149:0x00007fffa1072968():W:1:0/0()
kcrfw_wait_for_fg_copy+153:0x00007fffa1072960():W:8:0x76fdeb08/1996352264(shared pool|(child)latch#2:redo copy+0 shared pool|permanent memor,duration 1,cls perm+16112392 )
kcrfw_wait_for_fg_copy+157:0x00007fffa1072958():W:4:0x1/1()
kcrfw_wait_for_fg_copy+161:0x0000000076fdf0f8(shared pool|permanent memor,duration 1,cls perm+16113912 ):R:4:0xffffffff/4294967295(??)
kcrfw_wait_for_fg_copy+169:0x0000000076fdf0fc(shared pool|permanent memor,duration 1,cls perm+16113916 ):R:4:0xffffffff/4294967295(??)
kcrfw_wait_for_fg_copy+176:0x00007fffa1072968():R:1:0/0()
kcrfw_wait_for_fg_copy+180:0x00007fffa1072958():R:4:0x1/1()
kcrfw_wait_for_fg_copy+195:0x00000000600163f8(fixed sga|var:kcrfsg_+464 ):R:4:0x8/8()

The above memory trace output shows the kcrfw_wait_for_fg_copy function looking up the redo copy latch memory address at kcrfw_wait_for_fg_copy offset 132, looking up the redo copy support memory area at kcrfw_wait_for_fg_copy offset 142, and then peeks into the redo copy helper memory structure in kcrfw_wait_for_fg_copy at offset 161 and 169. This way it does not need to obtain the redo copy latch. At offset 195 of the function it looks up the number of redo copy latches, and then executes the same procedure again, looking up the same pointers in kcrfsg_, but adding an offset for the next latch.

Please mind that there is no need for the log writer to take the redo copy latch, in fact, it doesn’t seem to care if a redo copy latch is taken or not, it waits for a redo strand writer to indicate it is done writing by waiting for the redo copy latch helper memory structure for each strand to be become 0xffffffff, which then automatically means the change vectors in the public strand buffers it is intending to write (which means having a SCN lower than LWN SCN) are not written to anymore, because a process that wants to write into a redo strand must change the redo copy latch helper memory structure to another value than 0xffffffff, and revert it to 0xffffffff once finished.

The next thing the logwriter does when it truly needs to write, is update the real redo SCN. The real redo SCN is the SCN value that contains the highest SCN of the change vectors that are truly written to the online redolog files. Remember, the ‘on disk SCN’ is not the SCN of a change that is truly written. The real redo SCN is another SCN value that is stored in the kcrfsg_ struct, in my database at offset 288. This is done in the function kcrfw_update_and_bcast_real_redo_scn, which uses the kcsaj8 function to execute the the change of the SCN. This does not require a latch get:

kcsaj8+2:0x00007fffa1072950():W:8:0x7fffa1072970/140735894989168()
kcsaj8+13:0x00007fffa1072960():R:8:0x762b80/7744384()
kcsaj8+16:0x0000000060016348(fixed sga|var:kcrfsg_+288 shared pool|x_kcrfws.real_redo_scn+0 ):R:8:0x762b7d/7744381()
kcsaj8+22:0x00007f520ba98f40():R:8:0x7f520ba98f40/139990359707456()
kcsaj8+40:0x00007fffa1072900():W:8:0x7fffa1072960/140735894989152()
kcsaj8+44:0x00007fffa1072908():W:8:0/0()
kcsaj8+48:0x00007fffa1072910():W:8:0/0()
kcsaj8+52:0x00007fffa1072918():W:4:0/0()
kcsaj8+56:0x00007fffa1072938():W:8:0x7f520ba98f40/139990359707456()
kcsaj8+63:0x00007fffa1072930():W:8:0/0()
kcsaj8+70:0x00007fffa1072928():W:8:0/0()
kcsaj8+74:0x00007fffa1072920():W:8:0x77eced90/2012016016(shared pool|permanent memor,duration 1,cls perm+2451496 )
kcsaj8+81:0x00007fffa1072940():W:8:0x77eced90/2012016016(shared pool|permanent memor,duration 1,cls perm+2451496 )
kcsaj8+88:0x00007fffa1072828():W:4:0/0()
kcsaj8+98:0x00007fffa107285a():W:1:0/0()
kcsaj8+105:0x0000000060016348(fixed sga|var:kcrfsg_+288 shared pool|x_kcrfws.real_redo_scn+0 ):R:8:0x762b7d/7744381()
kcsaj8+105:0x0000000060016348(fixed sga|var:kcrfsg_+288 shared pool|x_kcrfws.real_redo_scn+0 ):W:8:0x762b80/7744384()
kcsaj8+120:0x00007fffa1072828():R:4:0/0()
kcsaj8+138:0x00007fffa1072908():R:8:0/0()
kcsaj8+145:0x00007fffa1072900():R:8:0x7fffa1072960/140735894989152()
kcsaj8+152:0x00007fffa1072910():R:8:0/0()
kcsaj8+159:0x00007fffa1072918():R:4:0/0()
kcsaj8+166:0x00007fffa1072938():R:8:0x7f520ba98f40/139990359707456()
kcsaj8+170:0x00007fffa1072930():R:8:0/0()
kcsaj8+174:0x00007fffa1072928():R:8:0/0()
kcsaj8+178:0x00007fffa1072920():R:8:0x77eced90/2012016016(shared pool|permanent memor,duration 1,cls perm+2451496 )
kcsaj8+182:0x00007fffa1072940():R:8:0x77eced90/2012016016(shared pool|permanent memor,duration 1,cls perm+2451496 )
kcsaj8+202:0x00007f520ba8be08():R:8:0x600113b8/1610683320(fixed sga|var:kcsgscn_+0 )
kcsaj8+219:0x00007fffa1072950():R:8:0x7fffa1072970/140735894989168()
kcsaj8+220:0x00007fffa1072958():R:8:0x3208da6/52465062()

The real redo SCN SCN value is the value that was gathered in the kcsnew3 function at the beginning of the function, however, I am not sure here. It might be the highest SCN in the public strands, although it would make sense to limit the current write to the LWN SCN that was obtained at the start, which means it’s the highest SCN in the strand buffers it picked for writing, while additional buffers might have been written into, with newer higher SCNs.

What happens next, is the logwriter needs to prepare the public redo strand buffers to be written to disk:

 | | > kcrfw_update_redo_blk_hdrs(0x77eced90, 0, ...)
 | | | > _intel_fast_memset(0xda31669c, 0, ...)
 | | | <> _intel_fast_memset.J(0xda31669c, 0, ...)
 | | | <> __intel_new_memset(0xda31669c, 0, ...)
 | | | < __intel_new_memset+0x0000000005c0 returns: 0xda31669c
 | | < kcrfw_update_redo_blk_hdrs+0x000000000363 returns: 0x77eced90
 | | > kcrfw_update_strand_hdrs(0x7ffc3dd11b38, 0x77eced90, ...)
 | | < kcrfw_update_strand_hdrs+0x000000000222 returns: 0x77eced90
 | | > kcrfw_cal_target_rba(0x5, 0xda316650, ...)
 | | | > kslgetl(0x6004b780, 0x1, ...)
 | | | < kslgetl+0x00000000012f returns: 0x1
 | | <> kslfre(0x6004b780, 0x1, ...)
 | | < kslfre+0x0000000001e2 returns: 0
 | | > skgstmGetEpochTs(0x6004b780, 0x1, ...)
 | | | > gettimeofday@plt(0x7ffc3dd118e0, 0, ...)
 | | | < __vdso_gettimeofday+0x0000000000fe returns: 0
 | | < skgstmGetEpochTs+0x000000000049 returns: 0x2050c6a4e2e57
 | | > kcrfw_do_write(0x77eced90, 0x1, ...)

The last line, which is a call to the kcrfw_do_write function, is the logwriter function inside kcrfw_redo_write that actually calls the disk IO functions in ksfd. I’ll walk quickly through these because executing IO is not the focus of this blogpost.

At this point the memory area’s are setup, the IO is prepared:

This is probably calculating the request size:

 | | | > ksfdgrsz(0x1, 0, ...)
 | | | | > skgfrgsz(0x3, 0, ...)
 | | | | < skgfrgsz+0x00000000002d returns: 0xa0
 | | | < ksfdgrsz+0x00000000001a returns: 0x288

This is probably setting up the asynchronous IO information and requests:

 | | | > ksfd_aio_setobjinfo(0x71717da0, 0x76b423e8, ...)
 | | | < ksfd_aio_setobjinfo+0x00000000003c returns: 0
 | | | > ksfdrqfill(0x71717da0, 0x76b423e8, ...)
 | | | | > kskiorm(0x71717da0, 0, ...)
 | | | | < kskiorm+0x00000000001e returns: 0
 | | | < ksfdrqfill+0x00000000021b returns: 0x102

The main function for performing IO is ksfdbio (kernel system file disk block IO):

 | | | > ksfdbio(0x7f2c1e502000, 0x1, ...)
 ...
 | | | | > ksfd_update_iostatsbytes(0, 0x76b76930, ...)
 ...
 | | | | > kslwtbctx(0x7ffc3dd11700, 0x42, ...)
 ...
 | | | | < kslwtbctx+0x00000000050b returns: 0x7a823e58
 | | | | > ksfd_sbio(0x7f2c1e502028, 0x1, ...)
 ...
 | | | | < ksfd_sbio+0x00000000035b returns: 0x1
 | | | | > kslwtectx(0x7ffc3dd11700, 0x200, ...)
 ...
 | | | | < kslwtectx+0x0000000003f9 returns: 0xe923
 | | | | > ksfdbio_updatestats(0xb6b1, 0, ...)
 ...
 | | | < ksfdbio+0x000000000507 returns: 0x7f2c23b04040

This is ksfd_update_iostatsbytes for statistics, kslwtbctx for registering a wait, kslwtectx for ending the wait, ksfdbio_updatestats for statistics about the IO just performed, and in between kslwtbctx and kslwtectx ksfd_sbio performs the IO.

If you look in ksfd_sbio, you’ll see the true IO steps:

 | | | | > ksfd_sbio(0x7f2c1e502028, 0x1, ...)
 | | | | | > ksfd_osdrqfil(0x71717da0, 0x76b423e8, ...)
 | | | | | < ksfd_osdrqfil+0x00000000014c returns: 0x1
 | | | | | > ksfd_addtoissueq(0x71717da0, 0x200, ...)
 | | | | | < ksfd_addtoissueq+0x0000000001ac returns: 0x7a6032e8
 | | | | | > ksfdgo(0x500, 0x81e, ...)
 ...
 | | | | | < ksfdgo+0x000000000135 returns: 0
 | | | | | > ksfdwtio(0x1, 0x500, ...)
 ...
 | | | | | < ksfdwtio+0x000000000176 returns: 0x1
 | | | | | > ksfd_chkio(0x71717da0, 0x9, ...)
 | | | | | | > ksfd_update_iostatstime(0x76b76930, 0x200, ...)
 | | | | | | < ksfd_update_iostatstime+0x000000000103 returns: 0x76c70388
 | | | | | < ksfd_chkio+0x000000000126 returns: 0x1
 | | | | < ksfd_sbio+0x00000000035b returns: 0x1

The ksfd_osdrqfil function probably is a function to create IO requests, and ksfd_addtoissueq looks like it is a function to put these requests in a queue. ksfdgo executes the IO requests, which essentially calls io_submit to submit the write requests. ksfdwtio essentially calls io_getevents to look for the IO results.

After kcrfw_do_write is done, kcrfw_post is executed, just like we saw with the kcrfw_do_null_write function.

Also identical to the null write, the current on disk SCN is read from kcrfsg_ using kcscu8.

Then a series of latch gets is executed to protect changes in the SGA to reflect the write that was executed.
First the redo writing latch is taken. After the redo writing latch is obtained, kcrfsg_+12, alias X$KCRFSWS.LAST_BLK is read and updated with the write:

kcrfw_post+715:0x0000000060016234(fixed sga|var:kcrfsg_+12 shared pool|X$KCRFWS.LAST_BLK+0 shared pool|x_kcrfws.last_blk+0 ):R:4:0x16a15/92693()
kcrfw_post+725:0x00000000600163c0(fixed sga|var:kcrfsg_+408 ):R:4:0xf5/245()
kcrfw_post+736:0x00007fffa1072958():R:8:0/0()
kcrfw_post+743:0x00007fffa1072880():W:4:0xf5/245()
kcrfw_post+753:0x00007fffa1072940():R:4:0xdf7/3575()
kcrfw_post+767:0x0000000060016234(fixed sga|var:kcrfsg_+12 shared pool|X$KCRFWS.LAST_BLK+0 shared pool|x_kcrfws.last_blk+0 ):W:4:0x16a17/92695()

While still holding the redo writing latch, the redo allocation latch for the correct public redo strand is taken, and MEM_BUFS_AVAILABLE_KCRFA_CLN and AVAILABLE_BUFS_KCRFA_CLN are updated:

kcrfw_post+803:0x0000000076fdf3bc(shared pool|KCRFA+52 shared pool|permanent memor,duration 1,cls perm+16114620 ):R:4:0xde2f/56879()
kcrfw_post+803:0x0000000076fdf3bc(shared pool|KCRFA+52 shared pool|permanent memor,duration 1,cls perm+16114620 ):W:4:0xde31/56881()
MEM_BUFS_AVAILABLE_KCRFA_CLN
kcrfw_post+3839:0x0000000076fdf3b8(shared pool|KCRFA+48 shared pool|permanent memor,duration 1,cls perm+16114616 ):R:4:0xd7d0/55248()
kcrfw_post+3839:0x0000000076fdf3b8(shared pool|KCRFA+48 shared pool|permanent memor,duration 1,cls perm+16114616 ):W:4:0xd7d0/55248()
AVAILABLE_BUFS_KCFRA_CLN

Then the redo allocation latch is freed, and the redo writing latch is freed.

Then the redo writing latch is taken again, in order to update the last write scn in kcrfsg_:

ub8_to_kscn_impl+21:0x00000000600162c0(fixed sga|var:kcrfsg_+152 shared pool|X$KCRFWS.LAST_WRITE_SCN+0 shared pool|x_kcrfws.last_write_scn+0 ):W:4:0x762b81/7744385()
ub8_to_kscn_impl+37:0x00000000600162c6(fixed sga|var:kcrfsg_+158 shared pool|X$KCRFWS.LAST_WRITE_SCN+6 ):W:2:0/0()
ub8_to_kscn_impl+41:0x00000000600162c4(fixed sga|var:kcrfsg_+156 shared pool|X$KCRFWS.LAST_WRITE_SCN+4 ):W:2:0/0()

And while holding the redo writing latch, the kcsnew_rba function is called to set the redo byte address data in kcrfsg_. In order to do this, the ‘consistent RBA’ latch is obtained first:

kslgetl+138:0x0000000060049c80(fixed sga|(parent)latch:Consistent RBA+0 fixed sga|var:consist_rba_lat_+0 ):R:8:0/0()
kslgetl+138:0x0000000060049c80(fixed sga|(parent)latch:Consistent RBA+0 fixed sga|var:consist_rba_lat_+0 ):W:8:0x13/19()

And then fields in the kcrfsg_ struct are read updated.

kcsnew_rba+47:0x00000000600162d8(fixed sga|var:kcrfsg_+176 ):W:4:0x1/1()  // rba details being modified flag
kcsnew_rba+58:0x00000000600162d4(fixed sga|var:kcrfsg_+172 ):R:4:0x7e/126()
kcsnew_rba+58:0x00000000600162d4(fixed sga|var:kcrfsg_+172 ):W:4:0x7f/127()
kcsnew_rba+65:0x00007fffa1072880():R:8:0x16a18000000f5/398126288470261()
kcsnew_rba+69:0x00000000600162c8(fixed sga|var:kcrfsg_+160 ):W:8:0x16a18000000f5/398126288470261() // 164: rba current log block
kcsnew_rba+72:0x00007fffa1072888():R:4:0x10010/65552()
kcsnew_rba+76:0x00000000600162d0(fixed sga|var:kcrfsg_+168 ):W:4:0x10010/65552()
kcsnew_rba+83:0x00000000600162d8(fixed sga|var:kcrfsg_+176 ):W:4:0/0()  // rba details being modified flag
kcsnew_rba+91:0x00000000600162e0(fixed sga|var:kcrfsg_+184 shared pool|pointer:consistent rba latch+0 ):R:8:0x60049c80/1610914944(fixed sga|(parent)latch:Consistent RBA+0 fixed sga|var:consist_rba_lat_+0 )
kcsnew_rba+95:0x00007fffa1072658():R:8:0x77eced90/2012016016(Variable Size(pgsz:2048k)|+384626064 shared pool|permanent memor,duration 1,cls perm+2451496 )
kcsnew_rba+99:0x00007fffa1072650():R:8:0x7f520ba98f40/139990359707456()
kcsnew_rba+106:0x00007fffa1072660():R:8:0x7fffa1072970/140735894989168()

After the kcrfsg_ RBA update, the consistent RBA latch is freed. While still holding the redo writing latch, the last write scn time is updated in kcrfsg_:

kcrfw_post+1355:0x0000000077eceda0(Variable Size(pgsz:2048k)|+384626080 shared pool|permanent memor,duration 1,cls perm+2451512 ):R:4:0x397df7b0/964556720()
kcrfw_post+1361:0x000000006001623c(Fixed Size(pgsz:2048k)|+90684 fixed sga|var:kcrfsg_+20 shared pool|X$KCRFWS.LAST_WRITE_SCN_TIME+0 ):W:4:0x397df7b0/964556720()

And the redo writing latch is released.

A little further in the function kcrfw_post, the kcsadj8 function for changing the on disk scn to the lwn scn is called, exactly like in the null write function:

 | | | > kcsaj8(0x60016290, 0x7ffc3dd118b8, ...)
 | | | < kcsaj8+0x0000000000dc returns: 0x1

After the adjustment of the on disk SCN, the foregrounds that have registered a wait via the post wait interface need to be posted. This is done in the kslpslf (kernel service lock management post list of foregrounds) function in the kcrfw_post function:

 | | | > kslpslf(0x97, 0x769a5730, ...)
 | | | | > ksl_postm_init(0x7ffc3dd094f0, 0x7ffc3dd11510, ...)
 | | | | < ksl_postm_init+0x00000000002b returns: 0
 | | | | > kslgetl(0x7a5d7388, 0x1, ...)
 | | | | < kslgetl+0x00000000012f returns: 0x1
 | | | | > ksl_update_post_stats(0x7a619f48, 0x7ffc3dd11510, ...)
 | | | | | > dbgtTrcData_int(0x7f2c23b5e6c0, 0x2050031, ...)
 | | | | | | > dbgtBucketRedirect(0x7f2c23b5e6c0, 0x7ffc3dd093c8, ...)
 | | | | | | < dbgtBucketRedirect+0x000000000050 returns: 0x1
 | | | | | | > dbgtIncInMemTrcRedirect(0x7f2c23b5e6c0, 0x6fa, ...)
 | | | | | | < dbgtIncInMemTrcRedirect+0x000000000035 returns: 0x1
 | | | | | | > skgstmGetEpochTs(0x7f2c23b5e6c0, 0x6fa, ...)
 | | | | | | | > gettimeofday@plt(0x7ffc3dd08ed0, 0, ...)
 | | | | | | | < __vdso_gettimeofday+0x0000000000fe returns: 0
 | | | | | | < skgstmGetEpochTs+0x000000000049 returns: 0x2050c7178c0c1
 | | | | | | > dbgtrRecAllocate(0x7f2c23b5e6c0, 0x7ffc3dd09370, ...)
 | | | | | | | > dbgtrPrepareWrite(0x7f2c23b5e6c0, 0x71121048, ...)
 | | | | | | | < dbgtrPrepareWrite+0x00000000011c returns: 0x11
 | | | | | | < dbgtrRecAllocate+0x000000000144 returns: 0x1
 | | | | | | > _intel_fast_memcpy(0x71124368, 0x7ffc3dd09468, ...)
 | | | | | | <> _intel_fast_memcpy.P(0x71124368, 0x7ffc3dd09468, ...)
 | | | | | | <> __intel_ssse3_rep_memcpy(0x71124368, 0x7ffc3dd09468, ...)
 | | | | | | < __intel_ssse3_rep_memcpy+0x000000002798 returns: 0x71124368
 | | | | | | > dbgtrRecEndSegment(0x7f2c23b5e6c0, 0x7ffc3dd09370, ...)
 | | | | | | < dbgtrRecEndSegment+0x00000000011c returns: 0x77c000a4
 | | | | | < dbgtTrcData_int+0x000000000323 returns: 0x77c000a4
 | | | | < ksl_update_post_stats+0x00000000024f returns: 0x77c000a4
 | | | | > kslfre(0x7a5d7388, 0, ...)
 | | | | < kslfre+0x0000000001e2 returns: 0
 | | | | > ksl_postm_do_posts(0x7ffc3dd094f0, 0, ...)
 | | | | | > kslpwrp_int(0x7ffc3dd09450, 0x7f2c23b9e9a0, ...)
 | | | | | | > skgpwpost(0x7ffc3dd09350, 0x7f2c23ba3ca0, ...)
 | | | | | | <> sskgpwpost(0x7ffc3dd09350, 0x7f2c23ba3ca0, ...)
 | | | | | | | > semop@plt(0x38000, 0x7ffc3dd07310, ...)
 | | | | | | | < semop+0x00000000000f returns: 0
 | | | | | | < sskgpwpost+0x00000000009a returns: 0x1
 | | | | | < kslpwrp_int+0x000000000069 returns: 0x1
 | | | | | > kgecss(0x7f2c23b9e9a0, 0x7f2c1e54b048, ...)
 | | | | | < kgecss+0x000000000035 returns: 0x7f2c23be0f40
 | | | | < ksl_postm_do_posts+0x000000000144 returns: 0x1
 | | | < kslpslf+0x000000000822 returns: 0x7a5e26c8

Essentially, what happens here is the posting of a message is initialised (ksl_postm_init; kernel service locking post multi init), although most of the initialisation from a memory perspective seems to happen in the kslpslf function. Then the post/wait queue latch is obtained, and the kslpslf function manipulates the post/wait queue (which seems to be multiple memory area’s linked together with pointers) to reflect the process it is going to semctl/semop.
Then in the function ksl_update_post_stats, some housekeeping is done:

ksl_update_post_stats+117:0x000000007a603758(shared pool|permanent memor,duration 1,cls perm+5465944 ):W:8:0x7a612b58/2053188440(Variable Size(pgsz:2048k)|+425798488 shared pool|(indx:34)X$KSUPR+0 shared pool|permanent memor,duration 1,cls perm+5528408 )
ksl_update_post_stats+124:0x000000007a603748(shared pool|(indx:19)X$KSUPR.KSLLAPSC+0 shared pool|permanent memor,duration 1,cls perm+5465928 ):R:4:0x1e/30()
ksl_update_post_stats+134:0x000000007a603748(shared pool|(indx:19)X$KSUPR.KSLLAPSC+0 shared pool|permanent memor,duration 1,cls perm+5465928 ):W:4:0x1f/31()
ksl_update_post_stats+141:0x000000007a603744(shared pool|(indx:19)X$KSUPR.KSLLAPSN+0 shared pool|permanent memor,duration 1,cls perm+5465924 ):W:4:0x3/3()
ksl_update_post_stats+2705:0x00007fffa1072590():R:8:0x97/151()
ksl_update_post_stats+2708:0x000000007a612f80(shared pool|(indx:34)X$KSUPR.KSLLID1R+0 shared pool|permanent memor,duration 1,cls perm+5529472 ):W:8:0x97/151()
ksl_update_post_stats+2716:0x00007fffa1072598():R:8:0/0()
ksl_update_post_stats+2720:0x000000007a612f88(shared pool|(indx:34)X$KSUPR.KSLLID2R+0 shared pool|permanent memor,duration 1,cls perm+5529480 ):W:8:0/0()
ksl_update_post_stats+2728:0x00007fffa10725a0():R:8:0x7fff00005645/140733193410117()
ksl_update_post_stats+2732:0x000000007a612f90(shared pool|(indx:34)X$KSUPR.KSLLRTYP+0 shared pool|permanent memor,duration 1,cls perm+5529488 ):W:8:0x7fff00005645/140733193410117()
ksl_update_post_stats+202:0x000000007a612fc0(shared pool|permanent memor,duration 1,cls perm+5529536 ):W:8:0x7a6032e8/2053124840(Variable Size(pgsz:2048k)|+425734888 shared pool|(indx:19)X$KSUPR+0 shared pool|permanent memor,duration 1,cls perm+5464808 )
ksl_update_post_stats+209:0x000000007a612fbc(shared pool|(indx:34)X$KSUPR.KSLLAPRC+0 shared pool|permanent memor,duration 1,cls perm+5529532 ):R:4:0x2/2()
ksl_update_post_stats+209:0x000000007a612fbc(shared pool|(indx:34)X$KSUPR.KSLLAPRC+0 shared pool|permanent memor,duration 1,cls perm+5529532 ):W:4:0x3/3()
ksl_update_post_stats+216:0x000000007a612fb0(shared pool|(indx:34)X$KSUPR.KSLLAPRV+0 shared pool|permanent memor,duration 1,cls perm+5529520 ):W:4:0x3/3()

Of course x$ksupr is the kernel service user process, which is the base table for v$process. However, there are a lot of columns in x$ksupr that are not externalised in v$process. The indx number of the logwriter is 19, the indx number of the process that is going to be posted is 34. One of these fields that are not externalised is ksllapsc, which is the Posts Send Count. Because the logwriter is going to post the foreground session, it’s posts send count is increased by one. ksllapsn is last Post SeNt location number, which is set to 3.

This number corresponds with x$kslpo:

SQL> select indx, ksllwnam, kslposts from x$kslpo where indx = 3;
      INDX KSLLWNAM               KSLPOSTS
---------- -------------------- ----------
         3 kslpsr                     4356

(kernel service locking post receive (?))

The foreground process statistics in x$ksupr are changed too, ksllaprc Posts Received Count, and is increased by one too, because the logwriter is going to send a message to the foreground process. ksllaprv is last Post ReceiVed location number, which is set to 3 too.

After that some tracing and other housekeeping is taking place in dbgtTrcData_int, after which the post/wait queue latch is freed. The next function is doing the actual semop call, which is ksl_postm_do_posts: kernel service locking, post multi do posts. In ksl_postm_do_posts, the function kslpwrp_int is called, which looks like it’s essentially a function that can call the function skgpwpost (system kernel generic post wait post), but if there are multiple sessions to be posted, it calls the function skgpwvectorpost, which iterates through the list of processes to be posted.

At this point the kcrfw_redo_write_driver function is done.

	Data Virtualizer… on Getting to know Oracle wait ev…
	Frits Hoogland on Oracle 12.2 wait event ‘…
	Andrew Markiewicz on Oracle 12.2 wait event ‘…
	YugabyteDB Off Campu… on My reasons for moving to Yugab…
	Harold Zubieta on Control file inconsistencies d…

—Frits Hoogland Weblog

IT Technology; Yugabyte, Postgres, Oracle, linux, TCP/IP and other stuff I find interesting

A look into Oracle redo, part 5: the log writer writing

Leave a comment Cancel reply

Share this:

Leave a comment Cancel reply