A look into Oracle redo, part 3: log writer work cycle overview

This is the third part of a series of blogposts on how the Oracle database handles redo. The previous part talked about the memory area that stores redo strand information: https://fritshoogland.wordpress.com/2018/02/05/a-look-into-oracle-redo-part-2-the-discovery-of-the-kcrfa-structure/.

The single most important process in the Oracle database for handling redo is the log writer, which primary task is flushing the redo information other Oracle database processes put in the public redo strands to disk. Now that we have investigated the public redo strands and concurrency (first part) and kcrfsg_ and the KCRFA structure (second part), it seems logical to me to look at the log writer.

Simply because we have seen the foreground process take the redo allocation latch, ‘fiddles around’ in the kcrfsg_ and KCRFA structures and releases the latch, it is a pretty safe guess the logwriter is using these structures to determine if it needs to write. It looks like the kcrfsg_ structure contains global redo information like redo log file information, and the KCRFA structure strand specific information.

First we need to get an understanding of what the logwriter process is actually doing. The below investigation is done to see how the idle and write work cycles work, I explicitly excluded additional tasks like redo log file switches.

First of all, actually quite surprisingly, there is little information about what the log writer is actually doing. The only well known features of the logwriter is that it writes the log buffer instrumented by the ‘log file parallel write’ wait event, and sleeps on a semaphore in the wait event ‘rdbms ipc message’. In order to learn more, I’ve run a debugtrace on the logwriter process running idle, and stripped excess information and then captured only the calls one level from the stack depth the logwriter process is on in between the work cycles:

$ awk -n '$0 ~ /^\ \x3e\ /' lgwr_full_cycle_stripped.txt
 > ksarcv(0x7ffc3dd12338, 0x7ffc3dd122a0, ...)               <<<<
 > ksl_exit_main_loop_wait(0x9d64fb00c, 0x7, ...)
 > ksbcti(0x12b84c18, 0x7f2c1eb804b0, ...)          (nr 1)
 > dbktFlush(0, 0, ...)
 > sltrgatime64(0x7f2c23b9e9a0, 0x7f2c23b9ebe8, ...)
 > ksbcti(0x12b84be0, 0x7f2c1eb80208, ...)          (nr 2)
 > ksbcti(0x12b84bfc, 0x7f2c1eb804b0, ...)          (nr 3)
 > sltrgatime64(0, 0, ...)
 > ksl_enter_main_loop_wait(0x1, 0x4ccef9975, ...)
 > ksarcv(0x7ffc3dd12338, 0x7ffc3dd122a0, ...)               <<<<
 > ksl_exit_main_loop_wait(0x9d67d8b01, 0x7, ...)
 > ksbcti(0x12b84c18, 0x7f2c1eb804b0, ...)          (nr 1)
 > dbktFlush(0, 0, ...)
 > sltrgatime64(0x7f2c23b9e9a0, 0x7f2c23b9ebe8, ...)
 > ksbcti(0x12b84be0, 0x7f2c1eb80208, ...)          (nr 2)
 > ksbcti(0x12b84bfc, 0x7f2c1eb804b0, ...)          (nr 3)
 > sltrgatime64(0, 0, ...)
 > ksl_enter_main_loop_wait(0x1, 0x4cd05fb07, ...)
 > ksarcv(0x7ffc3dd12338, 0x7ffc3dd122a0, ...)               <<<<

ksarcv means kernel service asynchronous receive message, this is the function that sets up the semaphore and puts the logwriter process to sleep. This also explains the ksl_exit_main_loop_wait and ksl_enter_main_loop_wait functions, these are called exactly before (enter) and after (exit) the ksarcv function. The dbktFlush function is managing trace files, this function is used to write messages to the logwriter tracefile. The sltrgatime64 function is a function that calls clock_gettime. That leaves one function that can be seen in my example: ksbcti. ksbcti means kernel service background processes call timeout/interrupts. This is a function that performs several different functions based on the first argument.

The first ksbcti function following ksl_exit_main_loop_wait (with first argument 0x12b84c18) seems to be the function that performs administering the log writer process resource usage details in the SGA among other things:

 | | > kews_timeout(0x7f2c1eb93ec0, 0, ...)
 | | | > sltrgftime64(0x7f2c1eb93ec0, 0, ...)
 | | | | > clock_gettime@plt(0x1, 0x7ffc3dd10ad0, ...)
 | | | | | > clock_gettime(0x1, 0x7ffc3dd10ad0, ...)
 | | | | | < clock_gettime+0x000000000069 returns: 0
 | | | | < clock_gettime+0x00000000003a returns: 0
 | | | < sltrgftime64+0x00000000004c returns: 0x9cb6163fa
 | | | > slcpums(0x7f2c23ba3c58, 0x191ae1d5, ...)
 | | | | > getrusage(0x1, 0x7ffc3dd10a30, ...)
 | | | | | > fthread_self(0x1, 0x7ffc3dd10a30, ...)
 | | | | | < fthread_self+0x000000000024 returns: 0
 | | | | <> getrusage_ext(0x1, 0x7ffc3dd10a30, ...)
 | | | | < getrusage+0x00000000000f returns: 0
 | | | < slcpums+0x00000000008f returns: 0x31d592
 | | | > kews_sqlst_flush(0xc, 0x72fd7d40, ...)
 | | | | > sltrgftime64(0xc, 0x72fd7d40, ...)
 | | | | | > clock_gettime@plt(0x1, 0x7ffc3dd10a60, ...)
 | | | | | | > clock_gettime(0x1, 0x7ffc3dd10a60, ...)
 | | | | | | < clock_gettime+0x000000000069 returns: 0
 | | | | | < clock_gettime+0x00000000003a returns: 0
 | | | | < sltrgftime64+0x00000000004c returns: 0x9cb61723f
 | | | | > slcpums(0x7f2c23ba3c58, 0x7ae0ee70, ...)
 | | | | | > getrusage(0x1, 0x7ffc3dd109c0, ...)
 | | | | | | > fthread_self(0x1, 0x7ffc3dd109c0, ...)
 | | | | | | < fthread_self+0x000000000024 returns: 0
 | | | | | <> getrusage_ext(0x1, 0x7ffc3dd109c0, ...)
 | | | | | < getrusage+0x00000000000f returns: 0
 | | | | < slcpums+0x00000000008f returns: 0x31e417
 | | | < kews_sqlst_flush+0x00000000016a returns: 0x7f2c23b9e900
 | | < kews_timeout+0x00000000052c returns: 0x7f2c23b9e900

kews_timeout means kernel event wait statistics, so statistics updated because of the process timing out. sltrgftime64 is an o/s specific call wrapper to read time from the operating system, it calls clock_gettime that reads the system clock (argument 1 is CLOCK_MONOTONIC, which is a clock that is not affected by ‘discontinuous’ changes to the clock time; think daylight saving time time changes for example). Similar, slcpums is an o/s specific call wrapper to read cpu usage, it calls getrusage to obtain CPU accounting data from the operating system. kews_sqlst_flush seems to be related to automatic tuning features, this *probably* (I haven’t investigated further) flushes status data like time and cpu to a buffer so features like sql monitor and adaptive features can use it.

The second ksbcti function call (with first argument 0x12b84be0) is the main function that performs the actual redo write, which is done by the kcrfw_redo_write_driver function:

 > ksbcti(0x12b84be0, 0x7f2c1eb80208, ...)
 | > ksbckbast(0, 0, ...)
 | < ksbckbast+0x000000000061 returns: 0x1
 | > ksumcl(0, 0, ...)
 | < ksumcl+0x000000000045 returns: 0x1540
 | > kcrfw_redo_write_driver(0, 0, ...) 
...

The next blogpost will look into what the kcrfw_redo_write_driver function actually performs. To spoil it a bit: the kcrfw_redo_write_driver function is called every 3 seconds, even if there’s nothing to write!

The third ksbcti function call (with first argument 0x12b84bfc) performs a function or series of functions that are not clear to me, and it’s not doing a lot inside the functions that are called in ksbcti to give me a hint:

 > ksbcti(0x12b84bfc, 0x7f2c1eb804b0, ...)
 | > ksbmsg(0, 0, ...)
 | < ksbmsg+0x000000000040 returns: 0x60065d90
 | > ksbxiaf(0, 0, ...)
 | < ksbxiaf+0x000000000058 returns: 0x795b5828
 | > kjci_action(0, 0, ...)
 | < kjci_action+0x000000000035 returns: 0x60065d90
 | > kcfsmpoll(0, 0, ...)
 | < kcfsmpoll+0x000000000030 returns: 0x60065d90
 | > ksuwaitsysevent(0, 0, ...)
 | < ksuwaitsysevent+0x0000000001e7 returns: 0x795b4aa8
 | > krdrsb_lgwr_chintr(0, 0, ...)
 | | > krsh_trace(0x2000, 0, ...)
 | | < krsh_trace+0x00000000005d returns: 0
 | < krdrsb_lgwr_chintr+0x00000000008b returns: 0
 | > ksbckbast(0, 0, ...)
 | < ksbckbast+0x000000000061 returns: 0x1
 < ksbcti+0x00000000018f returns: 0x1

Of course there are several things that point to features that are not used in my database; krdrsb points to dataguard and kjci points to RAC.

If you look again to the sequence you might wonder why the logwriter is not executing the redo write function immediately after waking up from the semaphore in kcarcv, in order to be able to write as soon as a process semctl’s/semop’s the semaphore asking the logwriter to write. Well, actually there is another sequence of function calls that can happen:

 > ksarcv(0x7ffc3dd12338, 0x7ffc3dd122a0, ...)
 > ksl_exit_main_loop_wait(0x6003ef40, 0x7af972b8, ...)
 > kcrfw_redo_write_driver(0x7ffc3dd122a0, 0x18, ...)         <<<<<
 > ksbcti(0x12b84c74, 0x7f2c1eb804b0, ...)
 > ksarcv(0x7ffc3dd12338, 0x7ffc3dd122a0, ...)
 > dbktFlush(0x7ffc3dd12338, 0x7ffc3dd122a0, ...)
 > sltrgatime64(0x7f2c23b9e9a0, 0x7f2c23b9ebe8, ...)
 > ksl_enter_main_loop_wait(0x1, 0x7a6032e8, ...)
 > ksarcv(0x7ffc3dd12338, 0x7ffc3dd122a0, ...)

Here you see ksarcv, which calls kcrfw_redo_write_driver immediately after ksarcv and ksl_exit_main_loop_wait, not inside ksbcti. What happened in ksarcv, is that semtimedop call returned 0, indicating the semaphore was semctl’ed or semop’ed, instead of timing out after 3 seconds. If you payed close attention you saw that the first argument of the ksbcti function of kcrfw_redo_write_driver is different. I am not sure why, because the same functions as the call with 0x12b84bfc as first argument are called (the ones with functions hinting at dataguard and RAC). The ksarcv call that follows is not putting the process to sleep on a semaphore, it reads in the asynchronous messages structure in fixed sga (ksasga_) after which it returns immediately. What follows are dbktFlush and sltrgatime64 for administrative purposes (writing to the tracefile if necessary, and updating resource usage information), after which the process prepares the semaphore sleep using ksl_enter_main_loop_wait and then goes to sleep on a semaphore in ksarcv.

Please note this is the general cycle, there are other tasks that the log writer performs. One example of such a task is the log writer is looking up the process monitor (PMON) process to see if it is alive, which is done every minute.

1 comment

Leave a comment

This site uses Akismet to reduce spam. Learn how your comment data is processed.