A look into Oracle redo, part 6: oracle post-wait commit and the on disk SCN

This is the sixth part in a blog series about Oracle database redo. The previous posts provided information about the log writer writing, this post is about the process that is waiting after issuing commit for the log writer to write it’s redo from the public redo strand. When the database is using post/wait for process commits, the committing process follows the following (simplified) procedure:

kcrfw_redo_gen_ext: 
 kcsgbsn8: kcsnew8: commit SCN is fetched from kcsgscn_
 kslgetl: redo copy latch is taken
 kslgetl: redo allocation latch is taken
 : allocation annotations are made for the commit record in kcrfsg_ and KCRFA
 kslfre: redo allocation latch is freed
 kcrfw_copy_cv : commit record is copied into a public redo strand.
 kslfre: redo copy latch is freed
 kcrfw_commit_force_int: second argument 0: post logwriter if not already posted

These are the actual function calls inside kcrf_commit_force_int with second argument 0:

| | | | | | | | | | | | | | | | > kcrf_commit_force_int(0x7fffbea0de50, 0, ...)
| | | | | | | | | | | | | | | | | > kcscu8(0x60016290, 0x7fffbea0dc78, ...)
| | | | | | | | | | | | | | | | | < kcscu8+0x000000000047 returns: 0x1
| | | | | | | | | | | | | | | | | > kcscu8(0x60016260, 0x7fffbea0dbc0, ...)
| | | | | | | | | | | | | | | | | < kcscu8+0x000000000047 returns: 0x1
| | | | | | | | | | | | | | | | | > ksbasend(0x600161a0, 0x60016908, ...)
| | | | | | | | | | | | | | | | | | > ksbpro(0x600161a0, 0x60016908, ...)
| | | | | | | | | | | | | | | | | | < ksbpro+0x000000000063 returns: 0x7a6032e8
| | | | | | | | | | | | | | | | | | > ksbpser(0x600161a0, 0x60016908, ...)
| | | | | | | | | | | | | | | | | | < ksbpser+0x000000000046 returns: 0x1
| | | | | | | | | | | | | | | | | | > ksberr(0x600161a0, 0, ...)
| | | | | | | | | | | | | | | | | | < ksberr+0x000000000045 returns: 0
| | | | | | | | | | | | | | | | | | > ksasnd(0x7a6032e8, 0x1, ...)
| | | | | | | | | | | | | | | | | | | > ksasndint(0x7a6032e8, 0x1, ...)
| | | | | | | | | | | | | | | | | | | | > ksaamb(0x7a63a0b8, 0x7a5d9fe0, ...)
| | | | | | | | | | | | | | | | | | | | | > kslgetl(0x6003ef40, 0x1, ...)
| | | | | | | | | | | | | | | | | | | | | < kslgetl+0x00000000012f returns: 0x1
| | | | | | | | | | | | | | | | | | | | | > _intel_fast_memcpy(0x7af99780, 0x7fffbea0dba8, ...)
| | | | | | | | | | | | | | | | | | | | | <> _intel_fast_memcpy.P(0x7af99780, 0x7fffbea0dba8, ...)
| | | | | | | | | | | | | | | | | | | | | <> __intel_ssse3_rep_memcpy(0x7af99780, 0x7fffbea0dba8, ...)
| | | | | | | | | | | | | | | | | | | | | < __intel_ssse3_rep_memcpy+0x000000002030 returns: 0x7af99780
| | | | | | | | | | | | | | | | | | | | | > kslfre(0x6003ef40, 0x60009578, ...)
| | | | | | | | | | | | | | | | | | | | | < kslfre+0x0000000001e2 returns: 0
| | | | | | | | | | | | | | | | | | | | < ksaamb+0x0000000001ed returns: 0x1
| | | | | | | | | | | | | | | | | | | < ksasndint+0x000000000120 returns: 0x1
| | | | | | | | | | | | | | | | | | | > ksl_post_proc(0x7a6032e8, 0, ...)
| | | | | | | | | | | | | | | | | | | <> kskpthr(0x7a6032e8, 0, ...)
| | | | | | | | | | | | | | | | | | | <> kslpsprns(0x7a6032e8, 0, ...)
| | | | | | | | | | | | | | | | | | | | > ksl_update_post_stats(0x7a6032e8, 0, ...)
| | | | | | | | | | | | | | | | | | | | | > dbgtTrcData_int(0x7f5b0ca886c0, 0x2050031, ...)
| | | | | | | | | | | | | | | | | | | | | | > dbgtBucketRedirect(0x7f5b0ca886c0, 0x7fffbea0d618, ...)
| | | | | | | | | | | | | | | | | | | | | | < dbgtBucketRedirect+0x000000000050 returns: 0x1
| | | | | | | | | | | | | | | | | | | | | | > dbgtIncInMemTrcRedirect(0x7f5b0ca886c0, 0x6fa, ...)
| | | | | | | | | | | | | | | | | | | | | | < dbgtIncInMemTrcRedirect+0x000000000035 returns: 0x1
| | | | | | | | | | | | | | | | | | | | | | > skgstmGetEpochTs(0x7f5b0ca886c0, 0x6fa, ...)
| | | | | | | | | | | | | | | | | | | | | | | > gettimeofday@plt(0x7fffbea0d120, 0, ...)
| | | | | | | | | | | | | | | | | | | | | | | < __vdso_gettimeofday+0x0000000000fe returns: 0
| | | | | | | | | | | | | | | | | | | | | | < skgstmGetEpochTs+0x000000000049 returns: 0x203895d2b55e6
| | | | | | | | | | | | | | | | | | | | | | > dbgtrRecAllocate(0x7f5b0ca886c0, 0x7fffbea0d5c0, ...)
| | | | | | | | | | | | | | | | | | | | | | | > dbgtrPrepareWrite(0x7f5b0ca886c0, 0x711f5048, ...)
| | | | | | | | | | | | | | | | | | | | | | | < dbgtrPrepareWrite+0x00000000011c returns: 0x1c
| | | | | | | | | | | | | | | | | | | | | | < dbgtrRecAllocate+0x000000000144 returns: 0x1
| | | | | | | | | | | | | | | | | | | | | | > _intel_fast_memcpy(0x711f6ac0, 0x7fffbea0d6b8, ...)
| | | | | | | | | | | | | | | | | | | | | | <> _intel_fast_memcpy.P(0x711f6ac0, 0x7fffbea0d6b8, ...)
| | | | | | | | | | | | | | | | | | | | | | <> __intel_ssse3_rep_memcpy(0x711f6ac0, 0x7fffbea0d6b8, ...)
| | | | | | | | | | | | | | | | | | | | | | < __intel_ssse3_rep_memcpy+0x000000002798 returns: 0x711f6ac0
| | | | | | | | | | | | | | | | | | | | | | > dbgtrRecEndSegment(0x7f5b0ca886c0, 0x7fffbea0d5c0, ...)
| | | | | | | | | | | | | | | | | | | | | | < dbgtrRecEndSegment+0x00000000011c returns: 0x77c000a4
| | | | | | | | | | | | | | | | | | | | | < dbgtTrcData_int+0x000000000323 returns: 0x77c000a4
| | | | | | | | | | | | | | | | | | | | < ksl_update_post_stats+0x00000000024f returns: 0x77c000a4
| | | | | | | | | | | | | | | | | | | | > skgpwpost(0x7fffbea0d760, 0x7f5b0cacdca0, ...)
| | | | | | | | | | | | | | | | | | | | <> sskgpwpost(0x7fffbea0d760, 0x7f5b0cacdca0, ...)
| | | | | | | | | | | | | | | | | | | | | > semop@plt(0x38000, 0x7fffbea0d6f0, ...)
| | | | | | | | | | | | | | | | | | | | | < semop+0x00000000000f returns: 0
| | | | | | | | | | | | | | | | | | | | < sskgpwpost+0x00000000009a returns: 0x1
| | | | | | | | | | | | | | | | | | | < kslpsprns+0x0000000001c3 returns: 0
| | | | | | | | | | | | | | | | | | < ksasnd+0x00000000006d returns: 0x1
| | | | | | | | | | | | | | | | | < ksbasend+0x0000000000f6 returns: 0x1
| | | | | | | | | | | | | | | | < kcrf_commit_force_int+0x000000000500 returns: 0

Basically, what happens is that kcscu8 is called to read 0x60016290 (kcrfsg_.on disk SCN) and kcscu8 is called to read 0x60016260 (kcrfsg_.LWN SCN). Then ksbasend (kernel service background processes asynchronous send message) is called, which essentially performs two functions in ksasnd: write a message in the messages struct (protected by the messages latch) in ksaamb (kernel service asynchronousm messages allocated message buffer) and then post the logwriter using post/wait (ksl_post_proc, actual posting using semop is done in skgpwpost; system kernel generic post wait post).
This also directly ends/returns from the kcrfw_redo_gen_ext function that called kcrf_commit_force_int.

At this point the commit was executed, and the logwriter is notified that it needs to write up to the commit record from the public redo strand.

The next thing the process do is release things like the enqueue(s) it’s holding. After that, the process will wait for the logwriter to execute the write and get posted via post/wait. This is also done in the kcrf_commit_force_int function, however now the second argument is set to ‘1’:

 | | | <> kcrf_commit_force_int(0x7f3011d75e10, 0x1, ...)
 | | | | > skgstmGetEpochTs(0x7f3011d75e10, 0x7, ...)
 | | | | | > gettimeofday@plt(0x7ffda5959810, 0, ...)
 | | | | | < __vdso_gettimeofday+0x0000000000fe returns: 0
 | | | | < skgstmGetEpochTs+0x000000000049 returns: 0x206daa374e6a1
 | | | | > kcscu8(0x60016290, 0x7ffda5959b88, ...)
 | | | | < kcscu8+0x000000000047 returns: 0x1
 | | | | > kcscu8(0x60016260, 0x7ffda5959ad0, ...)
 | | | | < kcscu8+0x000000000047 returns: 0x1
 | | | | > ksbasend(0x600161a0, 0x60016908, ...)
 | | | | | > ksbpro(0x600161a0, 0x60016908, ...)
 | | | | | < ksbpro+0x000000000063 returns: 0x7a6032e8
 | | | | | > ksbpser(0x600161a0, 0x60016908, ...)
 | | | | | < ksbpser+0x000000000046 returns: 0x1
 | | | | | > ksberr(0x600161a0, 0, ...)
 | | | | | < ksberr+0x000000000045 returns: 0
 | | | | | > ksasnd(0x7a6032e8, 0x1, ...)
 | | | | | | > ksasndint(0x7a6032e8, 0x1, ...)
 | | | | | | | > ksaamb(0x7a6085b8, 0x7a5d9fe0, ...)
 | | | | | | | < ksaamb+0x000000000504 returns: 0
 | | | | | | < ksasndint+0x000000000120 returns: 0x1
 | | | | | < ksasnd+0x00000000006d returns: 0x1

Actually exactly the same is happening here as we saw previously. The way the kcrf_commit_force_int function works is:
– It first reads the on disk SCN (0x60016290) via kcscu8. If the on disk SCN is equal or past the process’ commit SCN, it means the process’ redo is written to disk, and the kcrf_commit_force_int function is quit.
– If the on disk SCN is not progressed far enough, the LWN SCN (0x60016260) is read via kcscu8. If the LWN SCN progressed beyond or is equal to the commit SCN, it means the logwriter is in the progress of writing the process’ redo, so it does not need posting via ksbasend. If the logwriter has not progressed the LWN SCN that far, the process tries posting via ksbasend again.
– In the above function call trace we see ksaamb returning 0, which means it did try to post the logwriter again, but it found the logwriter already posted (by looking at kcrfsg_+516). As a result, it returned directly from the function without creating a message (and thus needing to fetch the messages latch and post the logwriter via semop).

At this point in kcrf_commit_force_int everything is done to make the logwriter write the process’ redo in the public redo strand, so it needs to wait for the logwriter to finish, and the instance is using post wait the notification mechanism for log file sync.

The next thing that happens is a post-wait entry in the post wait queue is setup, and the on disk SCN is checked once more:

 | | | | > kslawe(0x10380, 0x7a5d9fe0, ...)
 | | | | | > kslgetl(0x7a5d72c8, 0, ...)
 | | | | | < kslgetl+0x00000000012f returns: 0x1
 | | | | | > kslfre(0x7a5d72c8, 0x7a5d7368, ...)
 | | | | | < kslfre+0x0000000001e2 returns: 0
 | | | | < kslawe+0x00000000027d returns: 0
 | | | | > kcscu8(0x60016290, 0x7ffda5959b88, ...)
 | | | | < kcscu8+0x000000000047 returns: 0x1

Now that the process registered itself in the post-wait queue, a wait is registered via kslwtbctx:

 | | | | > kslwtbctx(0x7ffda5959950, 0x7ffda5959b88, ...)
 | | | | | > sltrgftime64(0x7a6d9580, 0x7ae18b68, ...)
 | | | | | | > clock_gettime@plt(0x1, 0x7ffda5959700, ...)
 | | | | | | | > clock_gettime(0x1, 0x7ffda5959700, ...)
 | | | | | | | < clock_gettime+0x000000000069 returns: 0
 | | | | | | < clock_gettime+0x00000000003a returns: 0
 | | | | | < sltrgftime64+0x00000000004c returns: 0x428fc8dac9
 | | | | | > kslwait_timeout_centi_to_micro(0x7fffffff, 0x443dfe7, ...)
 | | | | | < kslwait_timeout_centi_to_micro+0x000000000021 returns: 0x7fffffffffffffff
 | | | | | > kskthbwt(0, 0x428fc8dac9, ...)
 | | | | | < kskthbwt+0x0000000000a1 returns: 0x10
 | | | | | > kslwt_start_snapshot(0x7a6da7f0, 0x7a6da7f0, ...)
 | | | | | < kslwt_start_snapshot+0x0000000000dc returns: 0x7a6d9db8
 | | | | < kslwtbctx+0x00000000050b returns: 0x7a6d9db8

Now that the process is in the wait event, the process needs to truly ‘wait’. In post-wait mode this is done on a semaphore, just like the logwriter does when sleeping in the wait event ‘rdbms ipc message’. This is how that looks like:

 | | | | > kslwaitctx(0x7ffda5959950, 0xa, ...)
 | | | | | > ksliwat(0x7a6da7f0, 0x7a6da7f0, ...)
 | | | | | | > sltrgftime64(0x7a6da7f0, 0x7a6da7f0, ...)
 | | | | | | | > clock_gettime@plt(0x1, 0x7ffda5959280, ...)
 | | | | | | | | > clock_gettime(0x1, 0x7ffda5959280, ...)
 | | | | | | | | < clock_gettime+0x000000000069 returns: 0
 | | | | | | | < clock_gettime+0x00000000003a returns: 0
 | | | | | | < sltrgftime64+0x00000000004c returns: 0x428fc8e664
 | | | | | | > kslawe(0x10380, 0x459ffef, ...)
 | | | | | | < kslawe+0x000000000076 returns: 0x10380
 | | | | | | > kskiorm(0x7ae0ee70, 0, ...)
 | | | | | | < kskiorm+0x00000000001e returns: 0
 | | | | | | > kfias_iswtgon_ksfd(0x7ae0ee70, 0, ...)
 | | | | | | < kfias_iswtgon_ksfd+0x00000000002b returns: 0
 | | | | | | > kxdbio_has_work(0x7ffda59592a4, 0x6003d010, ...)
 | | | | | | < kxdbio_has_work+0x000000000027 returns: 0
 | | | | | | > skgpwwait(0x7ffda5959510, 0x7f3011d45ca0, ...)
 | | | | | | | > kslwait_conv_wait_time(0x186a0, 0x7f3011d45ca0, ...)
 | | | | | | | < kslwait_conv_wait_time+0x000000000027 returns: 0x186a0
 | | | | | | | > sskgpwwait(0x7ffda5959510, 0x7f3011d45ca0, ...)
 | | | | | | | | > semtimedop(0x58000, 0x7ffda5959208, ...)
 | | | | | | | | < semtimedop+0x000000000023 returns: 0xffffffffffffffff
 | | | | | | | | > __errno_location(0x58000, 0x7ffda5959208, ...)
 | | | | | | | | | > fthread_self(0x58000, 0x7ffda5959208, ...)
 | | | | | | | | | < fthread_self+0x000000000024 returns: 0
 | | | | | | | | < __errno_location+0x000000000010 returns: 0x7f3011d406a0
 | | | | | | | < sskgpwwait+0x00000000014e returns: 0
 | | | | | | < skgpwwait+0x0000000000e0 returns: 0
 | | | | | | > ksuSdiInProgress(0x10003, 0x10003, ...)
 | | | | | | < ksuSdiInProgress+0x000000000035 returns: 0
 | | | | | | > sltrgftime64(0x10003, 0x10003, ...)
 | | | | | | | > clock_gettime@plt(0x1, 0x7ffda5959280, ...)
 | | | | | | | | > clock_gettime(0x1, 0x7ffda5959280, ...)
 | | | | | | | | < clock_gettime+0x000000000069 returns: 0
 | | | | | | | < clock_gettime+0x00000000003a returns: 0
 | | | | | | < sltrgftime64+0x00000000004c returns: 0x428fcad86e
 | | | | | | > kslwo_getcbk(0x9d, 0x80f6298, ...)
 | | | | | | < kslwo_getcbk+0x000000000017 returns: 0
 | | | | | | > kgslwait_last_waitctx_time_waited_usecs(0x7f3011d409a0, 0x7a6da7f0, ...)
 | | | | | | < kgslwait_last_waitctx_time_waited_usecs+0x000000000045 returns: 0x1f20a
 | | | | | | > kslwt_update_stats_int(0x7a6da7f0, 0x7a6da7f0, ...)
 | | | | | | | > kews_update_wait_time(0x5, 0x1fda5, ...)
 | | | | | | | < kews_update_wait_time+0x00000000002e returns: 0x6
 | | | | | | < kslwt_update_stats_int+0x00000000033e returns: 0x428fcad86e
 | | | | | | > ksliwat_cleanup(0x7ffda5959650, 0x7ffda5959628, ...)
 | | | | | | | > kslgetl(0x7a5d72c8, 0x1, ...)
 | | | | | | | < kslgetl+0x00000000012f returns: 0x1
 | | | | | | | > kslfre(0x7a5d72c8, 0x7a5d7378, ...)
 | | | | | | | < kslfre+0x0000000001e2 returns: 0
 | | | | | | < ksliwat_cleanup+0x0000000000e5 returns: 0
 | | | | | < ksliwat+0x000000003acf returns: 0
 | | | | < kslwaitctx+0x00000000012b returns: 0

The most prominent thing is skgpwwait (system kernel generic post-wait wait), which calls sskgpwwait (skgpwwait with an additional s in front, which often are os dependent calls, like in this call, calling semtimedop). We see semtimedop returning 0xffffffffffffffff (-1), which means semaphore wait timed out, which means the semaphore was not posted using semop before the timeout set (100ms for log file sync) for the semaphore wait.

Interestingly, a little further down se see ksliwat_cleanup. This function clears this process from the post-wait queue, not only indicated by the ‘cleanup’ part in ksliwat_cleanup, but also by the latch get of 0x7a5d72c8, this is the ‘post/wait queue’ latch. This means that when the kslwaitctx function is left, this process is not in the post/wait queue anymore. However, the process is still waiting in the wait event ‘log file sync’.

The next thing that happens is a call to ‘ksu_dispatch_tac’ (kernel service user dispatch timed-out action):

 | | | | > ksu_dispatch_tac(0x7a5d72c8, 0, ...)
 ...
 | | | | < ksu_dispatch_tac+0x0000000003b4 returns: 0

Essentially, this routine validates certain conditions to trigger actions because the wait in kslwaitctx has timed out.

At this point the process repeats the steps of:
– kslawe: registering itself in the post-wait queue
– kcscu8: check on disk SCN
– kslwaitctx, semtimedop: wait to be posted
– (if not posted) ksliwat_cleanup: remove post-wait entry
– (if not posted) kcscu8: check on disk SCN
(the double check of on disk SCN via kcscu8 is not a typo!)

If the semaphore gets posted (visible by return code 0):

 | | | | | | | | > semtimedop(0x58000, 0x7ffda5959208, ...)
 | | | | | | | | < semtimedop+0x000000000012 returns: 0

The post-wait cleanup function (ksliwat_cleanup) does not require a latch when the process has been posted, probably because now that the entry has been used, there is no chance it will be used again, so it’s safe to cleanup without serialising access via a latch. After the ksliwat_cleanup returns, the process also returns from ksliwat and kslwaitctx.

Next kcscu8 is called to read the on disk SCN again, and the wait event is ended with kslwtectx, some statistics about redo writing are written (kcrfw_stats_syncoverhead_record), and the function kcrf_commit_force_int is ended, because when the semaphore is posted, the wait for the logwriter to finish is done, meaning the commit is entirely finished:

 | | | | > kcscu8(0x60016290, 0x7ffda5959b88, ...)
 | | | | < kcscu8+0x000000000047 returns: 0x1
 | | | | > kslwtectx(0x7ffda5959950, 0x7ffda5959b88, ...)
 | | | | | > kslwt_end_snapshot(0x7a6da7f0, 0x7a6da7f0, ...)
 | | | | | | > kslwh_enter_waithist_int(0x7a6da7f0, 0x7a6da7f0, ...)
 | | | | | | < kslwh_enter_waithist_int+0x00000000013a returns: 0x7a8378a8
 | | | | | | > kslwtrk_enter_wait_int(0x7a6da7f0, 0x7a6da7f0, ...)
 | | | | | | < kslwtrk_enter_wait_int+0x000000000019 returns: 0x7a8378a8
 | | | | | < kslwt_end_snapshot+0x0000000000f2 returns: 0x7a6d9580
 | | | | | > kslwt_update_stats_int(0x7a6da7f0, 0x7a6da7f0, ...)
 | | | | | | > kews_update_wait_time(0x5, 0, ...)
 | | | | | | < kews_update_wait_time+0x00000000002e returns: 0x6
 | | | | | < kslwt_update_stats_int+0x00000000033e returns: 0
 | | | | | > kskthewt(0x4290257c02, 0x9d, ...)
 | | | | | < kskthewt+0x0000000000c3 returns: 0xa030
 | | | | < kslwtectx+0x0000000003f9 returns: 0xe923
 | | | | > skgstmGetEpochTs(0x4290257c02, 0x9d, ...)
 | | | | | > gettimeofday@plt(0x7ffda5959810, 0, ...)
 | | | | | < __vdso_gettimeofday+0x0000000000fe returns: 0
 | | | | < skgstmGetEpochTs+0x000000000049 returns: 0x206daa3d1ca1b
 | | | | > kcrfw_stats_syncoverhead_record(0x7f3011d75e10, 0x206daa3d1ca1b, ...)
 | | | | | > kcrfw_stats_writeinfo_find(0x7f3011d75e10, 0x206daa3d1ca1b, ...)
 | | | | | < kcrfw_stats_writeinfo_find+0x0000000000fb returns: 0x206daa3d1a057
 | | | | < kcrfw_stats_syncoverhead_record+0x0000000000a4 returns: 0x29c4
 | | | < kcrf_commit_force_int+0x000000000b9c returns: 0x1

What is clear, is that the process checks the on disk SCN a lot of times. The obvious reason for that would be to stop waiting whenver the on disk SCN indicates the process’ redo information is written. However, in the past I’ve heard about issues with sessions that were in post-wait mode waiting, which were ‘forgotten’ when the database switched to polling (this was Oracle 11.2.0.3, this issue is solved!). That however leads to an interesting question:

1. What happens when a process is posted by the logwriter, but the on disk SCN is NOT progressed beyond the process’ commit SCN?
2. What happens when a process is NOT posted by the logwriter, but the on disk SCN has progressed beyond the process’ commit SCN?

Let’s test that! Let’s test the first situation: a foreground posted, but the on disk SCN did not progress beyond the commit SCN of that process.

First we need to know the commit SCN for the inserting session. We know that the SCN is obtained from kcsgscn_, let’s get the address of it:

SQL> select ksmfsnam, ksmfsadr, ksmfssiz from x$ksmfsv where ksmfsnam = 'kcsgscn_';
KSMFSNAM                       KSMFSADR           KSMFSSIZ
------------------------------ ---------------- ----------
kcsgscn_                       00000000600113B8         48

Now attach to a foreground session with gdb, and put an read/write (acces) watch on 0x600113b8, set pagination off and let gdb continue:

(gdb) awatch *0x600113b8
Hardware access (read/write) watchpoint 1: *0x600113b8
(gdb) commands
Type commands for breakpoint(s) 1, one per line.
End with a line saying just "end".
>c
>end
(gdb) set pagi off
(gdb) c
Continuing.

Now insert something in the foreground session (anything is okay, as long as it’s a regular insert):

Hardware access (read/write) watchpoint 1: *0x600113b8

Old value = 374652
New value = 374701
0x000000000320e5a8 in kcsAdjustUsingForeignSCN ()
Hardware access (read/write) watchpoint 1: *0x600113b8

Value = 374701
0x0000000010e3f12a in kcsgrsn ()
Hardware access (read/write) watchpoint 1: *0x600113b8

Value = 374701
0x0000000010e3ee68 in kcsgcsn ()
Hardware access (read/write) watchpoint 1: *0x600113b8

Value = 374701
0x0000000010e3f12a in kcsgrsn ()
Hardware access (read/write) watchpoint 1: *0x600113b8

Value = 374701
0x0000000010e3f03b in kcsgrsn8 ()
Hardware access (read/write) watchpoint 1: *0x600113b8

Value = 374701
0x0000000010e3f12a in kcsgrsn ()
Hardware access (read/write) watchpoint 1: *0x600113b8

Value = 374701
0x0000000010e3f12a in kcsgrsn ()
Hardware access (read/write) watchpoint 1: *0x600113b8

Value = 374701
0x0000000010e3f03b in kcsgrsn8 ()
Hardware access (read/write) watchpoint 1: *0x600113b8

Value = 374701
0x0000000010e3f12a in kcsgrsn ()
Hardware access (read/write) watchpoint 1: *0x600113b8

Value = 374701
0x0000000010e3f2d5 in kcsgssn ()
Hardware access (read/write) watchpoint 1: *0x600113b8

Value = 374701
0x0000000010e3e93b in kcscu8 ()
Hardware access (read/write) watchpoint 1: *0x600113b8

Value = 374701
0x0000000010e3e93b in kcscu8 ()
Hardware access (read/write) watchpoint 1: *0x600113b8

Value = 374701
0x0000000010e3f12a in kcsgrsn ()
Hardware access (read/write) watchpoint 1: *0x600113b8

Value = 374701
0x0000000010e3ee68 in kcsgcsn ()

Pay attention to the output sequence of a watchpoint:
– ‘Hardware access …’ and watchpoint number and address.
– old and new value or read value.
– instruction pointer address and function.
Also look at the different functions with which an SCN is queried.

Now execute the commit:

Hardware access (read/write) watchpoint 1: *0x600113b8

Old value = 374701
New value = 374767
0x000000000320e5a8 in kcsAdjustUsingForeignSCN ()
Hardware access (read/write) watchpoint 1: *0x600113b8

Value = 374767
0x0000000010e3ecc1 in kcsnew8 ()
Hardware access (read/write) watchpoint 1: *0x600113b8

Old value = 374767
New value = 374768
0x0000000010e3ecdb in kcsnew8 ()
Hardware access (read/write) watchpoint 1: *0x600113b8

Old value = 374768
New value = 374769
0x0000000010e3f12a in kcsgrsn ()
Hardware access (read/write) watchpoint 1: *0x600113b8

Value = 374769
0x0000000010e3ee68 in kcsgcsn ()
Hardware access (read/write) watchpoint 1: *0x600113b8

Value = 374769
0x0000000010e3ee68 in kcsgcsn ()

If you take a close look, you see a new SCN for the commit is obtained in the aptly named function kcsnew8 at instruction pointer address 0x10e3ecdb. Now that we know the instruction pointer address, we can filter the watch:
Disable the previous watch, and create a second access watch point:

(gdb) dis 1
(gdb) awatch *0x600113b8 if ( $rip == 0x10e3ecdb )
Hardware access (read/write) watchpoint 2: *0x600113b8
(gdb) commands
Type commands for breakpoint(s) 2, one per line.
End with a line saying just "end".
>c
>end
(gdb) c
Continuing.

Now issue another insert into and a commit:

Hardware access (read/write) watchpoint 2: *0x600113b8

Old value = 374908
New value = 374909
0x0000000010e3ecdb in kcsnew8 ()

Voila! commit SCN is 374909 in this case!

Because we know kcscu8 is called to read the on disk SCN from kcrfsg_, we can cheat and modify the on disk SCN in kcrfsg_ ourselves to make this lower than the commit SCN.

=> WARNING! Never EVER do this, or any of these techniques in a database that you care about! This can severely corrupt the database leading to an irrepairable situation! WARNING <=

First modify the watchpoint (2 in the above example) to stop when it is fired:

(gdb) commands 2
Type commands for breakpoint(s) 2, one per line.
End with a line saying just "end".
>end
(gdb) c
Continuing.

Now perform the insert and commit again. This will hang because the watchpoint will break execution:

Hardware access (read/write) watchpoint 2: *0x600113b8

Old value = 620035
New value = 620036
0x0000000010e3ecdb in kcsnew8 ()

Okay, so the commit SCN is 620036. Now break on kcscu8 and simply set the SCN to 620000, which undoubtly is lower than the commit SCN of 620036:

(gdb) break kcscu8
Breakpoint 3 at 0x10e3e910
(gdb) commands
Type commands for breakpoint(s) 3, one per line.
End with a line saying just "end".
>set {int}0x60016290=620000
>c
>end

Also set an access watch on 0x60016290, alias the on disk SCN, to validate what SCN the process reads:

(gdb) awatch *0x60016290
Hardware watchpoint 4: *0x60016290
(gdb) commands
Type commands for breakpoint(s) 4, one per line.
End with a line saying just "end".
>c
>end
(gdb)

Now attach with another gdb session to the logwriter, and keep the logwriter stopped after attaching.
After the logwriter is stopped, continue in the foreground session:

(gdb) c
Continuing.

What I regularly encounter, especially with new(er) versions of the Oracle database, is the following messages:

Program received signal SIGUSR2, User defined signal 2.
0x0000000010e3ecdb in kcsnew8 ()
(gdb) c
Continuing.

Program received signal SIGSEGV, Segmentation fault.
0x0000000010f973a5 in slaac_int ()
(gdb) c
Continuing.

It seems that a signal handler is triggered. In most cases just continuing will make the process running again.

After the signal handler, the following is shown in gdb:

Breakpoint 3, 0x0000000010e3e910 in kcscu8 ()
Hardware access (read/write) watchpoint 4: *0x60016290

Value = 620000
0x0000000010e3e93b in kcscu8 ()

Breakpoint 3, 0x0000000010e3e910 in kcscu8 ()
Hardware access (read/write) watchpoint 4: *0x60016290

Value = 620000
0x0000000010e3e93b in kcscu8 ()

Breakpoint 3, 0x0000000010e3e910 in kcscu8 ()
Hardware access (read/write) watchpoint 4: *0x60016290

Value = 620000
0x0000000010e3e93b in kcscu8 ()

Breakpoint 3, 0x0000000010e3e910 in kcscu8 ()
---Type <return> to continue, or q <return> to quit---

Quit using q and enter, and set pagination off: set pagi off, then continue.

Now let the logwriter continue in its gdb session (continue).

In the foreground debugger session, you'll see that the foreground did not do anything else than it already did:

Hardware access (read/write) watchpoint 4: *0x60016290

Value = 620000
0x0000000010e3e93b in kcscu8 ()

Breakpoint 3, 0x0000000010e3e910 in kcscu8 ()

Stop execution of the committing process, and disable breakpoint 3 (dis 3), which sets the on disk SCN to a value lower than the process' commit SCN, so it will pick up the correct current on disk SCN, which is beyond the process' commit SCN:

(gdb) c
Continuing.
Hardware access (read/write) watchpoint 4: *0x60016290

Old value = 620000
New value = 620865
0x0000000010e3e93b in kcscu8 ()

And the commit finishes! This proves that despite the post/wait posting, the session actually checks the on disk SCN using kcscu8 additional to the semaphore being posted, and only continues if the on disk SCN has progressed up to the commit SCN or further!

We can use these breakpoints to see if we can do the exact opposite: make sure the foreground is not posted, and then progress the on disk SCN beyond the commit SCN. At this time you should have two gdb sessions, one on a foreground process, and one on the logwriter. Both gdb sessions have the process they are attached to running.

Now insert and commit again. Because of the watchpoint on kcsgscn_ with the instruction pointer set to kcsnew8 at breakpoint 2, this will break/stop execution:

Hardware access (read/write) watchpoint 2: *0x600113b8

Old value = 621375
New value = 621376
0x0000000010e3ecdb in kcsnew8 ()

Break execution of the logwriter, so the foreground cannot be posted anymore (simply ctrl-C the gdb session attached to the log writer process).
Now change breakpoint 3 in the gdb session attached to the foreground to set the on disk SCN to 621400, which is higher than the commit SCN of 621376:

(gdb) commands 3
Type commands for breakpoint(s) 3, one per line.
End with a line saying just "end".
>set {int}0x60016290=621400
>c
>end

And enable breakpoint 3 again:

(gdb) enable 3

And then continue the foreground in gdb.

...
Hardware access (read/write) watchpoint 4: *0x60016290

Value = 621400
0x0000000010e3e93b in kcscu8 ()

Breakpoint 3, 0x0000000010e3e910 in kcscu8 ()
...

Repeatedly.

This shows watchpoint 4 begin triggered because of access to 0x60016290 (on disk SCN), and breakpoint 3 setting the on disk SCN to 621400.
Now continue the logwriter. This will finish the commit of the foreground session.

So conclusion here is that despite the on disk SCN having progressed beyond the foreground session's commit SCN value (commit SCN was 621376, on disk SCN was set to 621400), a commit will wait until it is explicitly posted via post-wait.

	Data Virtualizer… on Getting to know Oracle wait ev…
	Frits Hoogland on Oracle 12.2 wait event ‘…
	Andrew Markiewicz on Oracle 12.2 wait event ‘…
	YugabyteDB Off Campu… on My reasons for moving to Yugab…
	Harold Zubieta on Control file inconsistencies d…

—Frits Hoogland Weblog

IT Technology; Yugabyte, Postgres, Oracle, linux, TCP/IP and other stuff I find interesting

A look into Oracle redo, part 6: oracle post-wait commit and the on disk SCN

Leave a comment Cancel reply

Share this:

Leave a comment Cancel reply