A look into oracle redo, part 9: commit
The previous blogpost talked about a simple insert, this blogpost investigates what happens when the DML is committed. Of course this is done with regular commit settings, which means means they are not touched, which means commit_logging is set to immediate and commit_wait is set to wait as far as I know. The documentation says there is no default value, and the settings are empty in all parameter views. In my humble opinion, if you must change the commit settings in order to make your application perform usable with the database, something is severely wrong somewhere.
This blogpost works best if you thoroughly gone through the previous post. I admit it’s a bit dry and theoretical, but you will appreciate the knowledge which you gained there, because it directly applies to a commit.
First let’s look at the flow of functions for the commit:
kpoal8 opiexe kksExecuteCommand xctCommitTxn xctctl k2send k2lcom ktdcmt ktcCommitTxn ktcCommitTxn_new qesrcCM_Enabled qesrcCM_PreCmt_ qesrcTol_New ktuIMTabPresetTxn ktucmt kcbchg1 kcbchg1_main ktiimu_chg kcopcv kcrfw_redo_gen_ext kcrfw_copy_cv kcscur_rba kcbapl kcbhlchk kcoapl kco_issue_callback kturcm kco_blkchk kcoadv_hdr kcbhfix_tail kcrfw_partial_checksum kcrf_commit_force_int kcscu8 kcscu8 ksbasend ktuulc ksqrcl ksqrcli_int ktudnx kssdct kssdch_int kssdel kss_del_cb ktldbl ktldbl_noredo kcbnlc ktbdbc kssdch_int kssdel kss_del_cb ktaidm ksqrcl ksupop kcrf_commit_force_int kcscu8 kcscu8 ksbasend kslawe kcscu8 kslwtbctx kslwaitctx - kcscu8 kslawe kcscu8 kslwaitctx - kcscu8 kslawe kcscu8 kslwaitctx - kcscu8 kslwtectx
Please mind this is an overview of functions which is not complete, it provides enough information to show the flow of functions I want to highlight. There are much more functions involved during the execution of commit.
The first thing that is visible here is that after the kpoal8/opiexe/kksExecuteCommand (kernel compile shared objects Execute Command) function are xct (transaction control) functions. Of course this is logical, a commit ends a transaction and makes changes visible. The xct layer then moves into the k2 layer, which is the distributed execution layer. I am not doing anything distributed, it is my current understanding that this layer is invoked this way so that if anything distributed was pending, it would be handled appropriately. After the k2 layer the function ktdcmt (kernel transaction distributed commit) is executed.
After the distributed layers we enter the ktc layer (kernel transaction control). In ktcCommitTXN_new I see handling of features like the result cache (qesrc) and In-Memory (ktuIM), then the ktu layer (kernel transaction undo) is entered, which enters the kcb layer (kernel cache buffers) using functions we saw in the previous post: kcbchg1 and kcbchg1_main.
In fact, at this point it looks very similar to the insert, in kcbchg1_main, ktiimu_chg and kcopcv (prepare change vectors) are called, but only once (because a commit only involves one block, see a little further for the explanation) instead of three times as we saw with the insert. Then kcrfw_redo_gen_ext is called, which is doing almost the same as the insert: first kcrfw_copy_cv is executed to copy the change vector to the public redo strand, then kcbapl is called to apply the change to a buffer. The kco_issue_callback function calls kturcm (kernel transaction undo redo commit) indicating the type change to the buffer. This means that a commit changes a single buffer, which is the buffer that holds the transaction in the transaction table in the undo segment, and the change is marking the transaction as committed. So a ‘commit marker’ is not a special token that is written into the redo stream, but in fact it’s simply a block change, just like all other change vectors.
After kcbapl, kcrfw_partial_checksum is called to checksum the redo in the public redo strand, again just like we saw with the insert.
Unique to a commit is the invocation of the kcrf_commit_force_int function. This is the first ‘use’ of the kcrf_commit_force_int function (indicated by the second function argument set to zero, not visible in the overview), which is signalling the logwriter to write any unwritten change vectors in the public redo strands. kcrf_commit_force_int checks the on disk SCN and the LWN SCN using kcscu8 (kernel cache scn management read current SCN) in the kcrfsg_ struct to check logwriter progress:
– If the on disk SCN is beyond the process’ commit SCN, the change vectors are written, and no further action is necessary (this function is quit), which also means a second invocation of kcrf_commit_force_int is not necessary.
– If the on disk SCN isn’t progressed beyond the process’ commit SCN, but the LWN SCN is, it means the logwriter is currently writing the change vectors for this commit SCN. In that case there is no need to signal the logwriter, but it requires the process to validate the write later using the second invocation of kcrf_commit_force_int.
– If both the on disk SCN and LWN SCN did not progress beyond or are equal to the process’ commit SCN, this invocation of kcrf_commit_force_int needs to send the logwriter a message using ksbasend (kernel service background processes asynchronous send message) to start writing the public redo strands. ksbasend will only send a message if the messages flag in the kcrfsg_ struct is not set indicating it has already been signalled.
After which the kcrf_commit_force_int function is returned from, as well as the kcrfw_redo_gen_ext function, so we are back in kcbchg1_main.
Also different from an insert is the invocation of the ktuulc (kernel transaction undo unlock; this is a guess) function. Which calls ksqrcl (kernel service enqueue release an enqueue), which calls ksqrcli_int (my guess this (=the addition of _int) is an enhanced version of the enqueue release function), which performs the clearance of the TX enqueue set for the inserted row. This clearance is not a guess, ksqrcli_int does clear the TX enqueue for the inserted row. After clearing the row lock, some more functions returned from: kcbchg1_main and kcbchg1, so we are back in ktucmt.
Because the transaction is now committed, the active transaction count in the undo segment can be decreased, which is done in ktudnx (kernel transaction undo decrease active transaction count). Then the ktucmt function is returned from too, and we are back in ktcCommitTxn_new.
In ktcCommitTxn_new state objects are cleaned up using kssdct (kernel service state object delete children of specified type). A state object is a memory area that keeps the state of various things that are transient, so if they get lost, the state object reflects the last known state. The callback action of the function performs some more housekeeping, the ktldbl (kernel transaction list blocks changed delete list of block SO’s) function removes block SO’s/buffer handles, which calls kcbnlc (kernel cache buffers analyse cleanout), which calls ktbdbc (kernel transaction block fast block cleanout) to perform delayed block cleanout/commit cleanout. This cleans up the state in the data block, which means it cleans up the lock bytes, set Fsc/Scn to the commit SCN and set the commit flag to C— in the ITL in the block.
The next state object that is cleaned is the shared table lock (TM); by looking at the functions it’s quite easy to understand that this is happening, ksqrcl is the function to release and enqueue, and ktaidm is kernel transaction access internal deallocate dml lock function.
Past releasing the TM enqueue, there are other things done for which I didn’t put their function names in, but the execution is returning from a lot of the other functions shown as calling functions. Of course Oracle needs to update all kind of counters and usage statistics, and record audit information. But eventually, everything has been released. However, there is something more that is executed as part of a commit. This is the second invocation of the kcrf_commit_force_int function.
Actually, when kcrf_commit_force_int is executed for the second ‘use’, this is visible with the second argument of the calling arguments is set to ‘1’ (not visible in the function call overview above). The functions that are executed in kcrf_commit_force_int are actually exactly the same as the first invocation:
– kcscu8 is called to read the on disk SCN from kcrfsg_
– kcscu8 is called to read the LWN SCN from kcrfsg_
Also the same logic is applied to the values that are the result of calling kcscu8 to read the SCN values as stated previously. If these SCNs did not progress far enough, ksbasend is called.
The interesting thing of the second execution of kcrf_commit_force_int happens after ksbasend: the kcrf_commit_force_int function loops until the on disk SCN has progressed beyond the process’ commit SCN (which means the change vectors are written from the public redo strands into the online redologfiles). To indicate it’s waiting/looping for the on disk SCN to progress that far, the wait interface is called (kslwtbctx) for the wait ‘log file sync’, after which it loops, for which I put a hyphen before the start of the loop to indicate what to loop consists of.
I illustrated the post/wait mode of log file sync, which is visible with ‘kslawe’ (kernel service lock management add post-wait entry). The post-wait entry is removed inside kslwaitctx, and then setup again. Interestingly, when in post-wait mode, the process must be posted by the logwriter, even if it finds the on disk SCN to have progressed beyond the process’ commit SCN. The other mode for waiting for the on disk SCN is called ‘polling’, search my blog for articles about it if this sparked your interest.
The intention of this blogpost is not to bury you in Oracle internal functions, despite the look of the article and the amount of functions mentioned :-). The aim for spelling out the functions is to show what happens, and to learn about the layers in which they execute.
If you skip past the first couple of functions that are executed with a commit, the ktc (kernel transaction control) layer is crossed, then the ktu (kernel transaction undo) layer, after which the change is executed under supervision of the kcb (kernel cache buffer) layer.
In fact, the rough outline of the change is the same as described in the previous article about insert: kcbchg1, kcbchg1_main, kcopcv, kcrfw_redo_gen_ext, etc. Just like with the insert, the function called in the function kco_issue_callback sets the type of block change, which is kturcm with commit.
A commit is a change to the block that holds the undo segment’s transaction table, and flags the current transaction as committed. This is what is commonly referred to as a ‘commit marker’.
After the kturcm function, the transaction is changed to the status committed. However, if you look closely, there are several functions executed AFTER kturcm, like kco_blkchk, kcoadv_hdr and kcbhfix_tail that complete the change made in kturcm in order to make the block consistent.
After block changes and the change vector checksum in kcrfw_partial_checksum, a function unique to commit is executed: kcrf_commit_force_int. The first time invocation of this function signals the logwriter to write.
At the time of kcrf_commit_force_int and returning from it into the function kcrfw_redo_gen_ext back to kcbchg1_main, the newly inserted value is not available, but when the execution in the kcbchg1_main function reaches ktuulc to clean up the TX enqueue, the the NEW value becomes available!
This is something which I still do find counter intuitive, because this means at the above mentioned time, which is prior to reaching ktuulc the change becomes visible to all sessions but the committing session. The committing session at that point needs to clean up the block a little, and later on remove the shared TM enqueue, and after that, the committing session executes kcrf_commit_force_int again to wait for the ‘commit marker’ and obviously all successive change vectors to complete. WHILE ALL OTHER SESSIONS CAN SEE AND USE THE CHANGED DATA FOR WHICH THE COMMITTING SESSION IS WAITING!
Pingback: A look into oracle redo, part 9a: commit – concurrency considerations | Frits Hoogland Weblog
Pingback: A look into oracle redo: index and overview | Frits Hoogland Weblog