A look into oracle redo, part 9a: commit – concurrency considerations

During the investigations of my previous blogpost about what happens during a commit and when the data becomes available, I used breaks in gdb (GNU debugger) at various places of the execution of an insert and a commit to see what is visible for other sessions during the various stages of execution of the commit.

However, I did find something else, which is very logical, but is easily overlooked: at certain moments access to the table is blocked/serialised in order to let a session make changes to blocks belonging to the table, or peripheral blocks like undo, for the sake of consistency. These are changes made at the physical layer of an Oracle segment, the logical model of Oracle says that writers don’t block readers.

The first question is if this is a big issue. Well, for that you should simply look at any normal usage of an Oracle database. I don’t believe these serialisation moments are an issue, because the majority of my client’s does not experience serialisation problem.

However, this can become an issue if database processes can not run at will. What that means is that if processes are randomly stopped from execution by the operating system or hypervisor, it could be that the database process can be at an earlier mentioned ‘serialisation point’, which then means that access for all other processes remains blocked until the process is made running again, which then gives the process the opportunity to free exclusive access to a resource.

What are issues when a database process can not run at will?
– CPU oversubscription. If more processes are running than CPU’s are available (think about core’s and threads, and what it means for your CPU architecture), it means the operating system needs to make a decision on what it wants to be running, and what it wants to wait. No operating system scheduler understands or can know when an Oracle database process is running in a critical part of code, and therefore should not stop execution of it.
– Memory oversubscription. Whenever your system starts actively swapping in and out, your system is swapping, which means it’s moving active pages onto the swap device, and reads back active pages from the swap device for a process that demands them. This will activate additional tasks, and make processes get stuck waiting for memory to become available at all kinds of points during execution. Often, swapping is the beginning of the end, which means the services on the server stop functioning altogether for multiple reasons, like the Out Of Memory killer, or simply timing out for client’s of a process.
– Oversubscription of CPU or memory using virtualisation. As a troubleshooter, I would say this is actually worse than the ‘simple oversubscription’ mentioned in the two earlier points. Because typically, the visible part is the virtual machine. The virtual machine in this case itself is not oversubscribed, the oversubscription takes place at a layer invisible to the virtual machine, which unexplainably performs unpredictable. It is simply not okay to oversubscribe any machine that needs to run something that is latency sensitive like the Oracle database, unless you fully understand all the trade-offs that are coming with it, and you have the ability to understand when it does start to occur.

Whenever database processes can not run at will, you will see waits you didn’t see when processes could run at will. Typical waits in that case are (not an exhaustive list):
– any latch waits
– buffer busy waits
– any cursor: pin waits
Please mind these waits are not unique for this issue, any of these waits can occur for other reasons too.

Let me show a couple of examples for which the wait is artificially created by stopping execution at a point where this serialisation takes place, thereby mimicking getting put off cpu due to load, using the earlier insert and commit execution:

a) latch: cache buffers chains
Whenever a process needs to read or modify a database buffer, it needs to ‘pin’ the buffer. Such a pin is also known as a buffer handle or a buffer state object. The pin, which is a memory area, must be obtained from a freelist and attached to the buffer header in order to pin the buffer. A pin is obtained using the kcbzgs (kernel cache buffers helper functions get state object) function, which calls the kssadf_numa_intl (kernel service state objects add from freelist numa internal) function which initialises the state object.

The function that performs the actual pin of a block is not a single one, pinning a block is done in multiple functions. Two of them are kcbgtcr (kernel cache buffers get consistent read) and kcbgcur (kernel cache buffers get current). Under normal circumstances, concurrency for a consistent read of a buffer (kcbgtcr) does not lead to any blocking, because the latch (cache buffers chains) can be taken in shared mode, and the pinning is done in ‘fast mode’ (my interpretation):

> kcbgtcr(0x7f88a8b79db0, 0, ...)
| > kscn_to_ub8_impl(0x7f88a8b88d4c, 0x6baefdc0, ...)
| < kscn_to_ub8_impl+0x00000000003e returns: 0x287562
| > ktrexf(0x7fff998123f8, 0x7f88ae5d7f30, ...)
| | > ktrEvalBlockForCR(0x7f88a8b88d4c, 0x7f88ae5d7f30, ...)
| | < ktrEvalBlockForCR+0x0000000000f4 returns: 0x1
| < ktrexf+0x0000000000a4 returns: 0x9
| > kcbzgsf(0x1, 0x7fff998120b8, ...)
| | > kssadf_numa_intl(0x36, 0x6ddc0b50, ...)
| | | > kpdbIdToUid(0, 0x6ddc0b50, ...)
| | | < kpdbIdToUid+0x0000000000e7 returns: 0
| | < kssadf_numa_intl+0x000000000235 returns: 0x6bf84908
| < kcbzgsf+0x0000000001a0 returns: 0x6bf84908
| > kcbz_fp_cr(0x8dff9478, 0x6bf84998, ...)
| < kcbz_fp_cr+0x000000000089 returns: 0
| > kscn_to_ub8_impl(0x7f88a8b88d70, 0x1, ...)
| < kscn_to_ub8_impl+0x00000000003e returns: 0
< kcbgtcr+0x000000000b38 returns: 0x8df94014

Inside the kcbgtcr function, kcbzgsf (kernel cache buffers helper functions test state object fast) is called, and after that, the state object is pinned to the block in kcbz_fp_cr (kernel cache buffers helper functions fast pin for consistent read). Do you see the absence of ksl_get_shared_latch function to serialise access to add this pin? Actually the cache buffers chains latch is gotten, but it’s done inside the kcbgtcr function. The latch is gotten before the ktrexf (kernel transaction redo examine block fast) in shared mode, and the latch is released after the kcbz_fp_cr function. Multiple processes can pin the block for consistent read without serialisation or blocking each other because of the cache buffers chains latch being a shared latch, and this latch is taken in shared mode.

Why do I show this? Well, in the “old” days, which is Oracle 8.1, Oracle did not use shared latches (at least on the platforms I worked on, but if my memory serves me right, shared latches were introduced starting from Oracle 9), which meant for any logical read the cache buffers chains latch had to be obtained. This could get painful for very frequently visited blocks, like index root blocks, and high concurrency, and additional to that, because there was no post/wait messaging for latch waits but only spinning, this meant lots of processes waiting for latch number 98, and CPU usage going through the roof.

Back to today. Below is shown how the function calls look like when DML is done. When a buffer change is done (data is changed), a buffer has to be obtained in current mode, which is done using the kcbgcur function:

> kcbgcur(0x7ffda5957c70, 0x2, ...)
| > kcbzgs(0x1, 0x2, ...)
| | > kssadf_numa_intl(0x36, 0x7a6086c8, ...)
| | | > kpdbIdToUid(0, 0x7a6086c8, ...)
| | | < kpdbIdToUid+0x0000000000e7 returns: 0
| | < kssadf_numa_intl+0x000000000235 returns: 0x7867c9c0
| < kcbzgs+0x000000000178 returns: 0x7867c9c0
| > kti_is_imu(0x7867ca50, 0xb0f71018, ...)
| | > ktcgtx(0x76f29d50, 0xb0f71018, ...)
| | < ktcgtx+0x00000000001e returns: 0x76f29d50
| < kti_is_imu+0x000000000039 returns: 0
< kcbgcur+0x0000000009fb returns: 0xb0302014

This looks reasonably the same as the previous call overview which shown kcbgtcr, however the kcbzgsf function changed to kcbzgs, so minus the ‘f’ for fast, and a few other functions are missing. Another interesting thing (not visible) is the cache buffers chains latch is obtained after the kcbzgs function that obtains the buffer state object, and the latch is obtained with special bit 0x1000000000000000 set to indicate the shared latch is obtained in non-shared mode. After the kti_is_imu function the pin is attached to the buffer and the latch is freed.

To make things a little more complicated, a buffer can be gotten in current mode, but still get the cache buffers chains in shared mode. This is how that looks like:

> kcbgcur(0x7ffda5955620, 0x1, ...)
| > kcbzgs(0x1, 0x1, ...)
| | > kssadf_numa_intl(0x36, 0x7a6086c8, ...)
| | | > kpdbIdToUid(0, 0x7a6086c8, ...)
| | | < kpdbIdToUid+0x0000000000e7 returns: 0
| | < kssadf_numa_intl+0x000000000235 returns: 0x7867c9c0
| < kcbzgs+0x000000000178 returns: 0x7867c9c0
| > kcbz_fp_shr(0xb0fa3cb8, 0x7867ca50, ...)
| < kcbz_fp_shr+0x000000000085 returns: 0x7f3011d82f40
< kcbgcur+0x0000000009fb returns: 0xb07a0014

The function kcbzgs without ‘f’ is called, but the cache buffers chains latch is gotten in shared mode (not visible; done in the kcbgcur function after kcbzgs), and there’s the function kcbz_fp_shr (kernel cache buffers helper function fast pin shared) to pin the buffer in shared mode.

In order to be allowed to change a buffer, the cache buffers chains latch must be taken in non-shared mode to change the buffer header to indicate the buffer is busy and guarantee only the process itself can access it. Obtaining the cache buffers chains latch in non-shared mode means that access to the hash bucket (multiple hash buckets actually) to which the buffer is linked is blocked until the latch is freed. Please mind the period that the latch is obtained here is very short.

However, this blocking scenario can be easily replayed:
1. session 1: startup a database foreground session.
2. session 2: obtain the PID of the session 1’s process and attach to it with gdb.
3. session 2: break on the kti_is_imu function (break kti_is_imu) and continue.
4. session 1: insert a row into a table (insert into test values (‘a’);). the breakpoint will fire in gdb, stopping execution.
5. session 3: startup another database foreground session.
6. session 3: select the table on which the insert just broke (select * from test;). This will hang.
7. session 4: startup a SYSDBA session, and obtain the wait state of the foreground sessions:

SQL> @who active

SID_SERIAL_INST OSPID    USERNAME      C SERVER    PROGRAM                                          EXTRA
--------------- -------- ------------- - --------- ------------------------------------------------ -------------
59,53450,@1     28166    SYS           * DEDICATED sqlplus@memory-presentation.local (TNS V1-V3)    time:0ms,event:ON CPU:SQL,seq#:98
102,53150,@1    28083    TS              DEDICATED sqlplus@memory-presentation.local (TNS V1-V3)    time:66867.83ms,event:ON CPU:SQL,seq#:33
101,6637,@1     28186    TS              DEDICATED sqlplus@memory-presentation.local (TNS V1-V3)    time:344013.82ms,event:latch: cache buffers chainsp1:1811473056p2:289p3:24599> Blocked by (inst:sid): 1:102,seq#:39

SID 101 is waiting on the cache buffers chains latch, because SID 102 is holding that. SID 102 is showing ON CPU because we broke execution outside of a wait event, so Oracle thinks it’s busy executing, while it is actually stopped by gdb.

Again, this is a wait (latch: cache buffers chains) that is unlikely be an issue, and if it is, you either ran into a bug, or this is only the way a much bigger problem (oversubscription, bad application design) is showing itself.

b) buffer busy waits
The name of the wait event ‘buffer busy wait’ is self explanatory. The technical implementation is way lesser known. What technically happens during a buffer busy wait, is that a session needs to read a certain buffer, and in order to read it, it performs the usual actions of selecting the correct cache buffers chains bucket, following the double linked list to the buffer headers to find the best best buffer and then obtain a buffer state object. The critical part for running into the buffer busy wait is when the session checks the state of the buffer via the buffer header and finds the change state (X$BH.CSTATE) to be higher than 0, indicating the block is currently being changed. At this point the session calls kcbzwb (kernel cache buffers helper function wait for buffer), and must wait for the buffer state to be changed and therefore accessible, for which the wait event is the buffer busy wait event. When the change state is reverted back to 0, the buffer is accessible again.

Once a buffer is pinned in current mode using kcbgcur en kcbzgs, the buffer is still fully accessible for all sessions. Only once the database truly starts to change the buffer, which is done in kcbchg1_main, the cache buffers chains latch is taken non-shared, and the CSTATE field in the buffer header is changed to 1 and then the latch is freed. My current investigations show that the CSTATE is changed to 2 after the kcopcv and kcrfw_copy_cv functions, so when the redo is written to the public redo strand, and set to 4 after the changes have been written to the buffers, which is done in kcbapl. At the end of the kcbchg1_main function, when all the changes are done, the cache buffers chains latch is taken non-shared again, the CSTATE field is set to 0 and the latch is freed to enable access to the buffer again.

68801:kcbchg1_main+2536:0x000000006ba4e4d8(Variable Size(pgsz:2048k)|+182772952 shared pool|(child)latch#5383:cache buffers chains+0 shared pool|permanent memor,duration 1,cls perm+2204888 ):R:8:0/0()
68802:kcbchg1_main+2536:0x000000006ba4e4d8(Variable Size(pgsz:2048k)|+182772952 shared pool|(child)latch#5383:cache buffers chains+0 shared pool|permanent memor,duration 1,cls perm+2204888 ):W:8:0x1000000000000022/1152921504606847010()
..
68897:kcbchg1_main+3318:0x00000000967eddda(Variable Size(pgsz:2048k)|+901701082 buffer header|9668A000+66 ):W:1:0x1/1()
..
68939:kcbchg1_main+3715:0x000000006ba4e4d8(Variable Size(pgsz:2048k)|+182772952 shared pool|(child)latch#5383:cache buffers chains+0 shared pool|permanent memor,duration 1,cls perm+2204888 ):R:8:0x1000000000000022/1152921504606847010()
68940:kcbchg1_main+3715:0x000000006ba4e4d8(Variable Size(pgsz:2048k)|+182772952 shared pool|(child)latch#5383:cache buffers chains+0 shared pool|permanent memor,duration 1,cls perm+2204888 ):W:8:0/0()

Above you see a cache buffers chains latch being obtained non shared (special bit 0x1000000000000000 is set), a little later the buffer header is updated which sets the value 1 at offset 66. Once this happens, this is visible in the CSTATE column of X$BH. A little further the latch is freed.

Why am I showing all of this? Well, there is quite some work that is done while the buffer is in a changed state (CSTATE>0) and therefore inaccessible. In normal situations, this is not a problem because despite all the work this is done very quickly, and therefore it’s hardly noticeable/measurable, so there’s no or very little time spend in buffer busy waits in most databases

However… if processes do get stuck randomly, it’s possible that a process gets stuck while making changes to a buffer, which then means any additional access to the buffer results in buffer busy waits. Also mind that a change to a table buffer requires (at least; in the most simple case) three buffers to be changed: the buffer containing the table data, the buffer containing the undo segment transaction table and the buffer containing the actual undo. A commit requires one buffer: the undo segment transaction table.

Let me show you an example:
1. session 1: startup a database foreground session.
2. session 2: obtain the PID of the session 1’s process and attach to it with gdb.
3. session 2: break on ktugur and kcopcv and continue.
4. session 3: startup a database foreground session.
5. session 1: insert a row into a table: insert into test values (‘a’);
6. session 2: gdb breaks execution of session 1 because it encountered ktugur.
7. session 3: select * from test; this runs unblocked.
8. session 4: startup a database foreground session as sysdba.
9. session 4: select * from v$lock; this will hang.

This one wasn’t obvious. I found this upon doing the tests. When a foreground session is in ktugur, it is creating the change vectors (kernel transaction undo generate undo and redo), and it holds two buffers in current mode:

SQL> l
  1  select 	ts.name tablespace,
  2  	dbablk,
  3  	class,
  4  	state,
  5  	decode(state,0,'free',1,'xcur',2,'scur',3,'cr', 4,'read',5,'mrec',6,'irec',7,'write',8,'pi', 9,'memory',10,'mwrite',11,'donated', 12,'protected',  13,'securefile', 14,'siop',15,'recckpt', 16, 'flashfree',  17, 'flashcur', 18, 'flashna') state_decoded,
  6  	mode_held,
  7  	changes,
  8  	cstate,
  9  	flag,
 10  	flag2,
 11  	rflag,
 12  	sflag,
 13  	lru_flag,
 14  	dirty_queue
 15  from 	  x$bh b
 16  	, v$tablespace ts
 17  where 	1=1
 18  and	not (us_nxt = us_prv and wa_nxt = wa_prv and to_number(wa_nxt,'xxxxxxxxxxxxxxxx') = to_number(us_nxt,'xxxxxxxxxxxxxxxx') + 16)
 19* and	b.ts#=ts.ts#
SQL> /

TABLESPACE                         DBABLK      CLASS      STATE STATE_DECO  MODE_HELD    CHANGES     CSTATE       FLAG      FLAG2      RFLAG      SFLAG   LRU_FLAG DIRTY_QUEUE
------------------------------ ---------- ---------- ---------- ---------- ---------- ---------- ---------- ---------- ---------- ---------- ---------- ---------- -----------
UNDOTBS1                              128         17          1 xcur                2          1          0    2097152          0          0          0          0           0
TS                                  24597          1          1 xcur                2          1          0    2097153          0          0          0          0           0

The thing that was not obvious at first was that the blocks have CSTATE 0; so they can be accessed and read for a consistent read. Then I look at the wait and the p1/2/3 of the wait:

SID_SERIAL_INST OSPID    USERNAME      C SERVER    PROGRAM                                          EXTRA
--------------- -------- ------------- - --------- ------------------------------------------------ -------------------------------------------------------
100,262,@1      6274     SYS             DEDICATED sqlplus@memory-presentation.local (TNS V1-V3)    time:2519.48ms,event:buffer busy waitsp1:3p2:128p3:17> Blocked by (inst:sid): 1:59,seq#:476
142,15103,@1    6695     TS              DEDICATED sqlplus@memory-presentation.local (TNS V1-V3)    time:47115.23ms,event:SQL*Net message from client,seq#:226
12,25165,@1     7036     SYS           * DEDICATED sqlplus@memory-presentation.local (TNS V1-V3)    time:0ms,event:ON CPU:SQL,seq#:3165
59,31601,@1     5891     TS              DEDICATED sqlplus@memory-presentation.local (TNS V1-V3)    time:31292.51ms,event:ON CPU:SQL,seq#:111

Above SID 100 is session 4; it hangs on a buffer busy wait. The datafile is #2, which is the undo table space UNDOTBS1 and the buffer that it is waiting for is 128. Why does SID 100 needs to wait for the buffer, while another session (session 3 in the above runbook) can run without getting blocked for a buffer that is held exactly in the same state (xcur and CSTATE is zero)?

The answer can be gotten when executing pstack on the waiting process (or attach with gdb and obtain a backtrace):

(gdb) bt
#0  0x00007fd6a085bbda in semtimedop () at ../sysdeps/unix/syscall-template.S:81
#1  0x0000000010f9cca6 in sskgpwwait ()
#2  0x0000000010f9a2e8 in skgpwwait ()
#3  0x0000000010a66995 in ksliwat ()
#4  0x0000000010a65d25 in kslwaitctx ()
#5  0x00000000015c5e5e in kcbzwb ()
#6  0x0000000010b3748e in kcbgcur ()
#7  0x0000000010ab9b17 in ktuGetUsegHdr ()
#8  0x00000000014075d3 in ktcxbcProcessInPDB ()
#9  0x0000000001406de2 in ktcxbc ()
#10 0x0000000010ea8781 in qerfxFetch ()
#11 0x0000000010bb919c in rwsfcd ()
#12 0x00000000035ae618 in qeruaFetch ()
#13 0x00000000035ad837 in qervwFetch ()
#14 0x0000000010bb919c in rwsfcd ()
#15 0x0000000010ea1528 in qerhnFetch ()
#16 0x0000000010bb919c in rwsfcd ()
#17 0x0000000010ea1528 in qerhnFetch ()
#18 0x0000000010c9eeb3 in opifch2 ()
#19 0x000000000268c3de in kpoal8 ()
#20 0x0000000010ca5b3d in opiodr ()
#21 0x0000000010f37a29 in ttcpip ()
#22 0x0000000001eb4674 in opitsk ()
#23 0x0000000001eb914d in opiino ()
#24 0x0000000010ca5b3d in opiodr ()
#25 0x0000000001eb04ed in opidrv ()
#26 0x0000000002afedf1 in sou2o ()
#27 0x0000000000d05577 in opimai_real ()
#28 0x0000000002b09b31 in ssthrdmain ()
#29 0x0000000000d05386 in main ()

The important bit is at #6: kcbgcur (kernel cache buffers get buffer in current mode)! In other words: the block is gotten in current mode while another session is holding the block in current mode, which means it has to wait. The function kcbzwb means kernel cache buffer helper function wait for buffer.

I really wondered why the undo segment buffer is gotten in current mode; this is a read, there is no intention to change something. The current educated guess is this a way to be sure this buffer contains the current state of the transactions across all nodes in a RAC cluster. So this might be a clever optimisation to be absolutely sure the current state of the undo segment transaction table is gotten.

10. session 2: continue. This will break on kcopcv.
11. session 3: select * from test; this will hang.

Now we have two hanging sessions waiting for the wait event buffer busy wait:

SQL> @who active

SID_SERIAL_INST OSPID    USERNAME      C SERVER    PROGRAM                                          EXTRA
--------------- -------- ------------- - --------- ------------------------------------------------ -----------------
100,262,@1      6274     SYS             DEDICATED sqlplus@memory-presentation.local (TNS V1-V3)    time:120790ms,event:buffer busy waitsp1:3p2:128p3:35> Blocked by (inst:sid): 1:59,seq#:501
142,15103,@1    6695     TS              DEDICATED sqlplus@memory-presentation.local (TNS V1-V3)    time:1175715.03ms,event:buffer busy waitsp1:4p2:24597p3:1> Blocked by (inst:sid): 1:59,seq#:250
12,25165,@1     7036     SYS           * DEDICATED sqlplus@memory-presentation.local (TNS V1-V3)    time:0ms,event:ON CPU:SQL,seq#:3309
59,31601,@1     5891     TS              DEDICATED sqlplus@memory-presentation.local (TNS V1-V3)    time:9544.97ms,event:ON CPU:SQL,seq#:116

SID 100 still hangs in wait event buffer busy wait for file 3, block 128, but now the session doing the full table scan also hangs in wait event buffer busy wait, but for file 4, block 24597. Let’s look at the buffers held in current mode again:

SQL> @pinned_blocks

TABLESPACE                         DBABLK      CLASS      STATE STATE_DECO  MODE_HELD    CHANGES     CSTATE       FLAG      FLAG2      RFLAG      SFLAG   LRU_FLAG DIRTY_QUEUE
------------------------------ ---------- ---------- ---------- ---------- ---------- ---------- ---------- ---------- ---------- ---------- ---------- ---------- -----------
UNDOTBS1                              128         35          1 xcur                2          0          1    2097156          0          0          0          0           0
UNDOTBS1                             3127         36          1 xcur                2          0          1    2097156          0          0          0          0           0
TS                                  24597          1          1 xcur                2          0          1    2097157          0          0          0          0           0

Because the insert progressed to kcopcv, the CSTATE of all three buffers involved in the insert, the undo segment transaction table, the undo segment undo entry and the buffer which has the inserted row are set to 1. This means that the two buffer busy waits are actually two different reasons: the select * from v$lock still waits to get a buffer in current mode, and the select * from test, which does a consistent read, now is waiting for the CSTATE to get set back to 0. In order to solve the hanging, go to session 2 and continue. This will break again on kcoapl, and again if you continue, because there are 3 blocks involved in this insert. Another way to make the insert finish is to disable all breakpoints (disable command) or to quit gdb.

Just for the fun of it, here is another example of how a stuck session can escalate: a process that gets stuck executing the kcbapl (kernel cache buffers apply a change to a buffer). This scenario can be replayed using the following steps:
1. session 1: startup a database foreground session.
2. session 2: obtain the PID of the session 1’s process and attach to it with gdb.
3. session 2: break on kcbapl and continue.
4. session 3: startup a database foreground session.
5. session 3: insert a row into a table: insert into test values (‘a’);
6. session 1: insert a row into a table: insert into test values (‘a’);
7. session 2: gdb broken execution of session 1 because it encountered kcbapl.
8. session 3: commit; this will hang

Can you reason why session 3 is hanging? Both the sessions are doing inserts to the same table…Give it a thought and replay it yourself.

Explanation: the breakpoint stopped execution before actually doing the changes. CSTATE for the buffer is set to 1, it will get set to 2 inside kcbapl to indicate the change vectors have been copied. But the important bit is that a redo copy latch was taken before executing kcbapl. The commit in session 3 sent a message to the logwriter asynchronously asking it to write (first invocation of kcrf_commit_force_int), and then freed the logical locking of the insert, and then invoked kcrf_commit_force again to wait for the logwriter to finish writing and get notified. However, the logwriter needs to obtain all the redo copy latches (it actually reads a metadata structure that indicates the redo copy latches are freed or not), at which it needs to wait because session 1 stuck holding a redo copy latch and therefore needs to wait. Not only does this show how dangerous it is when sessions can’t run, it also shows how dangerous it can be to run gdb in a production database, it can be quite hard to figure out what will be the implication of attaching with gdb to the process. So the actual outcome of above is that session 3 waits in the event ‘log file sync’, and the logwriter is waiting in ‘LGWR wait for redo copy’.

Summary
The Oracle database is carefully setup to be able to handle concurrency. In most normal situations you will see no to maybe very little time spend in wait events that indicate concurrency, like latches or buffer busy waits. If an Oracle database is on a machine that has more active processes than available CPU threads or cores (this depends on the hardware and CPU architecture; for Intel based architectures you should lean towards core’s, for SPARC you can lean more towards threads), the operating system needs to make choices, and will do that trying to give all processes an equal share. This means that processes will be put off CPU in order to give all processes a fair share of CPU run time. Also mind other oversubscription possibilities, like using more memory than physically available, and doing the same using virtualisation, which is handing out more VCPU’s than CPU threads, and handing out more memory than physically available. Because you can’t look outside your virtual machine, it’s often harder to troubleshoot virtualisation oversubscription issues.

Oracle really tries to minimise the amount of time spend while holding a non shared latch. Therefore with modern Oracle databases it’s unlikely that latch waits show up, even if a system is (not too much) oversubscribed. If you do have a latching problem, it’s most likely you have got another problem which causes this.

The most well documented reason for buffer busy waits is a buffer being written onto persistent media, and therefore the buffer is marked busy for the time of the write. Another reason that is documented, is getting a buffer in a mode that is not compatible with the mode it is held in. Having a buffer in current mode will block another request for the buffer in current mode. But, having a buffer pinned in current mode does not block consistent read access, it’s only when the buffer is being changed that requires the buffer to be marked busy. This state is externalised in the CSTATE column of the view X$BH, which shows fields of the buffer header struct that is used to manage the buffer cache buffers. Because there is quite some work to be done when a buffer is changed, and during making these changes the buffers are not usable (‘busy’) for other processes, a buffer busy wait is more likely to occur when oversubscription is taking place.

Thanks to Jonathan Lewis for explaining how to get the buffer header addresses.
Thanks to Abel Macias for brainstorming.

Advertisements
1 comment

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

This site uses Akismet to reduce spam. Learn how your comment data is processed.

%d bloggers like this: