Archive

Tag Archives: hacking

This blogpost is about the Oracle database row or dictionary cache. This is a separate cache that caches database metadata, like database object properties or user properties.

There is surprising little in-depth technical detail about the row cache. To some degree I understand: issues with the row cache are rare.

I noticed a column in V$ROWCACHE called ‘FASTGETS’. Whatever FASTGETS means, in my database it is being used:

SQL> select cache#, type, parameter, gets, fastgets, getmisses from v$rowcache;
    CACHE# TYPE        PARAMETER                              GETS   FASTGETS  GETMISSES
---------- ----------- -------------------------------- ---------- ---------- ----------
         3 PARENT      dc_rollback_segments                   1000          9         12
         1 PARENT      dc_free_extents                           0          0          0
         4 PARENT      dc_used_extents                           0          0          0
         2 PARENT      dc_segments                            4374       7308       2496
         0 PARENT      dc_tablespaces                         1232      14466        178
         5 PARENT      dc_tablespace_quotas                      0          0          0
         6 PARENT      dc_files                                  5          0          5
        10 PARENT      dc_users                                953      13473        196
         8 PARENT      dc_objects                            32545      13682       3769
        17 PARENT      dc_global_oids                          723          0        165
        12 PARENT      dc_constraints                            0          0          0
        13 PARENT      dc_sequences                              9          0          9
        16 PARENT      dc_histogram_defs                     60851          0      19613

If you look at the description of FASTGETS in the documentation (currently the 20c version), it says: ‘Reserved for internal use’. That’s not very helpful.

Luckily, there is a trace event to make Oracle print additional information regarding the row cache: event 10222.

Please mind using events is an unsupported action unless you have explicit blessing from Oracle support to use it. Even then, it should only be set if you are 100% sure you need it, and you should unset it as soon as you can. Also document the exact reason for using it, and evaluate set events during patching and version upgrades.

Using the event, I found that a normal dictionary lookup looks like this:

kqrpre: start hash=6d2c659c mode=S keyIndex=1 dur=CALL opt=FALSE
kqrpre: found cached object po=0x7e5da720 flg=2
kqrmpin : kqrpre: found po Pin po 0x7e5da720 cid=10 flg=2 hash=70b9dd3f
time=3663934334
kqrpre: pinned po=0x7e5da720 flg=2 pso=0x793472c8
kqrpre: done po=0x7e5da720 cid=10 flg=2 eq=0x7e616a20 pso=0x793472c8 dur=CALL
kqrpre: keyIndex=1 hash=70b9dd3f 6d2c659c 0
kqrpre: time=3663934346
kqrpre: returnVal: TRUE
.
kqrprl: eq=0x7e616a20 fpInUse=FALSE
ksedsts()+418<-kqrprl()+1208<-kkdlGetBaseUser()+248<-kzulgt1()+220<-kksLockUserSchema()+125<-kksLoadChild()+1983<-kxsGetRuntimeLock()+2049<-kksfbc()+17007<-kkspsc0()+2222<-kksParseCursor()+123<-opiosq0()+2266<-kpoopr
x()+420<-kpoal8()+838<-opiodr()+1244<-ttcpip()+1239
<-opitsk()+1943<-opiino()+957<-opiodr()+1244<-opidrv()+1091<-sou2o()+191<-opimai_real()+455<-ssthrdmain()+423<-main()+275<-__libc_start_main()+245.

This is a bit cryptic, but this is a row cache parent read (kqrpre) and release (kqrprl), for cache id (cid) 10 (dc_users, see above v$rowcache output), for which the needed entry was found (found cached object). It even contains a short stack format dump of the current location of invocation.

Now see the following lookup:

kqrpre: start hash=70b9dd3f mode=S keyIndex=0 dur=CALL opt=TRUE
kqrpre: optimistic lookup hash=70b9dd3f
kqrpre: KQR_READ_PD_VER: versionAddr=0x628f2a20 version=10
kqrpre: optimistic lookup: success
kqrpre: done po=0x7f651cbb1bf8 cid=10 flg=0 eq=(nil) pso=0x79360288 dur=CALL
kqrpre: keyIndex=0 hash=70b9dd3f 6d2c659c 0
kqrpre: time=3663936115
kqrpre: returnVal: TRUE
.
kqrprl: eq=0x7f651cbb1bf8 fpInUse=TRUE
ksedsts()+418<-kqrprl()+1208<-kkdlGetUserId()+587<-kqlhdlod()+641<-kqlCallback()+120<-kqllod()+667<-kglLoadOnLock()+1107<-kgllkal()+861<-kglLock()+1429<-kglget()+302<-kglgob()+328<-qcdlgbo()+609<-qcdlgob()+982<-qcsfgob()+278<-qcsprfro()+553<-qcsprfro_tree()+380
<-qcsprfro_tree()+150<-qcspafq()+246<-qcspqbDescendents()+281<-qcspqb()+272<-kkmdrv()+192<-opiSem()+1962<-opiDeferredSem()+447<-opitca()+436<-kksFullTypeCheck()+86<-rpiswu2()+583<-kksLoadChild()+7969<-kxsGetRuntimeLock()+2049<-kksfbc()+17007<-kkspsc0()+2222
<-kksParseCursor()+123<-opiosq0()+2266<-kpooprx()+420<-kpoal8()+838<-opiodr()+1244<-ttcpip()+1239<-opitsk()+1943<-opiino()+957<-opiodr()+1244<-opidrv()+1091<-sou2o()+191<-opimai_real()+455<-ssthrdmain()+423<-main()+275<-__libc_start_main()+245.

This is lookup for cid 10 (dc_users) too, but now “optimistic”, as can be seen on line 2. The first line shows “opt=TRUE”, which gives the indication that optimistic reading is probably not done dynamically, but an explicit setting. Could this “optimistic” read be what is shown as FASTGET in v$rowcache?

Let’s look at the actual C functions that are called for such an optimistic row cache read:

> kqrpre1(0xa, 0x7ffc48561380, ...)
| > kqrpre2(0xa, 0x7ffc48561380, ...)
| | > kqrhsh(0x62f471c8, 0x7ffc48561380, ...)
| | | > kgghash(0x7ffc48561380, 0x82, ...)
| | | < kgghash+0x0000000000d0 returns: 0xc34886c9
| | < kqrhsh+0x00000000007d returns: 0xc34886c9
| | > KQR_READ_PD_VER(0x62f471c8, 0x628c7670, ...)
| | < KQR_READ_PD_VER+0x000000000044 returns: 0x4a
| | > kqrFastPoRead(0x62f471c8, 0x7ffc48561380, ...)
| | | > _intel_fast_memcmp(0x7ffc48561380, 0x78fc6834, ...)
| | | < _intel_fast_memcmp+0x000000000053 returns: 0
| | | > kglInstanceIdn(0x7f1bc9f099a0, 0x78fc6800, ...)
| | | <> kglHandleIdn(0x7f1bc9f099a0, 0x7c736fb0, ...)
| | | < kglHandleIdn+0x000000000011 returns: 0x1
| | | > kglPdbInstanceIdn(0x7f1bc9f099a0, 0x7c736fb0, ...)
| | | < kglPdbInstanceIdn+0x000000000024 returns: 0x1
| | < kqrFastPoRead+0x0000000001ac returns: 0x1

Here we see kqrpre1 call kqrpre2, which calls kqrhsh to calculate a hash value to lookup the row cache entry. Then we see KQR_READ_PD_VER being called, which is unique to the optimistic lookup, and then kqrFastPoRead. That seems to be a reasonable definite indication this is doing the FASTGET.

The logical question to ask then is what is the actual difference between a fastget and a (regular) get?

Looking through the function calls of a fastget versus a regular get the thing that strikes me is the absence of concurrency checking and protection gets in the fastget alias optimistic call. A regular row cache get performs:
– obtains the row cache hash bucket mutex
– obtains the row cache parent object mutex
– pins the (row cache entry) memory
– frees the row cache hash bucket mutex
– locks the row cache entry (kqrget)
– frees the row cache parent object mutex

– the row cache data is used (this is the actual purpose of the row cache get)

– obtains the row cache parent object mutex
– unpins the (row cache entry) memory
– frees the row cache parent object mutex
– frees the lock (there is no explicit unlock call, I assume it’s done via the state object)

A fastget does simply not do a lot of the concurrency protection measures, it checks the version of the entry before and after getting the entry, and apparently that’s enough to be sure about the consistency of the row cache entry. It does lock the entry, and releases it after usage. Especially since the number of row cache lookups can easily get very high, this probably saves a lot of CPU cycles.

Looking in the bigger picture, the fastget seems to be done at times when the database executable has fetched the same entry in a regular way earlier, so it’s safe to assume it will still be there, and it’s safe to do it without all the concurrency control around it.

Is there anything you can do about it? Not that I know, this is just a blog to explain something about which there is no documentation.

In case you’re wondering when this appeared in Oracle, the answer is 11.2.0.4. The FASTGETS column appeared in that function, as well as the C function kqrFastPoRead.

There’s also the undocumented parameter “_kqr_optimistic_reads”, which is TRUE in my (Oracle 18) instance.

This post is about how to use gdb, which is a debugger, so very simplistically put an aid for looking at C programs, as a profiler. I use gdb quite a lot for profiling because it’s the easiest way for profiling for me.

Lots of people which I know use other tools like perf, systemtap and dtrace for the same purpose and that’s fine. Each tools has its own advantages and disadvantages. One disadvantage of gdb is that it’s using ptrace to attach to a process, which makes it dead slow from a machine perspective, because everything it then does goes via another process, which is the debugger. That is how the debugger works.

Also lots of people use gdb like I do, and use basic functionality, which is breaking at functions, which makes it possible to find out the sequence of how functions are called, generating backtraces (stack traces) to understand the stack and maybe looking at functions arguments.

But there is more that you can do. Way more actually. These examples use the gdb that comes with RHEL7/OL7 via yum.

You can break conditionally at a function, which means that a break will be executed only if the condition is met. For example:

break sltrgatime64 if $caller_is("ksbabs")

This only breaks at a function called sltrgatime64 if it is called from a function that is called ksbabs.

But for what I wanted to do and investigate, I didn’t only want to do “something” when the sltrgatime64 function was called from ksbabs, I wanted be able to distinguish the two times this was called as the first one and the second one.

I knew that the “first time” was actually right after a function with the name of “dbktFlush” was called. So here’s what I did:

set $status = -1

break dbktFlush if $caller_is("ksbabs")
commands
 silent
 set $status = 0
 c
end

break sltrgatime64 if $caller_is("ksbabs")
commands
 silent
 if $status == 0
   set $status = 1
   printf "first invocation\n"
   c
 end
 if $status == 1
  set $status = 2
  printf "second invocation\n"
  c
 end
 if $status > 1
  printf "error. third or more invocations\n"
 end
end

This is much more advanced than using a single command, but I think for anyone who is able to program a few languages is reasonably simple to understand.

There’s two breaks, and both are conditional, the function from which the function to break on must be ksbabs.
In the break of dbktFlush the (convenience) variable $status is set to 0, indicating the the first invocation for the other break.

In the second break, there are 3 if statements. The first one picks up the variable $status set to 0, indicating the first invocation and then sets it to 1 for the second invocation and the second if picks up the second invocation indicated by the variable $status set to 1 and sets it to 2, so if I didn’t look carefully and there are more invocations of sltrgatime64, I will get notified.

If breaks would be activated when the second call to sltrgatime64 was executed, it would not come across dbktFlush, and thus the $status variable would (still) be set to -1, and the break would skip all ifs.

Of course this is a very specific case, you should tailor it for your own needs. However, this was the actual investigation I done to investigate the function of the above sltrgatime64 calls.

In fact, I gone a little deeper and wanted to understand what the timing of of these two reads of the wall clock time did. For that, I used the python interface in gdb, for which the support seems to be compiled in in the rpm based version.

You can use python, and use variables in python that remain state between gdb interactions. I used a dead simple interaction; calling time.time() to time the invocation of the sltrgatime64 executed right after dbktFlush was called:

python import time
python var_time=time.time()

break dbktFlush if $caller_is("ksbabs")
commands
 silent
 set $status = 0
 c
end

break sltrgatime64 if $caller_is("ksbabs")
commands
 silent
 if $status == 0
   set $status = 1
   python print time.time()-var_time
   python var_time=time.time()
   c
  else
   c
 end
end

If you set single logwriter to true and execute this against the logwriter, you will see that the logwriter tries to keep a 3 second pace by adjusting the sleep time on semtimedop().

My actual goal was to try to find out what is exactly timed for the message ‘Warning: log write elapsed time ????ms, size ?KB’ in the logwriter trace file, which Nenad Noveljic proved is not only about the actual IO time.

To be honest, I done it clumsy by going backward measuring all the calls that measured wall clock time and should have put a break on write and look at the backtrace. When I executed a backtrace of the write call that wrote the trace file entry, I could see this line was written in kcrfw_postprocess_write().

This is the gdb script with the timing:

python import time

break skgstmGetEpochTs if $caller_is("kcrfw_redo_write_driver")
commands
 silent
 python var_time=time.time()
 c
end

break kcrfw_do_write
 commands
 silent
 print $rip
 c
end

break skgstmGetEpochTs if $caller_is("kcrfw_postprocess_write")
commands
 silent
 shell sleep 1
 python print time.time()-var_time
 c
end

What does gdb script does, is always set the python variable var_time, which includes the idle cycles of the logwriter. If it goes into the function kcrfw_do_write it actually is going to write. That is also when potentially the line “Warning: log write elapsed time 1024ms, size 0KB” could be produced.
when the function skgstmGetEpochTs is called in the function kcrfw_postprocess_write, it executes a sleep for 1 second, and then prints the current time minus the time obtained when kcrfw_redo_write_driver was entered.

What this proofs, is that the timing for the log write time elapsed time warning is done from the start of the kcrfw_redo_write_driver function. That means that everything that the logwriter needs to do, excluding the the last bit of what the logwriter performs, posting the foregrounds, is timed, including dealing with the internal structures that keep track of the redo strands and the latching of it.

%d bloggers like this: