A look into Oracle redo, part 2: the discovery of the KCRFA structure

This is the second post in a series of blogposts on Oracle database redo internals. If you landed on this blogpost without having read the first blogpost, here is a link to the first blogpost: https://fritshoogland.wordpress.com/2018/01/29/a-look-into-oracle-redo-part-1-redo-allocation-latches/ The first blogpost contains all the versions used and a synopsis on what the purpose of this series of blogposts is.

In the first part, I showed how the principal access to the public redo strands is controlled by redo allocation latches, and showed a snippet of trace information of memory accesses of a foreground session when using the first public redo strand:

kcrfw_redo_gen_ext+11277:0x0000000076fdf3bc(shared pool|permanent memor,duration 1,cls perm+16114620 ):R:4:0x1256f/75119()
kcrfw_redo_gen_ext+11290:0x0000000076fdf3d8(shared pool|permanent memor,duration 1,cls perm+16114648 ):R:8:0/0()
kcrfw_redo_gen_ext+11300:0x0000000076fdf430(shared pool|permanent memor,duration 1,cls perm+16114736 ):W:8:0/0()
kcrfw_redo_gen_ext+11307:0x0000000076fdf3d0(shared pool|permanent memor,duration 1,cls perm+16114640 ):R:8:0xdeadbeef0000726e/-2401053092612115858()
kcrfw_redo_gen_ext+11312:0x0000000076fdf428(shared pool|permanent memor,duration 1,cls perm+16114728 ):W:8:0xdeadbeef0000726e/-2401053092612115858()
kcrfw_redo_gen_ext+11317:0x0000000076fdf3c8(shared pool|permanent memor,duration 1,cls perm+16114632 ):R:8:0x100010000726e/281479271707246()
kcrfw_redo_gen_ext+11322:0x0000000076fdf420(shared pool|permanent memor,duration 1,cls perm+16114720 ):W:8:0x100010000726e/281479271707246()
kcrfw_redo_gen_ext+11327:0x0000000076fdf3c0(shared pool|permanent memor,duration 1,cls perm+16114624 ):R:8:0x1b800001000/1889785614336(??)
kcrfw_redo_gen_ext+11332:0x0000000076fdf418(shared pool|permanent memor,duration 1,cls perm+16114712 ):W:8:0x1b800001000/1889785614336(??)
kcrfw_redo_gen_ext+11337:0x0000000076fdf3b8(shared pool|permanent memor,duration 1,cls perm+16114616 ):R:8:0x1256f0000b2fe/322633648354046()
kcrfw_redo_gen_ext+11342:0x0000000076fdf410(shared pool|permanent memor,duration 1,cls perm+16114704 ):W:8:0x1256f0000b2fe/322633648354046()
kcrfw_redo_gen_ext+11347:0x0000000076fdf3b0(shared pool|permanent memor,duration 1,cls perm+16114608 ):R:8:0/0()
kcrfw_redo_gen_ext+11352:0x0000000076fdf408(shared pool|permanent memor,duration 1,cls perm+16114696 ):W:8:0/0()
kcrfw_redo_gen_ext+11307:0x0000000076fdf3a8(shared pool|permanent memor,duration 1,cls perm+16114600 ):R:8:0x7270/29296()
kcrfw_redo_gen_ext+11312:0x0000000076fdf400(shared pool|permanent memor,duration 1,cls perm+16114688 ):W:8:0x7270/29296()
kcrfw_redo_gen_ext+11317:0x0000000076fdf3a0(shared pool|permanent memor,duration 1,cls perm+16114592 ):R:8:0xd8e4e200/3638878720(Redo Buffers(pgsz:2048k)|+15000064 redo|PUB_REDO_0+15000064 )
kcrfw_redo_gen_ext+11322:0x0000000076fdf3f8(shared pool|permanent memor,duration 1,cls perm+16114680 ):W:8:0xd8e4e200/3638878720(Redo Buffers(pgsz:2048k)|+15000064 redo|PUB_REDO_0+15000064 )
kcrfw_redo_gen_ext+11327:0x0000000076fdf398(shared pool|permanent memor,duration 1,cls perm+16114584 ):R:8:0/0()
kcrfw_redo_gen_ext+11332:0x0000000076fdf3f0(shared pool|permanent memor,duration 1,cls perm+16114672 ):W:8:0/0()
kcrfw_redo_gen_ext+11337:0x0000000076fdf390(shared pool|permanent memor,duration 1,cls perm+16114576 ):R:8:0x758e9f/7704223()
kcrfw_redo_gen_ext+11342:0x0000000076fdf3e8(shared pool|permanent memor,duration 1,cls perm+16114664 ):W:8:0x758e9f/7704223()
kcrfw_redo_gen_ext+11347:0x0000000076fdf388(shared pool|permanent memor,duration 1,cls perm+16114568 ):R:8:0x758ea0/7704224()
kcrfw_redo_gen_ext+11352:0x0000000076fdf3e0(shared pool|permanent memor,duration 1,cls perm+16114656 ):W:8:0x758ea0/7704224()
kcrfw_redo_gen_ext+11366:0x0000000076fdf438(shared pool|permanent memor,duration 1,cls perm+16114744 ):W:4:0xdeadbeef/3735928559(Redo Buffers(pgsz:2048k)|+112049903 redo|PUB_REDO_1+44941039 )
kcrfw_redo_gen_ext+11380:0x0000000076fdf3a8(shared pool|permanent memor,duration 1,cls perm+16114600 ):R:4:0x7270/29296()
kcrfw_redo_gen_ext+11386:0x0000000076fdf3a8(shared pool|permanent memor,duration 1,cls perm+16114600 ):W:4:0x7271/29297()
kcrfw_redo_gen_ext+11392:0x0000000076fdf468(shared pool|permanent memor,duration 1,cls perm+16114792 ):R:8:0xe0000000/3758096384(Redo Buffers(pgsz:2048k)|+134217728 )

And a snippet of trace information of memory accesses of a foreground session when using the second public redo strand:

kcrfw_redo_gen_ext+11277:0x0000000076fdf4e4(shared pool|permanent memor,duration 1,cls perm+16114916 ):R:4:0x1256f/75119()
kcrfw_redo_gen_ext+11290:0x0000000076fdf500(shared pool|permanent memor,duration 1,cls perm+16114944 ):R:8:0/0()
kcrfw_redo_gen_ext+11300:0x0000000076fdf558(shared pool|permanent memor,duration 1,cls perm+16115032 ):W:8:0/0()
kcrfw_redo_gen_ext+11307:0x0000000076fdf4f8(shared pool|permanent memor,duration 1,cls perm+16114936 ):R:8:0xdeadbeef000000e6/-2401053092612144922()
kcrfw_redo_gen_ext+11312:0x0000000076fdf550(shared pool|permanent memor,duration 1,cls perm+16115024 ):W:8:0xdeadbeef000000e6/-2401053092612144922()
kcrfw_redo_gen_ext+11317:0x0000000076fdf4f0(shared pool|permanent memor,duration 1,cls perm+16114928 ):R:8:0x10001000000e6/281479271678182()
kcrfw_redo_gen_ext+11322:0x0000000076fdf548(shared pool|permanent memor,duration 1,cls perm+16115016 ):W:8:0x10001000000e6/281479271678182()
kcrfw_redo_gen_ext+11327:0x0000000076fdf4e8(shared pool|permanent memor,duration 1,cls perm+16114920 ):R:8:0x800/2048()
kcrfw_redo_gen_ext+11332:0x0000000076fdf540(shared pool|permanent memor,duration 1,cls perm+16115008 ):W:8:0x800/2048()
kcrfw_redo_gen_ext+11337:0x0000000076fdf4e0(shared pool|permanent memor,duration 1,cls perm+16114912 ):R:8:0x1256f00012488/322633648383112()
kcrfw_redo_gen_ext+11342:0x0000000076fdf538(shared pool|permanent memor,duration 1,cls perm+16115000 ):W:8:0x1256f00012488/322633648383112()
kcrfw_redo_gen_ext+11347:0x0000000076fdf4d8(shared pool|permanent memor,duration 1,cls perm+16114904 ):R:8:0/0()
kcrfw_redo_gen_ext+11352:0x0000000076fdf530(shared pool|permanent memor,duration 1,cls perm+16114992 ):W:8:0/0()
kcrfw_redo_gen_ext+11307:0x0000000076fdf4d0(shared pool|permanent memor,duration 1,cls perm+16114896 ):R:8:0xe6/230()
kcrfw_redo_gen_ext+11312:0x0000000076fdf528(shared pool|permanent memor,duration 1,cls perm+16114984 ):W:8:0xe6/230()
kcrfw_redo_gen_ext+11317:0x0000000076fdf4c8(shared pool|permanent memor,duration 1,cls perm+16114888 ):R:8:0xdc01ce00/3691105792(Redo Buffers(pgsz:2048k)|+67227136 redo|PUB_REDO_1+118272 )
kcrfw_redo_gen_ext+11322:0x0000000076fdf520(shared pool|permanent memor,duration 1,cls perm+16114976 ):W:8:0xdc01ce00/3691105792(Redo Buffers(pgsz:2048k)|+67227136 redo|PUB_REDO_1+118272 )
kcrfw_redo_gen_ext+11327:0x0000000076fdf4c0(shared pool|permanent memor,duration 1,cls perm+16114880 ):R:8:0/0()
kcrfw_redo_gen_ext+11332:0x0000000076fdf518(shared pool|permanent memor,duration 1,cls perm+16114968 ):W:8:0/0()
kcrfw_redo_gen_ext+11337:0x0000000076fdf4b8(shared pool|permanent memor,duration 1,cls perm+16114872 ):R:8:0x75a1d2/7709138()
kcrfw_redo_gen_ext+11342:0x0000000076fdf510(shared pool|permanent memor,duration 1,cls perm+16114960 ):W:8:0x75a1d2/7709138()
kcrfw_redo_gen_ext+11347:0x0000000076fdf4b0(shared pool|permanent memor,duration 1,cls perm+16114864 ):R:8:0x75a1d2/7709138()
kcrfw_redo_gen_ext+11352:0x0000000076fdf508(shared pool|permanent memor,duration 1,cls perm+16114952 ):W:8:0x75a1d2/7709138()
kcrfw_redo_gen_ext+11366:0x0000000076fdf560(shared pool|permanent memor,duration 1,cls perm+16115040 ):W:4:0xdeadbeef/3735928559(Redo Buffers(pgsz:2048k)|+112049903 redo|PUB_REDO_1+44941039 )
kcrfw_redo_gen_ext+11380:0x0000000076fdf4d0(shared pool|permanent memor,duration 1,cls perm+16114896 ):R:4:0xe6/230()
kcrfw_redo_gen_ext+11386:0x0000000076fdf4d0(shared pool|permanent memor,duration 1,cls perm+16114896 ):W:4:0xe7/231()
kcrfw_redo_gen_ext+11392:0x0000000076fdf590(shared pool|permanent memor,duration 1,cls perm+16115088 ):R:8:0xe0200000/3760193536(Redo Buffers(pgsz:2048k)|+136314880 )

If you pay really close attention you see the memory location moved from 0x76fdf3?? / 0x76fdf4?? to 0x76fdf4?? / 0x76fdf5??, all in permanent memory in the shared pool. Would this mean information for each strand is kept in a struct per strand? Let’s take a memory allocation on a specific offset in the kcrfw_redo_gen_ext function, and take the allocation on exactly the same offset in the kcrfw_redo_gen_ext function, which means it’s probably doing the same, and calculate the difference of the memory allocation:

strand 0
kcrfw_redo_gen_ext+11322:0x0000000076fdf3f8(shared pool|permanent memor,duration 1,cls perm+16114680 ):W:8
strand 1
kcrfw_redo_gen_ext+11322:0x0000000076fdf520(shared pool|permanent memor,duration 1,cls perm+16114976 ):W:8

Now take the memory addresses, and calculate the difference: 0x76fdf520 – 0x76fdf3f8 = 0x128 = 296

Okay, so would the size of the probable struct per strand be 296 bytes? Another question: where would this potential struct start?

Often, the location of structs is fetched from another location, just like we saw earlier for the redo allocation latch, especially if they are more dynamically created than fixed SGA structures, which are not dynamic by nature. Upon close examination of similarities between both pina traces, I found usage of a pointer in the kcrfsg_ struct at offset +0:

kcrfw_redo_gen_ext+14362:0x0000000060016228(fixed sga|var:kcrfsg_+0 ):R:8:0x76fdf388/1996354440(shared pool|permanent memor,duration 1,cls perm+16114568 )

The difference between the memory address I took and the pointer address in kcrfsg_ at offset 0 for the first public redo strand is:
0x76fdf3f8 – 0x76fdf388 = 0x70 = 112
The difference between the memory address I took for the second public redo strand and the pointer in kcrfsg_ at offset 0 is:
0x76fdf520 – 0x76fdf388 = 0x198 = 408
Then subtract with the calculated difference between the memory locations of the first and second public redo strand:
408 – 296 = 112

It would be really helpful to have some kind of explanation about this structure… I searched through My Oracle Support (very seldomly the regular knowledge base provides some internals information, the best chance for internals is the bug database, the rest is…well, let me say: not that helpful. I also hate that other tabs are enabled if nothing is found), oraus.msg, oradebug dumplist, oradebug doc component for any tracing or dumps that could give more information, but nothing seemed to point to any explanation. I dumped the shared pool, but I found the same information (the memory location being permanent memory) as I obtained from x$ksmsp. Also, since the pinatrace annotation takes care of any x$ view that has it’s source address in the SGA, it seems there is no way other than puzzle all the individual locations in what I assume is a structure that holds per strand metadata.

I decided to go through the x$ views one more time, and found an x$ view that I already knew of, which points to both ‘strands’ and ‘kcrf’; x$kcrfstrand. However, the reason it is not annotated is because this x$ view has it’s addresses in the PGA:

SQL> select addr from x$kcrfstrand;


This means that when querying this x$ view, the results are gathered from wherever it comes from (the source can be file headers like redo (x$kccle) and controlfile (x$kcccf) for example, in this case we will the actual source of x$kcrfstrand is the SGA) and then stored temporarily in PGA, potentially manipulated, and then shown from the PGA, as we can see from the addresses (addr field in x$kcrfstrand).

However, what I can do to investigate where the x$kcfstrand takes its information from and run the pina (memory access) trace on the execution of ‘select * from x$kcrfstrand’, and try to see if I can find where it takes its data from. This is how the relevant part looks like:

kcrf_strand_ftcb+23:0x00000000600169b8(Fixed Size(pgsz:2048k)|+92600 fixed sga|var:kcrf_max_strands_+0 ):R:4:0x2/2()
kcrf_strand_ftcb+29:0x00000000600169c8(Fixed Size(pgsz:2048k)|+92616 fixed sga|var:kcrf_actv_pvt_strands_+0 ):R:4:0/0()
kcrf_strand_ftcb+52:0x0000000060016228(Fixed Size(pgsz:2048k)|+90664 fixed sga|var:kcrfsg_+0 shared pool|pointer:shared pool redo struct+0 ):R:8:0x76fdf388/1996354440(shared pool|permanent memor,duration 1,cls perm+16114568 )
__intel_ssse3_rep_memcpy+184:0x0000000076fdf4a0(shared pool|permanent memor,duration 1,cls perm+16114848 ):R:16:0x00000000200040000/8590196736(??)
__intel_ssse3_rep_memcpy+9254:0x0000000076fdf388(shared pool|permanent memor,duration 1,cls perm+16114568 ):R:16:0xfe56e00000fe56e00000/-2305841916834480128()

Going from top to bottom through this snippet:
– qerfxFetch is the query execute rowsource function to query x$ tables, kcrf_strand_ftcb is an helper function (kernel cache recovery file strand fetch copy buffer is my guess) to fetch the kcrf strand data from the shared pool structure; the other functions are memory copy functions.
– Interestingly, the kcrf_strand_ftcb function reads offset 0 of kcrfsg_ just like we seen before. This is quite logical if you think about it, this struct is dynamically allocated in the shared pool, which means it doesn’t have a fixed address, so it needs to be looked up every time something must be read from it.
– The next functions are memcpy functions. I’ve shown the relevant ones. The first memcpy (_intel_fast_memcpy) probably is a helper function that switches to an intel optimised memcpy, __intel_ssse3_rep_memcpy, which takes advantage of intel’s SIMD optimisations (SSE3 means Supplemental Streaming SIMD Extensions 3).
– The memcpy functions at lines 11 and 12 show it’s reading 16 bytes from 0x76fdf4a0, and then writing 16 bytes to 0x7f7780448670.
– 0x76fdf4a0 is the struct for the second strand: kcrfsg_+0 = 0x76fdf388 + 296 = 0x76fdf4b0 + 16 (copy size) = 0x76fdf4a0.
– 0x7f7780448670 is quite probably and address in the process’ PGA, which is used for the intermediate representation of the struct’s data for displaying it for the x$ view.

Even more interesting, this confirms that x$kcrfstrand is the externalisation of the struct that is used together with kcrfsg_, and the calculations that were made previously are correct; kcrfsg_+0 = 0x76fdf388, and that address with 296 added is 0x76fdf4b0, which is the same as 0x76fdf4a0 + 16 (if you look into the above example, the memcpy function copies 16 bytes at a time).

I think it’s pretty safe to assume this kcrf related struct in the shared pool is commonly referred to as ‘KCRFA’ in x$kcrfstrand. Of course we can only guess that the ‘A’ means in kernel cache recovery file, I would say ‘auxiliary’, but is could also very well be the A in ‘strAnd’. It is a structure that clearly was needed when oracle changed the concept of a single log buffer into multiple public redo strands, because identical information of multiple strands need to be stored at the same time, and the information in kcrfsg_ is singular of nature.

Please mind that because of the intermediate representation of the x$ table in PGA, I could not use the trick I showed in my ‘drilldown into the logwriter inner working and communication’, which is query a single column in the x$ view, and see which memory addresses in this struct are touched; the entire struct is copied into PGA and processed before it is used to show the contents, even if only a single column of a single row is queried.

Also, also consider that since the struct data is copied into the process’ PGA in order to make this information human readable, not all offsets (in x$kqfco) of the PGA representation might be applicable to the SGA struct. In fact, I think I have proven that the KCRFA per strand metadata width is 296. Now look at the column offsets from x$kqfco of x$kcrfstrand:

SQL> select c.kqfconam, c.kqfcooff, c.kqfcosiz
from x$kqfco c, x$kqfta t
where t.indx = c.kqfcotab
and t.kqftanam = 'X$KCRFSTRAND'
order by c.kqfcooff;

KQFCONAM                         KQFCOOFF   KQFCOSIZ
------------------------------ ---------- ----------
ADDR                                    0          8
INDX                                    0          4
INST_ID                                 0          4
CON_ID                                  0          2
STRAND_HDR_SCN_KCRFA_CLN                0          8
HIGHEST_SCN_KCRFA_CLN                   0          8
PNEXT_BUF_KCRFA_CLN                    32          8
NEXT_BUF_NUM_KCRFA_CLN                 40          4
BYTES_IN_BUF_KCRFA_CLN                 48          8
AVAILABLE_BUFS_KCRFA_CLN               56          4
MEM_BUFS_AVAILABLE_KCRFA_CLN           60          4
STRAND_HEADER_BNO_KCRFA_CLN            72          4
FLAG_KCRFA_CLN                         76          1
PVT_STRAND_STATE_KCRFA_CLN             77          1
HIGHEST_SUBSCN_KCRFA_CLN               78          2
LWN_BUF_NUM_KCRFA_CLN                  80          4
STRAND_NUM_ORDINAL_KCRFA_CLN           84          4
STRAND_INFO_PTR_KCRFA_CLN              88          8
ACTION_KCRFA                          184          4
LOG_RESIDUE_KCRFA                     188          1
LAST_BUF_WRITTEN_KCRFA                216          4
LAST_BUF_GATHERED_KCRFA               220          4
CONS_LOW_BUFNO_KCRFA                  224          4
FIRST_BUF_KCRFA                       240          8
LAST_BUF_KCRFA                        248          8
TOTAL_BUFS_KCRFA                      256          4
STRAND_SIZE_KCRFA                     260          4
LOWBUF_KCRFA                          272          4
HIBUF_KCRFA                           276          4
WRAPHIBUF_KCRFA                       280          4
LOG_SEQ_KCRFA                         284          4
PTR_KCRF_PVT_STRAND                   304          8
INDEX_KCRF_PVT_STRAND                 312          4
SPACE_KCRF_PVT_STRAND                 316          4
TXN_KCRF_PVT_STRAND                   320          2

This shows that the representation of the KCRFA struct in PGA apparently has a width of 322 bytes, while we know the KCRFA struct itself is 296. I am not sure why this is, the most obvious reason would be some data is processed already in PGA to make easier to show. This might also very well be the reason there is a need for copying it to PGA first, and not display it directly from its shared pool location.


Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: