This post is about how to use gdb, which is a debugger, so very simplistically put an aid for looking at C programs, as a profiler. I use gdb quite a lot for profiling because it’s the easiest way for profiling for me.
Lots of people which I know use other tools like perf, systemtap and dtrace for the same purpose and that’s fine. Each tools has its own advantages and disadvantages. One disadvantage of gdb is that it’s using ptrace to attach to a process, which makes it dead slow from a machine perspective, because everything it then does goes via another process, which is the debugger. That is how the debugger works.
Also lots of people use gdb like I do, and use basic functionality, which is breaking at functions, which makes it possible to find out the sequence of how functions are called, generating backtraces (stack traces) to understand the stack and maybe looking at functions arguments.
But there is more that you can do. Way more actually. These examples use the gdb that comes with RHEL7/OL7 via yum.
You can break conditionally at a function, which means that a break will be executed only if the condition is met. For example:
break sltrgatime64 if $caller_is("ksbabs")
This only breaks at a function called sltrgatime64 if it is called from a function that is called ksbabs.
But for what I wanted to do and investigate, I didn’t only want to do “something” when the sltrgatime64 function was called from ksbabs, I wanted be able to distinguish the two times this was called as the first one and the second one.
I knew that the “first time” was actually right after a function with the name of “dbktFlush” was called. So here’s what I did:
set $status = -1 break dbktFlush if $caller_is("ksbabs") commands silent set $status = 0 c end break sltrgatime64 if $caller_is("ksbabs") commands silent if $status == 0 set $status = 1 printf "first invocation\n" c end if $status == 1 set $status = 2 printf "second invocation\n" c end if $status > 1 printf "error. third or more invocations\n" end end
This is much more advanced than using a single command, but I think for anyone who is able to program a few languages is reasonably simple to understand.
There’s two breaks, and both are conditional, the function from which the function to break on must be ksbabs.
In the break of dbktFlush the (convenience) variable $status is set to 0, indicating the the first invocation for the other break.
In the second break, there are 3 if statements. The first one picks up the variable $status set to 0, indicating the first invocation and then sets it to 1 for the second invocation and the second if picks up the second invocation indicated by the variable $status set to 1 and sets it to 2, so if I didn’t look carefully and there are more invocations of sltrgatime64, I will get notified.
If breaks would be activated when the second call to sltrgatime64 was executed, it would not come across dbktFlush, and thus the $status variable would (still) be set to -1, and the break would skip all ifs.
Of course this is a very specific case, you should tailor it for your own needs. However, this was the actual investigation I done to investigate the function of the above sltrgatime64 calls.
In fact, I gone a little deeper and wanted to understand what the timing of of these two reads of the wall clock time did. For that, I used the python interface in gdb, for which the support seems to be compiled in in the rpm based version.
You can use python, and use variables in python that remain state between gdb interactions. I used a dead simple interaction; calling time.time() to time the invocation of the sltrgatime64 executed right after dbktFlush was called:
python import time python var_time=time.time() break dbktFlush if $caller_is("ksbabs") commands silent set $status = 0 c end break sltrgatime64 if $caller_is("ksbabs") commands silent if $status == 0 set $status = 1 python print time.time()-var_time python var_time=time.time() c else c end end
If you set single logwriter to true and execute this against the logwriter, you will see that the logwriter tries to keep a 3 second pace by adjusting the sleep time on semtimedop().
My actual goal was to try to find out what is exactly timed for the message ‘Warning: log write elapsed time ????ms, size ?KB’ in the logwriter trace file, which Nenad Noveljic proved is not only about the actual IO time.
To be honest, I done it clumsy by going backward measuring all the calls that measured wall clock time and should have put a break on write and look at the backtrace. When I executed a backtrace of the write call that wrote the trace file entry, I could see this line was written in kcrfw_postprocess_write().
This is the gdb script with the timing:
python import time break skgstmGetEpochTs if $caller_is("kcrfw_redo_write_driver") commands silent python var_time=time.time() c end break kcrfw_do_write commands silent print $rip c end break skgstmGetEpochTs if $caller_is("kcrfw_postprocess_write") commands silent shell sleep 1 python print time.time()-var_time c end
What does gdb script does, is always set the python variable var_time, which includes the idle cycles of the logwriter. If it goes into the function kcrfw_do_write it actually is going to write. That is also when potentially the line “Warning: log write elapsed time 1024ms, size 0KB” could be produced.
when the function skgstmGetEpochTs is called in the function kcrfw_postprocess_write, it executes a sleep for 1 second, and then prints the current time minus the time obtained when kcrfw_redo_write_driver was entered.
What this proofs, is that the timing for the log write time elapsed time warning is done from the start of the kcrfw_redo_write_driver function. That means that everything that the logwriter needs to do, excluding the the last bit of what the logwriter performs, posting the foregrounds, is timed, including dealing with the internal structures that keep track of the redo strands and the latching of it.