PL/SQL context switch

Whenever you use PL/SQL in SQL statements, the Oracle engine needs to switch from doing SQL to doing PL/SQL, and switch back after it is done. Generally, this is called a “context switch”. This is an example of that:

-- A function that uses PL/SQL 
create or replace function add_one( value number ) return number is
        l_value number(10):= value;
begin
        return l_value+1;
end;
/
-- A SQL statement that uses the PL/SQL function
select sum(add_one(id)) from t2;

Of course the functionality of the function is superfluous, it can easily be done in ‘pure’ SQL with ‘select sum(id+1) from t2’. But that is not the point.
Also, I added a sum() function, for the sake of preventing output to screen per row.

The first thing to check, if there is a difference in performance between executing with sum(id+1) and sum(add_one(id)). If there isn’t we can stop here 🙂

TS@frits > set timing on
TS@frits > select sum(id+1) from t2;

 SUM(ID+1)
----------
5.0000E+11

Elapsed: 00:00:00.19
TS@frits > select sum(add_one(id)) from t2;

SUM(ADD_ONE(ID))
----------------
      5.0000E+11

Elapsed: 00:00:01.13

This statement executes a full table scan, I’ve closely guarded the IO times were alike.
But it looks there is a whopping difference between including PL/SQL and not: 113/19*100=595%, or differently worded: almost six times slower.

How does this work? Let’s have a look using stapflames. The idea behind flame graphs in general is to understand in which (c code) functions (user mode and kernel mode) the time is spend. Because of the full backtrace (all the called functions on top of each other), it gives an insight on how a program is working.

The first thing we need to establish, is how PL/SQL looks like from the perspective of C-functions. For that reason, I created a bogus PL/SQL program to profile:

declare
t number:=0;
begin
while t < 1000000 loop
t:=t+1;
end loop;
end;
/

Yes, that is right, the only thing this anonymous PL/SQL block does, is declare a number to a variable named ‘t’, and then loop adding one to the variable until the variable reaches the number 1000000. Again, what this program does is not interesting nor functional, the only thing it needs to do is run, so when we profile the program we are sure it is doing PL/SQL.

I ran this anonymous PL/SQL block using my stapflame utility to generate a flamegraph, and this is how that looks like:
I have taken the flamegraph of all time considered on cpu by the Oracle database.

First of all, one important property of flamegraphs is shown: the sequence is random. If you look at the kpoal8 function, you see there are two different paths taken from this function: opiexe (oracle program interface execute) and opiosq0 (oracle program interface prepare to parse). Of course the PL/SQL block is first parsed and then executed, so the order is different than shown.

What is also very visible, is that almost all time executing, is done using a function ‘pfrrun’, which seems to be the main function executing PL/SQL. On top of that we can see some functions which roughly resemble the functionality used in the PL/SQL block: pfrinstr_ADDN (addition, t:=t+1) and pfrinstr_RELBRNCH (the loop). This also gives a fair indication PL/SQL is interpreted rather than compiled. Anyway, what is important is that from looking at the little test above, we can distinguish running PL/SQL from SQL by the pfrrun function.

Now let’s look at at a flamegraph of running a PL/SQL function in the SQL statement:
The flamegraph shows all the time considered running on CPU for executing the statement ‘select sum(add_one(id)) from t2’. There is a lot to see!

When we look on top of the function kpoal8, we see the function opifch2. This means the vast majority of the time is spend in the fetch phase of the SQL statement. On top of the opifch2 function we see two functions which start with qer. ‘qer’ probably means Query Execute Rowsource. ‘qertbFetch’ is the fetch procedure for table scans. Then we see the kdsttgr function (kernel data scan table get row), and then the ultra fast table scan function (kdstf; kernel data scan table full) followed by a lot of zero’s and/or one’s and ending with ‘km’ or ‘kmP’. There are a lot of kdstf functions in the Oracle executable, I assume the zero’s and one’s after ‘kdstf’ are a bitmap of potentially needed functions during the full scan, so the full table scan function can omit a lot of choices on runtime, leading to better CPU efficiency. See an article by Tanel on what this means.

In the full table scan function, there are two main functions which consume time, kdst_fetch and qesaFastAggNonDistSS. ‘kdst_fetch’ and deeper functions are functions related to doing IO for reading the data from the data file. ‘qesaFastAggNonDistSS’ is the main function for actually processing the data. The function qesaFastAggNonDistSS has two main functions consuming its time, evaopn2 and a function visible as ‘sl..’, which is actually a function called slnxsum, in other words, the sum() function. The function evaopn2 is a function to evaluate operands. This evaluation is the path towards executing the PL/SQL function.

Actually, when carefully looking at the evaopn2 function, we see the slnxsum function, and ‘evapls’, which is the function to switch to PL/SQL. The two main functions in ‘evapls’ are kgmexec and opiomc. Again here the order is switched; what happens here is first a cursor must be mapped for executing PL/SQL (opiomc function), after which it can be executed (kgmexec function).

In order to understand what the time taken by “switching” to PL/SQL is, we can take the total time the query engine is processing everything PL/SQL related, which is the total time taken by the ‘evapls’ function, and measure the time actually running PL/SQL, which is the time taken by the ‘pfrrun’ function. This can be accomplished by simple systemtap script:
(please mind you need a recent systemtap version, preferably gotten from https://sourceware.org/git/systemtap.git, and kernel version 3.10 to be able to use return probes)

global evapls_time, pfrrun_time, evapls_tot=0, pfrrun_tot=0

probe begin {
	printf("Begin.\n")
}
probe process("/u01/app/oracle/product/12.1.0.2/dbhome_1/bin/oracle").function("evapls") {
	if ( pid() == target() )
		evapls_time=local_clock_us()
}
probe process("/u01/app/oracle/product/12.1.0.2/dbhome_1/bin/oracle").function("evapls").return {
	if ( pid() == target() )
		evapls_tot+=local_clock_us()-evapls_time
}
probe process("/u01/app/oracle/product/12.1.0.2/dbhome_1/bin/oracle").function("pfrrun") {
	if ( pid() == target() )
		pfrrun_time=local_clock_us()
}
probe process("/u01/app/oracle/product/12.1.0.2/dbhome_1/bin/oracle").function("pfrrun").return {
	if ( pid() == target() )
		pfrrun_tot+=local_clock_us()-pfrrun_time
}

probe end {
	printf("\nevapls time: %12d\npfrrun time: %12d\n", evapls_tot, pfrrun_tot)
}

This is how it looks like on my machine:

# stap -x 29680 plsql.stap
Begin.
^C
evapls time:      3203637
pfrrun time:       951830

So, the overhead or context switching time, which must be Oracle server code executing between the the evapls function, where it determines it needs to execute PL/SQL and the pfrrun function, which is the PL/SQL interpreter, is on my machine:
(1-951830/3203637)*100=70%
Wow!

One way of reducing this problem, is using subquery factoring, alias the ‘with clause’. To use the function that way, this is how the SQL should be written:

with
function add_one( value number ) return number is
	l_value number(10):= value;
begin
	return l_value+1;
end;
select sum(add_one(id)) from t2;
/

Let’s have a look at the flamegraph of this construction:
It becomes apparent that with subquery factoring, there are way lesser functions between the evapls and pfrrun functions.
Normal PLSQL: kgmexec, kkxmpexe, kkxdexe, peidxexe, peidxr_run, plsql_run
Subquery factoring: kkxmss_speedy_stub, peidxrex
Also mind there is no codepath for mapping a cursor.

Let’s have a look at the timing:

# stap -x 29680 plsql.stap
Begin.
^C
evapls time:      2192685
pfrrun time:       880230

The time spend in PL/SQL, by looking at total time spend in the evapls function reduced by 32% ((1-2192685/3203637)*100).
However, if you calculate the overhead, it is still pretty significant: (1-880230/2192685)*100=60%

Conclusion
The most simple conclusion I can make is: do not use PL/SQL if you can solve it in SQL, like in the example above. This does not mean you should never use PL/SQL, contrary: in a lot of cases processing should be done where the data is, and sometimes you need a procedural language for that.

I made a lot of assumptions in this little investigation. The function naming (the translation from the Oracle C function name to what functionality it is supposed to deliver) are speculations.

The context switch between SQL mode and PL/SQL mode looks like it is technically setting up the execution environment for PL/SQL. Indeed this takes time, and the true PL/SQL execution time when repeatedly executing PL/SQL is very low in my case. Please mind actual times will differ on different systems. However, the main conclusion is that using PL/SQL in SQL execution probably is not the most performant thing to do, including using subquery factoring.

Advertisements
4 comments
  1. Jan Karremans said:

    Very useful end interesting investigation with a good conclusion.
    Thank you for this, Frits, much appreciated.

  2. Xavi Morera said:

    I faced a similar issue some days ago: Problem was that there was a call to a PL/SQL function in a WHERE clause, so it was executed for each of the rows of the table (9M rows):

    select t1.*
    from t1
    where t1.date > pl_sql_function
    ;

    As the function had no parameters and always returned the same result, I moved the function call to a WITH clause, and changed the WHERE clause to reference the WITH block:

    WITH w1 as
    (select /*+ materialize */ pl_sql_function date from dual)
    select t1.*
    from t1, w1
    where t1.date > w1.date
    ;

    And volià, query passed from 36 min to 8 sec!

    Xavi

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: