Direct path and buffered reads again.
Since the direct path feature for serial processes was discovered after it became available in Oracle 11.2.0.1 (as far as I know, I haven’t checked Oracle 11.1), there have been a lot of blog posts on when this happens. A lot of these do not specify the Oracle version, which is a failure in my opinion. There are different decisions made in different versions.
The purpose of this blogpost is to show the results of my tests on when the Oracle database engine switches from buffered to direct path reads and vice versa. There probably are decisions made by the database engine for this feature based on internal kept statistics, like general activity and object usage, which means my tests might be different in your database. For that purpose I included an anonymous PL/SQL block in this post so you can replay the same test in your own database, except for the table, which you have to create yourself.
There are a few basics rules that are applied by the database engine for the buffered or direct path read full segment scan decision:
– If the segment size is smaller than 5 * _SMALL_TABLE_THRESHOLD (11.2.0.1 and 11.2.0.2) or _SMALL_TABLE_THRESHOLD (11.2.0.3 and higher) it will be read into the buffercache (also known as a buffered read), any block already read into the buffercache will be read from the cache.
– If the segment size is bigger than 5 * the buffer cache size (set by _VERY_LARGE_OBJECT_THRESHOLD, which value is set at 500 by default, which seems to be 500% of the buffercache), it will be read via direct path, any blocks that are in the cache are ignored. If blocks of this segment are dirty, these are checkpointed to disk before the segment is read, which is visible with the events ‘messages’ and ‘enq: KO – fast object checkpoint’.
– If a segment is between above mentioned sizes, the database engine makes a decision. In other words: you can get different behaviour in different scenario’s. In order to give this size a name, the most logical name to indicate this size is a medium table. Anytime a direct path read is chosen, any dirty blocks are checkpointed to disk, visible with the events ‘messages’ and ‘enq: KO – fast object checkpoint’.
Starting from 11.2.0.3, the direct path decision starting point changed from 5 * _SMALL_TABLE_THRESHOLD to _SMALL_TABLE_THRESHOLD. When a nsmtio trace is taken, an extra line appears in the trace when a segment is sized between _SMALL_TABLE_THRESHOLD and 5 * _SMALL_TABLE_THRESHOLD:
Partial trace lines for a medium segment sized bigger than 5 * _SMALL_TABLE_THRESHOLD (12.1.0.2):
NSMTIO: kcbism: islarge 1 next 0 nblks 21128 type 2, bpid 3, kcbisdbfc 0 kcbnhl 16384 kcbstt 3658 keep_nb 0 kcbnbh 182931 kcbnwp 1 NSMTIO: kcbimd: nblks 21128 kcbstt 3658 kcbpnb 18293 kcbisdbfc 3 is_medium 0 NSMTIO: kcbivlo: nblks 21128 vlot 500 pnb 182931 kcbisdbfc 0 is_large 0 NSMTIO: qertbFetch:[MTT < OBJECT_SIZE < VLOT]: Checking cost to read from caches(local/remote) and checking storage reduction factors (OLTP/EHCC Comp)
On the first line, nblks indicates the segment size, and kcbstt indicates _SMALL_TABLE_THRESHOLD.
Partial trace lines for a medium segment sized smaller than 5 * _SMALL_TABLE_THRESHOLD (121.1.0.2):
NSMTIO: kcbism: islarge 1 next 0 nblks 3668 type 2, bpid 3, kcbisdbfc 0 kcbnhl 16384 kcbstt 3658 keep_nb 0 kcbnbh 182931 kcbnwp 1 NSMTIO: kcbimd: nblks 3668 kcbstt 3658 kcbpnb 18293 kcbisdbfc 3 is_medium 0 NSMTIO: kcbcmt1: hit age_diff adjts last_ts nbuf nblk has_val kcbisdbfc cache_it 898 8181 7283 182931 3668 1 0 1 NSMTIO: kcbivlo: nblks 3668 vlot 500 pnb 182931 kcbisdbfc 0 is_large 0 NSMTIO: qertbFetch:[MTT < OBJECT_SIZE < VLOT]: Checking cost to read from caches(local/remote) and checking storage reduction factors (OLTP/EHCC Comp)
The first line shows the segment size (3668) being just above the _SMALL_TABLE_THRESHOLD (3658).
What is interesting to mention is the line with kcbcmt1. Things like hit, age_diff, adjts, last_ts seem to indicate additional statistics on (recent) activity are included.
For the results shown below, I created an anonymous PL/SQL block that is included at the end of the blogpost. Also, what I did is I flushed the buffercache so the measurement starts off with an empty cache. If you are running this on a real used database, you shouldn’t flush the buffercache, and probably you will see small numbers of read blocks during reading.
Version 11.2.0.1, _SMALL_TABLE_THRESHOLD = 1436 (_STT*5=7180), segment size 7040.
table: T_OVER_STT size: 7040 blocks, 330171 rows. cached - physical reads cache/direct: (10) 749/0 full scan - physical reads cache/direct: 6129/0 cached - physical reads cache/direct: (20) 0/0 full scan - physical reads cache/direct: 0/0 cached - physical reads cache/direct: (30) 0/0 full scan - physical reads cache/direct: 0/0 cached - physical reads cache/direct: (40) 0/0 full scan - physical reads cache/direct: 0/0 cached - physical reads cache/direct: (50) 0/0 full scan - physical reads cache/direct: 0/0 cached - physical reads cache/direct: (60) 0/0 full scan - physical reads cache/direct: 0/0 cached - physical reads cache/direct: (70) 0/0 full scan - physical reads cache/direct: 0/0 cached - physical reads cache/direct: (80) 0/0 full scan - physical reads cache/direct: 0/0 cached - physical reads cache/direct: (90) 0/0 full scan - physical reads cache/direct: 0/0 cached - physical reads cache/direct: (100) 0/0 full scan - physical reads cache/direct: 0/0
First 10% is read with into the buffer cache, then a full scan is initiated. The full scan reads it buffered to, because the segment size is smaller than 5 * _SMALL_TABLE_THRESHOLD. Next, all other scans are read via the cache because it has been read into the cache already, which is why all the other scans do not cause physical reads.
Version 11.2.0.2, _SMALL_TABLE_THRESHOLD = 1492 (_STT*5=7460), segment size 1920.
table: T_JUST_OVER_STT size: 1920 blocks, rows. cached - physical reads cache/direct: (10) 239/0 full scan - physical reads cache/direct: 1623/0 cached - physical reads cache/direct: (20) 0/0 full scan - physical reads cache/direct: 0/0 cached - physical reads cache/direct: (30) 0/0 full scan - physical reads cache/direct: 0/0 cached - physical reads cache/direct: (40) 0/0 full scan - physical reads cache/direct: 0/0 cached - physical reads cache/direct: (50) 0/0 full scan - physical reads cache/direct: 0/0 cached - physical reads cache/direct: (60) 0/0 full scan - physical reads cache/direct: 0/0 cached - physical reads cache/direct: (70) 0/0 full scan - physical reads cache/direct: 0/0 cached - physical reads cache/direct: (80) 0/0 full scan - physical reads cache/direct: 0/0 cached - physical reads cache/direct: (90) 0/0 full scan - physical reads cache/direct: 0/0 cached - physical reads cache/direct: (100) 0/0 full scan - physical reads cache/direct: 0/0
Here the same thing is happening, 11.2.0.2 behaves identical to 11.2.0.1 with a segment that is smaller than 5 * _SMALL_TABLE_THRESHOLD. First the first 10% of the blocks is read, then a full scan reads the other blocks in the cache, after which all the other scans read from the cache, so no physical reads are shown.
Version 11.2.0.3, _SMALL_TABLE_THRESHOLD = 1482 (_STT*5=7410), segment size 1664.
table: T_OVER_STT size: 1664 blocks, 74404 rows. cached - physical reads cache/direct: (10) 251/0 full scan - physical reads cache/direct: 0/1548 cached - physical reads cache/direct: (20) 126/0 full scan - physical reads cache/direct: 0/1548 cached - physical reads cache/direct: (30) 126/0 full scan - physical reads cache/direct: 0/1548 cached - physical reads cache/direct: (40) 126/0 full scan - physical reads cache/direct: 0/1548 cached - physical reads cache/direct: (50) 252/0 full scan - physical reads cache/direct: 0/1548 cached - physical reads cache/direct: (60) 126/0 full scan - physical reads cache/direct: 0/1548 cached - physical reads cache/direct: (70) 126/0 full scan - physical reads cache/direct: 0/1548 cached - physical reads cache/direct: (80) 126/0 full scan - physical reads cache/direct: 0/1548 cached - physical reads cache/direct: (90) 252/0 full scan - physical reads cache/direct: 0/1548 cached - physical reads cache/direct: (100) 44/0 full scan - physical reads cache/direct: 0/1548
Here we see the database behave differently! The same pattern as shown earlier is executed: first a percentage of the table is read set fixed to doing a buffered read, then a full scan is executed, which now scans via direct path for a segment that is smaller than 5 * _SMALL_TABLE_THRESHOLD. Another interesting thing to see is that despite the last cached/buffered scan reading ALL the blocks in the cache, a full scan remains scanning via direct path. Please mind we are scanning a segment bigger than _SMALL_TABLE_THRESHOLD, but smaller the 5 * _SMALL_TABLE_THRESHOLD.
Careful testing shows that all the versions up to Oracle 12.1.0.2 behave consistent like this. A final consideration, as mentioned earlier is the testing in my lab is on idle databases, often just started up for the sake of the test, so dynamics and (internally kept) runtime statistics might show something else in a really used database.
How about a segment that is truly bigger than 5 * _SMALL_TABLE_THRESHOLD?
Version 11.2.0.1, _SMALL_TABLE_THRESHOLD=1436, 5*_STT=7180, segment size 21504.
table: T2 size: 21504 blocks, 1000000 rows. cached - physical reads cache/direct: (10) 2135/0 full scan - physical reads cache/direct: 0/20941 cached - physical reads cache/direct: (20) 2142/0 full scan - physical reads cache/direct: 0/20941 cached - physical reads cache/direct: (30) 2016/0 full scan - physical reads cache/direct: 0/20941 cached - physical reads cache/direct: (40) 2149/0 full scan - physical reads cache/direct: 0/20941 cached - physical reads cache/direct: (50) 2040/0 full scan - physical reads cache/direct: 0/20941 cached - physical reads cache/direct: (60) 2040/0 full scan - physical reads cache/direct: 0/20941 cached - physical reads cache/direct: (70) 2168/0 full scan - physical reads cache/direct: 0/20941 cached - physical reads cache/direct: (80) 2168/0 full scan - physical reads cache/direct: 0/20941 cached - physical reads cache/direct: (90) 2040/0 full scan - physical reads cache/direct: 0/20941 cached - physical reads cache/direct: (100) 2044/0 full scan - physical reads cache/direct: 0/0
Here we see a table that is bigger than 5 * _SMALL_TABLE_THRESHOLD, and we see that the size makes a difference! When the amount of blocks in the cache is close to 100%, the full scan might choose a buffered scan instead of a direct path read. In fact, if I change the starting percentage to 90% and the increase to 1%, we can see the switching point is at 99%:
table: T2 size: 21504 blocks, 1000000 rows. cached - physical reads cache/direct: (90) 18898/0 full scan - physical reads cache/direct: 0/20941 cached - physical reads cache/direct: (91) 256/0 full scan - physical reads cache/direct: 0/20941 cached - physical reads cache/direct: (92) 124/0 full scan - physical reads cache/direct: 0/20941 cached - physical reads cache/direct: (93) 256/0 full scan - physical reads cache/direct: 0/20941 cached - physical reads cache/direct: (94) 256/0 full scan - physical reads cache/direct: 0/20941 cached - physical reads cache/direct: (95) 128/0 full scan - physical reads cache/direct: 0/20941 cached - physical reads cache/direct: (96) 256/0 full scan - physical reads cache/direct: 0/20941 cached - physical reads cache/direct: (97) 252/0 full scan - physical reads cache/direct: 0/20941 cached - physical reads cache/direct: (98) 128/0 full scan - physical reads cache/direct: 0/20941 cached - physical reads cache/direct: (99) 256/0 full scan - physical reads cache/direct: 132/0 cached - physical reads cache/direct: (100) 0/0 full scan - physical reads cache/direct: 0/0
When 99% of the blocks is read into the buffercache, you can see the full scan following it choosing a buffered scan method, and reading the remaining 132 blocks into the cache.
This behaviour is consistent up to Oracle version 12.1.0.2. Just to make sure we are all on the same page again, this is in my test databases. I love to hear if you see something different in real life.
Another thing that influences direct path reads is the number of dirty blocks. If v_read_update in the procedure below is set to UPDATE, instead of scanning a certain percentage into the buffercache, a certain percentage of blocks is updated, after which the same full scan is executed, and then the update is rolled back. Please mind I took a table that is bigger than 5 * _SMALL_TABLE_THRESHOLD in the tests below. When the procedure is set to UPDATE, only the full scan is shown, not the update. The number between brackets is the percentage of the table that has been updated prior to the full scan.
Version 11.2.0.1
table: T2 size: 21504 blocks, 1000000 rows. --update full scan - physical reads cache/direct: (10) 0/20941 full scan - physical reads cache/direct: (20) 0/20941 full scan - physical reads cache/direct: (30) 0/20941 full scan - physical reads cache/direct: (40) 0/20941 full scan - physical reads cache/direct: (50) 0/20941 full scan - physical reads cache/direct: (60) 0/20941 full scan - physical reads cache/direct: (70) 0/20941 full scan - physical reads cache/direct: (80) 4084/0 full scan - physical reads cache/direct: (90) 0/0 full scan - physical reads cache/direct: (100) 0/0
Version 11.2.0.2
table: T2 size: 21504 blocks, 1000000 rows. --update full scan - physical reads cache/direct: (10) 0/20941 full scan - physical reads cache/direct: (20) 0/20941 full scan - physical reads cache/direct: (30) 0/20941 full scan - physical reads cache/direct: (40) 0/20941 full scan - physical reads cache/direct: (50) 0/20941 full scan - physical reads cache/direct: (60) 0/20941 full scan - physical reads cache/direct: (70) 0/20941 full scan - physical reads cache/direct: (80) 4084/0 full scan - physical reads cache/direct: (90) 0/0 full scan - physical reads cache/direct: (100) 0/0
Version 11.2.0.3
table: T2 size: 21504 blocks, 1000000 rows. --update full scan - physical reads cache/direct: (10) 0/20941 full scan - physical reads cache/direct: (20) 0/20941 full scan - physical reads cache/direct: (30) 0/20941 full scan - physical reads cache/direct: (40) 0/20941 full scan - physical reads cache/direct: (50) 0/20941 full scan - physical reads cache/direct: (60) 8420/0 full scan - physical reads cache/direct: (70) 0/0 full scan - physical reads cache/direct: (80) 0/0 full scan - physical reads cache/direct: (90) 0/0 full scan - physical reads cache/direct: (100) 0/0
Version 11.2.0.4
table: T2 size: 21504 blocks, 1000000 rows. --update full scan - physical reads cache/direct: (10) 0/20941 full scan - physical reads cache/direct: (20) 0/20941 full scan - physical reads cache/direct: (30) 0/20941 full scan - physical reads cache/direct: (40) 0/20941 full scan - physical reads cache/direct: (50) 0/20941 full scan - physical reads cache/direct: (60) 8420/0 full scan - physical reads cache/direct: (70) 0/0 full scan - physical reads cache/direct: (80) 0/0 full scan - physical reads cache/direct: (90) 0/0 full scan - physical reads cache/direct: (100) 0/0
Version 12.1.0.1
table: T2 size: 21504 blocks, 1000000 rows. --update full scan - physical reads cache/direct: (10) 0/20941 full scan - physical reads cache/direct: (20) 0/20941 full scan - physical reads cache/direct: (30) 0/20941 full scan - physical reads cache/direct: (40) 0/20941 full scan - physical reads cache/direct: (50) 0/20941 full scan - physical reads cache/direct: (60) 8420/0 full scan - physical reads cache/direct: (70) 0/0 full scan - physical reads cache/direct: (80) 0/0 full scan - physical reads cache/direct: (90) 0/0 full scan - physical reads cache/direct: (100) 0/0
Version 12.1.0.2
table: T2 size: 21504 blocks, 1000000 rows. --update full scan - physical reads cache/direct: (10) 0/20941 full scan - physical reads cache/direct: (20) 0/20941 full scan - physical reads cache/direct: (30) 0/20941 full scan - physical reads cache/direct: (40) 0/20941 full scan - physical reads cache/direct: (50) 0/20941 full scan - physical reads cache/direct: (60) 8420/0 full scan - physical reads cache/direct: (70) 0/0 full scan - physical reads cache/direct: (80) 0/0 full scan - physical reads cache/direct: (90) 0/0 full scan - physical reads cache/direct: (100) 0/0
During repeated tests, the above figures show there is a difference between Oracle versions 11.2.0.1/11.2.0.2 and version 11.2.0.3 and above. Versions 11.2.0.1 and 11.2.0.2 seem to have their switching point from direct path to buffered at approximately 75% dirty blocks, while versions 11.2.0.3 and higher have their switching point at approximately 50%.
I am convinced there are more things that influence this. In fact, I had to re-run the tests a couple of times (flushing the buffercache before each run of course) in order to come up with the figures above. This means that sometimes this can show fluctuations, even in my idle database.
This is the anonymous PL/SQL procedure to test this in your own database:
declare -- this is a direct path read tester. -- please mind the user executing this anonymous PL/SQL block must have the following rights: -- - alter session -- - select on v$mystat and v$statname -- -- -> because user_tables is used, statistics must be correct! -- -- fill out the following details -- table to read -> it's best to flush the buffer cache if you want to start with a low percentage of blocks in the cache! v_table varchar2(30) := 'YOUR_TABLE_HERE'; -- percentage to start at for reading blocks in the cache. v_current_pct number := 10; -- the amount to add to the percentage after the run has been executed. v_step number := 10; -- read or update (uppercase). v_read_update varchar2(10) := 'READ'; -- v_nr_blocks number; v_numrows number; v_readrows number; v_rd_cache number; v_rd_direct number; v_tmp number; v_read_to_cache varchar2(200) := 'select /*+ full('||v_table||') */ count(*) from '||v_table||' where rownum < :1'; v_update varchar2(200) := 'update '||v_table||' set id=id+1 where rownum < :1'; v_try_read varchar2(200) := 'select /*+ full('||v_table||') */ count(*) from '||v_table; v_ret number; begin select blocks into v_nr_blocks from user_segments where segment_type = 'TABLE' and segment_name = v_table; select num_rows into v_numrows from user_tables where table_name = v_table; dbms_output.put_line('table: '||v_table||' size: '||v_nr_blocks||' blocks, '||v_numrows||' rows.'); if v_read_update = 'READ' then while v_current_pct <= 100 loop v_readrows := (v_nr_blocks / 100) * v_current_pct * (v_numrows / v_nr_blocks); select value into v_rd_cache from v$mystat, v$statname where v$mystat.statistic#=v$statname.statistic# and name = 'physical reads cache'; select value into v_rd_direct from v$mystat, v$statname where v$mystat.statistic#=v$statname.statistic# and name = 'physical reads direct'; execute immediate 'alter session set events ''10949 trace name context forever, level 1'''; execute immediate v_read_to_cache into v_ret using v_readrows; execute immediate 'alter session set events ''10949 trace name context forever, level 0'''; select value into v_tmp from v$mystat, v$statname where v$mystat.statistic#=v$statname.statistic# and name = 'physical reads cache'; v_rd_cache := v_tmp - v_rd_cache; select value into v_tmp from v$mystat, v$statname where v$mystat.statistic#=v$statname.statistic# and name = 'physical reads direct'; v_rd_direct := v_tmp - v_rd_direct; dbms_output.put_line('cached - physical reads cache/direct: ('||v_current_pct||') '||v_rd_cache||'/'||v_rd_direct); -- select value into v_rd_cache from v$mystat, v$statname where v$mystat.statistic#=v$statname.statistic# and name = 'physical reads cache'; select value into v_rd_direct from v$mystat, v$statname where v$mystat.statistic#=v$statname.statistic# and name = 'physical reads direct'; execute immediate v_try_read into v_ret; select value into v_tmp from v$mystat, v$statname where v$mystat.statistic#=v$statname.statistic# and name = 'physical reads cache'; v_rd_cache := v_tmp - v_rd_cache; select value into v_tmp from v$mystat, v$statname where v$mystat.statistic#=v$statname.statistic# and name = 'physical reads direct'; v_rd_direct := v_tmp - v_rd_direct; dbms_output.put_line('full scan - physical reads cache/direct: '||v_rd_cache||'/'||v_rd_direct); v_current_pct := v_current_pct + v_step; end loop; end if; if v_read_update = 'UPDATE' then v_current_pct := 10; v_step := 10; dbms_output.put_line('--update'); while v_current_pct <= 100 loop v_readrows := (v_nr_blocks / 100) * v_current_pct * (v_numrows / v_nr_blocks); execute immediate v_update using v_readrows; select value into v_rd_cache from v$mystat, v$statname where v$mystat.statistic#=v$statname.statistic# and name = 'physical reads cache'; select value into v_rd_direct from v$mystat, v$statname where v$mystat.statistic#=v$statname.statistic# and name = 'physical reads direct'; execute immediate v_try_read into v_ret; select value into v_tmp from v$mystat, v$statname where v$mystat.statistic#=v$statname.statistic# and name = 'physical reads cache'; v_rd_cache := v_tmp - v_rd_cache; select value into v_tmp from v$mystat, v$statname where v$mystat.statistic#=v$statname.statistic# and name = 'physical reads direct'; v_rd_direct := v_tmp - v_rd_direct; dbms_output.put_line('full scan - physical reads cache/direct: ('||v_current_pct||') '||v_rd_cache||'/'||v_rd_direct); execute immediate 'rollback'; v_current_pct := v_current_pct + v_step; end loop; end if; end; /
This script should be run as the owner of the table you want to investigate for direct path reads.
The Oracle user must have ALTER SESSION rights to be able to set an event to disable direct path reads, and it needs select rights on V$MYSTAT and V$STATNAME. And obviously select rights on the table it wants to read. Another important thing is the table must have accurate statistics. If you gather statistics after you have run the PL/SQL procedure, you should flush the buffer cache, and flush the shared pool in order to flush the cached cursors and statistics.
It first reads a certain percentage of the table with event 10949 set, which disables direct path reads, then the event is disabled and a full table scan is executed. Then the variable for the percentage is increased, and the buffered scan is executed again, and the full scan following it, etc.
Change the v_table definition to the table you want to scan.
What you additionally can do, is v_current_pct (10 by default) as the starting amount to read buffered, and the amount to increase the percentage with in v_step (10 by default). One of the things I done is change v_current_pct to 90 and v_step to 1.
Frits,
do you have trace output for some test made in 11.2.0.3+? I’m interested in the part/point in which the decision is made:
—-11.2.0.4 cached non compressed – last direct read at:
NSMTIO: qertbFetch:[MTT < OBJECT_SIZE < VLOT]: Checking cost to read from caches(local/remote) and checking storage reduction factors (OLTP/EHCC Comp)
NSMTIO: kcbdpc:DirectRead: tsn: 4, objd: 87878, objn: 87874
ckpt: 1, nblks: 874, ntcache: 436, ntdist:0
NSMTIO: Additional Info: VLOT=132300
Object# = 87878, Object_Size = 874 blocks
SqlId = 1nb5kz7dm1jf6, plan_hash_value = 4229476807, Partition# = 0
NSMTIO: qertbFetch:[MTT < OBJECT_SIZE < VLOT]: Checking cost to read from caches(local/remote) and checking storage reduction factors (OLTP/EHCC Comp)
NSMTIO: kcbdpc:NoDirectRead:[CACHE_READ]: tsn: 4, objd: 87878, objn: 87874
ckpt: 0, nblks: 874, ntcache: 437, ntdist:0
NSMTIO: Additional Info: VLOT=132300
Object# = 87878, Object_Size = 874 blocks
SqlId = 1nb5kz7dm1jf6, plan_hash_value = 4229476807, Partition# = 0
This view is more comfortable, because you have the VLOT from which you can calculate all thresholds and you can see the object size, cached block, so on. Paste some if you have.
Thanks.
Hi Nikolay, thanks for stopping by! Here is the output from version 12.1.0.2 where the switch has been made:
Hi, in 12.1.0.2 everything is different because of ABTC. Do you have from 12.1.0.1 or 11.2.0.4. I’m really curious why my results are different. The only diff I see is the platform. My post is on Solaris, but I try it on Linux as well and I’m sure that the threshold was 50% cached for non compressed tables. Anyway, when I have time I’ll test it again. I just need to find time to rebuild my lab… #toobusy
This is version 11.2.0.4:
Could you expand on ABTC and everything being different? I know about ABTC, but haven’t investigated it deeply. As far as I know, it is another mechanism which can be invoked if needed. If you look at the numbers from the investigation, you’ll see there is no difference. However, this is run against an idle database.
Hi Frits,
This algorithm is deprecated/”totally killed” in 12.1.0.2 which makes our discussion almost useless, but it’s still relevant and interesting. I didn’t had time to write about ABTC, I have only one maybe a bit old but interesting post in here:
http://progeeking.com/2014/10/03/rac-12c-direct-path-reads/
You can check if you have time.
When I manage I’ll make the tests and provide you raw results. I don’t see how I’ll find time in the next three weeks, but anyway, lets see.
I don’t agree with your conclusion, Nikolay.
If you carefully read my article again, you see that my tests show that starting from 11.2.0.3 (so not from version 12.1.0.1 and higher):
– Tables with a size between _STT and 5*_STT indeed are always read via direct path, regardless of the amount of blocks in the cache.
– However, for tables with a size between 5*_STT and VLOT, if the amount of blocks in the (local) cache is 99% or higher, a buffered read is chosen.
When there are dirty buffers for the table, testing reveals the following figures:
– Version 11.2.0.1 and 11.2.0.2 seem to switch to buffered reads when around 75% of table blocks are dirty.
– Version 11.2.0.3 and up switch around 50% of the table blocks being dirty.
When there are dirty blocks it seems to be a bit less predictable than with reads, but I could get reasonably consistent results during testing.
My tests do not show the mechanism being changed in favour of automatic big table caching (ABTC) since version 12.1.0.2 at all.
Please mind my testing platform is Linux, but I doubt these mechanisms are platform depended (maybe outside of Exadata, although I haven’t been able to witness Exadata specific behaviour. The mechanism to determine being on Exadata or not is there).
I hope you can redo your tests and confirm, or not. Both outcomes are interesting.
The only playground is a rusty 11.2.0.3 in which I have the same result as in my blog
On Linux?
If you have the time and opportunity, can you run my anonymous plsql block on a table? (which results in a FTS, but you understand that) It’s important to rerun it a couple of times (flushing the buffercache in between of course) to see if it’s consistent. Also indicate if it’s between STT and 5*STT or 5*STT and VLOT.
with my procedure.
But if I use yours I have that:
SQL>
table: X size: 6016 blocks, 304896 rows.
cached – physical reads cache/direct: (10) 716/0
full scan – physical reads cache/direct: 0/5852
cached – physical reads cache/direct: (20) 566/0
full scan – physical reads cache/direct: 0/5852
cached – physical reads cache/direct: (30) 568/0
full scan – physical reads cache/direct: 0/5852
cached – physical reads cache/direct: (40) 630/0
full scan – physical reads cache/direct: 0/5852
cached – physical reads cache/direct: (50) 566/0
full scan – physical reads cache/direct: 0/5852
cached – physical reads cache/direct: (60) 568/0
full scan – physical reads cache/direct: 0/5852
cached – physical reads cache/direct: (70) 566/0
full scan – physical reads cache/direct: 0/5852
cached – physical reads cache/direct: (80) 630/0
full scan – physical reads cache/direct: 0/5852
cached – physical reads cache/direct: (90) 568/0
full scan – physical reads cache/direct: 0/5852
cached – physical reads cache/direct: (100) 568/0
full scan – physical reads cache/direct: 0/0
PL/SQL procedure successfully completed.
SQL>
the result from mine is:
SDRX: cached read at 45% cached rows –no compression
SDRXC: cached read at 86% cached rows — compression
SDRX: cached read at 45% dirty rows –no compression
SDRXC: cached read at 86% dirty rows — compression
That’s very strange for me. The interesting thing in here is that I’m not using object statistics and u r. Anyway, I really don’t have time to dig now, but for sure I’ll when I.
Yes:
SQL> select * from v$version;
BANNER
——————————————————————————–
Oracle Database 11g Enterprise Edition Release 11.2.0.3.0 – 64bit Production
PL/SQL Release 11.2.0.3.0 – Production
CORE 11.2.0.3.0 Production
TNS for Linux: Version 11.2.0.3.0 – Production
NLSRTL Version 11.2.0.3.0 – Production
SQL>
sorry I hit post by mistake
All objects are 6000 blocks the STT is 709
KSPPINM KSPPSTVL
——————————————— ————–
_small_table_threshold 709
SQL>
Can you please run the block and post the output?
Please mind I don’t doubt your investigation, as I said, I am confident there are more variables in play, it would be interesting to see if you get a difference consistently, and work out what it is.
I have gotten different results from time to time.
This one is 1900 blocks on the same db:
SDRX: cached read at 35% cached rows
SDRXC: cached read at 82% cached rows
SDRX: cached read at 35% dirty rows
SDRXC: cached read at 82% dirty rows
PL/SQL procedure successfully completed.
SQL>
table: SDPX size: 1920 blocks, rows.
cached – physical reads cache/direct: (10) 496/0
full scan – physical reads cache/direct: 0/1820
cached – physical reads cache/direct: (20) 0/0
full scan – physical reads cache/direct: 0/1820
cached – physical reads cache/direct: (30) 0/0
full scan – physical reads cache/direct: 0/1820
cached – physical reads cache/direct: (40) 0/0
full scan – physical reads cache/direct: 0/1820
cached – physical reads cache/direct: (50) 0/0
full scan – physical reads cache/direct: 0/1820
cached – physical reads cache/direct: (60) 0/0
full scan – physical reads cache/direct: 0/1820
cached – physical reads cache/direct: (70) 0/0
full scan – physical reads cache/direct: 0/1820
cached – physical reads cache/direct: (80) 0/0
full scan – physical reads cache/direct: 0/1820
cached – physical reads cache/direct: (90) 0/0
full scan – physical reads cache/direct: 0/1820
cached – physical reads cache/direct: (100) 0/0
full scan – physical reads cache/direct: 0/1820
PL/SQL procedure successfully completed.
SQL>
Sorry for the 10000 comments.
Please gather statistics before running the procedure, this is what makes it not run well.
Ok, this time in one comment 🙂
All test are made with basic users in it’s own schema. In my opinion the algorithm acts differently in/with sys.
The host is the same, 11.2.0.3 on Linux64
STT is 709;
–mine
objects 1900 blocks:
SDRX: cached read at 35% cached rows
SDRXC: cached read at 82% cached rows –compressed table
SDRX: cached read at 35% dirty rows
SDRXC: cached read at 82% dirty rows –compressed table
–yours with v_read_update varchar2(10) := ‘READ’; on non compressed table sdpx /the same as in my procedure/
SQL> exec dbms_stats.gather_table_stats(‘x’,’sdpx’);
PL/SQL procedure successfully completed.
SQL> alter system flush buffer_cache;
System altered.
SQL> alter system flush buffer_cache;
System altered.
SQL> alter system flush buffer_cache;
System altered.
SQL>
.
.
.
table: SDPX size: 1920 blocks, 1800 rows.
cached – physical reads cache/direct: (10) 210/0
full scan – physical reads cache/direct: 0/1820
cached – physical reads cache/direct: (20) 190/0
full scan – physical reads cache/direct: 0/1820
cached – physical reads cache/direct: (30) 188/0
full scan – physical reads cache/direct: 0/1820
cached – physical reads cache/direct: (40) 190/0
full scan – physical reads cache/direct: 0/1820
cached – physical reads cache/direct: (50) 188/0
full scan – physical reads cache/direct: 884/0
cached – physical reads cache/direct: (60) 0/0
full scan – physical reads cache/direct: 0/0
cached – physical reads cache/direct: (70) 0/0
full scan – physical reads cache/direct: 0/0
cached – physical reads cache/direct: (80) 0/0
full scan – physical reads cache/direct: 0/0
cached – physical reads cache/direct: (90) 0/0
full scan – physical reads cache/direct: 0/0
cached – physical reads cache/direct: (100) 0/0
full scan – physical reads cache/direct: 0/0
PL/SQL procedure successfully completed.
SQL>
–mine
STT=709; objects 7000 blocks:
SDRX: cached read at 45% cached rows
SDRXC: cached read at 86% cached rows –compressed table
SDRX: cached read at 45% dirty rows
SDRXC: cached read at 86% dirty rows –compressed table
–yours with v_read_update varchar2(10) := ‘READ’; on non compressed table sdpx /the same as in my procedure/
SQL> exec dbms_stats.gather_table_stats(‘x’,’sdpx’);
PL/SQL procedure successfully completed.
SQL> alter system flush buffer_cache;
System altered.
SQL> alter system flush buffer_cache;
System altered.
SQL> alter system flush buffer_cache;
System altered.
.
.
.
table: SDPX size: 7168 blocks, 7000 rows.
cached – physical reads cache/direct: (10) 824/0
full scan – physical reads cache/direct: 36/7048
cached – physical reads cache/direct: (20) 692/0
full scan – physical reads cache/direct: 0/7048
cached – physical reads cache/direct: (30) 694/0
full scan – physical reads cache/direct: 0/7048
cached – physical reads cache/direct: (40) 692/0
full scan – physical reads cache/direct: 0/7048
cached – physical reads cache/direct: (50) 694/0
full scan – physical reads cache/direct: 0/7048
cached – physical reads cache/direct: (60) 692/0
full scan – physical reads cache/direct: 0/7048
cached – physical reads cache/direct: (70) 694/0
full scan – physical reads cache/direct: 0/7048
cached – physical reads cache/direct: (80) 756/0
full scan – physical reads cache/direct: 0/7048
cached – physical reads cache/direct: (90) 692/0
full scan – physical reads cache/direct: 0/7048
cached – physical reads cache/direct: (100) 694/0
full scan – physical reads cache/direct: 0/0
PL/SQL procedure successfully completed.
SQL>
The results are consistent, I did them 3 times. I think tracing in 11.2.0.4+ is the best way to investigate. Here in 0.3 the trace provide only basic info like:
NSMTIO: qertbFetch:[OBJECT_SIZE < VLOT]: Direct Read canceled as object's size (7023 blocks) is less than the value of '_very_large_object_threshold'(177390 blocks).
Additional Info: SqlId=aw4bnnzhm1vgz, phv=2,Object#=499330, Partition#=0
Anyway, this is what I can do for now.
There some inconsistencies in your story, probably because of small errors because of copy and pasting.
Your description is about SDRX* tables, while you execute against a table called SDPX.
The first execution is on a table which is 5 * _STT. Because your reads flipped to cached around 50% (in my procedure), it is worth investigating if there are dirty blocks involved somehow.
The second execution on a table with the same name (SDPX) being bigger than 5 * _STT shows (when looking at my procedure) exactly what I would expect: flipping to cached above 90%, probably at 99%.
Frits, it’s not a copy/paste error. I have this misspelled on my blog post from the beginning. The procedure is using SDPX% tables and reports SDRX% tables. P and R are misspelled. Everything is accurate, do it yourself.
The procedure is in here:
http://progeeking.com/2014/02/25/the-big-q-direct-path-and-cell-offloading/
SDPX% is just reported as SDRX% that’s all. The table size is the same for both procedure /yours and mine/. The first one is ~1900 the and second is ~7000. My procedure is generating the table data automatically by setting threshold. As I told you and what I’m trying to say in the post is that using scripts is not a good idea. The best thing is tracing, because in the trace you have the exact number of blocks cached, size of the object, thresholds and so on. The script is used only to generate a proper trace file.
I realised that I sometimes saw deviations on the threshold when I just created the table. I normally bounce the instance after creation. I realised it when reading your comment. In most (real life) cases tables are not freshly created, but rather created and have had reboots of the instance.
Hi Frits,
these adaptive decisions are certainly still an interesting topic.
I recently had a case where a Full Table Scan as part of a Parallel Execution plan decided to go for buffered reads on 11.2.0.4 Exadata, something I hadn’t noticed before, because for Parallel Execution the rules are supposed to be different (something like almost always do direct path reads except the segment is very small, something like below “_small_table_threshold” blocks or similar), and the segment was bigger than “_small_table_threshold” I think. Unfortunately I didn’t have the time and mandate to pursue the issue further in that particular environment.
So do you plan to publish something similar about Parallel Execution and the decision between direct path / buffered reads?
Randolf
Hi Randolf, thanks for reading!
I do not have plans currently to expand the investigation to parallel query.
I am not aware PQ does take segment size in consideration? I see a blogpost by Mark Rittman mentioning _PARALLEL_MIN_TABLE_THRESHOLD, which I can’t find in both 12.1.0.2 and 11.2.0.4. Of course it doesn’t make sense to use parallel query on a very small table, but I didn’t know such a check exists. I thought that before in-memory parallel query, parallel query slaves inherently performed direct path IO. Are you sure?
There are a few reasons for PQ to scan a segment buffered:
– cache attribute is set
– segment is placed in the keep pool
– in-memory parallel execution kicked in
Hi Frits,
yes I’m very sure that I saw this cached read – that’s the reason I raised the question here. You’re right, those cases you mention (cache attribute, keep pool, in-memory PX) could result in a deliberate cached read in parallel slaves, but none of these applied to the case I witnessed.
Very likely due to some previous index based access (we saw some index based access in other plans) most of the blocks of the segment were in the cache and perhaps this resulted in a similar decision you see for the serial adaptive algorithm depending on the number of blocks cached.
If I find the time I’ll simply take your test case script (thanks for publishing) and see what I get when I mix this with Parallel Execution.
Randolf
Pingback: Log Buffer #445: A Carnival of the Vanities for DBAs | InsideMySQL
Pingback: Async direct path read : PART2 (playing with db_file_multiblock_read_count and _direct_io_slots) | Hatem Mahmoud Oracle's blog
Hi Frits, great analysis as usual.
Regarding your test “Version 11.2.0.3, _SMALL_TABLE_THRESHOLD = 1482 (_STT*5=7410), segment size 1664”.
I am wondering, is it only me that sees this always direct path as “bad” ?
Given the 3 cases (11.2.0.3 and above), small table (under _STT), medium table (_STT to 5x buffer cache), large table (5x buffer cache).
Why would a “medium” table always read from disk, but a large table use “cache heuristics”? Would you expect the reverse? Where medium tables use cache heuristics, and large always go direct path?
Seems very wrong to me.
Thank you for reading Christo!
I am convinced there is more into play, but on my test servers there is no real activity. I published the scripts in order to have somebody posting an exciting different result. I have seen it being different. The only point is that it isn’t consistent enough to test and build and prove an hypothesis on the way that works.
Doing a FTS on a small table ( 5* BC) is always done direct path, which also makes sense.
Indeed the difference between medium (> _STT & 5* _STT & < _VLOT) is weird.
The only hypothesis I could come up with is somebody having an opinion that doing direct path starting from 5*_STT isn't aggressive enough, and it needs to kick in earlier, and build that in on top of the existing heuristics. It falls in the same timeframe as Exadata getting widely adopted. However, this is really just a wild theory.
If this mechanism is causing pain, a simple approach would be to set the cache attribute of the table. It will then always be considered "small" by this part of the code, and thus read into the buffercache. Of course this is a workaround. And this kind of abnormal/non-obvious settings will make it very hard to understand when it is assigned to a new DBA.
I still think it’s bad as it is. Imagine you are doing performance testing in DEV. Your results will be skewed dramatically.
I think the cached vs. non-cached reads of tables is a very fundamental functionality and needs to be clearly defined.
Pingback: Direct path and buffered reads again: compressed tables. | Frits Hoogland Weblog
Pingback: Direct path read and ABTC thoughts and mysteries | progeeking
If you are still curious: http://wp.me/p3QpUL-mUo
Pingback: ORACLE 12C Automatic big table caching | Hatem Mahmoud Oracle's blog