Exadata: what kind of IO requests has a cell been receiving?
When you are administering an Exadata or more Exadata’s, you probably have multiple databases running on different database or “computing” nodes. In order to understand what kind of IO you are doing, you can look inside the statistics of your database, and look in the data dictionary what that instance or instances (in case of RAC) have been doing. When using Exadata there is a near 100% chance you are using either normal redundancy or high redundancy, of which most people know the impact of the “write amplification” of both normal and high redundancy of ASM (the write statistics in the Oracle data dictionary do not reflect the additional writes needed to satisfy normal (#IO times 2) or high (#IO times 3) redundancy). This means there might be difference in IOs between what you measure or think for your database is doing, and actually is done at the storage level.
But what if you want to know what is happening on the storage level, so on the level of the cell or actually “cellsrv”, which is the process which makes IO flow to your databases? One option is to run “iostat -x”, but that gives a list that is quite hard readable (too much disk devices); and: it doesn’t show you what the reason for the IO was: redo write? controlfile read? Archivelog? This would especially be great if you want to understand what is happening if your IO behaves different than you expect, and you’ve ruled out IORM.
Well, it is possible to get an IO overview (cumulative since startup)! Every storage server keeps a table of IO reasons. This table can be dumped into a trace file on the cell; to generate a dump with an overview of what kind of IOs are done; use “cellcli” locally on a cell, and enter the following command:
alter cell events="immediate cellsrv.cellsrv_dump('ioreasons',0)";
This doesn’t generate anything useful as output on the command line, except for the name of the thread-logfile where we can find the contents of the dump we requested:
Dump sequence #18 has been written to /opt/oracle/cell11.2.3.2.1_LINUX.X64_130109/log/diag/asm/cell/enkcel01/trace/svtrc_15737_14.trc Cell enkcel01 successfully altered
As an aid for searching your dump in thread-logfile: search (“/” when you use “less” for it), enter the following (using the above example, with sequence #18): “/sequence\ #18”, without ‘”‘.
This is an example from a cell in the Enkitec lab, which I used for this example:
Cache::dumpReasons I/O Reason Table 2013-10-23 08:11:06.869047*: Dump sequence #18: Cache::dumpReasons Reason Reads Writes Cache::dumpReasons ------------------------------------ Cache::dumpReasons UNKNOWN 436784 162942 Cache::dumpReasons RedoLog Write 0 80329 Cache::dumpReasons RedoLog Read 873 0 Cache::dumpReasons ControlFile Read 399993 0 Cache::dumpReasons ControlFile Write 0 473234 Cache::dumpReasons ASM DiskHeader IO 4326 4 Cache::dumpReasons BufferCache Read 27184 0 Cache::dumpReasons DataHeader Read 2627 0 Cache::dumpReasons DataHeader Write 0 1280 Cache::dumpReasons Datafile SeqRead 45 0 Cache::dumpReasons Datafile SeqWrite 0 373 Cache::dumpReasons HighPriority Checkpoint Write 0 6146 Cache::dumpReasons DBWR Aged Write 0 560 Cache::dumpReasons ReuseBlock Write 0 150 Cache::dumpReasons Selftune Checkpoint Write 0 116800 Cache::dumpReasons RequestLit Write 0 25 Cache::dumpReasons Archivelog IO 0 255 Cache::dumpReasons TrackingFile IO 2586 2698 Cache::dumpReasons ASM Relocate IO 0 200 Cache::dumpReasons ASM Replacement IO 0 91 Cache::dumpReasons ASM CacheCleanup IO 0 4514 Cache::dumpReasons ASM UserFile Relocate 0 2461 Cache::dumpReasons ASM Redo IO 0 10610 Cache::dumpReasons ASM Cache IO 1953 0 Cache::dumpReasons ASM PST IO 0 44 Cache::dumpReasons ASM Heartbeat IO 26 162984 Cache::dumpReasons ASM BlockFormat IO 0 3704 Cache::dumpReasons ASM StaleFile IO 0 675 Cache::dumpReasons OSD Header IO 0 315 Cache::dumpReasons Smart scan 11840 0
Please mind the numbers here are IOs, it doesn’t say anything about the size of the IOs. Also please mind these are numbers of a single cell, you probably have 3, 7 or 14 cells.
In my opinion this IO summary can be of much value during IO performance investigations, but also during proofs of concept.
If the cell has been running for a while, these number may grow very big. In order to make an easy baseline, the IO reason numbers can be reset, so you can start off your test or proof-of-concept run and measure what actually has happened on the cell layer! In order to reset the IO reason table, enter the following command in the cellcli:
alter cell events = "immediate cellsrv.cellsrv_resetstats(ioreasons)";
This will reset the IO reasons table in the cell.
PS1: Thanks to Nikolay Kovachev for pointing out the ‘ioreasons’ resetstats parameter. Indeed ‘all’ is way too blunt.
PS2: The IO numbers seem to be the number IO requests the cell has gotten from it’s clients (ASM and database) for data, not for metadata. During a smartscan metadata flows in between the database and the cell server before data is actually served.
Hi Frits,
To reset only I/O reason stats you can use: alter cell events=”immediate cellsrv.cellsrv_resetstats(‘ioreasons’)”;
Also when you have time, could you please check this one: cellsrv – storage indexes to database objects http://wp.me/p3QpUL-2c
Maybe you’ll have some suggestions…
Pingback: Exadata: measuring IO latencies experienced by the cell server | Frits Hoogland Weblog
Pingback: Exadata: disk level statistics | Frits Hoogland Weblog