Orion / ORacle IO Numbers

What is orion? Well it used to be the name of the oracle java server, which eventually became ‘OC4J’, or ‘Oracle Containers for Java’. It’s a product oracle uses or bought (don’t know). It’s still available from the original vendor.

But oracle now has a new orion: ORacle IO Numbers. This is a tool to test IO with! Other than other tools, it uses the exact libraries oracle 10g uses, and issues ‘database like’ requests. In other words, this tool enables you to make a IO prediction of a database fairly easy, or enables you to calculate optimal numbers of IO slaves (for oracle’s parallel query).

Orion is available on otn for the following platforms:
-Linux AMD x86-64
-Linux i386
-Windows

If you are interested: follow this link. It also contains a document about usage, output, examples and troubleshooting.

What does it do, and how does it work?

First, you have to have raw devices, or files (both are used and written during the test, and destroy any current data in them. So generate them for this test!!!)

Next, make a file (in the same directory as where the executable out of the zip/msi appeared, that’s the most easy way), named anything you wish, with the extention “.lun”; for example: testrun.lun. In that file, put the whole path to these files or the raw devices:

/oracle/orion/file1
/oracle/orion/file2

For a simple testrun, execute the following:
./orion -run simple -testname testrun -num_disks 2
This runs using the ‘simple’ mode of orion (meaning ‘small’ and ‘large’ IO’s are tested in isolation (not concurrent)). ‘-num_disks’ is used for determining the number of concurrent IO’s (to my understanding; I’ve gotten 1-5 concurrent IO’s when set to ‘1’, and 1-10 concurrent IO’s when set to ‘2’)

Currently, I’ve used this tool to measure the number of IO’s possible for a performance problem.
This problem is with a 9.2 database (but I think using orion the measurement still is viable), in the nightbatch. Despite most batches, most of the waiting in on ‘db_file_sequential_read’, which means (among others) index unique scans, index range scans and table fetches by index rowid.

To simulate this type of IO, I’ve used the following commandline:
orion -run advanced -testname test -num_disks 2 -type rand -write 5 -matrix row -num_large 1
This commandline means:
-num_disks 2: I want to know throughput up to 10 concurrent processes
-type rand: most of my IO’s are not sequentially, but random (from an o/s perspective, we walk through some branch blocks, and through the leafs (if range-scanned)
-write 5: ultimate goal of the batch is to process data. So we do writing
-matrix row: the calculation is based upon small IO’s, we set the number of large IO’s using ‘num_large’
-num_large 1: we do some ‘large’ IO’s (multiblock reads; full table scans, fast full index scans)

I’ve gotten up to about 500 IOPS (IO’s per second), from windows 2003, iSCSI, 1Gb ethernet and netapp filer. The filer is massively overloaded (cpu 100% almost entire day), and results of two tests on the same files where not consistent, which is in line with an overloaded filer, of which the throughput is depended upon other IO’s being done at the same time.

Advertisements
26 comments
  1. Steven said:

    I’ve just received some new servers and I’ve been running the linux64 OrIOn on them. There is a single 2Gb fire channel between the SAN and the disks, so even if the disks are perfect, I can’t possibly have better than 256MB/s. Yet OrIOn has shown me numbers as high as 1500GB/s (which is suspect is close to the bandwith to RAM). My most recent test shows (summary.txt):

    ORION VERSION 11.1.0.0.0

    Commandline:
    -run advanced -testname data1 -num_disks 8 -size_small 16 -size_large 1024 -type
    rand -simulate concat -write 40

    This maps to this test:
    Test: data1
    Small IO size: 16 KB
    Large IO size: 1024 KB
    IO Types: Small Random IOs, Large Random IOs
    Simulated Array Type: CONCAT
    Write: 40%
    Cache Size: Not Entered
    Duration for each Data Point: 60 seconds
    Small Columns:, 0
    Large Columns:, 0, 1, 2, 3, 4, 5, 6, 7,
    8, 9, 10, 11, 12, 13, 14, 15, 16
    Total Data Points: 38

    Name: /oraetl6/data1/orion.dat Size: 3355443200
    1 FILEs found.

    Maximum Large MBPS=340.22 @ Small=0 and Large=11
    Maximum Small IOPS=5593 @ Small=36 and Large=0
    Minimum Small Latency=0.05 @ Small=1 and Large=0

    I’m not sure if OrIOn is just wrong, or if my server is better than it ought to be? Is the problem that the datafile is small enough to fit entirely in RAM?

    Steven

  2. Administrator said:

    I haven’t had a change to test it on linux 64-bit. To be seen from summary.txt, it appears it’s based upon the code of oracle version 11!

    You’ve got just one file of about 3G. Orion should use the O_DIRECT and O_ASYNC flags, so local (linux) buffer cache shouldn’t matter. (mmh, should test that, I suppose!)

    If you look at the bottom of the summary.txt file, you see the system had a maximum throughput of 340 MBPS, using 11 large IO requests concurrently.

    The maximum of IOPS is 5593 (about 89 MBPS), using 36 concurrent IO’s

    So I see as maximum throughput 340 MBPS, not 1500GB/s. Still 340 MBPS is more than potentially could be transferred using a single 2Gb HBA. As you said, that’s 2Gb/8=256MB.

    Obviously, this is the result of a cache (look in the documentation on the oracle orion site, it’s even mentioned!) somewhere.

    This could be eliminated by using more large files. (which makes it more unlikely that the IO requests could be cached).

    My guess would be the write(s) are cached on the HBA. What kind of HBA is it, and how is it configured?

    Please mind that you ought to setup a situation which is alike the situation for which you are doing the test, otherwise you measurement will set false hope! Also look at the documentation again about how to interpretate the IO numbers (oracle recommends to take less than the number measured for calculations of throughput!)

  3. Administrator said:

    I’ve read the manpage of the io_submit call (async io system call). My conclusion is, because the files aren’t opened with O_DIRECT, the async io call works synchronous, and uses the linux buffercache.

    Look for the system calls which orion uses to open the datafiles:
    1. first readonly
    14097 open(“/oracle/orion/file1”, O_RDONLY|O_LARGEFILE) = 8
    2. then read/write
    14097 open(“/oracle/orion/file1”, O_RDWR|O_LARGEFILE) = 8

  4. It’s my first visit to your website. After just a quick browse, I’m really impressed!

  5. Just found your home page its great, it looks like you folks do great service keep up the good work.

  6. I cannot attach via iSCSI. We get an error message from the orion tool stating “File identification failed”. In our lun file is \\.\d:, which is a mapped drive. Anybody have any ideas?

  7. Administrator said:

    You have .lun file, in which \\.\d: is.
    Is the D: drive visible in your iSCSI tools (on the _local_ server)? Does windows see the drive and not reporting any errors?
    What do you mean by a “mapped drive”?
    If so (meaning the D: drive is correct for windows), remove all other “destinations” from the .lun file, and test validity for orion for each destionation one by one.

    The message “File identification failed” means it cannot find \\.\d:

  8. lale said:

    Hi,
    Where I can find the orion for linux-64bit? I have seen it on oracle site but this does not look to me like 64-bit.
    Thanks.

  9. Administrator said:

    The downloadpage on technet has 3 installable packages, windows, linux 32bit and linux EM64T (intel 64 bit). That doesn’t help for you?

  10. Mike said:

    I’ve successfully used Orion on RHEL, but unsuccessfully on Fedora Core 1. Anybody here experience this error before?

    SKGFR Returned Error — Async. read failed on FILE: /oradata/tmp/test1.dat
    OER 1: please look up error in Oracle documentation
    rwbase_issue_req: lun_aiorq failed on read
    rwbase_run_test: rwbase_issue_req failed
    rwbase_run_process: rwbase_run_test failed
    rwbase_rwluns: rwbase_run_process failed
    orion_thread_main: rw_luns failed
    Test error occurred
    Orion exiting

    I upgraded libaio to libaio-0.3.99-2 just in case, but it doesn’t seem to help. Any suggestions?

  11. Administrator said:

    Someone on the oracle forums encountered this problem on linux x86_64.
    It seems to be solved by a patch delivered in this bugzilla entry: https://bugzilla.redhat.com/bugzilla/show_bug.cgi?id=176361
    Does that solve your problem?

    Latest libaio I have on a 32-bit linux testsystem is 3.105.

  12. rahul said:

    While running orion on windows XP, I am getting the following error. Will some one please let m eknow how to resolve this issue-

    ORION: ORacle IO Numbers — Version 10.2.0.1.0
    Test will take approximately 23 minutes
    Larger caches may take longer

    rwbase_read_luncfg: SlfFopen error on mytest1.lun
    orion_main: rwbase_read_luncfg failed

  13. Devi said:

    Isai Arredondo,

    Dont format the disk and try; now u wont get the “File identification failed” error,

    Thanks and Regards,
    Devi Prasad B

  14. Devi said:

    and orion will successfully without any error

  15. Jim Burnham said:

    Now that I’ve got some results, I’m trying to interpret them. I think I can conclude that I have something misconfigured since raw is so much faster than ext3. I’m also getting comfortable that similarly misconfigured, my new server will be significantly faster than my existing server.

    I was able to use Orion on a Linux server both against a raw disk and a filesystem. The key for the filesystem testing seems to be using a file which is a multiple of your block size and much larger than can fit in cache. I made a ~1MB file from concatenated text files and once I got it sized as a multiple of the the block size, the tests ran. The results were up there with RAM speed so I increased the file size until I started getting believable results. I went ahead and made it 10GB.

    Please have a look at my results and share your thoughts:

    http://forums.oracle.com/forums/thread.jspa?messageID=2249899

  16. It makes sense raw is faster than ext3: raw means IO is done directly to the device, using ext3 it means everything is done buffered using the linux/operating system buffercache. Another side effect of using buffered IO is the other thing you’ve encountered: caching. Caching on itself is a good thing (the most important thing an oracle database does is caching to speedup transactions), but when oracle already cache’s, the probability of the operating system caching effectively is low. That is why direct IO is used, to eliminate double buffering.

    Another thing to test is the IO scheduler: by default, the cfq (completely fair queueing) is used. I’ve had good results with the deadline scheduler. For raw, there is no (OS) IO scheduling.

  17. William Adams said:

    Could someone answer a few questions about ORION?
    Ive downloaded the orion10.2_linux.gz. After ungzip the file, I get cannot execute binary file message. Am I using the wrong version? The file itself is executable.

    We’re using OS, 2.6.9-55.EL

    THanks in advance!

  18. William Adams said:

    sorry, I actually gunzip the file.

  19. After the gunzip, you should have 1 file with the name ‘orion’
    you can use the ‘file’ command to see what kind of file it is:

    $ file orion
    orion: ELF 32-bit LSB executable, Intel 80386, version 1 (SYSV), for GNU/Linux 2.2.5, dynamically linked (uses shared libs), for GNU/Linux 2.2.5, not stripped

    the ‘rights’ on the file are depended on your umask, but are probably:
    -rw-r–r– (644)
    (this can be seen using ‘ls -ls’)

    to make this file executable, enter ‘chmod 755 orion’.
    that means the file can be executed. now enter “./orion’
    this should result in:
    ./orion
    ORION: ORacle IO Numbers — Version 10.2.0.1.0
    Parse error: View help screen using orion -help.

  20. William Adams said:

    Fritz,
    thanks for the suggestion. I did the file command and got the same results as you. However, after issuing the chmod, I still received the ./orion: cannot execute binary file.

    Is there a specific shell this tool must execute?

  21. William Adams said:

    I’m using ia64 do I need to use the orion_linux_em64t version?

  22. Can you issue the following statement: ls -ls orion

    if it displays -rwxr-xr-x orion

    you should be able to issue: ./orion
    (please mind the “./”; this is needed if “.” or the directory where you extracted orion is not in your path)

  23. William Adams said:

    1876 -rwxr-xr-x 1 oracle dba 1914905 orion

  24. William Adams said:

    any other ideas? I’m really lost for suggestions.

  25. You could of course try the ia64 version! please try!
    What does the ‘ldd’ command return with the non-working orion executable?

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: