Playing with ASM disks on linux, part 2

In the previous article about ASM disks we investigated the difference between an ASM disk used “raw” (using the blockdevice nowadays ;-), no more /dev/raw/*), and an ASM disk used via ASM-lib. The difference is an ASM-lib name at offset 0×28 of the device.

According to Alejandro Vargas at Oracle, ASMlib has to following advantages:

ASMLib takes care of
1) Device name labels
2) Device ownership and permissions management
3) Async I/O management
4) I/O optimization
5) Sanity checkups

Device name labels
When devices are used as ASM-raw disks, they are being given a name (which is diskgroupname underscore disknumber, unless specified explicitly by the creator), but that’s the ASM disk name visible for the ASM instance, and only if the disk is mounted.
There is a specific ASM-lib name (at offset 0×28 of the device), which is visible in the path field (which means a disk is called ‘ORCL:YOURNAME’ instead of ‘/dev/sdb1’) in v$asm_disk.

Because devices are found using this name, it would suggest that this solves the known issue with device naming changes (a disk device in linux could point to a different disk after reboot, mainly after hardware changes) in linux. Is this really an issue?

Let’s move the contents of the /dev/sdb1 partition to the /dev/sdd1 partition and see if we get into trouble:


SQL> select g.name, d.name, d.path from v$asm_diskgroup g, v$asm_disk d where d.group_number=g.group_number;

NAME NAME PATH
------------------------------ ------------------------------ ------------------------------
DATADG YABBA /dev/sdb1

I’ve set ownership of the /dev/sdd1 device to oracle, the asm_diskstring is set to ‘/dev/sd*’, so we should see the other disk:

SQL> select group_number, disk_number, mount_status, header_status, mode_status, state, path from v$asm_disk;

GROUP_NUMBER DISK_NUMBER MOUNT_S HEADER_STATU MODE_ST STATE PATH
------------ ----------- ------- ------------ ------- -------- ------------------------------
0 0 CLOSED CANDIDATE ONLINE NORMAL /dev/sdd1
1 0 CACHED MEMBER ONLINE NORMAL /dev/sdb1

Now shutdown the instance, copy the contents of /dev/sdb1 to /dev/sdd1, and set ownership of /dev/sdb1 back to root. (which will make the /dev/sdb1 device invisible to the ASM instance.)

SQL> shut
ASM diskgroups dismounted
ASM instance shutdown
SQL>

The ASM instance is down. Now do some actions as root:

# dd if=/dev/sdb1 of=/dev/sdd1 bs=1M
5114+1 records in
5114+1 records out
5362850304 bytes (5.4 GB) copied, 707.67 seconds, 7.6 MB/s
# chown root /dev/sdb1

And start the ASM instance again:

SQL> startup
ASM instance started

Total System Global Area 284565504 bytes
Fixed Size 1299428 bytes
Variable Size 258100252 bytes
ASM Cache 25165824 bytes
ASM diskgroups mounted

Above we see the diskgroups (diskgroup actually; DATADG) are mounted. This suggests the ASM instance has picked up the other disk! Let see:

SQL> select g.name, d.name, d.path from v$asm_diskgroup g, v$asm_disk d where d.group_number=g.group_number;

NAME NAME PATH
------------------------------ ------------------------------ ------------------------------
DATADG YABBA /dev/sdd1

Look! The ASM instance picked up the ASM disk from another device!

This means we are not depended upon an extra naming facility from ASMlib for remapping of devices; the ASM instance itself is perfectly capable of handling device changes.

Device ownership and permissions management
This is very true; it is convenient to have a service wich reads the disk/device headers and makes them accessible to the ASM instance. In fact, that is what I see as the big advantage of ASMlib

It is more work the setup the udev rules file, but it isn’t a very big problem to setup the devices in /etc/udev/rules.d/50-udev.rules on the other hand…

Async I/O management
I am not sure what this means.
Upon reading that, it suggests asynchronous IO will be a problem without ASMlib.
AFAIK, asynchronous IO is enabled by default in the oracle executable, at least since version 10g. Let’s see:

-Check if the asynchronous library is linked with the oracle executable:

ldd /oracle/db/11.1.0.6/bin/oracle
linux-gate.so.1 => (0x00d8d000)
libskgxp11.so => /oracle/db/11.1.0.6/lib/libskgxp11.so (0x00595000)
librt.so.1 => /lib/librt.so.1 (0x00d47000)
libnnz11.so => /oracle/db/11.1.0.6/lib/libnnz11.so (0x00138000)
libclsra11.so => /oracle/db/11.1.0.6/lib/libclsra11.so (0x00aa2000)
libdbcfg11.so => /oracle/db/11.1.0.6/lib/libdbcfg11.so (0x0068e000)
libhasgen11.so => /oracle/db/11.1.0.6/lib/libhasgen11.so (0x009f6000)
libskgxn2.so => /oracle/db/11.1.0.6/lib/libskgxn2.so (0x00634000)
libocr11.so => /oracle/db/11.1.0.6/lib/libocr11.so (0x002e2000)
libocrb11.so => /oracle/db/11.1.0.6/lib/libocrb11.so (0x00c85000)
libocrutl11.so => /oracle/db/11.1.0.6/lib/libocrutl11.so (0x00110000)
libaio.so.1 => /usr/lib/libaio.so.1 (0x00119000)
libdl.so.2 => /lib/libdl.so.2 (0x0011b000)
libm.so.6 => /lib/libm.so.6 (0x00358000)
libpthread.so.0 => /lib/libpthread.so.0 (0x0011f000)
libnsl.so.1 => /lib/libnsl.so.1 (0x0037f000)
libc.so.6 => /lib/libc.so.6 (0x00396000)
/lib/ld-linux.so.2 (0x00b25000)

Well, the asynchronous library is linked with the oracle executable, that is for sure.
It *could* ignore the asynchronous library and still do synchronous calls.

Let’s see how the ASM client, my yoda database, is doing IO against ASM:
The checkpoint process issues a read and a write to the controlfile every 3 seconds:
(editted output of strace -p )

...
18:41:48 io_submit(1114112, 1, {{0x7bb110, 0, 1, 0, 18}}) = 1
18:41:48 io_getevents(1114112, 1, 128, {{0x7bb110, 0x7bb110, 16384, 0}}, {600, 0}) = 1
...
18:41:48 io_submit(1114112, 1, {{0x7bb110, 0, 0, 0, 18}}) = 1
18:41:48 io_getevents(1114112, 1, 128, {{0x7bb110, 0x7bb110, 16384, 0}}, {600, 0}) = 1
...
18:41:51 io_submit(1114112, 1, {{0x7bb110, 0, 1, 0, 18}}) = 1
18:41:51 io_getevents(1114112, 1, 128, {{0x7bb110, 0x7bb110, 16384, 0}}, {600, 0}) = 1
...
18:41:51 io_submit(1114112, 1, {{0x7bb110, 0, 0, 0, 18}}) = 1
18:41:51 io_getevents(1114112, 1, 128, {{0x7bb110, 0x7bb110, 16384, 0}}, {600, 0}) = 1
...

Well, these sure look like asynchronous calls…
The database is using asynchronous IO (synchronous IO would issue pread64 and/or pwrite64 calls on linux).

Now let’s see how the IO is done when ASM-lib is in use:
(editted output of strace -p ):

...
16:23:53 read(22, "MSA\2\7P\20\2075\20@\276K"..., 80) = 80
16:23:53 read(22, "MSA\2\7P\20\2075\20"..., 80) = 80
...
16:23:56 read(22, "MSA\2\7P\20\2075\20@\276K"..., 80) = 80
16:23:56 read(22, "MSA\2\7P\20\2075\20"..., 80) = 80
...
16:23:59 read(22, "MSA\2\7P\20\2075\20@\276K"..., 80) = 80
16:23:59 read(22, "MSA\2\7P\20\2075\20"..., 80) = 80
...

Kind of surprising; The read() call is a synchronous call, not an asynchronous call.

Let’s see what the filedescriptor number 22 is pointing to:


$ ls -ls /proc/13306/fd/22
0 lrwx------ 1 oracle oinstall 64 Feb 20 20:22 22 -> /dev/oracleasm/iid/0000000000000009

That’s a meta-device the ASM-lib kernel module creates for ASM-lib disks!

When using ASM-lib, the processes do not talk against the raw devices directly, they talk to the ASM-lib meta-device, and the ASM-lib kernel module takes care of the “real” IO against the devices.

This means ASM-lib IO is NOT raw IO: the ASM-lib kernel module is a layer between user processes and disk devices.

I haven’t had time to investigate the inner working of the ASM-lib kernel module (it’s open source!). Probably the optimisation is buffering some stuff, although not too much of the stuff; otherwise it wouldn’t work with clustering/RAC.

I/O Optimization
I guess this is the same point as Asynch IO management; using asynchronous IO the IO is optimised. Probably the user IO is meant by “Asynch IO Management” and the kernel IO is meant by “I/O Optimization”. I can’t judge on this part, because I do not know what optimisations the ASM-lib kernel module does at this time.

Sanity Checkups
One sanity check I can think of is ownership management of the ASM devices, so the ASM and database instances can access them. I really can not think of other sanity checkups. The consistency of the information in the ASM device header is checked by the ASM instance, not by ASM-lib. That would be a very obvious sanity check, but isn’t done by ASM-lib.

Advertisements
1 comment

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: