Archive

Tag Archives: databasemachine

Recently we upgraded an Exadata to the currently latest version, 11.2.3.2.0. The Exadata software itself consists of an image for the storage servers (the storage servers are essentially re-imaged), and a set of updates for the database/computing nodes, including: firmware for ILOM (lights out adapter), BIOS, LSI RAID adapter, Infiniband adapter, linux kernel, drivers, mandatory packages, to name some.

One of the exceptional things this upgrade does, is remove the hot-spare out of the RAID set on the database/compute nodes. This is documented in MOS note: 1468877.1, as ‘known issue 5: hotspare removed for compute nodes’. For some sites, this actually can be a good thing, if they are really tight on disk space on the compute nodes of Exadata. I must say that we have not encountered this situation. What this means, is that the actual HDD configuration on the compute node is left to the customer, instead of having one mandatory configuration (having 3 disks in a RAID-5 configuration, and one hot-spare).

So if you decide to use the former hot-spare disk as an active part of the RAID configuration, you are effectively trading availability for diskspace. Please mind the RAID set itself already provides redundancy, even without the hot-spare!

On the other hand, I think in most configurations, it makes sense to convert the disk back to being hot-spare.

This is done in the following way:

a) Get an overview of the current disk configuration:
/opt/MegaRAID/MegaCli/MegaCli64 -pdlist -a0 | grep -iE "slot|firmware"
Slot Number: 0
Firmware state: Online, Spun Up
Device Firmware Level: 0B70
Slot Number: 1
Firmware state: Online, Spun Up
Device Firmware Level: 0B70
Slot Number: 2
Firmware state: Online, Spun Up
Device Firmware Level: 0B70
Slot Number: 3
Firmware state: Unconfigured(good), Spun Up
Device Firmware Level: 0B70

This shows the disk in slot number 3 being left “unconfigured”, but in “good” state (of a disk has gone bad because of errors, it will be removed from the RAID set, and will show up as “unconfigured(bad)”!)
This is the state the upgrade to 11.2.3.2.0 leaves your system.

New let’s make the disk hot spare again!

b) Get the inclosure id:
/opt/MegaRAID/MegaCli/MegaCli64 -encinfo -a0 | grep ID
Device ID : 252

This means we know the enclosure id (252) and the slot number (3), which is the information needed for the MegaCli utility to revert the unconfigured disk to hot-spare again!

c) Revert the unconfigured disk back to hot-spare
/opt/MegaRAID/MegaCli/MegaCli64 -PdHsp -set -EnclAffinity -PhysDrv[252:3] -a0
Adapter: 0: Set Physical Drive at EnclId-252 SlotId-3 as Hot Spare Success.
Exit Code: 0x00

d) Check the disk configuration again:
/opt/MegaRAID/MegaCli/MegaCli64 -pdlist -a0 | grep -iE "slot|firmware"
Slot Number: 0
Firmware state: Online, Spun Up
Device Firmware Level: 0B70
Slot Number: 1
Firmware state: Online, Spun Up
Device Firmware Level: 0B70
Slot Number: 2
Firmware state: Online, Spun Up
Device Firmware Level: 0B70
Slot Number: 3
Firmware state: Hotspare, Spun Up
Device Firmware Level: 0B70

It appears that having the disk been removed from the RAID set by the update to 11.2.3.2.0 generates an ASR message. At least it did at our site, despite this being an undocumented bug (7161023,’ASR generating false errors in relation to disks’) which is marked resolved in 11.2.3.1.0 (?). Most sites I encounter have ASR setup, but not having all messages send additionally to local, onsite monitoring. I want to stress it’s very important to have the ASR messages sent to your own monitoring too!

Oracle Support does not list all the specifications from an ASR message it has gotten. Instead, a Service Request is made with enough information for Oracle itself (!!). In our case, the exact error message was NOT specified, only ‘compute server hard disk predictive failure’ and the node name.

Where do you look on an Exadata for that information? The first logical point is the ASR daemon. I didn’t spend too much time on it, but it seems that it’s more a proxy for messages than a database. I wasn’t able to find useful information about the systems which where using this daemon.

What are the sources for ASR with an Exadata? These are:

Computing node:
- “compmon daemon” / Linux level monitoring
- ILOM
Storage node:
- “cell daemon” / Linux level monitoring
- ILOM

For the computing node, it’s quite easy to see if there are any detected failed devices from the viewpoint of the ILOM:
(please mind ipmitool -I open only works on the local system)
# ipmitool -I open sunoem cli "show /SP/logs/event/list Severity==(Major,Critical,Down)"
Connected. Use ^D to exit.
-> show /SP/logs/event/list Severity==(Major,Critical,Down)

ID Date/Time Class Type Severity
----- ------------------------ -------- -------- --------

-> Session closed
Disconnected

This shows no messages with the severity Major, Critical or Down are in the eventlog in the ILOM. Please mind that the logons to the ILOM have severity “Minor”. These are in most system the vast majority of the messages, which are not of interest for this investigation. If you want to know if something has failed, there even a simpler command:
# ipmitool -I open sunoem cli "show faulty"

For the “compmon daemon”, grep the processlist for “compmon”:
# ps -ef | grep compmon
root 12812 1 0 Oct22 ? 00:00:11 /usr/bin/perl -w /opt/oracle.cellos/compmon/exadata_mon_hw_asr.pl -server

The most important part here is the directory: /opt/oracle/cellos/compmon
If you navigate to that directory, you will see a number of “state files”: asrs.state, traps.state and disks.state.
The disks.state lists the disk status as listed with a) with the firmware state.
The most important file for the ASR message investigation is the traps.state file. This file lists traps it has sent to ASR. In our case:
1 ; Mon Oct 22 14:39:10 2012 ; 86425886-b359-4587-8d46-f31ff2ecb135 ; Physicaldisk : Make Model: is at status predictive failure. Raised fault id: HALRT-02008 ; Physical disk should be replaced. Exadata Compute Server: Disk Serial Number:
Yes, this is pasted correctly, it misses Physicaldisk, Make Model and Disk Serial Number information. This has not been omitted for safety, it just is not listed.
So, the failure which was sent was HALRT-02008 in our case.

For completeness, the ILOM layer can be investigated identically to the description of the ILOM handling on the computing layer. The Linux layer messages can be investigated with: # cellcli -e list alerthistory
32 2012-10-17T02:00:27+02:00 info "HDD disk controller battery on disk contoller at adapter 0 is going into a learn cycle. This is a normal maintenance activity that occurs quarterly and runs for approximately 1 to 12 hours. The disk controller cache might go into WriteThrough caching mode during the learn cycle. Disk write throughput might be temporarily lower during this time. The message is informational only, no action is required."
33 2012-10-22T11:43:21+02:00 info "Factory defaults restored for Adapter 0"
34 2012-10-22T11:43:23+02:00 info "Factory defaults restored for Adapter 0"

Follow

Get every new post delivered to your Inbox.

Join 2,191 other followers

%d bloggers like this: