Thursday, June 9, 2011

Using mdadm to send e-mail alerts for RAID failures

Environment

Novell SUSE Linux Enterprise Desktop 10
Novell SUSE Linux Enterprise Server 10
Novell SUSE Linux Enterprise Server 10 Service Pack 1
Novell SUSE Linux Enterprise Desktop 10 Service Pack 1

Situation

Mdadm is a command line utility that can be used to create, manage, and monitor Linux software RAID devices.
This TID will explain how to use mdadm to monitor and report issues with a software raid configuration in SLE Linux. This document is not intended to explain software raid setup in SLE Linux. The setup steps for mdadm are for use after a system has an active software raid setup.
Steps for setting up e-mail alerting of errors with mdadm:
E-mail error alerting with mdadm can be accomplished in several ways:
  1. Using a command line directly
  2. Using the /etc/mdadm.conf file to specify an e-mail address
NOTE: e-mails are only sent when the following events occur: 
Fail, FailSpare, DegradedArray, and TestMessage
Specifying an e-mail address using the mdadm command line
Using the command line simply involves including the e-mail address in the command. The following explains the mdadm command and how to set it up so that it will load every time the system is started.
mdadm --monitor --scan --daemonize --mail=jdoe@somemail.com
The command could be put /etc/init.d/boot.local so that it was loaded every time the system was started.
Verification that mdadm is running can be verified by typing the following in a terminal window:
ps aux | grep mdadm
Specifying an e-mail address using the mdadm.conf file
Using mdadm with the /etc/mdadm.conf file is very similar to the command line, except that the e-mail address is included in the mdadm.conf file. The following is an example of an mdadm.conf file:
#~~~~~~~~~~~~ Sample mdadm.conf file ~~~~~~~~~~~~~~~~~~~~~~~~
DEVICE partitions
ARRAY /dev/md0 level=raid1 UUID=1e60d34a:2900a5a6:016ce23d:edbe1177
ARRAY /dev/md1 level=raid1 UUID=b9db4840:b9f19361:ed0112d1:74f6071a
ARRAY /dev/md2 level=raid1 UUID=f6135aa0:dc21f04e:24d4c1e1:4fe7b596
#~~~~~~~~~~~~ end of file ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
The lines beginning with # were added for this documentation.

Utilizing the /etc/mdadm.conf would simplify the command line and make it look like this:
mdadm --monitor --scan --daemonize
This command could be added to the /etc/init.d/boot.local so that mdadm ran every time the system was started.
NOTE: It has been found that mdadm will not send an e-mail if the DEVICE partitions section does not exist in the /etc/mdadm.conf file. If those sections do not exist a new /etc/mdadm.conf file can be created by using the following command:
mdadm --detail --scan > /etc/mdadm.conf
The MAILADDR line could then be added as well.

Running an external program when an event occurs
Another option provided with the /etc/mdadm.conf file is to run an external application when an error is detected.
An example application could be something as simple as a script that causes messages to popup on the screen when an event occurs. The following script is one example:
NOTE: The following script is for example purposes only and is NOT supported by Novell.

#!/bin/bash
#
# mdadm RAID health check
#
# Events are being passed to xmessage via $1 (events) and $2 (device)
#
# Setting variables to readable values
event=$1
device=$2
# Check event and then popup a window with appropriate message based on event
if [ $event == "Fail" ];then
    xmessage "A failure has been detected on device" $device
    else
    if [ $event == "FailSpare" ]; then
        xmessage "A failure has been detected on spare device" $device
        else
        if [ $event == "DegradedArray" ]; then
            xmessage "A Degraded Array has been detected on device" $device
            else
            if [ $event == "TestMessage" ]; then
                xmessage "A Test Message has been generated on device" $device
            fi
        fi
    fi  
fi
#~~~~~~~~~~~~~~~~~~~~~~~~~ End of Script ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
To add an external program simply add the following line to the /etc/mdadm.conf file:
PROGRAM /etc/raid-events
Where /etc/raid-events is the file that contains the script listed above. Ensure that the file is also marked as executable.

Testing the configuration to ensure that e-mails are sent
After everything has been setup you can verify that the e-mail alerts are sent and can be received by running mdadm in test mode. This can be accomplished by doing the following:
  1. Open a terminal window and type su to login as root
  2. type mdadm --monitor --scan --test
                  Add the -–mail parameter if the /etc/mdadm.conf does not contain a MAILADDR line

An e-mail should be received for each arrary device listed in the /etc/mdadm.conf file.  
If e-mails are not received the /var/log/mail* files can be used to help debug why the failure occurred. The most common cause is that the e-mail address is being blocked by the receving gateway.
Another item to check is to ensure the postfix is installed on the system as mdadm uses postfix to send out the e-mails.

No comments:

Post a Comment