For years, I’ve ran many small servers running the popular ICH/ISW Intel Storage Matrix RAID in Raid-1 configuration. For many years this has worked absolutely perfectly with no issues on both Windows and Linux. But something has always really bugged me. What do i do when (and they will) a drive fails? How does ISW handle it?
On Windows, this is simple, you launch the Storage Matrix software and click rebuild (if it isn’t rebuilding automagically). But how do you do this on a Linux server which has no Storage Matrix software? After hours of Googling, i came across the command “dmraid -R”. But that didn’t work in my test environments.
So i spent a whole afternoon figuring this out. This is what i found.
DMRaid Works. Sort of
DMRaid is the linux implementation of popular onboard RAID setups. Your raid can be from Intel, Nvidia, Promise and a few others who do implement it. Intel is the most common one, and that’s the one i generally have on all my Intel servers. What *you* may find is that your implementation is different, but this posting should help you.
My test setup was a simple ICH6R machine with two 160gb Seagate hard drives. I booted up the machine, went into the Intel raid setup, and created a 20gb mirror partition called “System”. I then installed CentOS 5.5 32bit on this machine, and went to work.
Initial results
The first thing i did, was find out what i’ve got. Running “dmraid -s” gave me
[root@nasri ~]# dmraid -s
*** Group superset isw_djhffiddde
–> Active Subset
name : isw_djhffiddde_System
size : 41942528
stride : 256
type : mirror
status : ok subsets: 0
devs : 2
spares : 0
Then running “dmraid -r” gave me
[root@nasri ~]# dmraid -r
/dev/sda: isw, "isw_djhffiddde", GROUP, ok, 312581806 sectors, data@ 0
/dev/sdb: isw, "isw_djhffiddde", GROUP, ok, 312581806 sectors, data@ 0
This tells me, my mirror set is running, and has two drives attached and all is happy.
Broken results
I then, turned the machine off, and yanked a drive, inserted a different drive, and turned it back on. After fiddling with the bios for a few minutes (my machine wanted to boot form the newly installed drive, not the raid) i got back in, and this is what i saw
[root@nasri ~]# dmraid -s
ERROR: isw: wrong number of devices in RAID set "isw_djhffiddde_System" [1/2] on /dev/sda
*** Group superset isw_djhffiddde
–> *Inconsistent* Active Subset
name : isw_djhffiddde_System
size : 41942528
stride : 256
type : mirror
status : inconsistent subsets: 0
devs : 1
spares : 0
and
[root@nasri ~]# dmraid -r
/dev/sda: isw, "isw_djhffiddde", GROUP, ok, 312581806 sectors, data@ 0
So, dmraid tells me that the raid is broken and inconsistent. Great. That’s what i want to see when a disk fails in my raid sets. According to the man pages, and the Google, to repair it you use “dmraid -R <raid id> /dev/<device>”
So, here goes.
[root@nasri ~]# dmraid -R isw_djhffiddde_System /dev/sdb
ERROR: isw: wrong number of devices in RAID set "isw_djhffiddde_System" [1/2] on /dev/sda
isw: drive to rebuild: /dev/sdbRAID set "isw_djhffiddde_System" already active
device "isw_djhffiddde_System" is now registered with dmeventd for monitoring
Error: Unable to write to descriptor!
Error: Unable to execute set command!
Error: Unable to write to descriptor!
Error: Unable to execute set command!
Hrm. Error’s. I don’t like errors. What’s happened? To be honest, I’ll never know – but it seems like it was not working. dmraid thinks its working, but i cant see it. I cant really hear any grumblings from the drive, nor can i see the LED’s flash. dmraid tells me the following:
[root@nasri ~]# dmraid -s
*** Group superset isw_djhffiddde
–> Active Subset
name : isw_djhffiddde_System
size : 41942528
stride : 256
type : mirror
status : nosync
subsets: 0
devs : 2
spares : 0
Ok, so its not inconsistent now, but it is “nosync”, which i cannot figure out what it means. I should look at the source code, but i cant be bothered.
Alright, so it appears that its not working.
Plan B
To figure out if its doing something, i turned the machine off and removed the new drive, and put in a Western Digital Raptor. Something that makes sounds. Booted up, and dmraid still showed the same stuff, inconsistent raid set. Now, i added the new WDRaptor to this set.
[root@nasri ~]# dmraid -R isw_djhffiddde_System /dev/sdb
ERROR: isw: wrong number of devices in RAID set "isw_djhffiddde_System" [1/2] on /dev/sda
isw: drive to rebuild: /dev/sdbRAID set "isw_djhffiddde_System" already active
device "isw_djhffiddde_System" is now registered with dmeventd for monitoring
Oh wow, much better. On top of that, i could hear the grumblings of the WD, and i could see LED activity. So, it works!
I also found a command to monitor this progress. Its called “dmsetup status”
[root@nasri ~]# dmsetup status
isw_djhffiddde_Systemp2: 0 41720805 linear
isw_djhffiddde_Systemp1: 0 208782 linear
isw_djhffiddde_System: 0 41942776 mirror 2 8:16 8:0 928/1280 1 AA 1 core
VolGroup00-LogVol01: 0 4128768 linear
VolGroup00-LogVol00: 0 37552128 linear[root@nasri ~]# dmsetup status
isw_djhffiddde_Systemp2: 0 41720805 linear
isw_djhffiddde_Systemp1: 0 208782 linear
isw_djhffiddde_System: 0 41942776 mirror 2 8:16 8:0 936/1280 1 AA 1 core
VolGroup00-LogVol01: 0 4128768 linear
VolGroup00-LogVol00: 0 37552128 linear[root@nasri ~]# dmsetup status
isw_djhffiddde_Systemp2: 0 41720805 linear
isw_djhffiddde_Systemp1: 0 208782 linear
isw_djhffiddde_System: 0 41942776 mirror 2 8:16 8:0 1280/1280 1 AA 1 core
VolGroup00-LogVol01: 0 4128768 linear
VolGroup00-LogVol00: 0 37552128 linear
And finally
[root@nasri ~]# dmraid -r
/dev/sdb: isw, "isw_djhffiddde", GROUP, ok, 312581806 sectors, data@ 0
/dev/sda: isw, "isw_djhffiddde", GROUP, ok, 72303838 sectors, data@ 0
[root@nasri ~]# dmraid -s
*** Group superset isw_djhffiddde
–> Active Subset
name : isw_djhffiddde_System
size : 41942528
stride : 256
type : mirror
status : ok
subsets: 0
devs : 2
spares : 0
So. This is why it “sort of” works. It didn’t work with another Seagate drive, but it worked with a different drive. Consequently, i yanked the good 80gb drive from this set, and plugged in a 750gb Seagate, and was able to mirror back to that without a problem. Maybe initially it was my drives.
Conclusion
To fix your broken Raid1′s on your Intel raid’s, use “dmraid -R <raidid> <dev>” and watch “dmsetup status” and wait for the ratio to be 1.
No comments:
Post a Comment