Tuesday, February 8, 2011

Software RAID - Howto

Redhat Enterprise 3 doesn't contain a good guide on how to install and manage a RHEL3 system to a pair of mirrored disks using software RAID. Here's is my guide. This guide should work equally well for the clones of RHEL, e.g. Whitebox linux, CentOS, Tao Linux ...

Installing RHEL

My hardware for installing to was a Pentium 4 machine, with two 80GB Maxtor IDE hard disks, 1GB RAM. I booted RHEL off disk 1, and started working through the installer.

At the point where disk partitioning takes place, I chose Disk Druid (instead of fdisk / auto) to partition the disks. I created two 100MB software RAID primary paritions, one on each disk, two 512MB linux swap partitions, two 79GB paritions to fill the rest of the disk. I made the two 100MB partitions a single RAID 1 device, mounted on /boot, and the other two a RAID 1 device mounted on /. The rest of the install proceeds as normal.

When the machine reboots back into RHEL, it will have working software RAID, however the boot loader will only be installed on the first disk (/dev/hda). To install this on the second disk (/dev/hdc), we need to run grub.

$ grub 
grub> device (hd0) /dev/hdc 
grub> root (hd0,0) 
 Filesystem type is ext2fs, partition type 0xfd 
grub> setup (hd0) 
 Checking if "/boot/grub/stage1" exists... no 
 Checking if "/grub/stage1" exists... yes
 Checking if "/grub/stage2" exists... yes
 Checking if "/grub/e2fs_stage1_5" exists... yes
 Running "embed /grub/e2fs_stage1_5 (hd0)"... 15 sectors are embedded.
 succeeded
 Running "install /grub/stage1 (hd0) (hd0)1+15 p (hd0,0)/grub/stage2
 /grub/grub.conf"... succeeded
 Done.

The next thing to do is to take a backup of the parition table on the disk - you will need this to restore to an extra disk. You can get this by running fdisk, and picking option 'p' (print partition table).

/dev/hda 
Device Boot Start End Blocks id System 
/dev/hda1 * 1 203 102280+ fd Linux raid autodetect 
/dev/hda2 204 1243 524160 82 Linux swap 
/dev/hda3 1244 158816 79416792 fd Linux raid autodetec

Monitoring the RAID array

This is for my setup with two disks, /dev/hda, /dev/hdc both with identical data on.

$ cat /proc/mdstat 
Personalities : [raid1] 
read_ahead 1024 sectors 
Event: 1 
md0 : active raid1 hda2[0] hdc2[1] 
      119925120 blocks [2/2] [UU] 
... 

This will give the status of the raid array, if both disks are operating it looks like this

md0 : active raid1 hdc3[1] hda3[0]

If it's broken and only one disk is operating it looks like this

md0 : active raid1 hdc3[1]

If it's recovering from a failed disk it looks like this

md0 : active raid1 hda3[1] hdc[1] 
.... 
[.>.........] recovery = 3% (.../...) finish=128min speed=10000k/sec 
... 


More information comes from

mdadm --query --detail /dev/md0 
.... lots of stuff ... 
Number    Major     Minor     RaidDevice    State 
0         0         0         0             faulty, removed 
1         222       3         1             active sync /dev/hdc3

This tells us that device 0 is missing - device 1 is working fine. 

In theory the mdmonitor

How to restore from a broken raid array

In this case /dev/hda has failed and I'm inserting a replacement disk. I start by rebooting the machine from CD on disk 1, and running the rescue mode by typing 'linux rescue' at the command prompt on the CD.

Do not mount any disks, or set up the network. You will be dropped into a command prompt.

Partition the new disk with the same partition table as the old disk. It is very important to make sure you partition the correct disk. You may wish to unplug the working disk during this step to insure yourself against user error.

$ fdisk /dev/hda 
n (new) 
p 1 (patition #1) 
1 203 (start and end cylinders) 
t 1 fd (set the partition type to linux raid) 

n 
p 2 
204 1243 
t 2 82 (set the partition type to linux swap) 

n 
p 3 
1244 158816 
t 3 fd (set the partition type to linux raid)

I then boot the machine from it's working disk. I then need to add the replacement disk into the array and trigger the rebuild.

mdadm --manage --add /dev/md0 /dev/hda3 
mdadm --manage --add /dev/md1 /dev/hda1 


The new disk has no boot sector - that's not covered by the RAID array. We need to write this back to the disk as earlier.

$ grub 
grub> device (hd0) /dev/hdc 
grub> root (hd0,0) 
 Filesystem type is ext2fs, partition type 0xfd 
grub> setup (hd0)
 Checking if "/boot/grub/stage1" exists... no
 Checking if "/grub/stage1" exists... yes
 Checking if "/grub/stage2" exists... yes
 Checking if "/grub/e2fs_stage1_5" exists... yes
 Running "embed /grub/e2fs_stage1_5 (hd0)"... 15 sectors are embedded. 
succeeded
 Running "install /grub/stage1 (hd0) (hd0)1+15 p (hd0,0)/grub/stage2 /grub/grub.conf"... succeeded 
Done. 


At this point our system is fully restored.

Note: It's entirely possible to do the recovery by booting from the working disk rather than a rescue CD. This increases the chance of accidently destroying all your data. I'd recommend not doing that, until you can perform a recovery with a CD without referencing this guide at any point.

No comments:

Post a Comment