Tuesday, February 8, 2011

Linux rebuild broken software raid array

With this article I will show you how to look if a raid array (in our case a RAID1 array) is broken and how to rebuild it. I have tested it on CentOS 5, but it should also work on previous versions of centOS and Linux.
When you look at a "normal" array, you see something like this:
# cat /proc/mdstat Personalities : [raid1] read_ahead 1024 sectors md2 : active raid1 hda3[1] hdb3[0] 262016 blocks [2/2] [UU] md1 : active raid1 hda2[1] hdb2[0] 119684160 blocks [2/2] [UU] md0 : active raid1 hda1[1] hdb1[0] 102208 blocks [2/2] [UU] unused devices:
That's the normal state - what you want it to look like. When a drive has failed and been replaced, it looks like this:
Personalities : [raid1] read_ahead 1024 sectors md0 : active raid1 hda1[1] 102208 blocks [2/1] [_U] md2 : active raid1 hda3[1] 262016 blocks [2/1] [_U] md1 : active raid1 hda2[1] 119684160 blocks [2/1] [_U] unused devices:
Notice that it doesn't list the failed drive parts, and that an underscore appears beside each U. This shows that only one drive is active in these arrays - we have no mirror.
Another command that will show us the state of the raid drives is "mdadm"
# mdadm -D /dev/md0 /dev/md0: Version : 00.90.00 Creation Time : Thu Aug 21 12:22:43 2003 Raid Level : raid1 Array Size : 102208 (99.81 MiB 104.66 MB) Device Size : 102208 (99.81 MiB 104.66 MB) Raid Devices : 2 Total Devices : 1 Preferred Minor : 0 Persistence : Superblock is persistent Update Time : Fri Oct 15 06:25:45 2004 State : dirty, no-errors Active Devices : 1 Working Devices : 1 Failed Devices : 0 Spare Devices : 0 Number Major Minor RaidDevice State 0 0 0 0 faulty removed 1 3 1 1 active sync /dev/hda1 UUID : f9401842:995dc86c:b4102b57:f2996278
As this shows, we presently only have one drive in the array.
Although I already knew that /dev/hdb was the other part of the raid array, you can look at /etc/raidtab to see how the raid was defined:
raiddev /dev/md1 raid-level 1 nr-raid-disks 2 chunk-size 64k persistent-superblock 1 nr-spare-disks 0 device /dev/hda2 raid-disk 0 device /dev/hdb2 raid-disk 1 raiddev /dev/md0 raid-level 1 nr-raid-disks 2 chunk-size 64k persistent-superblock 1 nr-spare-disks 0 device /dev/hda1 raid-disk 0 device /dev/hdb1 raid-disk 1 raiddev /dev/md2 raid-level 1 nr-raid-disks 2 chunk-size 64k persistent-superblock 1 nr-spare-disks 0 device /dev/hda3 raid-disk 0 device /dev/hdb3 raid-disk 1
To get the mirrored drives working properly again, we need to run fdisk to see what partitions are on the working drive:
# fdisk /dev/hda Command (m for help): p Disk /dev/hda: 255 heads, 63 sectors, 14946 cylinders Units = cylinders of 16065 * 512 bytes Device Boot Start End Blocks Id System /dev/hda1 * 1 13 104391 fd Linux raid autodetect /dev/hda2 14 14913 119684250 fd Linux raid autodetect /dev/hda3 14914 14946 265072+ fd Linux raid autodetect
Duplicate that on /dev/hdb. Use "n" to create the parttions, and "t" to change their type to "fd" to match. Once this is done, use "raidhotadd":
# mdadm /dev/md0 --add /dev/hdb1 # mdadm /dev/md1 --add /dev/hdb2 # mdadm /dev/md2 --add /dev/hdb3
The rebuilding can be seen in /proc/mdstat:
# cat /proc/mdstat Personalities : [raid1] read_ahead 1024 sectors md0 : active raid1 hdb1[0] hda1[1] 102208 blocks [2/2] [UU] md2 : active raid1 hda3[1] 262016 blocks [2/1] [_U] md1 : active raid1 hdb2[2] hda2[1] 119684160 blocks [2/1] [_U] [>....................] recovery = 0.2% (250108/119684160) finish=198.8min speed=10004K/sec unused devices:
The md0, a small array, has already completed rebuilding (UU), while md1 has only begun. After it finishes, it will show:
# mdadm -D /dev/md1 /dev/md1: Version : 00.90.00 Creation Time : Thu Aug 21 12:21:21 2003 Raid Level : raid1 Array Size : 119684160 (114.13 GiB 122.55 GB) Device Size : 119684160 (114.13 GiB 122.55 GB) Raid Devices : 2 Total Devices : 2 Preferred Minor : 1 Persistence : Superblock is persistent Update Time : Fri Oct 15 13:19:11 2004 State : dirty, no-errors Active Devices : 2 Working Devices : 2 Failed Devices : 0 Spare Devices : 0 Number Major Minor RaidDevice State 0 3 66 0 active sync /dev/hdb2 1 3 2 1 active sync /dev/hda2 UUID : ede70f08:0fdf752d:b408d85a:ada8922b

No comments:

Post a Comment