Recovering a broken mirrored drive – ubuntu 9.04
Posted on | June 29, 2011 | Comments Off on Recovering a broken mirrored drive – ubuntu 9.04
The other day I got an email from mdadm, a process running on some of our servers that keeps an eye on the raid array.
—–
This is an automatically generated mail message from mdadm running on woo A DegradedArray event had been detected on md device /dev/md0. Faithfully yours, etc. P.S. The /proc/mdstat file currently contains the following: Personalities : [linear] [multipath] [raid0] [raid1] [raid6] [raid5] [raid4] [raid10] md1 : active raid1 sda2[0] 303805120 blocks [2/1] [U_] md0 : active raid1 sda1[0] 7815488 blocks [2/1] [U_] unused devices: <none>—–
This was not a happy event – looks like one of the two drives in the array was no longer working.
- This is a system that’s a couple of years old running software RAID 1 (mirrored) 320gb SATA drives.
- OS is Ubuntu 9.04 running web services.
- The failed drive is no longer readable by the system.
- There are only two partitions on the drive : System and Swap.
Easiest thing to do here is to replace the drive (first making a new backup).
I just ran a quick check on the raid status to confirm the email I had received.
cat /proc/mdstat (maybe you will need to sudo this command)
This is my output
—–
Sun Jun 26:02:27 PM:~$ cat /proc/mdstat Personalities : [linear] [multipath] [raid0] [raid1] [raid6] [raid5] [raid4] [raid10] md1 : active raid1 sda2[0] 303805120 blocks [2/1] [U_] md0 : active raid1 sda1[0] 7815488 blocks [2/1] [U_] unused devices: <none> Sun Jun 26:02:28 PM:~$ —–Now I had to pull the bad drive and replace it.
1- Sometimes you can find out which is the bad drive by looking in dmesg for the read failure on the device.
dmesg | grep ata (or whatever is appropriate for you)
2- Shutdown and unplug the suspect drive – reboot to confirm you have the correct device unplugged.
3- Plug in the new drive (best if it is unpartitioned/unformatted)
reboot and watch the boot up to see if the drive shows up – if you don’t see it go by on the screen (i always get attracted to something else and forget to watch carefully).
Once the box is booted up – grep the output of dmesg to find the new device.
4- You can also check (and get important info for the next steps) by running
sudo fdisk -l
——–
Sun Jun 26:02:30 PM:~$ sudo fdisk -l [sudo] password for ken: Disk /dev/sda: 320.0 GB, 320072933376 bytes 255 heads, 63 sectors/track, 38913 cylinders Units = cylinders of 16065 * 512 = 8225280 bytes Disk identifier: 0x59b728b7 Device Boot Start End Blocks Id System /dev/sda1 1 973 7815591 fd Linux raid autodetect /dev/sda2 * 974 38795 303805215 fd Linux raid autodetect Disk /dev/sdb: 500.1 GB, 500107862016 bytes 255 heads, 63 sectors/track, 60801 cylinders Units = cylinders of 16065 * 512 = 8225280 bytes Disk identifier: 0x00000000 Disk /dev/sdb doesn’t contain a valid partition table Disk /dev/md0: 8003 MB, 8003059712 bytes 2 heads, 4 sectors/track, 1953872 cylinders Units = cylinders of 8 * 512 = 4096 bytes Disk identifier: 0x00000000 Disk /dev/md0 doesn’t contain a valid partition table Disk /dev/md1: 311.0 GB, 311096442880 bytes 2 heads, 4 sectors/track, 75951280 cylinders Units = cylinders of 8 * 512 = 4096 bytes Disk identifier: 0x00000000 Disk /dev/md1 doesn’t contain a valid partition table Sun Jun 26:02:35 PM:~$ —–Note that my working drive is sda with a couple of partitions.
The device sdb doesn’t have a valid partition table. Your milage (and drive designations will vary).
5- Now to get the raid back on track we need to copy the existing partition table from the functioning raid drive to the newly installed drive.
(Dangerous stuff here – I have never tried it but would almost bet money that getting the drives backwards would not be ‘good’)
So here is my output for sudo sfdisk -l
—–
Sun Jun 26:02:49 PM:~$ sudo sfdisk -l Disk /dev/sda: 38913 cylinders, 255 heads, 63 sectors/track Units = cylinders of 8225280 bytes, blocks of 1024 bytes, counting from 0 Device Boot Start End #cyls #blocks Id System /dev/sda1 0+ 972 973- 7815591 fd Linux raid autodetect /dev/sda2 * 973 38794 37822 303805215 fd Linux raid autodetect /dev/sda3 0 – 0 0 0 Empty /dev/sda4 0 – 0 0 0 Empty Disk /dev/sdb: 60801 cylinders, 255 heads, 63 sectors/track sfdisk: ERROR: sector 0 does not have an msdos signature /dev/sdb: unrecognized partition table type No partitions found Disk /dev/md0: 1953872 cylinders, 2 heads, 4 sectors/track sfdisk: ERROR: sector 0 does not have an msdos signature /dev/md0: unrecognized partition table type No partitions found Disk /dev/md1: 75951280 cylinders, 2 heads, 4 sectors/track sfdisk: ERROR: sector 0 does not have an msdos signature /dev/md1: unrecognized partition table type No partitions found Sun Jun 26:02:49 PM:~$ —–6- Check out the man page for sfdisk and read through some of the stuff there.
We are going to use the -d option which should give us the partition information
about one device and pipe that through to the other device – hopefully using the partition information gleaned from the good drive to recreate the same partitions on the new drive…(fingers crossed here)
sudo sfdisk -d /dev/sda | sudo sfdisk /dev/sdb
So again we are just piping the output of the first sfdisk command into the input of the second.
If you want see the output of the first part of the command before you commit to destroying whatever is on the target of the second sfdisk command you can enter just that portion and see what you get.
sudo sfdisk -d /dev/sda (again use the appropriate drive designation here for your system – not mine)
You should get some output that sort of makes sense to you…
—–
Sun Jun 26:02:49 PM:~$ sudo sfdisk -d /dev/sda # partition table of /dev/sda unit: sectors /dev/sda1 : start= 63, size= 15631182, Id=fd /dev/sda2 : start= 15631245, size=607610430, Id=fd, bootable /dev/sda3 : start= 0, size= 0, Id= 0 /dev/sda4 : start= 0, size= 0, Id= 0 Sun Jun 26:02:57 PM:~$ —–If you point this command at the newly installed drive you should get an error (unless it has an existing partition table that sfdisk recognizes).
Here is mine again
—–
Sun Jun 26:02:57 PM:~$ sudo sfdisk -d /dev/sdb sfdisk: ERROR: sector 0 does not have an msdos signature /dev/sdb: unrecognized partition table type No partitions found Sun Jun 26:02:59 PM:~$—–
All of this double checking makes me feel a little better about continuing…
—–
Sun Jun 26:02:59 PM:~$ sudo sfdisk -d /dev/sda | sudo sfdisk /dev/sdb Checking that no-one is using this disk right now … OK Disk /dev/sdb: 60801 cylinders, 255 heads, 63 sectors/track Old situation: Units = cylinders of 8225280 bytes, blocks of 1024 bytes, counting from 0 Device Boot Start End #cyls #blocks Id System /dev/sdb1 0+ 972 973- 7815591 fd Linux raid autodetect /dev/sdb2 * 973 38794 37822 303805215 fd Linux raid autodetect /dev/sdb3 0 – 0 0 0 Empty /dev/sdb4 0 – 0 0 0 Empty New situation: Units = sectors of 512 bytes, counting from 0 Device Boot Start End #sectors Id System /dev/sdb1 63 15631244 15631182 fd Linux raid autodetect /dev/sdb2 * 15631245 623241674 607610430 fd Linux raid autodetect /dev/sdb3 0 – 0 0 Empty /dev/sdb4 0 – 0 0 Empty Successfully wrote the new partition table Re-reading the partition table … If you created or changed a DOS partition, /dev/foo7, say, then use dd(1) to zero the first 512 bytes: dd if=/dev/zero of=/dev/foo7 bs=512 count=1 (See fdisk(8).) Sun Jun 26:03:01 PM:~$ —–Whew…I always get a little butterfly thing no matter how many drives i break…
(P.S. right at the moment I am listening to a shredder from the early 90s Gary Hoey. No Joe Satriani but still fun sometimes)
7- So now I want to take another quick look at all of the partitions with
sudo fdisk -l
—–
Sun Jun 26:03:02 PM:~$ sudo fdisk -l Disk /dev/sda: 320.0 GB, 320072933376 bytes 255 heads, 63 sectors/track, 38913 cylinders Units = cylinders of 16065 * 512 = 8225280 bytes Disk identifier: 0x59b728b7 Device Boot Start End Blocks Id System /dev/sda1 1 973 7815591 fd Linux raid autodetect /dev/sda2 * 974 38795 303805215 fd Linux raid autodetect Disk /dev/sdb: 500.1 GB, 500107862016 bytes 255 heads, 63 sectors/track, 60801 cylinders Units = cylinders of 16065 * 512 = 8225280 bytes Disk identifier: 0x00000000 Device Boot Start End Blocks Id System /dev/sdb1 1 973 7815591 fd Linux raid autodetect /dev/sdb2 * 974 38795 303805215 fd Linux raid autodetect Disk /dev/md0: 8003 MB, 8003059712 bytes 2 heads, 4 sectors/track, 1953872 cylinders Units = cylinders of 8 * 512 = 4096 bytes Disk identifier: 0x00000000 Disk /dev/md0 doesn’t contain a valid partition table Disk /dev/md1: 311.0 GB, 311096442880 bytes 2 heads, 4 sectors/track, 75951280 cylinders Units = cylinders of 8 * 512 = 4096 bytes Disk identifier: 0x00000000 Disk /dev/md1 doesn’t contain a valid partition table—–
8- Nice – there is the second drive with appropriate partitions but still not a happy raid camper.
Again and again – use the correct nomenclature for your particular system configuration
In the case of our example system we will use these commands
This is for the swap partition
sudo mdadam –add /dev/md0 /dev/sdb1
—–
Sun Jun 26:03:13 PM:~$ sudo mdadm –add /dev/md0 /dev/sdb1 mdadm: added /dev/sdb1 —–and this is for the system partition
sudo madam –add /dev/md1 /dev/sdb2
—–
Sun Jun 26:03:13 PM:~$ sudo mdadm –add /dev/md1 /dev/sdb2 mdadm: added /dev/sdb2 —– so now that we did that -lets see what is going on by looking at mdstat again.—–
Sun Jun 26:03:16 PM:~$ cat /proc/mdstat Personalities : [linear] [multipath] [raid0] [raid1] [raid6] [raid5] [raid4] [raid10] md1 : active raid1 sdb2[2] sda2[0] 303805120 blocks [2/1] [U_] [>………………..] recovery = 0.9% (2881792/303805120) finish=80.2min speed=62493K/sec md0 : active raid1 sdb1[1] sda1[0] 7815488 blocks [2/2] [UU] unused devices: <none> Sun Jun 26:03:16 PM:~$—–
Awesome stuff. Look, the computer machine is working to bring the newly added device up to snuff. Love it.
You can also get additional information using
sudo mdadm –detail /dev/md1
sudo mdadm –detail /dev/md0
—–
Sun Jun 26:04:02 PM:~$ sudo mdadm –detail /dev/md0 /dev/md0: Version : 00.90 Creation Time : Sat Sep 19 19:59:31 2009 Raid Level : raid1 Array Size : 7815488 (7.45 GiB 8.00 GB) Used Dev Size : 7815488 (7.45 GiB 8.00 GB) Raid Devices : 2 Total Devices : 2 Preferred Minor : 0 Persistence : Superblock is persistent Update Time : Sun Jun 26 15:15:35 2011 State : clean Active Devices : 2 Working Devices : 2 Failed Devices : 0 Spare Devices : 0 UUID : c6fe5bf1:47145c2e:8f53a666:581a3da1 Events : 0.604 Number Major Minor RaidDevice State 0 8 1 0 active sync /dev/sda1 1 8 17 1 active sync /dev/sdb1 Sun Jun 26:04:04 PM:~$-----Sun Jun 26:03:30 PM:~$ sudo mdadm –detail /dev/md1 /dev/md1: Version : 00.90 Creation Time : Sat Sep 19 19:59:48 2009 Raid Level : raid1 Array Size : 303805120 (289.73 GiB 311.10 GB) Used Dev Size : 303805120 (289.73 GiB 311.10 GB) Raid Devices : 2 Total Devices : 2 Preferred Minor : 1 Persistence : Superblock is persistent Update Time : Sun Jun 26 16:02:40 2011 State : active, degraded, recovering Active Devices : 1 Working Devices : 2 Failed Devices : 0 Spare Devices : 1 Rebuild Status : 55% complete UUID : a004ba5a:4a61bca9:f20d5c50:35d36b51 Events : 0.4423477 Number Major Minor RaidDevice State 0 8 2 0 active sync /dev/sda2 2 8 18 1 spare rebuilding /dev/sdb2 —–
9- Now go get a cup of coffee, tea, water…what ever you enjoy. this recovery will take a bit of time to complete. I am going to have some left over pasta from dinner last night.
10- Finally we want to install GRUB on to the new drive
sudo grub-install /dev/md1Good luck, ken.