valence

the capacity of one person or thing to react with or affect another in some special way, as by attraction or the facilitation of a function or activity.

Recovering a broken mirrored drive – ubuntu 9.04

Posted on | June 29, 2011 | Comments Off on Recovering a broken mirrored drive – ubuntu 9.04

The other day I got an email from mdadm, a process running on some of our servers that keeps an eye on the raid array.

—–

This is an automatically generated mail message from mdadm running on woo
A DegradedArray event had been detected on md device /dev/md0.
Faithfully yours, etc.
P.S. The /proc/mdstat file currently contains the following:
Personalities : [linear] [multipath] [raid0] [raid1] [raid6] [raid5] [raid4] [raid10]
md1 : active raid1 sda2[0]
303805120 blocks [2/1] [U_]
md0 : active raid1 sda1[0]
7815488 blocks [2/1] [U_]
unused devices: <none>

—–

This was not a happy event – looks like one of the two drives in the array was no longer working.

  • This is a system that’s a couple of years old running software RAID 1 (mirrored) 320gb SATA drives.
  • OS is Ubuntu 9.04 running web services.
  • The failed drive is no longer readable by the system.
  • There are only two partitions on the drive : System and Swap.

Easiest thing to do here is to replace the drive (first making a new backup).

I just ran a quick check on the raid status to confirm the email I had received.

cat /proc/mdstat (maybe you will need to sudo this command)

This is my output

 

—–

Sun Jun 26:02:27 PM:~$ cat /proc/mdstat
Personalities : [linear] [multipath] [raid0] [raid1] [raid6] [raid5] [raid4] [raid10]
md1 : active raid1 sda2[0]
303805120 blocks [2/1] [U_]
md0 : active raid1 sda1[0]
7815488 blocks [2/1] [U_]
unused devices: <none>
Sun Jun 26:02:28 PM:~$
—–

Now I had to pull the bad drive and replace it.

1- Sometimes you can find out which is the bad drive by looking in dmesg for the read failure on the device.

dmesg | grep ata (or whatever is appropriate for you)

2- Shutdown and unplug the suspect drive – reboot to confirm you have the correct device unplugged.

3- Plug in the new drive (best if it is unpartitioned/unformatted)

reboot and watch the boot up to see if the drive shows up – if you don’t see it go by on the screen (i always get attracted to something else and forget to watch carefully).

Once the box is booted up – grep the output of dmesg to find the new device.

4- You can also check (and get important info for the next steps) by running

sudo fdisk -l

——–

Sun Jun 26:02:30 PM:~$ sudo fdisk -l
[sudo] password for ken:
Disk /dev/sda: 320.0 GB, 320072933376 bytes
255 heads, 63 sectors/track, 38913 cylinders
Units = cylinders of 16065 * 512 = 8225280 bytes
Disk identifier: 0x59b728b7
Device       Boot      Start         End      Blocks      Id  System
/dev/sda1                   1                 973       7815591        fd  Linux raid autodetect
/dev/sda2   *           974              38795   303805215   fd  Linux raid autodetect
Disk /dev/sdb: 500.1 GB, 500107862016 bytes
255 heads, 63 sectors/track, 60801 cylinders
Units = cylinders of 16065 * 512 = 8225280 bytes
Disk identifier: 0x00000000
Disk /dev/sdb doesn’t contain a valid partition table
Disk /dev/md0: 8003 MB, 8003059712 bytes
2 heads, 4 sectors/track, 1953872 cylinders
Units = cylinders of 8 * 512 = 4096 bytes
Disk identifier: 0x00000000
Disk /dev/md0 doesn’t contain a valid partition table
Disk /dev/md1: 311.0 GB, 311096442880 bytes
2 heads, 4 sectors/track, 75951280 cylinders
Units = cylinders of 8 * 512 = 4096 bytes
Disk identifier: 0x00000000
Disk /dev/md1 doesn’t contain a valid partition table
Sun Jun 26:02:35 PM:~$
—–

Note that my working drive is sda with a couple of partitions.

The device sdb doesn’t have a valid partition table. Your milage (and drive designations will vary).

5- Now to get the raid back on track we need to copy the existing partition table from the functioning raid drive to the newly installed drive.

(Dangerous stuff here – I have never tried it but would almost bet money that getting the drives backwards would not be ‘good’)

So here is my output for sudo sfdisk -l

—–

Sun Jun 26:02:49 PM:~$ sudo sfdisk -l
Disk /dev/sda: 38913 cylinders, 255 heads, 63 sectors/track
Units = cylinders of 8225280 bytes, blocks of 1024 bytes, counting from 0
Device Boot Start     End   #cyls    #blocks   Id  System
/dev/sda1          0+    972     973-   7815591   fd  Linux raid autodetect
/dev/sda2   *    973   38794   37822  303805215   fd  Linux raid autodetect
/dev/sda3          0       –       0          0    0  Empty
/dev/sda4          0       –       0          0    0  Empty
Disk /dev/sdb: 60801 cylinders, 255 heads, 63 sectors/track
sfdisk: ERROR: sector 0 does not have an msdos signature
/dev/sdb: unrecognized partition table type
No partitions found
Disk /dev/md0: 1953872 cylinders, 2 heads, 4 sectors/track
sfdisk: ERROR: sector 0 does not have an msdos signature
/dev/md0: unrecognized partition table type
No partitions found
Disk /dev/md1: 75951280 cylinders, 2 heads, 4 sectors/track
sfdisk: ERROR: sector 0 does not have an msdos signature
/dev/md1: unrecognized partition table type
No partitions found
Sun Jun 26:02:49 PM:~$
—–

6- Check out the man page for sfdisk and read through some of the stuff there.

We are going to use the -d option which should give us the partition information

about one device and pipe that through to the other device – hopefully using the partition information gleaned from the good drive to recreate the same partitions on the new drive…(fingers crossed here)

 

sudo sfdisk -d /dev/sda | sudo sfdisk /dev/sdb

 

So again we are just piping the output of the first sfdisk command into the input of the second.

If you want see the output of the first part of the command before you commit to destroying whatever is on the target of the second sfdisk command you can enter just that portion and see what you get.

sudo sfdisk -d /dev/sda (again use the appropriate drive designation here for your system – not mine)

You should get some output that sort of makes sense to you…

 

—–

Sun Jun 26:02:49 PM:~$ sudo sfdisk -d /dev/sda
# partition table of /dev/sda
unit: sectors
/dev/sda1 : start=       63, size= 15631182, Id=fd
/dev/sda2 : start= 15631245, size=607610430, Id=fd, bootable
/dev/sda3 : start=        0, size=        0, Id= 0
/dev/sda4 : start=        0, size=        0, Id= 0
Sun Jun 26:02:57 PM:~$
—–

If you point this command at the newly installed drive you should get an error (unless it has an existing partition table that sfdisk recognizes).

Here is mine again

 

—–

Sun Jun 26:02:57 PM:~$ sudo sfdisk -d /dev/sdb
sfdisk: ERROR: sector 0 does not have an msdos signature
/dev/sdb: unrecognized partition table type
No partitions found
Sun Jun 26:02:59 PM:~$

—–

 

All of this double checking makes me feel a little better about continuing…

 

—–

Sun Jun 26:02:59 PM:~$ sudo sfdisk -d /dev/sda | sudo sfdisk /dev/sdb
Checking that no-one is using this disk right now …
OK
Disk /dev/sdb: 60801 cylinders, 255 heads, 63 sectors/track
Old situation:
Units = cylinders of 8225280 bytes, blocks of 1024 bytes, counting from 0
Device Boot Start     End   #cyls    #blocks   Id  System
/dev/sdb1          0+    972     973-   7815591   fd  Linux raid autodetect
/dev/sdb2   *    973   38794   37822  303805215   fd  Linux raid autodetect
/dev/sdb3          0       –       0          0    0  Empty
/dev/sdb4          0       –       0          0    0  Empty
New situation:
Units = sectors of 512 bytes, counting from 0
Device Boot    Start       End   #sectors  Id  System
/dev/sdb1            63  15631244   15631182  fd  Linux raid autodetect
/dev/sdb2   *  15631245 623241674  607610430  fd  Linux raid autodetect
/dev/sdb3             0         –          0   0  Empty
/dev/sdb4             0         –          0   0  Empty
Successfully wrote the new partition table
Re-reading the partition table …
If you created or changed a DOS partition, /dev/foo7, say, then use dd(1)
to zero the first 512 bytes:  dd if=/dev/zero of=/dev/foo7 bs=512 count=1
(See fdisk(8).)
Sun Jun 26:03:01 PM:~$
—–

Whew…I always get a  little butterfly thing no matter how many drives i break…

(P.S. right at the moment I am listening to a shredder from the early 90s Gary Hoey. No Joe Satriani but still fun sometimes)

 

7- So now I want to take another quick look at all of the partitions with

sudo fdisk -l

—–

Sun Jun 26:03:02 PM:~$ sudo fdisk -l
Disk /dev/sda: 320.0 GB, 320072933376 bytes
255 heads, 63 sectors/track, 38913 cylinders
Units = cylinders of 16065 * 512 = 8225280 bytes
Disk identifier: 0x59b728b7
Device Boot      Start         End      Blocks   Id  System
/dev/sda1               1         973     7815591   fd  Linux raid autodetect
/dev/sda2   *         974       38795   303805215   fd  Linux raid autodetect
Disk /dev/sdb: 500.1 GB, 500107862016 bytes
255 heads, 63 sectors/track, 60801 cylinders
Units = cylinders of 16065 * 512 = 8225280 bytes
Disk identifier: 0x00000000
Device Boot      Start         End      Blocks   Id  System
/dev/sdb1               1         973     7815591   fd  Linux raid autodetect
/dev/sdb2   *         974       38795   303805215   fd  Linux raid autodetect
Disk /dev/md0: 8003 MB, 8003059712 bytes
2 heads, 4 sectors/track, 1953872 cylinders
Units = cylinders of 8 * 512 = 4096 bytes
Disk identifier: 0x00000000
Disk /dev/md0 doesn’t contain a valid partition table
Disk /dev/md1: 311.0 GB, 311096442880 bytes
2 heads, 4 sectors/track, 75951280 cylinders
Units = cylinders of 8 * 512 = 4096 bytes
Disk identifier: 0x00000000
Disk /dev/md1 doesn’t contain a valid partition table

—–

8- Nice – there is the second drive with appropriate partitions but still not a happy raid camper.

Again and again – use the correct nomenclature for your particular system configuration

 

In the case of our example system we will use these commands

This is for the swap partition

sudo mdadam –add /dev/md0 /dev/sdb1

—–

Sun Jun 26:03:13 PM:~$ sudo mdadm –add /dev/md0 /dev/sdb1
mdadm: added /dev/sdb1
—–

and this is for the system partition

sudo madam –add /dev/md1 /dev/sdb2

—–

Sun Jun 26:03:13 PM:~$ sudo mdadm –add /dev/md1 /dev/sdb2
mdadm: added /dev/sdb2
—–
so now that we did that -lets see what is going on  by looking at mdstat again.

—–

Sun Jun 26:03:16 PM:~$ cat /proc/mdstat
Personalities : [linear] [multipath] [raid0] [raid1] [raid6] [raid5] [raid4] [raid10]
md1 : active raid1 sdb2[2] sda2[0]
303805120 blocks [2/1] [U_]
[>………………..]  recovery =  0.9% (2881792/303805120) finish=80.2min speed=62493K/sec
md0 : active raid1 sdb1[1] sda1[0]
7815488 blocks [2/2] [UU]
unused devices: <none>
Sun Jun 26:03:16 PM:~$

—–

Awesome stuff. Look, the computer machine is working to bring the newly added device up to snuff. Love it.

 

You can also get additional information using

sudo mdadm –detail /dev/md1

sudo mdadm –detail /dev/md0

—–

Sun Jun 26:04:02 PM:~$ sudo mdadm –detail /dev/md0
/dev/md0:
Version : 00.90
Creation Time : Sat Sep 19 19:59:31 2009
Raid Level : raid1
Array Size : 7815488 (7.45 GiB 8.00 GB)
Used Dev Size : 7815488 (7.45 GiB 8.00 GB)
Raid Devices : 2
Total Devices : 2
Preferred Minor : 0
Persistence : Superblock is persistent
Update Time : Sun Jun 26 15:15:35 2011
State : clean
Active Devices : 2
Working Devices : 2
Failed Devices : 0
Spare Devices : 0
UUID : c6fe5bf1:47145c2e:8f53a666:581a3da1
Events : 0.604
Number   Major   Minor   RaidDevice State
0       8        1        0      active sync   /dev/sda1
1       8       17        1      active sync   /dev/sdb1
Sun Jun 26:04:04 PM:~$
-----
Sun Jun 26:03:30 PM:~$ sudo mdadm –detail /dev/md1
/dev/md1:
Version : 00.90
Creation Time : Sat Sep 19 19:59:48 2009
Raid Level : raid1
Array Size : 303805120 (289.73 GiB 311.10 GB)
Used Dev Size : 303805120 (289.73 GiB 311.10 GB)
Raid Devices : 2
Total Devices : 2
Preferred Minor : 1
Persistence : Superblock is persistent
Update Time : Sun Jun 26 16:02:40 2011
State : active, degraded, recovering
Active Devices : 1
Working Devices : 2
Failed Devices : 0
Spare Devices : 1
Rebuild Status : 55% complete
UUID : a004ba5a:4a61bca9:f20d5c50:35d36b51
Events : 0.4423477
Number   Major   Minor   RaidDevice State
0       8        2        0      active sync   /dev/sda2
2       8       18        1      spare rebuilding   /dev/sdb2
—–

9- Now go get a cup of coffee, tea, water…what ever you enjoy. this recovery will take a bit of time to complete. I am going to have some left over pasta from dinner last night.

10- Finally we want to install GRUB on to the new drive

sudo grub-install /dev/md1

Good luck, ken.

 

 

 

 

Comments

Comments are closed.

  • About

    This website is supported by Ken Lombardi @ analogman consulting.
    phone: 253.two.two.two-7626
    email: ken@analogman'dot'org
    tweet: analogmanorg

  • Admin