Skip to main content

7 Steps to Safely Replace a Drive in a Linux MD RAID Array

Pulblished:

Updated:

Comments: counting...

How to safely replace a hard drive in a Linux MDRAID Array.

Whether you have a failed drive, or want to step up to a larger set of drives, or even just want to clone a mirrored system, here is the procedure I use.

Let’s lay out an example scenario - say we have a mirrored (RAID1) array, and I just got an Email alert from smartmontools telling me that a drive with the serial number 7S8DJ3D2KEH has failed.

Step 1 - Check the array status

first you should check the array’s status with cat /proc/mdstat:

dan@raidlab1:~$ cat /proc/mdstat
  md2 : active raid1 sdc1[0] sde1[1]
    1942896704 blocks super 1.2 [2/2] [UU]
    bitmap: 2/15 pages [8KB], 65536KB chunk

Looks good, we don’t have a re-sync action running so we can safely continue.

Step 2 - Find the serial number if you don’t already have it

If you don’t have a failed drive, but want to replace one for another reason, you can use the hdparm utility to find out which serial numbers go to which drive:

dan@raidlab1:~$ sudo hdparm -I /dev/sdc
  /dev/sdc:
    Serial Number:  7S8DJ3D2KEH

dan@raidlab1:~$ sudo hdparm -I /dev/sde
  /dev/sde:
    Serial Number:  D82KG9L1LTL

Step 3 - Remove the drive from the array

OK, let’s replace the sdc drive, the first step is to tell MDADM that the drive has failed, and then remove it from the array:

dan@raidlab1:~$ sudo mdadm --manage /dev/md2 --fail /dev/sdc1
dan@raidlab1:~$ sudo mdadm --manage /dev/md2 --remove /dev/sdc1

Step 4 - Remove the drive from the kernel

Next, tell the OS to delete the reference to the drive, this doesn’t remove any data, it just tells the kernel that the disk is no longer available:

dan@raidlab1:~$ echo 1 | sudo tee /sys/block/sdc/device/delete

Step 5 - Physically change out the drive

Now that we are sure there are no write operations occuring on this drive, we can physically remove it from the system and replace it with the new drive. You can use the lsblk command to look for the new drive’s location in the /dev directory, if it is brand new it will not list any partitions. In my experience, if you remove the drive sdc for example, and replace it with another drive, the new drive will also be sdc, or whichever the original drive was.

Step 6 - Partition the new drive

So in this example the new drive is /dev/sdc just as the old one was. Now we need to copy the partition table from another drive in the array, sde in this case, we will use the sfdisk command to dump the partition table from sde, and pipe that data back into the sfdisk command to write the table to sdc:

dan@raidlab1:~$ sudo sfdisk -d /dev/sde | sudo sfdisk /dev/sdc

Just to be safe, let’s compare the partition tables of each of those drives:

dan@raidlab1:~$ sudo fdisk -l /dev/sde

Should be the same output as:

dan@raidlab1:~$ sudo fdisk -l /dev/sdc

You’ll also need to randomize the GUID of the new disk to prevent conflicts with other drives.

sgdisk -G /dev/sdc

Step 7 - Add the new drive to the array

All that’s left is to add the new drive to the array and let it re-sync:

dan@raidlab1:~$ sudo mdadm --manage /dev/md2 --add /dev/sdc1

We can check the progress of the resync by again running:

dan@raidlab1:~$ cat /proc/mdstat

Lastly, if you have smartmontools installed and running, we need to reset the daemon so it doesn’t keep warning about the drive we removed:

dan@raidlab1:~$ sudo systemctl restart smartd

Wrap up

And that’s all there is to it! While you wait for your array to re-sync, here are some really great hard drives to keep on hand for your next replacement.

Home Lab Grade:

Enterprise Grade:

Mission Critical Grade: