How to safely replace a hard drive in a Linux MDRAID Array.
Whether you have a failed drive, or want to step up to a larger set of drives, or even just want to clone a mirrored system, here is the procedure I use.
Let’s lay out an example scenario - say we have a mirrored (RAID1) array, and I just got an Email alert from
smartmontools telling me that a drive with the serial number
7S8DJ3D2KEH has failed.
Step 1 - Check the array status
first you should check the array’s status with
dan@raidlab1:~$ cat /proc/mdstat md2 : active raid1 sdc1 sde1 1942896704 blocks super 1.2 [2/2] [UU] bitmap: 2/15 pages [8KB], 65536KB chunk
Looks good, we don’t have a re-sync action running so we can safely continue.
Step 2 - Find the serial number if you don’t already have it
If you don’t have a failed drive, but want to replace one for another reason, you can use the
hdparm utility to find out which serial numbers go to which drive:
dan@raidlab1:~$ sudo hdparm -I /dev/sdc /dev/sdc: Serial Number: 7S8DJ3D2KEH dan@raidlab1:~$ sudo hdparm -I /dev/sde /dev/sde: Serial Number: D82KG9L1LTL
Step 3 - Remove the drive from the array
OK, let’s replace the
sdc drive, the first step is to tell
MDADM that the drive has failed, and then remove it from the array:
dan@raidlab1:~$ sudo mdadm --manage /dev/md2 --fail /dev/sdc1 dan@raidlab1:~$ sudo mdadm --manage /dev/md2 --remove /dev/sdc1
Step 4 - Remove the drive from the kernel
Next, tell the OS to delete the reference to the drive, this doesn’t remove any data, it just tells the kernel that the disk is no longer available:
dan@raidlab1:~$ echo 1 | sudo tee /sys/block/sdc/device/delete
Step 5 - Physically change out the drive
Now that we are sure there are no write operations occuring on this drive, we can physically remove it from the system and replace it with the new drive. You can use the
lsblk command to look for the new drive’s location in the
/dev directory, if it is brand new it will not list any partitions. In my experience, if you remove the drive
sdc for example, and replace it with another drive, the new drive will also be
sdc, or whichever the original drive was.
Step 6 - Partition the new drive
So in this example the new drive is
/dev/sdc just as the old one was. Now we need to copy the partition table from another drive in the array,
sde in this case, we will use the
sfdisk command to dump the partition table from
sde, and pipe that data back into the
sfdisk command to write the table to
dan@raidlab1:~$ sudo sfdisk -d /dev/sde | sudo sfdisk /dev/sdc
Just to be safe, let’s compare the partition tables of each of those drives:
dan@raidlab1:~$ sudo fdisk -l /dev/sde
Should be the same output as:
dan@raidlab1:~$ sudo fdisk -l /dev/sdc
You’ll also need to randomize the GUID of the new disk to prevent conflicts with other drives.
sgdisk -G /dev/sdc
Step 7 - Add the new drive to the array
All that’s left is to add the new drive to the array and let it re-sync:
dan@raidlab1:~$ sudo mdadm --manage /dev/md2 --add /dev/sdc1
We can check the progress of the resync by again running:
dan@raidlab1:~$ cat /proc/mdstat
Lastly, if you have
smartmontools installed and running, we need to reset the daemon so it doesn’t keep warning about the drive we removed:
dan@raidlab1:~$ sudo systemctl restart smartd
And that’s all there is to it! While you wait for your array to re-sync, here are some really great hard drives to keep on hand for your next replacement.
Home Lab Grade:
Mission Critical Grade: