Removing, replacing, and resyncing a disk in a degraded RAID array 

This guide will explain how to remove the faulty disk from the existing RAID array, add a new disk, and synchronize it.

Our data center constantly monitors RAID arrays for the operating systems we provide. If we detect a RAID array degradation or failure event, we will promptly notify the customer and recommend scheduling maintenance to replace the failed disk. The replacement process usually takes around 15-20 minutes and should not lead to data loss. As long as the remaining disks continue functioning, the new disk will copy any data, allowing the RAID array to continue functioning without data loss.

Some key points to consider: 

  • This guide applies to various RAID configurations, including RAID 1, RAID 5, and RAID 10. It covers the process for a software-type RAID array. 
  • RAID arrays provide data redundancy and improved performance, but a degraded array can compromise data integrity. Properly removing, replacing, and resynchronizing hard drives is critical to maintaining the health of your RAID system. 
  • Ensure you have “smartmontools” installed, as it is required to retrieve the serial number of the failed drive.   

Step 1: Check the current RAID array status 

  1. Open your terminal. 
  2. Connect to your server. 
  3. Check the current RAID array status (see Fig. 1)
    $cat /proc/mdstat

    Fig. 1. Checking the current RAID array status inside the terminal. 

    This command will display the status of your RAID arrays. The faulty drive could be marked as [F] or might have disappeared from the RAID array. 

Step 2: Mark the faulty disk as failed 

  1. Identify the faulty disk by running the following command:
    $ cat /proc/mdstat
    This command will display the disk marked as [F].

    If a drive is missing, use the following command to identify the missing disk:
    $ lsblk
  2. Mark the disk as failed (see Fig. 2)
    $ mdadm --manage /dev/md0 --fail /dev/sdb2 

    Fig. 2. Marking /dev/sdb2 as a faulty disk. 

    Replace "/dev/md0" with your RAID array device and "/dev/sdb2" with the faulty disk.

Step 3: Install “smartmontools” and retrieve the serial number of the faulty disk (if not already installed) 

  1. Install “smartmontools”:
     
    For Debian/Ubuntu operating systems: 
    $ sudo apt-get install smartmontools 

    For CentOS/RHEL operating systems: 

    $ sudo yum install smartmontools 

    For Fedora

    $ sudo dnf install smartmontools 

    For Arch Linux

    $ sudo pacman -S smartmontools 
  2. Find the serial number of the faulty disk (see Fig. 3)
    $ smartctl -a /dev/sdb2 | grep "Serial Number" 

    Fig. 3. Finding the serial number of "/dev/sdb2" using ”smartmontool”. 

    Replace "/dev/sdb2" with the faulty disk's device name. This command will display the faulty disk's serial number. 

Step 4: Remove the faulty disk from the array 

  1. Remove the disk from the array (see Fig. 4)
    $ mdadm --manage /dev/md0 --remove /dev/sdb2 

    Fig. 4. Removing the faulty disk from the array. 

    This command removes the failed disk from the RAID array. 
  2. Verify the removal (see Fig. 5)
    $ cat /proc/mdstat 

    Fig. 5. Verifying that the disk has been removed from the array. 

    Ensure the faulty disk is no longer part of the RAID array. 

Step 5: Contact support and power off the server 

  1. Contact technical support: 
    Inform them that you have completed all the above steps and the server is ready for maintenance. 
  2. Power off the server: 
    $ sudo shutdown -h now 
    Technical support will replace the faulty drive with a new one and power the server back on. 

Step 6: Add the new disk 

  1. Power on the server if it is not already powered on. 
  2. Copy the partition table from a good disk to the new disk. Identify the good disk (e.g., /dev/sda) and the new disk (e.g., /dev/sdb): 
    $ sfdisk -d /dev/sda | sfdisk /dev/sdb 
    Replace "/dev/sda" with the good disk's device name and "/dev/sdb" with the new disk's device name. 
  3. Verify the partition table on the new disk matches the other disks in the array (see Fig. 6): 
    $ fdisk –l 

    Fig. 6. Checking the file system of the new drive to ensure it matches other disks. 

    Confirm the filesystem of the new disk is the same as the other disks in the array. 
  4. Add the new disk to the RAID array (see Fig. 7)
    $ mdadm --manage /dev/md0 --add /dev/sdb2 

    Fig. 7. Adding the new disk to the RAID array. 

    Replace "/dev/sdb2" with the new disk's partition name.

Step 7: Synchronize the RAID Array 

  1. Check the synchronization status: 
    $ cat /proc/mdstat 

    Monitor the synchronization process. This may take some time depending on the size of the disks and the RAID level. 

Troubleshooting 

If you encounter any issues during these steps, refer to additional resources for more detailed information or contact technical support. 

Summary 

By following these steps, you can effectively manage your RAID array, ensuring minimal downtime and data integrity.