How to use "Rescue mode" to troubleshoot your server?

A guide for troubleshooting and repairing your server using "Rescue Mode".

"Rescue mode" is a valuable tool for troubleshooting your server if it becomes unavailable, the file system is corrupted, or you must perform system-level tasks on unmounted partitions. It allows you to boot into a minimal environment to repair or inspect your system. Here's a detailed guide on how to use rescue mode on Cherry Servers to troubleshoot your server.

Key Points to Consider:

  • "Rescue mode" is beneficial when your server’s primary operating system isn’t working correctly and you need to recover from issues like file system corruption, failed migrations, or boot problems.
  • This guide covers Cherry Servers' bare metal servers.
  • Rescue mode allows you to perform tasks such as file system checks (fsck), reinstalling the boot-loader, or backing up data without the operating system running.

Additional use cases for "Rescue mode":

  • File system repair: If your server fails to boot due to file system corruption, you can use tools like fsck to check and repair the disk.
  • Data recovery: "Rescue mode" allows you to mount the file system and copy important data to another location.
  • Partition management: Modify, resize, or recreate partitions when the operating system is not running.
  • Boot-loader fixing: In case of a boot-loader failure, you can reinstall GRUB or other boot-loaders in "Rescue mode".

If you want to reset the root password specifically, please refer to this guide, which walks through the steps of resetting the root password in "Rescue mode".

Step-by-Step Instructions

Step 1: Access the Cherry Servers client portal

  1. Log in to your Cherry Servers account using your credentials.
  2. Navigate to the server you need to troubleshoot.

Step 2: Activate "Rescue mode"

  1. Select the server you need to work on.
  2. Click on "Actions" in the top right corner of the client portal (see Fig. 1).

    Fig. 1. Click the "Actions" button in the Cherry Servers client portal.
  3. From the drop-down menu, select "Enter rescue mode" (see Fig. 2).

    Fig. 2. Select the "Enter rescue mode" option.
  4. A pop-up window will appear to generate a temporary password. This password allows you to access your server while it’s booted into rescue mode (see Fig. 3).

    Fig. 3. Create a temporary password for "Rescue mode" access.
  5. Click "Enter rescue mode" and wait a few minutes while the server reboots into rescue mode (see Fig. 4).

    Fig. 4. Wait for the server to boot into "Rescue mode".
  6. Once the server is successfully in "Rescue mode", you’ll see a confirmation message in the client portal (see Fig. 5).

    Fig. 5. Server already booted in "Rescue mode”.

Step 3: Connect to your server in "Rescue mode"

  1. Use an SSH client to connect to your server. The IP address will be displayed in the Cherry Servers client portal.
  2. Use the "root" username and the password you generated earlier. Ensure you have the password saved, as it will not be displayed again for security reasons.
    • If you encounter issues connecting via SSH, check out the SSH tutorial.

Step 4: Mount the filesystem (if needed)

Once logged in, the server’s primary drive won’t be automatically mounted. Depending on the task, you may need to mount the drive manually.

  1. Identify the root partition with the following command:
    $ lsblk
    The output will show available devices, such as /dev/sda1, /dev/nvme0n1p1, or /dev/md127 if you're using RAID (see Fig. 6).

    Fig. 6. Output showing available partitions.
  2. To mount the root partition, use:
    $ mount /dev/md127 /mnt
    Replace md127 with the appropriate partition on your server.

Step 5: Perform troubleshooting tasks

Once the filesystem is mounted, the next step is to chroot into the mounted filesystem to operate as if you were working directly on your server’s root filesystem. This allows you to run commands as if the system was booted normally.

Chroot into the mounted filesystem:

After mounting the root filesystem in Step 4, follow these commands to prepare for troubleshooting:

  1. Mount necessary filesystems for chroot: Before chrooting, you need to mount several additional directories to ensure that everything works inside the chroot environment:
    $ mount --bind /dev /mnt/dev
    $ mount --bind /proc /mnt/proc
    $ mount --bind /sys /mnt/sys
  2. Chroot into the mounted filesystem: Now, change the root directory to the mounted system:
    $ chroot /mnt
    Once inside the chroot environment, you can perform the following troubleshooting tasks.

File system check (fsck):

If you're troubleshooting file system corruption, use the fsck tool to check and repair the file system. Run the following command to perform a file system check on your root partition or RAID device:

$ fsck /dev/md127

Replace /dev/md127 with the correct device for your filesystem. For non-RAID setups, it might be something like /dev/sda1. If there are errors, fsck will attempt to repair them.

Backup data:

If you need to back up important data before performing repairs, use the scp command to copy files to another machine or external storage:

$ scp /mnt/path_to_file user@remote:/path_to_backup/
  • Replace /mnt/path_to_file with the path to the file you want to back up.
  • Replace user@remote:/path_to_backup/ with your remote server’s username, IP, and destination path.

You can back up entire directories or critical configuration files to prevent data loss during troubleshooting.

Examine system logs:

System logs can provide crucial information to diagnose the problem. You can access logs from within the chroot environment by running:

$ less /var/log/syslog
  • Replace syslog with the relevant log file (e.g., dmesg, auth.log, or messages) for different logs.

Reinstall GRUB boot-loader:

If the issue is related to the boot-loader, such as a corrupted GRUB installation, you can reinstall GRUB from within the chroot environment:

  1. Install GRUB on the appropriate device(s): For RAID or any multi-drive setup, install GRUB on each physical drive:
    $ grub-install /dev/sda
    $ grub-install /dev/sdb
    Replace /dev/sda and /dev/sdb with the correct physical devices for your server. If you are not using RAID, only install GRUB on the primary boot drive.
  2. Update GRUB configuration: After installing GRUB, update the configuration to apply the changes:
    $ update-grub

Exit the Chroot environment

Once you have completed your troubleshooting tasks, exit the chroot environment:

$ exit

Unmount filesystems

After exiting chroot, unmount the filesystems that were mounted for chroot:

$ umount /mnt/dev
$ umount /mnt/proc
$ umount /mnt/sys

Now you can continue with any other steps necessary or proceed to reboot the server once you've finished troubleshooting.

Step 6: Exit "Rescue mode"

Once you have completed your troubleshooting tasks, exit "Rescue mode":

  1. Return to the Cherry Servers client portal and select your server.

  2. Click on "Actions" in the top-right corner and select "Exit rescue mode" (see Fig. 7).


    Fig. 6. Select the "Exit rescue mode" option.
  3. Wait for a few minutes while your server reboots into its normal operating mode. 

Step 7: Verify

Once the server has rebooted, verify that the issue is resolved and the server functions normally. You can log in via SSH again to ensure everything is back to normal.

Summary

Rescue mode is a powerful tool for troubleshooting various server issues. Whether you need to fix file system problems, recover data, or reset the boot-loader, it provides a safe environment to perform essential repairs.