DevOps Portfolio

Linux Permission Disaster Recovery: When chmod -R 777 / Goes Wrong

Recently, I faced a challenging Linux troubleshooting scenario on an EC2 instance in Amazon Web Services. A colleague accidentally executed the command:

chmod -R 777 /

This command recursively granted full permissions to everyone on the entire filesystem, resulting in several issues:

To complicate matters, the instance lacked an SSM agent, making remote recovery impossible.

Recovery Approach

  1. Stopped the EC2 instance
  2. Detached the root EBS volume
  3. Launched a temporary rescue instance
  4. Attached the broken volume as a secondary disk
  5. Mounted it and entered the system using chroot

Recovery Commands

sudo mount /dev/xvdf1 /mnt/recovery
sudo chroot /mnt/recovery

Then I reset the ubuntu user password and fixed SSH permissions:

Permission Fixes

sudo passwd ubuntu
chmod 755 /home/
chmod 750 /home/ubuntu/
chmod 700 /home/ubuntu/.ssh
chmod 600 /home/ubuntu/.ssh/authorized_keys

After correcting the permissions, I reattached the volume to the original instance and accessed it via SSH and the EC2 Serial Console.

Lessons Learned

⚠️ Never run recursive permission changes on /

The root directory contains critical system files that require specific permissions for security and functionality.

🔧 Always install the SSM agent for emergency access

SSM Session Manager provides a backup access method when SSH fails due to permission issues.

🔒 Linux file permissions are critical for system stability

Understanding permission hierarchies and security implications is essential for system administration.

💡 DevOps Reality Check

DevOps sometimes involves not just automation but also deep troubleshooting under pressure. This incident highlights the importance of understanding Linux fundamentals and having recovery strategies in place before disaster strikes.

#Linux#AWS#Troubleshooting#DevOps#SRE#EC2