|
| 1 | +--- |
| 2 | +title: "Disaster Recovery Guide: My Approach to Safeguarding Critical Data" |
| 3 | +date: 2025-03-01 10:00:00 +0200 |
| 4 | +categories: [infrastructure] |
| 5 | +tags: [homelab, backup, disaster-recovery, proxmox, synology, hetzner, pbs, security] |
| 6 | +description: A structured disaster recovery plan for homelab enthusiasts and businesses. |
| 7 | +image: |
| 8 | + path: /assets/img/posts/disaster-recovery-banner.webp |
| 9 | + alt: Disaster Recovery Guide for Homelab and Business |
| 10 | +--- |
| 11 | + |
| 12 | +In today's digital landscape, data loss can be catastrophic whether you're running a sophisticated homelab or managing IT for an organization of any size. This guide shares my personal disaster recovery strategy, incorporating industry best practices and security considerations to help you build resilience against potential failures. |
| 13 | + |
| 14 | +## Common Backup Mistakes and Ransomware Risks |
| 15 | + |
| 16 | +> Even with a structured backup plan, common mistakes can make backups ineffective when disaster strikes. |
| 17 | +{: .prompt-warning } |
| 18 | + |
| 19 | +### ❌ Frequent Backup Mistakes |
| 20 | + |
| 21 | +- **Lack of Backup Testing** – A backup is useless if you've never tested restoring from it. |
| 22 | +- **Storing Backups on the Same System** – Backups stored on the same machine or network are vulnerable to failures, ransomware, and accidental deletion. |
| 23 | +- **Unencrypted Backups** – Without encryption, your backups can be easily compromised. |
| 24 | +- **Overwriting Previous Backups** – Without versioning, ransomware or file corruption can render all backup copies useless. |
| 25 | +- **No Offsite Backup** – Only keeping local copies increases risk in case of fire, theft, or natural disasters. |
| 26 | + |
| 27 | +### 🔥 Ransomware & Backup Protection |
| 28 | + |
| 29 | +> Modern ransomware attacks actively seek and encrypt backup files, making recovery impossible unless preventive measures are in place. |
| 30 | +{: .prompt-danger } |
| 31 | + |
| 32 | +To mitigate these threats: |
| 33 | + |
| 34 | +- **🛡 Implement Backup Protections** |
| 35 | + - PBS allows **marking snapshots as 'protected'**, preventing accidental deletion. |
| 36 | + - **Note:** Users with sufficient permissions can still remove this protection. |
| 37 | + - More details: [Proxmox Forum Discussion](https://forum.proxmox.com/threads/immutable-backups.107332/) |
| 38 | +- **🔌 Air-Gapped Backups** – Maintain at least one backup that is offline or isolated from the network. |
| 39 | +- **🔑 Enable Multi-Factor Authentication (MFA)** – Restrict access to backup systems to prevent unauthorized tampering. |
| 40 | +- **📊 Set Up Alerts for Backup Failures** – Ensure you're notified immediately when a backup job fails, so you can take action. |
| 41 | + |
| 42 | +> By addressing these risks, you can ensure your backups remain resilient against both accidental failures and cyber threats. |
| 43 | +{: .prompt-tip } |
| 44 | + |
| 45 | +## My Infrastructure Stack |
| 46 | + |
| 47 | +### Core Components |
| 48 | + |
| 49 | +- **Primary Storage**: Synology DS223 with 2x2TB drives in RAID1 configuration |
| 50 | + |
| 51 | +{: width="600" height="400" } |
| 52 | +_Synology DS223 NAS setup_ |
| 53 | + |
| 54 | +- **Backup Server**: Proxmox Backup Server (PBS) running with 4x Intel D3-S4510 SSDs in RAIDz2 |
| 55 | + |
| 56 | +{: width="600" height="400" } |
| 57 | +_Proxmox Backup Server configuration_ |
| 58 | + |
| 59 | +- **Offsite Storage**: Hetzner VPS with attached StorageBox for geographical redundancy |
| 60 | + |
| 61 | +{: width="600" height="400" } |
| 62 | +_Hetzner StorageBox setup_ |
| 63 | + |
| 64 | +- **Secure Key Management**: Local KeyPass on iOS/MacOS containing encryption keys |
| 65 | + |
| 66 | +### The 3-2-1 Backup Strategy in Action |
| 67 | + |
| 68 | +I've implemented the widely-recommended 3-2-1 backup approach: |
| 69 | +- **3** copies of data (original + 2 backups) |
| 70 | +- **2** different storage types (local NAS and cloud storage) |
| 71 | +- **1** offsite copy (Hetzner StorageBox) |
| 72 | + |
| 73 | +### Backup Flow & Schedule |
| 74 | + |
| 75 | +My automated backup chain ensures data flows through the system with minimal intervention: |
| 76 | + |
| 77 | +1. **VM/LXC → PBS**: Every Saturday at 02:00 |
| 78 | + - Primary backup of all virtual machines and containers |
| 79 | + - Encrypted at rest for security |
| 80 | + |
| 81 | +{: width="700" height="400" } |
| 82 | +_Automated backup schedule configuration_ |
| 83 | + |
| 84 | +2. **PBS → Synology**: Every Saturday at 06:00 |
| 85 | + - Secondary local copy using rsync and crontab |
| 86 | + - RAID1 protection against single drive failure |
| 87 | + |
| 88 | +3. **Synology → Hetzner**: Every Saturday at 08:00 |
| 89 | + - Offsite copy for geographic redundancy |
| 90 | + - Protection against local disasters (fire, theft, etc.) |
| 91 | + |
| 92 | +## Implementation Details |
| 93 | + |
| 94 | +### Critical Scripts for Backup Automation |
| 95 | + |
| 96 | +#### PBS to Synology Rsync (Running on PBS Server) |
| 97 | + |
| 98 | +```bash |
| 99 | +0 6 * * 6 rsync -av --delete --progress /mnt/datastore/ /mnt/hyperbackup/ >> /var/log/rsync_backup.log 2>&1 |
| 100 | +``` |
| 101 | + |
| 102 | +More info about these scripts can be found [here](#scenario-2-hetzner-vpsstoragebox-failure) |
| 103 | + |
| 104 | +#### Proxmox Backup Client Script (backup-pbs.sh) |
| 105 | + |
| 106 | +```bash |
| 107 | +#!/bin/bash |
| 108 | + |
| 109 | +# 1) Export token secret as "PBS_PASSWORD" |
| 110 | +export PBS_PASSWORD='token-secret-from-PBS' |
| 111 | + |
| 112 | +# 2) Define user@pbs + token |
| 113 | +export PBS_USER_STRING='token-id-from-PBS' |
| 114 | + |
| 115 | +# 3) PBS IP/hostname |
| 116 | +export PBS_SERVER='PBS-IP' |
| 117 | + |
| 118 | +# 4) Datastore name |
| 119 | +export PBS_DATASTORE='DATASTORE_PBS' |
| 120 | + |
| 121 | +# 5) Build complete repository |
| 122 | +export PBS_REPOSITORY="${PBS_USER_STRING}@${PBS_SERVER}:${PBS_DATASTORE}" |
| 123 | + |
| 124 | +# 6) Get local server shortname |
| 125 | +export PBS_HOSTNAME="$(hostname -s)" |
| 126 | + |
| 127 | +# 7) ENCRYPTION KEY |
| 128 | +export PBS_KEYFILE='/root/pbscloud_key.json' |
| 129 | + |
| 130 | +echo "Run pbs backup for $PBS_HOSTNAME ..." |
| 131 | + |
| 132 | +proxmox-backup-client backup \ |
| 133 | + srv.pxar:/srv \ |
| 134 | + volumes.pxar:/var/lib/docker/volumes \ |
| 135 | + netw.pxar:/var/lib/docker/network \ |
| 136 | + etc.pxar:/etc \ |
| 137 | + scripts.pxar:/usr/local/bin \ |
| 138 | + --keyfile /root/pbscloud_key.json \ |
| 139 | + --skip-lost-and-found \ |
| 140 | + --repository "$PBS_REPOSITORY" |
| 141 | + |
| 142 | +# List existing backups |
| 143 | +proxmox-backup-client list --repository "${PBS_REPOSITORY}" |
| 144 | + |
| 145 | +echo "Done." |
| 146 | +``` |
| 147 | + |
| 148 | +#### Proxmox Backup Client Restore Script (backup-pbs-restore.sh) |
| 149 | + |
| 150 | +```bash |
| 151 | +#!/bin/bash |
| 152 | + |
| 153 | +# Global configs |
| 154 | +export PBS_PASSWORD='token-secret-from-PBS' |
| 155 | +export PBS_USER_STRING='token-id-from-PBS' |
| 156 | +export PBS_SERVER='PBS_IP' |
| 157 | +export PBS_DATASTORE='DATASTORE_FROM_PBS' |
| 158 | +export PBS_KEYFILE='/root/pbscloud_key.json' |
| 159 | +export PBS_REPOSITORY="${PBS_USER_STRING}@${PBS_SERVER}:${PBS_DATASTORE}" |
| 160 | + |
| 161 | +# Input parameters |
| 162 | +SNAPSHOT_PATH="$1" |
| 163 | +ARCHIVE_NAME="$2" |
| 164 | +RESTORE_DEST="$3" |
| 165 | + |
| 166 | +# Parameter validation |
| 167 | +if [[ -z "$SNAPSHOT_PATH" || -z "$ARCHIVE_NAME" || -z "$RESTORE_DEST" ]]; then |
| 168 | + echo "Usage: $0 <snapshot_path> <archive_name> <destination>" |
| 169 | + echo "Example: $0 \"host/cloud/2025-01-22T15:19:17Z\" srv.pxar /root/restore-srv" |
| 170 | + exit 1 |
| 171 | +fi |
| 172 | + |
| 173 | +# Create destination if needed |
| 174 | +mkdir -p "$RESTORE_DEST" |
| 175 | + |
| 176 | +# Summary display |
| 177 | +echo "=== PBS Restore ===" |
| 178 | +echo "Snapshot: $SNAPSHOT_PATH" |
| 179 | +echo "Archive: $ARCHIVE_NAME" |
| 180 | +echo "Destination: $RESTORE_DEST" |
| 181 | +echo "Repository: $PBS_REPOSITORY" |
| 182 | +echo "Encryption key $PBS_KEYFILE" |
| 183 | +echo "=====================" |
| 184 | + |
| 185 | +# Run restore |
| 186 | +proxmox-backup-client restore \ |
| 187 | + "$SNAPSHOT_PATH" \ |
| 188 | + "$ARCHIVE_NAME" \ |
| 189 | + "$RESTORE_DEST" \ |
| 190 | + --repository "$PBS_REPOSITORY" \ |
| 191 | + --keyfile "$PBS_KEYFILE" |
| 192 | + |
| 193 | +EXIT_CODE=$? |
| 194 | + |
| 195 | +if [[ $EXIT_CODE -eq 0 ]]; then |
| 196 | + echo "=== Restore completed successfully! ===" |
| 197 | +else |
| 198 | + echo "Restore error (code $EXIT_CODE)." |
| 199 | +fi |
| 200 | + |
| 201 | +exit $EXIT_CODE |
| 202 | +``` |
| 203 | + |
| 204 | +## Disaster Recovery Scenarios |
| 205 | + |
| 206 | +Having a backup is only half the solution—knowing how to restore is equally critical. Here are my documented procedures for various failure scenarios: |
| 207 | + |
| 208 | +### Scenario 1: Synology NAS Failure |
| 209 | + |
| 210 | +Even if my primary NAS fails, data remains safe in two locations: |
| 211 | +1. Proxmox Backup Server (4x Intel SSDs in RAIDz2) |
| 212 | +2. Hetzner StorageBox (offsite) |
| 213 | + |
| 214 | +**Recovery Steps:** |
| 215 | +1. Replace the failed hardware components |
| 216 | +2. Reconfigure RAID1 on the new or repaired NAS |
| 217 | +3. Restore HyperBackup schedule (targeting Saturday 08:00) |
| 218 | +4. Verify successful completion of first backup cycle |
| 219 | + |
| 220 | +### Scenario 2: Hetzner VPS/StorageBox Failure |
| 221 | + |
| 222 | +If my cloud provider experiences issues: |
| 223 | + |
| 224 | +1. Provision a new VPS with appropriate specifications |
| 225 | +2. Install proxmox-backup-client: |
| 226 | + - For Ubuntu: Follow the [community guide](https://forum.proxmox.com/threads/install-the-backup-client-on-ubuntu-desktop-24-04.146065/) |
| 227 | + - For Debian: Use standard package installation methods |
| 228 | +3. Create the encryption key file at `/root/pbscloud_key.json`: |
| 229 | + - Retrieve the key from KeyPass (stored on iOS/MacOS) |
| 230 | +4. Deploy backup automation scripts: |
| 231 | + - `backup-pbs.sh` for regular backups |
| 232 | + - `backup-pbs-restore.sh` for potential recoveries |
| 233 | +5. Test both backup and restore functionality to verify operations |
| 234 | +6. Restore crontab for automatically backup: |
| 235 | + |
| 236 | +```bash |
| 237 | +0 2 * * * /usr/local/bin/backup-pbs.sh >> /var/log/backup-cloud.log 2>&1 |
| 238 | +``` |
| 239 | + |
| 240 | +### Scenario 3: PBS Server Failure |
| 241 | + |
| 242 | +In case my primary backup server fails: |
| 243 | + |
| 244 | +1. Download and install the latest PBS ISO |
| 245 | +2. Configure storage properly: |
| 246 | + |
| 247 | +```bash |
| 248 | +# /etc/proxmox-backup/datastore.cfg |
| 249 | +datastore: raidz2 |
| 250 | + comment |
| 251 | + gc-schedule sat 03:30 |
| 252 | + notification-mode notification-system |
| 253 | + path /mnt/datastore |
| 254 | +``` |
| 255 | + |
| 256 | +3. Verify `/etc/fstab` contains correct mount point: |
| 257 | + |
| 258 | +```bash |
| 259 | +#raidz2 |
| 260 | +/dev/sdb /mnt/datastore ext4 defaults 0 2 |
| 261 | +``` |
| 262 | + |
| 263 | +> **Note**: /dev/sdb represents the RAIDz2 array (because in this scenario the PBS is a VM and /dev/sdb is a second disk attached from RAIDz2 pool) |
| 264 | +{: .prompt-info } |
| 265 | + |
| 266 | +4. Ensure the datastore has the required structure: |
| 267 | + - `.chunks` |
| 268 | + - `vm` |
| 269 | + - `.gc-status` |
| 270 | + - `ct` |
| 271 | + - `host` |
| 272 | + |
| 273 | +5. Data can be restored from multiple sources: |
| 274 | + - Original RAIDz2 array (if drives survived) |
| 275 | + - Hetzner StorageBox (`/mnt/storagebox/Storage_1`) |
| 276 | + - Synology NAS (`/volume1/Backup/Proxmox/hyperbackup`) |
| 277 | + |
| 278 | +6. Import VM/LXC encryption key from KeyPass into the new PVE environment. |
| 279 | + |
| 280 | +## Security Best Practices |
| 281 | + |
| 282 | +Based on my experience, here are critical security measures for robust disaster recovery: |
| 283 | + |
| 284 | +### Encryption Throughout the Chain |
| 285 | + |
| 286 | +1. **Data-at-Rest Encryption**: All my backups are encrypted using strong keys |
| 287 | +2. **Transport Encryption**: Using secure SSH tunnels for data transfer |
| 288 | +3. **Key Management**: Isolated storage of encryption keys in KeyPass |
| 289 | +4. **Regular Key Rotation**: Changing encryption keys periodically |
| 290 | + |
| 291 | +### Access Control |
| 292 | + |
| 293 | +1. **Principle of Least Privilege**: Backup systems have minimal permissions |
| 294 | +2. **Token-Based Authentication**: Using secure tokens rather than passwords |
| 295 | +3. **Network Segmentation**: Backup systems on separate network segments |
| 296 | +4. **Firewall Rules**: Strict ingress/egress rules for backup traffic |
| 297 | + |
| 298 | +### Critical Files and Keys |
| 299 | + |
| 300 | +Always securely store: |
| 301 | +- Encryption keys in KeyPass (iOS/MacOS): |
| 302 | + - `/root/pbscloud_key.json` |
| 303 | + - PBS VM/LXC encryption key |
| 304 | +- PBS Configuration: `/etc/proxmox-backup/datastore.cfg` |
| 305 | +- Backup location references: |
| 306 | + 1. PBS: `/mnt/datastore` |
| 307 | + 2. Synology: `/volume1/Backup/Proxmox/hyperbackup` |
| 308 | + 3. Hetzner: `/mnt/storagebox/Storage_1` |
| 309 | + |
| 310 | +## Continuous Improvement Recommendations |
| 311 | + |
| 312 | +> No backup system is perfect without ongoing validation and improvement. Here are practices I'm implementing or planning to adopt: |
| 313 | +{: .prompt-tip } |
| 314 | + |
| 315 | +### ✅ Regular Backup Verification |
| 316 | +- **🔄 Monthly integrity checks** on random files |
| 317 | +- **🔍 Checksum validation** to detect bit rot |
| 318 | +- **📊 Log analysis** for backup completion and failures |
| 319 | + |
| 320 | +### 🛠 Automated Recovery Testing |
| 321 | +- **🔄 Quarterly test restores** to verify recoverability |
| 322 | +- **📜 Documented results** with timing measurements |
| 323 | +- **🎯 Improvement targets** based on test results |
| 324 | + |
| 325 | +### 🔔 Monitoring and Alerting |
| 326 | +- **📡 Real-time monitoring** of backup processes |
| 327 | +- **⚠ Alert systems** for backup failures or delays |
| 328 | +- **📉 Storage capacity trend analysis** to prevent space issues |
| 329 | + |
| 330 | +### 📖 Documentation and Training |
| 331 | +- **📑 Keeping recovery documentation updated** |
| 332 | +- **🔄 Regular practice** of recovery procedures |
| 333 | +- **👥 Cross-training** to ensure multiple people can perform recovery |
| 334 | + |
| 335 | +### 🔐 Security Updates |
| 336 | +- **🔄 Regular patching** of backup systems |
| 337 | +- **🛡 Vulnerability scanning** of the backup infrastructure |
| 338 | +- **🔑 Updating encryption standards** as needed |
| 339 | + |
| 340 | +## Conclusion |
| 341 | + |
| 342 | +> Disaster recovery isn't just about having backups—it's about having a **proven, tested strategy** that can be executed confidently when needed. |
| 343 | +{: .prompt-tip } |
| 344 | + |
| 345 | +For **homelabbers and businesses alike**, the approach outlined here provides a **solid foundation** for data protection **without enterprise-level budgets**. |
| 346 | + |
| 347 | +By implementing **proper backup chains, documenting recovery procedures, and regularly testing your systems**, you can achieve **peace of mind** knowing your critical data can survive: |
| 348 | +✔ Hardware failures |
| 349 | +✔ Human errors |
| 350 | +✔ Malicious attacks |
| 351 | + |
| 352 | +**💬 What disaster recovery strategies do you use in your environment?** |
| 353 | +I'd love to hear your thoughts and experiences in the comments below! 🚀 |
| 354 | + |
| 355 | +--- |
| 356 | + |
| 357 | +*Disclaimer: This approach works for my specific needs but should be adapted to your unique requirements. Always test your recovery procedures thoroughly before relying on them in an actual disaster scenario.* |
0 commit comments