Skip to content

Commit ae7035a

Browse files
authored
chore: prepare new blog
1 parent 3558178 commit ae7035a

File tree

2 files changed

+282
-0
lines changed

2 files changed

+282
-0
lines changed
Lines changed: 282 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,282 @@
1+
---
2+
title: "Restoring from Longhorn Backups: A Complete Guide"
3+
date: 2025-09-11 14:30:00 +0300
4+
categories: [kubernetes, disaster-recovery]
5+
#tags: [kubernetes,longhorn,backup,restore,disaster-recovery,k8s,storage,minio,prometheus,grafana,jellyfin,flux,gitops]
6+
description: Complete step-by-step guide to restoring Kubernetes applications from Longhorn backups stored in MinIO, including scaling strategies, PVC management, and real-world lessons learned.
7+
image:
8+
path: /assets/img/posts/k8s-longhorn-restore.webp
9+
alt: Kubernetes Longhorn backup restoration guide
10+
draft: true
11+
---
12+
13+
14+
# Restoring Your Kubernetes Applications from Longhorn Backups: A Complete Guide
15+
16+
When disaster strikes your Kubernetes cluster, having a solid backup strategy isn't enough—you need to know how to restore your applications quickly and reliably. Recently, I had to rebuild my entire K8S cluster from scratch and restore all my applications from 3-month-old Longhorn backups stored in MinIO. Here's the complete step-by-step process that got my media stack and observability tools back online.
17+
18+
## The Situation
19+
20+
After redeploying my K8S cluster with Flux GitOps, I found myself with:
21+
- ✅ Fresh cluster with all applications deployed via Flux
22+
- ✅ Longhorn storage configured and connected to MinIO backend
23+
- ✅ All backup data visible in Longhorn UI
24+
- ❌ Empty volumes for all applications
25+
- ❌ Lost configurations, dashboards, and media metadata
26+
27+
The challenge? Restore 6 critical applications to their backup state without losing the current Flux-managed infrastructure.
28+
29+
## Applications to Restore
30+
31+
Here's what needed restoration:
32+
- **Prometheus** (45GB) - Monitoring metrics and configuration
33+
- **Loki** (20GB) - Log aggregation and retention
34+
- **Jellyfin** (10GB) - Media library and metadata
35+
- **Grafana** (10GB) - Dashboards and data sources
36+
- **QBittorrent** (5GB) - Torrent client configuration
37+
- **Sonarr** (5GB) - TV show management settings
38+
39+
## Prerequisites
40+
41+
Before starting, ensure you have:
42+
- Kubernetes cluster with kubectl access
43+
- Longhorn installed and configured
44+
- Backup storage backend accessible (MinIO/S3)
45+
- Applications deployed (scaled up or down doesn't matter)
46+
- Longhorn UI access for backup management
47+
48+
## Step 1: Assess Current State
49+
50+
First, let's understand what we're working with:
51+
52+
```bash
53+
# Check current deployments and statefulsets
54+
kubectl get deployments -A
55+
kubectl get statefulsets -A
56+
57+
# Review current PVCs
58+
kubectl get pvc -A -o wide
59+
60+
# Verify Longhorn storage class
61+
kubectl get storageclass
62+
```
63+
64+
This gives you a complete picture of your current infrastructure and identifies which PVCs need replacement.
65+
66+
## Step 2: Scale Down Applications
67+
68+
**Critical:** Before touching any storage, scale down applications to prevent data corruption:
69+
70+
```bash
71+
# Scale down deployments
72+
kubectl scale deployment jellyfin --replicas=0 -n default
73+
kubectl scale deployment qbittorrent --replicas=0 -n default
74+
kubectl scale deployment sonarr --replicas=0 -n default
75+
kubectl scale deployment grafana --replicas=0 -n observability
76+
77+
# Scale down statefulsets
78+
kubectl scale statefulset loki --replicas=0 -n observability
79+
kubectl scale statefulset prometheus-kube-prometheus-stack --replicas=0 -n observability
80+
kubectl scale statefulset alertmanager-kube-prometheus-stack --replicas=0 -n observability
81+
```
82+
83+
Wait for all pods to terminate before proceeding.
84+
85+
## Step 3: Remove Current Empty PVCs
86+
87+
Since the current PVCs contain only empty data, we need to remove them:
88+
89+
```bash
90+
# Delete PVCs in default namespace
91+
kubectl delete pvc jellyfin -n default
92+
kubectl delete pvc qbittorrent -n default
93+
kubectl delete pvc sonarr -n default
94+
95+
# Delete PVCs in observability namespace
96+
kubectl delete pvc grafana -n observability
97+
kubectl delete pvc storage-loki-0 -n observability
98+
kubectl delete pvc prometheus-kube-prometheus-stack-db-prometheus-kube-prometheus-stack-0 -n observability
99+
```
100+
101+
## Step 4: Restore Backups via Longhorn UI
102+
103+
This is where the magic happens. Access your Longhorn UI and navigate to the **Backup** tab.
104+
105+
For each backup, click the **⟲ (restore)** button and configure:
106+
107+
### Prometheus Backup
108+
- **Name**: `prometheus-restored`
109+
- **Storage Class**: `longhorn`
110+
- **Access Mode**: `ReadWriteOnce`
111+
112+
### Loki Backup
113+
- **Name**: `loki-restored`
114+
- **Storage Class**: `longhorn`
115+
- **Access Mode**: `ReadWriteOnce`
116+
117+
### Jellyfin Backup
118+
- **Name**: `jellyfin-restored`
119+
- **Storage Class**: `longhorn`
120+
- **Access Mode**: `ReadWriteOnce`
121+
122+
### Continue for all other backups...
123+
124+
**Important**: Wait for all restore operations to complete before proceeding. You can monitor progress in the Longhorn UI.
125+
126+
## Step 5: Create PersistentVolumes
127+
128+
Once restoration completes, the restored Longhorn volumes need PersistentVolumes to be accessible by Kubernetes:
129+
130+
```yaml
131+
# Example for Jellyfin - repeat for all applications
132+
apiVersion: v1
133+
kind: PersistentVolume
134+
metadata:
135+
name: jellyfin-restored-pv
136+
spec:
137+
capacity:
138+
storage: 10Gi
139+
accessModes:
140+
- ReadWriteOnce
141+
persistentVolumeReclaimPolicy: Retain
142+
storageClassName: longhorn
143+
csi:
144+
driver: driver.longhorn.io
145+
fsType: ext4
146+
volumeAttributes:
147+
numberOfReplicas: "3"
148+
staleReplicaTimeout: "30"
149+
volumeHandle: jellyfin-restored
150+
```
151+
152+
Apply this pattern for all restored volumes, adjusting the `storage` capacity and `volumeHandle` to match your backups.
153+
154+
## Step 6: Create PersistentVolumeClaims
155+
156+
Now create PVCs that bind to the restored PersistentVolumes:
157+
158+
```yaml
159+
# Example for Jellyfin
160+
apiVersion: v1
161+
kind: PersistentVolumeClaim
162+
metadata:
163+
name: jellyfin
164+
namespace: default
165+
spec:
166+
accessModes:
167+
- ReadWriteOnce
168+
resources:
169+
requests:
170+
storage: 10Gi
171+
storageClassName: longhorn
172+
volumeName: jellyfin-restored-pv
173+
```
174+
175+
The key here is using `volumeName` to bind the PVC to the specific PV we created.
176+
177+
## Step 7: Verify Binding
178+
179+
Check that all PVCs are properly bound:
180+
181+
```bash
182+
# Check binding status
183+
kubectl get pvc -n default | grep -E "(jellyfin|qbittorrent|sonarr)"
184+
kubectl get pvc -n observability | grep -E "(grafana|storage-loki|prometheus)"
185+
186+
# Verify Longhorn volume status
187+
kubectl get volumes -n longhorn-system | grep "restored"
188+
```
189+
190+
You should see all PVCs in `Bound` status and Longhorn volumes as `attached` and `healthy`.
191+
192+
## Step 8: Scale Applications Back Up
193+
194+
With storage properly restored and connected, bring your applications back online:
195+
196+
```bash
197+
# Scale deployments back up
198+
kubectl scale deployment jellyfin --replicas=1 -n default
199+
kubectl scale deployment qbittorrent --replicas=1 -n default
200+
kubectl scale deployment sonarr --replicas=1 -n default
201+
kubectl scale deployment grafana --replicas=1 -n observability
202+
203+
# Scale statefulsets back up
204+
kubectl scale statefulset loki --replicas=1 -n observability
205+
kubectl scale statefulset prometheus-kube-prometheus-stack --replicas=1 -n observability
206+
kubectl scale statefulset alertmanager-kube-prometheus-stack --replicas=1 -n observability
207+
```
208+
209+
## Step 9: Final Verification
210+
211+
Confirm everything is working correctly:
212+
213+
```bash
214+
# Check pod status
215+
kubectl get pods -A | grep -v Running | grep -v Completed
216+
217+
# Verify Longhorn volumes are healthy
218+
kubectl get volumes -n longhorn-system | grep "restored"
219+
220+
# Test application functionality
221+
kubectl get pods -n default -o wide
222+
kubectl get pods -n observability -o wide
223+
```
224+
225+
## Results and Key Lessons
226+
227+
### What Was Restored Successfully ✅
228+
- **Jellyfin**: Complete media library, metadata, and user settings
229+
- **Grafana**: All dashboards, data sources, and alerting rules
230+
- **Prometheus**: Historical metrics and configuration
231+
- **Loki**: Log retention policies and stored logs
232+
- **QBittorrent**: Torrent configurations and download states
233+
- **Sonarr**: TV show monitoring and quality profiles
234+
235+
### Important Considerations
236+
237+
1. **Data Age**: My backups were 3 months old, so any data created after that point was lost. Plan backup frequency accordingly.
238+
239+
2. **Storage Sizes**: Pay attention to backup sizes vs. current PVC sizes. My Prometheus backup was 45GB while the current PVC was only 15GB—the restore process required updating the PVC size.
240+
241+
3. **Volume Naming**: Longhorn creates restored volumes with specific names. The PV `volumeHandle` must match exactly.
242+
243+
4. **Application Dependencies**: Some applications have interdependencies. Restore core infrastructure (Prometheus, Grafana) before application-specific services.
244+
245+
## Alternative: CLI-Based Restoration
246+
247+
For automation or when UI access isn't available, you can restore via Longhorn's CRD:
248+
249+
```yaml
250+
apiVersion: longhorn.io/v1beta1
251+
kind: Volume
252+
metadata:
253+
name: jellyfin-restored
254+
namespace: longhorn-system
255+
spec:
256+
size: "10737418240" # Size in bytes
257+
restoreVolumeRecurringJob: false
258+
fromBackup: "s3://your-minio-bucket/backups/backup-name"
259+
```
260+
261+
## Conclusion
262+
263+
Restoring Kubernetes applications from Longhorn backups requires careful orchestration of scaling, PVC management, and volume binding. The process took about 45 minutes for 6 applications, but the result was a complete restoration to the previous backup state.
264+
265+
Key takeaways:
266+
- **Always scale down applications first** to prevent corruption
267+
- **Understand the relationship** between Longhorn volumes, PVs, and PVCs
268+
- **Test your backup restoration process** before you need it
269+
- **Document your PVC naming conventions** for faster recovery
270+
- **Monitor backup age** vs. acceptable data loss
271+
272+
Having a solid backup strategy is crucial, but knowing how to restore efficiently under pressure is what separates good infrastructure management from great infrastructure management.
273+
274+
## Next Steps
275+
276+
Consider implementing:
277+
- **Automated backup validation** to ensure restorability
278+
- **Backup age monitoring** with alerts
279+
- **Documentation of critical PVC mappings**
280+
- **Regular disaster recovery drills**
281+
282+
Your future self will thank you when disaster strikes again.
321 KB
Loading

0 commit comments

Comments
 (0)