Surviving the Crash: A Battle-Tested Disaster Recovery Blueprint for Norwegian Ops
Let's be honest. If your disaster recovery (DR) plan consists solely of a nightly cron job sending a tarball to an AWS S3 bucket in Frankfurt, you don't have a DR plan. You have a placebo.
I learned this the hard way in 2019. We had a "robust" backup strategy. Then a rogue script corrupted our primary RAID array. When we tried to hydrate the S3 archives, we hit two walls: the download latency and the realization that the database schema had drifted three months prior. It took 18 hours to restore. In this industry, 18 hours is an eternity.
For Norwegian businesses operating under strict Datatilsynet oversight and the shadow of Schrems II, relying on US-owned cloud giants for your failover is a compliance gamble. You need sovereignty, and you need speed.
This is not a high-level management summary. This is a technical blueprint for building a resilient, low-latency DR site using standard Linux tools available today (early 2022).
The Geometry of Latency: Why Oslo Matters
Physics is undefeated. If your primary users are in Oslo or Bergen, and your failover site is in Virginia, your RTO (Recovery Time Objective) is already compromised by the speed of light. Even routing through Frankfurt adds unnecessary milliseconds.
When we architect infrastructure at CoolVDS, we emphasize the proximity to NIX (Norwegian Internet Exchange). A local DR site means your replication stream experiences minimal jitter. This allows for synchronous or near-synchronous replication without stalling the primary application thread.
The Stack
We are going to build a DR pipeline using three core technologies:
- WireGuard: For a secure, high-performance encrypted transport layer.
- BorgBackup: For deduplicated, encrypted, and mountable file-level archives.
- PostgreSQL Streaming Replication: For the database layer.
1. The Transport Layer: WireGuard
Forget IPsec. It is bloated and hard to debug. OpenVPN is single-threaded and slow. WireGuard is part of the Linux kernel (since 5.6), making it the only logical choice for high-throughput replication in 2022.
On your CoolVDS recovery instance (let's call it dr-node), generate your keys:
umask 077; wg genkey | tee privatekey | wg pubkey > publickey
Create the configuration at /etc/wireguard/wg0.conf. Note the MTU. If you are tunneling over the internet, fragmentation kills performance. 1360 is usually safe for ensuring packets fit inside standard 1500 MTU frames with headers.
[Interface]
Address = 10.10.10.2/24
SaveConfig = true
PostUp = iptables -A FORWARD -i wg0 -j ACCEPT; iptables -t nat -A POSTROUTING -o eth0 -j MASQUERADE
PostDown = iptables -D FORWARD -i wg0 -j ACCEPT; iptables -t nat -D POSTROUTING -o eth0 -j MASQUERADE
ListenPort = 51820
PrivateKey = [INSERT_DR_NODE_PRIVATE_KEY]
MTU = 1360
[Peer]
PublicKey = [INSERT_PRIMARY_NODE_PUBLIC_KEY]
AllowedIPs = 10.10.10.1/32
Endpoint = primary.example.com:51820
PersistentKeepalive = 25
Start it up. It connects instantly. No handshake delays.
systemctl enable --now wg-quick@wg0
2. The Data Layer: BorgBackup
Rsync is fine for small files, but for disaster recovery, you need versioning and encryption at rest. BorgBackup handles deduplication better than anything else I have used. This is critical when you are paying for storage.
Pro Tip: Never push backups from the primary server. Always pull them from the backup server. If your primary server is compromised by ransomware, the attacker can't delete the backups if the primary server has no SSH access to the backup server.
On your CoolVDS DR instance, set up a restricted SSH user that can only run the borg serve command. Then, initialize the repo:
borg init --encryption=repokey /mnt/nvme_storage/backups/main_repo
We use CoolVDS NVMe instances because the bottleneck in deduplication is often random I/O looking up chunk hashes. Rotating rust drives will choke here. NVMe makes the difference between a 1-hour backup and a 10-minute backup.
Here is a battle-tested backup script. Save this as /usr/local/bin/run-backup.sh:
#!/bin/bash
LOG="/var/log/borg.log"
export BORG_PASSPHRASE="your-super-secret-passphrase"
REPOSITORY="ssh://backupuser@10.10.10.2/./backups/main_repo"
# Backup everything except temporary junk
echo "Starting backup at $(date)" >> $LOG
borg create --stats --progress \
--compression lz4 \
$REPOSITORY::'{hostname}-{now:%Y-%m-%d_%H:%M}' \
/etc \
/home \
/var/www \
--exclude '/var/www/*/var/cache' \
--exclude '/home/*/.cache' >> $LOG 2>&1
# Prune old backups to keep costs down
borg prune -v --list --keep-daily=7 --keep-weekly=4 --keep-monthly=6 $REPOSITORY >> $LOG 2>&1
echo "Backup finished at $(date)" >> $LOG
3. The Database: Hot Standby
File backups are useless for a live database. You need a hot standby. With PostgreSQL 14 (current stable), setting up streaming replication is straightforward.
On the Primary node, configure postgresql.conf to listen on the WireGuard IP:
listen_addresses = 'localhost,10.10.10.1'
wal_level = replica
max_wal_senders = 10
Allow replication in pg_hba.conf:
host replication replicator 10.10.10.2/32 scram-sha-256
On the CoolVDS DR node, stop Postgres, clear the data directory, and pull the base backup:
systemctl stop postgresql
rm -rf /var/lib/postgresql/14/main/*
# Run as postgres user
pg_basebackup -h 10.10.10.1 -D /var/lib/postgresql/14/main -U replicator -P -X stream -R
The -R flag automatically creates the standby.signal file and configures the connection settings. Start the service, and you have a replica that is milliseconds behind the primary.
The Compliance Angle (Schrems II & GDPR)
Here is the reality check. If you replicate your customer data to a US-controlled cloud provider, even if the data center is in Europe, you are navigating a legal minefield regarding data transfer mechanisms. The Datatilsynet (Norwegian Data Protection Authority) has been very clear about risk assessments.
By using a provider like CoolVDS, where the legal entity and the infrastructure are strictly within Norwegian/European jurisdiction, you simplify your compliance posture significantly. Your data travels over an encrypted WireGuard tunnel to a server in Oslo. No Atlantic crossings. No FISA warrants.
Why Hardware Matters for Recovery
When disaster strikes, you don't just need your data; you need it online. We have seen clients try to restore terabytes of data to cheap VPS providers with throttled I/O. The restore process takes days because the "SSD" is actually a shared tier with massive IOPS limits.
We built CoolVDS on KVM virtualization with direct NVMe pass-through where possible. We don't oversell CPU to the point of steal-time death. When you run borg extract, you need raw CPU power for decryption and high IOPS for writing millions of small files. Anything less is negligence.
Final Check
A backup is Schrödinger's cat: it both exists and doesn't exist until you observe it. Schedule a monthly "Fire Drill." Shut down the primary interface. Promote the Postgres standby on the CoolVDS node. Point your DNS to the failover IP.
If you sweat during the drill, your automation isn't good enough yet.