Disaster Recovery is No Longer Just About Backups: It's About Sovereignty
If you are running infrastructure in 2022 without a Disaster Recovery (DR) plan that accounts for both ransomware and Schrems II compliance, you aren't a CTO; you're a gambler. I've sat in boardrooms in Oslo where the question shifted from "How much data did we lose?" to "Why is our backup data sitting on a server in a jurisdiction that violates GDPR?"
The days of lazily dumping tarballs into an Amazon S3 bucket in us-east-1 are over. For Norwegian businesses, the intersection of the Datatilsynet (Data Protection Authority) requirements and the physical reality of latency creates a tight corridor for architectural decisions. You need speed, you need immutability, and you need to know exactly where the physical drive sits.
This isn't a theoretical think-piece. We are going to look at how to build a DR strategy that balances RTO (Recovery Time Objective) with TCO (Total Cost of Ownership), using tools that exist right now in your terminal.
The "Cold" Tier: Immutable Offsite Backups
Ransomware attackers in 2022 attack your backups first. If your backup server is mounted as a writable volume on your production server, it’s not a backup. It’s a target.
For a cost-effective "cold" recovery strategy, we use Restic. Unlike simple rsync scripts, Restic provides encryption by default and supports deduplication, which is critical when you are paying for NVMe storage capacity.
Here is a production-grade wrapper script pattern we use to push encrypted backups to a secondary CoolVDS storage instance located in a separate datacenter zone. Note the use of restricted SFTP:
#!/bin/bash
# /usr/local/bin/run-backup.sh
export RESTIC_REPOSITORY="sftp:backup-user@10.10.20.5:/backups/prod-app-01"
export RESTIC_PASSWORD_FILE="/root/.restic_pw"
# Fail fast on errors
set -e
# Initialize if not exists (one-time check)
restic snapshots > /dev/null 2>&1 || restic init
# Backup /var/www and /etc
# --exclude-caches prevents backing up garbage like __pycache__ or node_modules cache
restic backup /var/www /etc --exclude-caches --tag scheduled_job
# Retention Policy: Keep last 24 hourly, last 7 daily, last 4 weekly
restic forget --keep-hourly 24 --keep-daily 7 --keep-weekly 4 --prune
Pro Tip: On the receiving CoolVDS instance, configure the SSH keys in .ssh/authorized_keys with the command="internal-sftp" flag. This prevents a compromised production server from getting a shell on your backup server. It can only transfer files.
The "Warm" Tier: Database Replication
Backups are for disasters. Replication is for continuity. If you are running a high-traffic e-commerce site targeting the Nordics, a 4-hour restore time from a cold backup is unacceptable.
For PostgreSQL 14 (current stable standard), we avoid complex third-party tools for basic DR. Streaming replication is built-in, robust, and free. The goal is to have a replica on a standby node that can be promoted instantly.
On the Primary node (postgresql.conf):
wal_level = replica
max_wal_senders = 10
wal_keep_size = 512MB # Critical for network jitters
hot_standby = on
listen_addresses = '10.10.10.2' # Private Network IP
On the Replica node, since PostgreSQL 12, we no longer use recovery.conf. Instead, we use a standby.signal file and put connection details in postgresql.conf:
primary_conninfo = 'host=10.10.10.2 port=5432 user=replicator password=SecretPassword123'
restore_command = 'cp /var/lib/postgresql/wal_archive/%f %p'
The Latency Factor: If your primary server is in Oslo and your DR site is in Frankfurt, light speed imposes a latency penalty. Synchronous replication (
synchronous_commit = on) over that distance will kill your write performance. Stick to asynchronous replication for cross-border DR. If you use CoolVDS, you can utilize low-latency private networking between nodes within the same region to enable synchronous replication without the performance hit.
The Infrastructure: Why "Cloud Agnostic" is a Lie
Many CTOs try to build "cloud agnostic" Terraform scripts. In practice, this triples complexity. The pragmatic approach is to choose a provider that respects standard Linux primitives.
When we deploy DR infrastructure, we look for three non-negotiables:
- KVM Virtualization: Containers (LXC/OpenVZ) share a kernel. If the host kernel panics, your "isolated" container dies too. KVM provides the hardware abstraction needed for true segmentation.
- NVMe Storage: Recovery Time Objective (RTO) is largely disk-bound. Restoring 500GB of data on SATA SSDs takes hours. On NVMe, it takes minutes. CoolVDS NVMe instances regularly benchmark at 5-6x the IOPS of standard cloud SSDs.
- Local Compliance: Your provider must be a legal entity within the EEA/Norway to simplify your Record of Processing Activities (ROPA).
Calculating the Cost of Downtime
Before you approve the budget for a standby node, run the numbers.
| Metric | Description | Typical Cost (SME) |
|---|---|---|
| RPO | Max data loss allowed | 1 hour = $$ thousands |
| RTO | Max time to restore | 4 hours = Reputation death |
| CoolVDS Standby | Cost of active standby | Fraction of downtime cost |
The "Hot" Tier: Automated Failover with Keepalived
For the ultimate "Pragmatic CTO" setup, we use a Floating IP (VIP) managed by keepalived using VRRP. This allows you to point your DNS A-record to a single IP that floats between your primary and secondary load balancers.
vrrp_instance VI_1 {
state MASTER
interface eth0
virtual_router_id 51
priority 100
advert_int 1
authentication {
auth_type PASS
auth_pass 1111
}
virtual_ipaddress {
192.168.1.100
}
}
When the master fails, the VIP shifts instantly. No DNS propagation delays. No TTL waiting games.
Conclusion: Verify or Vanish
A DR plan that hasn't been tested is just a Word document. It won't save you.
The regulatory landscape in Norway creates a unique pressure cooker for tech teams. We have to be faster than the hackers and compliant enough for the lawyers. By leveraging standard tools like Restic and Postgres replication on top of robust, compliant infrastructure like CoolVDS, you satisfy both.
Next Step: Audit your current RTO. If it's over 4 hours, spin up a secondary CoolVDS KVM instance today and configure that replication slot. It’s cheaper than a ransom payment.