Console Login

Disaster Recovery Architecture: Surviving Ransomware and Auditors in Norway

Disaster Recovery Architecture: Surviving Ransomware and Auditors in Norway

Let’s be blunt. If your Disaster Recovery (DR) plan relies solely on a nightly cron job running tar sent to an S3 bucket, your infrastructure is already dead; you just haven't looked at the logs yet. In 2024, data loss isn't just about hardware failure—it's about ransomware that targets backups first, and regulatory bodies like Datatilsynet that hand out fines based on negligence.

I’ve audited enough startup infrastructure in Oslo to know the pattern: great CI/CD pipelines, decent Kubernetes clusters, and absolutely fragile recovery protocols. When a Postgres node corrupts or an rm -rf / goes rogue, the "cloud magic" evaporates, leaving you with cold, hard recovery time objectives (RTO).

This guide isn't for hobbyists. It's for the Pragmatic CTO who needs to secure business continuity while navigating GDPR, Schrems II, and tight budgets. We will build a DR strategy that actually works when the terminal goes dark.

The "Schrems II" Reality Check

Before we touch a single config file, we must address the legal elephant in the server room. For Norwegian and European companies, simply dumping encrypted backups into a US-owned cloud provider poses a legal risk post-Schrems II. Data sovereignty is no longer optional.

Your DR site needs to be geographically separated from your primary production environment but ideally remain within the same legal jurisdiction to minimize compliance friction. This is where a local, sovereign provider like CoolVDS becomes a strategic asset rather than just a vendor. Hosting your failover or backup repositories on VPS Norway infrastructure ensures that your data never leaves the EEA, satisfying auditors and keeping latency low (often sub-10ms via NIX) for faster synchronization.

Architecture: HA is not DR

A common misconception is treating High Availability (HA) as Disaster Recovery. They are distinct disciplines.

  • High Availability: Surviving a node failure (e.g., Kubernetes self-healing, Database Replication).
  • Disaster Recovery: Surviving data corruption, ransomware, or entire datacenter loss.

If you execute DROP DATABASE production; on your primary node, your HA cluster will faithfully replicates that destruction to all secondaries in milliseconds. Congratulations, you are now highly available and entirely empty.

The 3-2-1-1-0 Rule

In 2024, the classic 3-2-1 rule is outdated. We use 3-2-1-1-0:

  • 3 copies of data.
  • 2 different media types (NVMe block storage + Object Storage/Tape).
  • 1 offsite location (e.g., CoolVDS Oslo datacenter).
  • 1 offline/immutable copy (Air-gapped or Object Lock).
  • 0 errors on recovery verification.

Technical Implementation: The Immutable Backup

Ransomware attackers actively hunt for backup mount points to encrypt them. To defeat this, we use a "pull" model rather than a "push" model, or utilize immutable flags.

Strategy 1: The Pull Model with Restic

Don't let your production server have SSH access to your backup server. Instead, the backup server should connect to production to pull data. If production is compromised, the attacker cannot reach the backup repository.

# On the Backup Server (e.g., CoolVDS Storage Instance)
# Initialize a repository
restic init --repo /mnt/secure-backups/prod-01

# The backup command (executed via cron on the BACKUP server)
restic -r /mnt/secure-backups/prod-01 backup \
    --ssh-command "ssh -i /root/.ssh/id_rsa_pull -l backupuser" \
    sftp:backupuser@production-server:/var/www/html

Strategy 2: Database Consistency

File-level backups are useless for running databases. You need transactionally consistent dumps. For MySQL/MariaDB, mysqldump with --single-transaction is the bare minimum, but for large datasets (50GB+), you should be using Percona XtraBackup or Mariabackup to avoid table locking.

# /usr/local/bin/dr_backup_mysql.sh
#!/bin/bash
TIMESTAMP=$(date +"%F")
BACKUP_DIR="/backup/mysql/$TIMESTAMP"
mkdir -p $BACKUP_DIR

# Stream backup directly to compression to save local NVMe I/O
mariabackup --backup \
    --stream=xbstream \
    --extra-lsndir=$BACKUP_DIR/chkpoint \
    --user=backup_user \
    --password=$SECURE_PASS \
    | gzip > $BACKUP_DIR/full_backup.xb.gz

# Check exit code
if [ $? -eq 0 ]; then
    echo "Backup Successful"
else
    logger -p local0.err "MySQL Backup Failed"
    exit 1
fi

Minimizing RTO with NVMe

Recovery Time Objective (RTO) is the duration acceptable for downtime. If your backup is on slow, spinning rust (HDD), restoring 500GB might take 6 hours. This is unacceptable for modern commerce.

Pro Tip: Network throughput usually isn't the bottleneck in recovery; disk I/O is. When restoring a database, the system must write massive amounts of data and rebuild indices simultaneously. This is why we standardize on NVMe storage at CoolVDS. The high IOPS capability allows you to ingest the backup stream and rebuild indices 4x-10x faster than standard SSDs.

Automating the "Fire Drill"

A backup that hasn't been restored is Schrödinger's Backup—it effectively doesn't exist. You must automate verification. Here is a simplified approach using a temporary container.

# verify_backup.sh
# Spin up a Docker container to test the restore

docker run -d --name dr-test -e MYSQL_ROOT_PASSWORD=test mariadb:10.11

# Wait for init
sleep 15

# Attempt restore
zcat /backup/mysql/latest/full_backup.xb.gz | docker exec -i dr-test mbstream -x -C /var/lib/mysql

# Verify data integrity check
docker exec dr-test mysql -uroot -ptest -e "CHECK TABLE critical_orders;"

if [ $? -eq 0 ]; then
  echo "DR Verification Passed: Data is readable."
  docker rm -f dr-test
else
  echo "DR Verification FAILED"
  # Trigger PagerDuty/OpsGenie alert here
fi

Network considerations: The Northern Advantage

When disaster strikes, you may need to reroute traffic via DNS or BGP. If your primary site is in Oslo and your DR site is in Frankfurt, you introduce latency changes that can break application logic or trigger timeouts in synchronous microservices.

Maintaining your DR presence within Norway (using a provider like CoolVDS) keeps the network topology predictable. With direct peering at NIX (Norwegian Internet Exchange), latency between your primary office and the DR site remains negligible, ensuring that remote desktop sessions or admin consoles remain snappy during the crisis.

The Final Word on Infrastructure

You cannot script your way out of bad hardware. While we focus heavily on software resilience, the underlying metal matters. Stability comes from redundant power feeds and enterprise-grade hardware virtualization (KVM) rather than container-based virtualization which suffers from noisy neighbors.

Disaster Recovery is an insurance policy. You hope to never use it, but the premium you pay—in time and resources—must purchase certainty. By leveraging immutable backup techniques, local NVMe infrastructure, and rigorous automated testing, you transform a potential catastrophe into a manageable incident.

Action Item: Audit your current RTO. If it exceeds 4 hours, it's time to re-architect. Spin up a dedicated storage instance on CoolVDS today and test your cross-site bandwidth throughput.