Surviving the Blackout: A Pragmatic Disaster Recovery Strategy for Norwegian Systems
It is 3:00 AM. Your monitoring dashboard is a sea of red. The SSH handshake times out. Your hosting provider's status page just updated: "Major power failure in Zone A. Cooling systems offline."
If your stomach just dropped, your Disaster Recovery (DR) plan is theoretical. If you poured a coffee and executed a Terraform script to spin up your failover environment, you have an actual plan.
In the last year, we have seen data centers literally burn to the ground in Europe and ransomware gangs target mid-sized Nordic enterprises with ruthless efficiency. Coupled with the legal complexities of Schrems II, the strategy of "just dump it to S3" is no longer just lazy—it is potentially illegal for Norwegian companies handling sensitive data.
This is not a high-level management summary. This is a technical architect's guide to keeping the lights on when the world goes dark.
The Legal Reality: Why Location Matters in 2022
Before we touch a single config file, we must address the elephant in the server room: Data Sovereignty. Since the CJEU's Schrems II ruling, transferring personal data to US-controlled clouds (even those with servers in Europe) carries significant risk due to the US CLOUD Act. The Norwegian Data Protection Authority (Datatilsynet) has been clear about the scrutiny required for international transfers.
The Fix: Keep your primary and DR sites within jurisdictions that respect GDPR without extraterritorial caveats. Hosting on CoolVDS ensures your data resides on infrastructure governed by Norwegian and EEA law, eliminating the compliance headache of explaining to auditors why your customer database is technically accessible by a foreign subpoena.
Defining the Metrics: RTO vs. RPO
You cannot optimize what you do not measure. In every Service Level Agreement (SLA), two metrics dictate your architecture:
- RPO (Recovery Point Objective): How much data can you afford to lose? (e.g., "We can lose the last 15 minutes of transactions.")
- RTO (Recovery Time Objective): How long can you be offline? (e.g., "We must be back up within 4 hours.")
If you demand an RPO of zero (no data loss) and an RTO of near-zero, you are not looking for backups; you are looking for synchronous multi-site replication. That is expensive. For most robust web applications, an RPO of 1 hour and RTO of 4 hours is the sweet spot between cost and continuity.
Phase 1: The Database Layer
Your stateless application containers are easy to restore. Your database is where the war is won or lost. For a standard MySQL/MariaDB setup on a VPS, reliance on `mysqldump` alone is insufficient for high-traffic sites due to locking and restoration time.
Strategy: Asynchronous Replication + Point-in-Time Recovery
On CoolVDS NVMe instances, I/O performance is high enough to handle binary logging without noticeable latency. Enable binary logging in your my.cnf to allow for point-in-time recovery.
[mysqld]
server-id = 1
log_bin = /var/log/mysql/mysql-bin.log
expire_logs_days = 7
max_binlog_size = 100M
binlog_format = ROW
sync_binlog = 1
innodb_flush_log_at_trx_commit = 1
The sync_binlog = 1 and innodb_flush_log_at_trx_commit = 1 flags are the "paranoid mode" settings. They force a disk write for every transaction. On spinning rust (HDD), this kills performance. On CoolVDS NVMe storage, the latency penalty is negligible, buying you ACID compliance without the tears.
Phase 2: Immutable File Backups with Borg
Ransomware loves to encrypt your backups too. If your backup server is mounted as a writable drive on your main server, you are vulnerable. We use BorgBackup because it supports encryption, compression, and deduplication natively.
The Push vs. Pull Debate: heavily restrict the backup server. Ideally, the backup server should pull data, or use a restricted SSH key that can only execute the borg command.
# Restricting SSH keys in .ssh/authorized_keys on the backup destination
command="borg serve --restrict-to-path /mnt/backups/repo",no-pty,no-agent-forwarding,no-port-forwarding,no-X11-forwarding,no-user-rc ssh-rsa AAAAB3Nza... user@production-server
This ensures that even if your production server is compromised, the attacker cannot wipe the existing backups on the remote repository.
Pro Tip: Test your restoration speed. I recently saw a client with 2TB of data on a budget VPS provider. Their download speed was capped, and restoring the data would have taken 4 days. CoolVDS offers unmetered internal traffic, making restores from a secondary storage instance lightning fast.
Phase 3: Infrastructure as Code (IaC)
If your server vanishes, do you remember exactly which PHP modules were installed? Probably not. In 2022, manual server configuration is negligence.
Use Terraform to define your CoolVDS infrastructure. If the worst happens, you change the region variable and run terraform apply.
resource "coolvds_instance" "recovery_node" {
count = var.dr_mode ? 1 : 0
hostname = "dr-web-01"
plan = "nvme-16gb"
location = "oslo-datacenter-b"
image = "ubuntu-20.04"
ssh_keys = [
var.admin_ssh_key
]
}
Note: While CoolVDS integrates with standard cloud-init, keeping your state files encrypted and off-site is mandatory.
The "War Game" Scenario
A DR plan that hasn't been tested is just a hope. Once a quarter, you should perform a "Game Day":
- Spin up a fresh CoolVDS instance in a separate isolation group.
- Deploy your Ansible playbooks or Docker Compose files.
- Restore the database from the last night's backup.
- Measure the time.
If the database restore takes 3 hours, and your CEO expects 1 hour, you have a hardware problem. This is usually where disk IOPS become the bottleneck. Standard SATA SSDs often choke on large SQL imports. This is why we standardize on NVMe for all tiers at CoolVDS—high IOPS are not a luxury; they are a recovery requirement.
Summary: The 3-2-1-1 Strategy
Update the classic rule for the modern threat landscape:
- 3 copies of data.
- 2 different media types (e.g., Block Storage and Object Storage).
- 1 offsite location (Geographically separated).
- 1 immutable copy (or offline) to prevent ransomware encryption.
Disaster recovery is about removing the panic from the equation. When you know your data is sovereign, your pipelines are automated, and your hardware is performant enough to handle a rapid restore, a 3:00 AM outage isn't a crisis. It's just a checklist.
Ready to harden your infrastructure? Don't let slow I/O compromise your RTO. Deploy a high-availability test environment on CoolVDS today and experience the difference raw NVMe performance makes.