Disaster Recovery for Norwegian Ops: When rm -rf / Happens to You
I remember the first time I felt my stomach drop through the floor of a server room. It was 2014. A junior dev had run a cleanup script on production instead of staging. In less than three seconds, the customer database was gone. No "Are you sure?" prompt. Just empty directories and a terrifying silence.
We recovered, but it took 18 hours. Thatβs 18 hours of lost revenue, angry calls, and damaged reputation.
If you are running mission-critical infrastructure in Norway today, relying on a daily mysqldump isn't a strategy; it's negligence. With the EU General Data Protection Regulation (GDPR) looming on the horizon for 2018, the requirements for data integrity and availability are about to get legal teeth. Here is how we build a Disaster Recovery (DR) plan that actually works, using tools available right now in early 2017.
The Norwegian Context: Latency and Legality
Before we touch a single config file, we need to address location. Many VPS providers will happily sell you a "backup solution" that pipes your data to a data center in Virginia or Frankfurt. For a Norwegian business, this is a problem for two reasons:
- Latency: If you need to failover your application, routing traffic from Oslo to Ashburn adds 100ms+ latency. Your application will feel sluggish, and TCP handshakes will drag.
- Data Sovereignty: While we have the Privacy Shield framework (replacing Safe Harbor), keeping Norwegian user data within Norwegian borders is often the safest route for compliance with Datatilsynet, especially for health or financial data.
This is why I stick to providers like CoolVDS. Their infrastructure is local. You get low latency to NIX (Norwegian Internet Exchange) and the legal safety of data residency. But CoolVDS is just the infrastructure; the architecture is on you.
RTO and RPO: The Metrics That Matter
Stop thinking in terms of "backups" and start thinking in terms of two metrics:
- Recovery Time Objective (RTO): How long can you be down? (e.g., "We must be back up in 1 hour.")
- Recovery Point Objective (RPO): How much data can you lose? (e.g., "We can lose up to 15 minutes of transactions.")
If your boss says "zero downtime and zero data loss," ask for a budget of 5 million NOK. If they don't have that, we use Master-Slave Replication and Continuous Synchronization.
Step 1: Real-Time File Mirroring with Lsyncd
Rsync is great, but cron jobs leave gaps. If you run rsync every hour, you risk losing 59 minutes of uploaded files. In 2017, we use lsyncd (Live Syncing Daemon). It watches your local directory trees for event monitor interface (inotify) events and spawns a process to synchronize the changes to a remote disaster recovery VPS.
Install it on your primary CoolVDS instance:
apt-get install lsyncd
Configure /etc/lsyncd/lsyncd.conf.lua to mirror your web root to your secondary server. This setup assumes you have SSH key-based auth set up between servers.
settings {
logfile = "/var/log/lsyncd/lsyncd.log",
statusFile = "/var/log/lsyncd/lsyncd.status"
}
sync {
default.rsync,
source = "/var/www/html",
target = "dr-user@10.0.0.5:/var/www/html",
delete = false, -- Safety first! Don't propagate deletes instantly in DR.
delay = 1,
rsync = {
binary = "/usr/bin/rsync",
archive = true,
compress = true,
_extra = { "--bwlimit=2000" } -- Don't saturate the NIC
}
}
Pro Tip: Noticedelete = false. If an attacker wipes your primary server, you do not wantlsyncdto instantly wipe your backup. Use a separate script to clean up old files on the DR server during a maintenance window.
Step 2: MySQL 5.7 GTID Replication
Database dumps are too slow for recovery. By the time you restore a 50GB dump, your customers have already moved to a competitor. We need replication. MySQL 5.7 (the current stable standard) makes this much easier with Global Transaction Identifiers (GTIDs). No more messing with log file positions.
On the Primary (Master) Server my.cnf:
[mysqld]
server-id = 1
log_bin = /var/log/mysql/mysql-bin.log
binlog_format = ROW
gtid_mode = ON
enforce_gtid_consistency = ON
log_slave_updates = ON
On the DR (Slave) Server my.cnf:
[mysqld]
server-id = 2
relay_log = /var/log/mysql/mysql-relay-bin.log
gtid_mode = ON
enforce_gtid_consistency = ON
After restarting both services, create a replication user on the master and dump the data to the slave. Then, on the slave, simply run:
CHANGE MASTER TO
MASTER_HOST='10.0.0.2',
MASTER_USER='repl_user',
MASTER_PASSWORD='StrongPassword123!',
MASTER_AUTO_POSITION = 1;
START SLAVE;
Now you have a hot standby. If the primary node fails, you can promote the slave to master in seconds. With CoolVDS's private networking, this replication traffic doesn't eat into your public bandwidth quota and remains secure from the public internet.
Step 3: The "Oh Sh*t" Automated Recovery Script
When disaster strikes, your hands will be shaking. You do not want to be typing complex commands. You want a big red button. I use Ansible for this, but a well-tested Bash script works for smaller deployments.
Here is a snippet of a recovery script that switches DNS (using a theoretical API, or CloudFlare if you use them) and promotes the DB:
#!/bin/bash
# promote_dr.sh - USE WITH CAUTION
echo "Stopping Slave replication..."
mysql -e "STOP SLAVE; RESET SLAVE ALL;"
echo "Setting server to read-write..."
mysql -e "SET GLOBAL read_only = OFF;"
echo "Updating DNS records..."
# This assumes you have a CLI tool for your DNS provider
/usr/local/bin/update-dns --record www.example.com --ip 10.0.0.5
echo "DR Site is now LIVE. Check logs."
Why Infrastructure Choice Matters
You can script until your fingers bleed, but if the underlying hypervisor is unstable, you are building on sand. This is why I prefer KVM-based virtualization, which is standard on CoolVDS.
Unlike OpenVZ (common in cheaper hosting), KVM provides true kernel isolation. If a "noisy neighbor" on the host node kernel panics their OS, your instance keeps humming. For DR, I recommend provisioning a CoolVDS Storage Optimized instance. They offer higher disk density which is perfect for keeping archival logs and backup snapshots without breaking the bank.
The 2017 Checklist for Norwegian DR
| Component | Strategy | Tool |
|---|---|---|
| File System | Real-time Mirroring | Lsyncd + Rsync |
| Database | Hot Standby | MySQL 5.7 GTID Replication |
| Off-site Backup | Encrypted Archives | Duplicity (GPG encrypted) |
| Infrastructure | Isolated Kernel | CoolVDS KVM |
Final Thoughts
Disaster recovery is not a product you buy; it is a process you practice. Set a calendar reminder for next Tuesday. Pretend your primary server just vanished. Can you restore it? If the answer is "maybe," you have work to do.
Start by spinning up a secondary instance on CoolVDS today. The cost of a second VPS is negligible compared to the cost of telling your CEO that the data is gone forever.