The 3:00 AM Kernel Panic
If you have been in this industry long enough, you know the sound. It is not a siren; it is the vibrating hum of a phone on a nightstand. It is 3:14 AM. Your primary database node just went dark. SSH times out. The monitoring dashboard is a sea of red.
After the massive Dyn DNS DDoS attack in October, the illusion of internet stability has shattered. We are entering 2017 with a new reality: redundancy is not an upsell; it is survival.
Too many sysadmins in Oslo think a nightly tar.gz dumped to an FTP server is a Disaster Recovery (DR) plan. It is not. That is an archive. A DR plan is measured in RTO (Recovery Time Objective) and RPO (Recovery Point Objective). If it takes you six hours to provision a new VPS, install the LEMP stack, and unzip that archive, you have lost a day's revenue.
This guide is for the operators who need to get back online in minutes, not hours. We will look at building a warm standby using standard tools available in Ubuntu 16.04 LTS.
The Compliance Minefield: Datatilsynet & The Coming GDPR
Before we touch the config files, we must address the legal architecture. With the General Data Protection Regulation (GDPR) looming on the horizon for 2018 enforcement, and the current Norwegian Personal Data Act (Personopplysningsloven) in full effect, where you put your failover matters.
Using a US-based cloud for your backup might seem cheap, but with the death of Safe Harbor and the shakiness of the Privacy Shield framework, you are exposing your company to legal liability. Data sovereignty is critical.
Architect's Note: Keep your primary and failover nodes within Norwegian borders or the EEA. Hosting with CoolVDS ensures your data sits in Oslo, physically secured and legally compliant with Norwegian law. Low latency to NIX (Norwegian Internet Exchange) is just a bonus.
Step 1: The Infrastructure (KVM vs. OpenVZ)
For DR, virtualization type matters. Many budget hosts use OpenVZ. In a disaster scenario, you might need to load kernel modules for specific file systems or firewall rules (IPtables/NFTables). Shared kernels prevent this.
We only use KVM at CoolVDS. You need a dedicated kernel. You need the ability to mount an ISO if the bootloader corrupts. Do not gamble your recovery on a container that depends on the host node's kernel version.
Step 2: Database Replication (MySQL 5.7)
Files are easy; databases are hard. We will set up a Master-Slave replication. This gives you a "Warm Standby." If Master dies, you promote Slave.
On the Primary (Master) - /etc/mysql/mysql.conf.d/mysqld.cnf:
[mysqld]
bind-address = 0.0.0.0
server-id = 1
log_bin = /var/log/mysql/mysql-bin.log
binlog_format = ROW
# Durability settings for 2016 hardware
innodb_flush_log_at_trx_commit = 1
sync_binlog = 1Restart MySQL and create a replication user:
mysql> CREATE USER 'replicator'@'10.0.0.%' IDENTIFIED BY 'StrongPass_2016!';
mysql> GRANT REPLICATION SLAVE ON *.* TO 'replicator'@'10.0.0.%';
mysql> FLUSH PRIVILEGES;On the Failover (Slave) - CoolVDS Instance B:
[mysqld]
bind-address = 0.0.0.0
server-id = 2
log_bin = /var/log/mysql/mysql-bin.log
relay-log = /var/log/mysql/mysql-relay-bin.log
read_only = 1The read_only = 1 flag is your safety net. It prevents your application from accidentally writing to the backup node while the master is alive.
Step 3: Asset Synchronization
Database replication handles the data, but what about user uploads? rsync is still the king here. Don't overcomplicate it with distributed file systems like GlusterFS unless you have a team to manage the split-brain scenarios.
Run this via cron every 5 minutes on the Master:
#!/bin/bash
# Simple DR Sync Script
SRC="/var/www/html/uploads/"
DEST="user@failover_ip:/var/www/html/uploads/"
rsync -avz --delete -e "ssh -i /root/.ssh/id_rsa_dr" $SRC $DESTPro Tip: Network latency kills sync speed. The round-trip time (RTT) between two CoolVDS instances in our Oslo datacenter is negligible (<1ms). If you are syncing to a server in Frankfurt, that latency adds up on thousands of small PHP or image files.
The NVMe Factor
Here is the bottleneck nobody talks about: Restore IOPS.
When you trigger a failover, your caches are cold. Your innodb_buffer_pool is empty. The server has to pull data from the disk for every request until the RAM warms up. On standard spinning HDD VPS hosting, your site will be up, but it will be unresponsive due to I/O wait.
This is why we standardized on NVMe storage for all CoolVDS plans. NVMe handles the random read/write spikes of a cold boot 5x-10x faster than standard SSDs via SATA. In a disaster, speed isn't a luxury; it's the difference between a 30-second blip and a 10-minute outage.
The Failover Procedure
Automated failover is dangerous. It leads to "flapping" where both servers think they are the master, leading to data corruption. In 2016, manual promotion is still the safest bet for small-to-mid-sized teams.
To Promote the Slave:
- Log into the Slave CoolVDS instance.
- Stop Slave:
mysql> STOP SLAVE; - Disable Read-Only:
mysql> SET GLOBAL read_only = OFF; - Update your DNS or Load Balancer (HAProxy/Nginx) to point to the new IP.
Comparison: CoolVDS vs. Legacy Hosting
| Feature | Legacy VPS | CoolVDS Architecture |
|---|---|---|
| Virtualization | OpenVZ (Shared Kernel) | KVM (Full Isolation) |
| Storage | SATA SSD / HDD | NVMe (PCIe) |
| Location | Often Central Europe | Oslo, Norway (Low Latency) |
| DDoS Protection | Null-route (Offline) | Advanced Mitigation |
Summary
Disaster recovery isn't about buying a second server; it's about engineering reliability. You need valid backups, replication, and the underlying hardware performance to handle the load when the worst happens.
Do not wait for the next botnet to test your infrastructure. Build your redundancy on a platform that respects the physics of I/O and the laws of Norway.
Ready to harden your stack? Deploy a KVM NVMe instance on CoolVDS today and configure your warm standby before the next 3:00 AM call.