Console Login

Zero-Downtime Database Migration: A Survival Guide for Norwegian DevOps

Zero-Downtime Database Migration: A Survival Guide for Norwegian DevOps

There is no sound louder than the silence of a production database that refuses to start after a migration. I have been there. It is 3:00 AM, the coffee is cold, and the systemctl status postgresql cursor is blinking mocking red text at you. Migrations are the open-heart surgery of systems administration. One wrong move with a PID file or a permission bit, and your e-commerce platform bleeds revenue.

If you are reading this, you are likely planning to move terabytes of critical data. Maybe you are fleeing a legacy dedicated server, or perhaps you are repatriating data to Norway to satisfy Datatilsynet’s strict interpretation of GDPR and Schrems II. Whatever the reason, the goal is binary: zero data loss, near-zero downtime.

This guide is not theoretical. It covers the specific, battle-tested strategies for migrating high-load databases (PostgreSQL 17 and MySQL 8.4 LTS contexts) within the Nordic infrastructure landscape. We will focus on the "Replicate and Switch" method, which remains the gold standard in late 2025.

The Architecture of Trust: Why Latency Kills Migrations

Before we touch a single config file, we must address the physical layer. Database replication relies heavily on network throughput and latency. If you are migrating a 500GB dataset from a server in Frankfurt to a VPS in Oslo, the speed of light matters. High latency during the synchronization phase leads to replication lag. If the lag exceeds the rate of incoming writes, you will never catch up. You will be stuck in a permanent state of "almost synced."

Pro Tip: Always test the network path before starting. Use mtr (My Traceroute) to check for packet loss between your source and destination. If you see jitter >10ms within Norway, abort. Stability is a prerequisite, not a luxury.

This is where infrastructure choice dictates success. At CoolVDS, we engineer our network stack specifically for the Nordic region, peering directly at NIX (Norwegian Internet Exchange). When your destination server sits on high-performance NVMe storage with optimized local peering, the replication stream writes to disk as fast as the network delivers it. Spinning rust (HDDs) simply cannot handle the random write IOPS required during the catch-up phase of a live migration.

Phase 1: The Setup (PostgreSQL Focus)

Let's assume a standard scenario: Migrating a PostgreSQL 16 cluster to a fresh CoolVDS instance running PostgreSQL 17. We will use logical replication, which allows us to migrate between major versions with minimal downtime.

1. Prepare the Source (The "Old" Server)

First, ensure your source database allows replication connections. You need to modify pg_hba.conf. Do not open this to 0.0.0.0/0. Be precise.

# /etc/postgresql/16/main/pg_hba.conf host replication all 192.168.1.50/32 scram-sha-256

Next, check your `postgresql.conf`. You need wal_level set to logical. If this wasn't already set, you will need a restart. This is the only downtime required on the source until the final switch.

wal_level = logical max_replication_slots = 10 max_wal_senders = 10

2. The Destination Environment

On your CoolVDS instance, infrastructure tuning is critical before you install the DB software. We want to maximize disk I/O.

Here is a snippet to optimize the kernel for high-throughput database workloads. This goes into /etc/sysctl.conf:

# Optimize for Database Workloads (NVMe targets)
vm.swappiness = 1
vm.dirty_background_ratio = 5
vm.dirty_ratio = 15

# Increase network buffer sizes for faster replication
net.core.rmem_max = 16777216
net.core.wmem_max = 16777216
net.ipv4.tcp_rmem = 4096 87380 16777216
net.ipv4.tcp_wmem = 4096 65536 16777216

Apply with sysctl -p. These settings prevent the OS from aggressively swapping database pages and allow larger TCP windows for the replication stream.

Phase 2: The Replication Stream

We avoid `pg_dump` for the main migration because it freezes the state of the DB at the start of the dump. For a 1TB database, that restore takes hours. During those hours, your production app is either down or drifting. Instead, we use Logical Replication.

Step 1: Create Publication on Source

CREATE PUBLICATION my_migration_pub FOR ALL TABLES;

Step 2: Schema Migration

Logical replication does not copy DDL (Data Definition Language). You must dump the schema only and restore it on the destination.

pg_dump -h source_ip -U postgres --schema-only --no-publications my_database | psql -h dest_ip -U postgres my_database

Step 3: Create Subscription on Destination (CoolVDS)

This command starts the data flow. The destination will pull all existing data and then stay in sync with new changes.

CREATE SUBSCRIPTION my_migration_sub
CONNECTION 'host=source_ip port=5432 user=replicator password=SecretPassword dbname=my_database'
PUBLICATION my_migration_pub
WITH (copy_data = true);

Watch the logs. You should see the initial sync start. On a CoolVDS NVMe instance, you will notice the I/O wait is negligible compared to standard cloud SSDs. This is where hardware quality shines.

Phase 3: Connection Pooling and the Switchover

The scariest part of migration is changing the application configuration to point to the new IP. DNS propagation is too slow. VIPs (Virtual IPs) are great but require complex routing. The pragmatic solution for 2025? PgBouncer.

By routing your application traffic through a local PgBouncer instance, you can "pause" traffic, switch the backend configuration, and resume. The application just sees a momentary latency spike, not a connection error.

Here is a robust pgbouncer.ini configuration for the switchover phase:

[databases]
# Initially points to Source
my_db = host=10.0.0.1 port=5432 dbname=my_database

[pgbouncer]
listen_port = 6432
listen_addr = *
auth_type = md5
auth_file = /etc/pgbouncer/userlist.txt
pool_mode = transaction
server_reset_query = DISCARD ALL
max_client_conn = 1000
default_pool_size = 20

The Switchover Procedure

  1. Pause the Pool: PAUSE my_db; (in PgBouncer console). Traffic queues up.
  2. Verify Sync: Check pg_stat_replication on source. LSN difference should be zero.
  3. Stop Source: Prevent any accidental writes.
  4. Promote Destination: Update sequences if necessary (logical replication doesn't always sync sequence values perfectly—check setval).
  5. Reconfigure PgBouncer: Edit the config to point to the new CoolVDS IP.
  6. Reload and Resume: RELOAD; RESUME my_db;

Total write downtime? Often less than 5 seconds.

Security and Compliance (The Norwegian Context)

In 2025, you cannot ignore the legal layer. If you are handling Norwegian citizen data, storing it on US-owned cloud infrastructure presents a compliance risk due to the CLOUD Act. Moving to a sovereign Norwegian provider like CoolVDS mitigates this.

However, migration opens a temporary vulnerability window. Ensure your replication traffic is encrypted. If you aren't using a VPN or private VPC peering, you must enforce SSL.

# On Destination (Subscription) CONNECTION 'host=... sslmode=verify-full sslrootcert=/etc/ssl/certs/ca.crt ...'

We see too many "temporary" migrations where security is disabled "just to get it working." That is how data leaks happen. Configure your firewall (UFW or nftables) to accept replication traffic only from the specific source IP.

# Simple UFW rule for the pragmatic admin
ufw allow from 192.0.2.15 to any port 5432 proto tcp comment 'Allow Source DB Replication'

Troubleshooting: When Things Go Wrong

Even with perfect planning, ghosts exist in the machine. A common issue during the initial copy phase is the "Snapshot too old" error on the source if the database is under heavy write load. This happens because the replication slot cannot hold back the WAL segments long enough.

Solution: You might need to temporarily increase max_slot_wal_keep_size in PostgreSQL 13+ (still valid in v17).

ALTER SYSTEM SET max_slot_wal_keep_size = '100GB'; SELECT pg_reload_conf();

Another issue is CPU Steal. If you are migrating to a noisy multi-tenant cloud, your CPU cycles might get stolen by a neighbor mining crypto. This slows down the decryption of SSL traffic during replication. CoolVDS uses strict KVM isolation to prevent this. We guarantee dedicated CPU cycles so your hashing throughput remains consistent.

Final Thoughts

Database migration is 20% technology and 80% preparation. The tools available in 2025 make it easier than ever, but they do not remove the risk. They just shift it. By choosing a provider that offers raw, unadulterated NVMe I/O and low-latency peering within Norway, you remove the hardware bottleneck from the equation.

Don't let slow I/O or network jitter compromise your data integrity. If you are planning a move, verify your target environment first.

Ready to test your replication throughput? Deploy a high-performance instance on CoolVDS in under 55 seconds and ping oslo.ix to see the difference.