Console Login

Zero Downtime: Architecting Synchronous MySQL Replication with Galera Cluster

The 3 AM Wake-Up Call You Can Avoid

If you have been in operations long enough, you know the sound of a pager going off when a primary database master dies. It is never 2 PM on a Tuesday. It is always 3 AM on a Saturday. In the traditional master-slave setups we have relied on for years, failover is a terrifying manual dance. You have to promote a slave, update application config files, and pray you did not lose the last few seconds of transactions during the crash.

It is 2014. We can do better than shell scripts and prayer. While asynchronous replication (standard MySQL) is fine for offloading reads, it does not guarantee data consistency. For true High Availability (HA), we need synchronous replication.

Today, I am walking you through a battle-tested setup using MariaDB 10.0 with Galera Cluster. This is not theoretical; this is the exact architecture we recently deployed for a high-traffic e-commerce client in Oslo who could not afford a single second of downtime during the upcoming holiday sales.

Why Galera? (And Why Network Latency Matters)

Galera is a multi-master solution. This means you can read and write to any node in the cluster. If one node fails, the others keep working without data loss. However, there is a catch: Physics.

In a synchronous setup, a transaction is not committed until all nodes agree. If you have a node in Oslo and another in a cheap datacenter with jittery routing in Amsterdam, your write speed drops to the speed of that slow link. The cluster waits for the slowest node.

Pro Tip: For a Galera cluster, network stability is more critical than raw CPU. We host our reference clusters on CoolVDS instances because their peering at NIX (Norwegian Internet Exchange) ensures single-digit millisecond latency between nodes, which is absolutely mandatory for synchronous replication. Do not try this over public internet links with high packet loss.

Step 1: The Environment

We will use three nodes to avoid split-brain scenarios (quorum requires an odd number). We are using CentOS 6.5, which is standard for enterprise deployments right now.

  • Node 1: 10.0.0.1 (db1)
  • Node 2: 10.0.0.2 (db2)
  • Node 3: 10.0.0.3 (db3)

Step 2: Repository and Installation

First, stop using the default repositories. They are outdated. We need the official MariaDB 10.0 repo.

[mariadb] name = MariaDB baseurl = http://yum.mariadb.org/10.0/centos6-amd64 gpgkey=https://yum.mariadb.org/RPM-GPG-KEY-MariaDB gpgcheck=1

Install the cluster packages on all three nodes:

yum install MariaDB-Galera-server MariaDB-client gallery-4 -y

Step 3: Configuring the Write-Set Replication

This is where the magic happens. We need to edit /etc/my.cnf.d/server.cnf. The default configuration is not tuned for performance. Specifically, check your innodb_buffer_pool_size. On a standard 4GB CoolVDS instance, allocate about 2GB to the pool.

Here is the critical WSREP (Write Set Replication) configuration for Node 1:

[galera] # Mandatory settings wsrep_on=ON wsrep_provider=/usr/lib64/galera/libgalera_smm.so wsrep_cluster_address="gcomm://10.0.0.1,10.0.0.2,10.0.0.3" binlog_format=row default_storage_engine=InnoDB innodb_autoinc_lock_mode=2 # Cluster Name wsrep_cluster_name="coolvds_norway_cluster" # Node specific (Change this for Node 2 and 3) wsrep_node_address="10.0.0.1" wsrep_node_name="db1" # Tuning for 1Gbps Links wsrep_slave_threads=4 wsrep_sst_method=rsync

Step 4: Bootstrapping the Cluster

The most common mistake I see is trying to start all nodes at once. You must bootstrap the first node to create the new cluster UUID.

On Node 1 only:

/etc/init.d/mysql bootstrap

Once Node 1 is running, you can start Node 2 and Node 3 normally:

/etc/init.d/mysql start

Check the status to ensure they have joined:

mysql -u root -p -e "SHOW STATUS LIKE 'wsrep_cluster_size'" +--------------------+-------+ | Variable_name | Value | +--------------------+-------+ | wsrep_cluster_size | 3 | +--------------------+-------+

If you see '3', congratulations. You have a multi-master cluster.

Step 5: Load Balancing with HAProxy

Even though you can write to any node, it is often best practice to write to one and read from many, or use a smart load balancer to handle failed nodes. Application drivers are often terrible at failover.

We use HAProxy 1.5 (released recently stable) to manage this.

listen mysql-cluster 0.0.0.0:3306 mode tcp balance roundrobin option tcpka option mysql-check user haproxy_check server db1 10.0.0.1:3306 check weight 1 server db2 10.0.0.2:3306 check weight 1 server db3 10.0.0.3:3306 check weight 1

The Storage Bottleneck

Replication generates significant disk I/O. Every write has to be written to the binary log and the storage engine. If you are running this on legacy spinning rust (HDDs), your "wait" times will skyrocket, causing the cluster to throttle itself (Flow Control) to let the slow node catch up.

This is why we standardized on Pure SSD storage for all VPS tiers. In our benchmarks, SSD-backed instances handled 400% more transactions per second (TPS) in a Galera cluster compared to traditional SAS SANs often found in budget hosting.

Data Sovereignty and Compliance

Working in Norway, we have to respect the Personopplysningsloven (Personal Data Act). Data processing agreements are getting stricter, and relying on US-based cloud giants creates legal gray areas regarding Safe Harbor. Hosting your database cluster on Norwegian soil, within the jurisdiction of Datatilsynet, removes that headache entirely.

Final Thoughts

Building a cluster is easy; keeping it healthy requires robust infrastructure. A split-brain scenario caused by a flapping network interface will ruin your day. Do not cheap out on the underlying network.

Ready to test this architecture? Deploy a 3-node SSD VPS environment on CoolVDS in under 60 seconds and start building a database layer that actually stays up.