Console Login

Database Sharding in 2019: Survival Strategies for High-Traffic Norwegian Workloads

Database Sharding in 2019: Survival Strategies for High-Traffic Norwegian Workloads

There is a specific feeling of dread that every senior sysadmin knows. It usually happens around 20:00 on a peak traffic day. Your primary database server—the one you already upgraded to 64GB RAM and the fastest SSDs available—is sitting at load average 40.0. The iowait is climbing. Queries that usually take 50ms are taking 5 seconds.

You have hit the vertical scaling wall.

In the Nordic hosting market, where we pride ourselves on stability and speed, throwing more hardware at a monolithic database eventually stops working. The next logical step is sharding (horizontal partitioning). But let's be honest: sharding is a nightmare of complexity. It breaks referential integrity, complicates backups, and makes your application logic significantly heavier. You should only do it when you have no other choice.

This guide covers how to shard without destroying your data consistency, specifically focusing on the tools available to us right now in early 2019.

The Architecture of Failure: Why Vertical Scaling Dies

Before we cut the database into pieces, you must understand why your current setup is failing. It's usually I/O contention. Even with the NVMe storage we standardise on at CoolVDS, a single kernel can only schedule so many disk operations and context switches before locking occurs.

When you shard, you split your data across multiple instances (shards). Each shard holds a subset of the data. This allows you to scale writes linearly. If one CoolVDS KVM instance can handle 5,000 write IOPS, two can handle 10,000.

Strategy 1: Application-Level Sharding

This is the "brute force" method where your application code decides which database node to talk to. It is the most flexible but requires significant code refactoring.

The Routing Logic

You need a Shard Key. For a SaaS platform serving Norwegian businesses, the customer_id or organization_id is a natural choice. This keeps all data for one customer on the same physical node, preserving local joins.

Here is a basic implementation of a consistent hashing router in Python (compatible with Python 3.6+), which you might run in your middleware layer:

import hashlib

class ShardRouter:
    def __init__(self, shards):
        # shards = ['db-shard-01', 'db-shard-02', 'db-shard-03']
        self.shards = shards

    def get_shard(self, routing_key):
        # Create a deterministic hash of the key
        hash_value = int(hashlib.md5(str(routing_key).encode('utf-8')).hexdigest(), 16)
        
        # Modulo operation determines the index
        shard_index = hash_value % len(self.shards)
        return self.shards[shard_index]

# Usage
router = ShardRouter(['10.0.0.101', '10.0.0.102', '10.0.0.103'])
customer_shard = router.get_shard(customer_id=41992)
print(f"Connect to {customer_shard} for this transaction")

The Risk: If you add a new shard node later, the modulo changes, and your mapping breaks. You need to migrate data. That is why consistent hashing (using a ring topology) is preferred over simple modulo.

Strategy 2: Native Partitioning (PostgreSQL 11)

If you are running PostgreSQL, version 11 (released late 2018) made declarative partitioning much more usable. It allows the database engine to handle the routing transparently. This is often cleaner than application-side logic.

You can define a parent table and partition it by hash. This setup works beautifully on our high-performance VPS instances because you can mount different partitions on different storage volumes if needed.

-- Parent table
CREATE TABLE traffic_logs (
    log_id uuid NOT NULL,
    server_id int NOT NULL,
    log_data jsonb,
    created_at timestamp
) PARTITION BY HASH (server_id);

-- Create partitions (Shards)
CREATE TABLE traffic_logs_0 PARTITION OF traffic_logs 
    FOR VALUES WITH (MODULUS 3, REMAINDER 0);

CREATE TABLE traffic_logs_1 PARTITION OF traffic_logs 
    FOR VALUES WITH (MODULUS 3, REMAINDER 1);

CREATE TABLE traffic_logs_2 PARTITION OF traffic_logs 
    FOR VALUES WITH (MODULUS 3, REMAINDER 2);
Pro Tip: Postgres partitioning improves read performance significantly because the query planner knows which partitions to ignore (Partition Pruning). Make sure enable_partition_pruning = on is set in your postgresql.conf.

Infrastructure Matters: The "Noisy Neighbor" Problem

Sharding multiplies your infrastructure footprint. Instead of one big server, you now manage five or ten medium-sized ones. This is where the underlying virtualization technology becomes critical.

If you host your shards on cheap, oversold hosting, you will suffer from "CPU Steal" and fluctuating I/O latency. A database shard is sensitive to latency. If Shard A responds in 2ms but Shard B takes 200ms because a neighbor is mining crypto, your application hangs.

This is why we strictly use KVM (Kernel-based Virtual Machine) at CoolVDS. Unlike OpenVZ or LXC containers, KVM provides hard resource isolation. Your RAM is dedicated. Your CPU cycles are reserved. For a sharded database cluster, consistent latency is more important than raw burst speed.

Optimizing MySQL 8.0 for Shards

When deploying multiple MySQL shards on NVMe storage, the default configuration is rarely sufficient. You need to ensure the InnoDB engine is flushing to disk efficiently without blocking threads. Here is a battle-tested configuration snippet for a 16GB RAM shard instance:

[mysqld]
# Basic settings
max_connections = 500
default_authentication_plugin = mysql_native_password

# InnoDB Optimization for NVMe
innodb_buffer_pool_size = 12G
innodb_log_file_size = 2G
innodb_flush_method = O_DIRECT
innodb_io_capacity = 2000
innodb_io_capacity_max = 4000

# Durability vs Performance trade-off (Careful here!)
# Set to 1 for full ACID compliance (Slower)
# Set to 2 for slight risk but higher speed
innodb_flush_log_at_trx_commit = 1

# Thread handling
innodb_read_io_threads = 8
innodb_write_io_threads = 8

Legal & Network Latency: The Norwegian Context

In 2018, GDPR changed how we architect systems. If you are sharding data for Norwegian users, you must know where those shards physically reside. Storing personal data (PII) on a shard in a US data center while the rest is in Europe can trigger compliance headaches under the current privacy shield frameworks.

Furthermore, latency impacts consensus. If you are using a Galera Cluster or a raft-based consensus, the round-trip time (RTT) between nodes defines your write speed.

Route Approximate Latency Impact on Sharding
Oslo to Oslo (Local LAN) < 1ms Ideal for synchronous replication.
Oslo to Amsterdam ~18ms Acceptable for async replicas.
Oslo to New York ~90ms Do not use for synchronous writes.

At CoolVDS, our data center is connected directly to the NIX (Norwegian Internet Exchange). This ensures that traffic between your application servers in Oslo and your database shards stays within the country, minimizing hops and latency spikes.

The DevOps Workflow for Sharding

Implementing this isn't just about config files; it's about operations. You cannot backup a sharded cluster with a simple mysqldump. You need a consistent snapshot across all nodes, or you risk broken foreign keys logically.

  1. Provisioning: Use Terraform or Ansible. Do not manually configure shards. Configuration drift will kill you.
  2. Monitoring: You need aggregated metrics. Watching 10 different htop windows is impossible. We recommend Prometheus + Grafana (which is becoming the standard in 2019) to aggregate metrics like mysql_global_status_threads_running across all shards.
  3. Failover: If Shard 3 dies, what happens? Your application must handle the connection timeout gracefully and perhaps degrade functionality for users on that shard, rather than crashing the whole site.

Conclusion

Sharding is a necessary evil for successful platforms. It introduces operational overhead that requires discipline and robust hardware. You cannot run a high-performance sharded database on budget, oversold containers. You need the predictability of KVM and the speed of NVMe.

If you are planning to split your database, start with a solid foundation. Low latency, data sovereignty in Norway, and raw IO performance are non-negotiable.

Ready to benchmark your sharding logic? Deploy a high-performance KVM instance on CoolVDS in less than 55 seconds and see the difference dedicated resources make.