Database Sharding is a Nightmare: Strategies to Wake Up Without Data Loss
I remember the first time I decided to shard a production database. It was 2018, running a SaaS platform that had just hit 10TB of user data. Queries were taking 4 seconds. The CTO was breathing down my neck. We thought splitting the database would solve everything. Instead, we introduced race conditions, cross-shard join impossibilities, and a maintenance overhead that cost us three senior engineers.
Sharding is not a feature you enable. It is an architectural commitment that breaks your application logic if done poorly. But when you hit the ceiling of what a single node can handle—even with the massive NVMe throughput we see on modern virtualization—it becomes necessary.
Here is how to approach sharding without destroying your data integrity, specifically tailored for high-throughput environments targeting the Nordic market.
The "Do You Really Need This?" Check
Before we touch sharding keys, look at your infrastructure. Most bottlenecks I diagnose in Oslo data centers aren't capacity issues; they are I/O latency issues. Using standard SATA SSDs for a high-transaction database is professional suicide.
Pro Tip: Before refactoring your code for sharding, check youriostat. If%iowaitis high, simply moving to a CoolVDS instance with local NVMe storage and dedicated CPU cores often solves the problem instantly. Vertical scaling is cheaper than developer time.
However, if you are pushing 50k+ writes per second or hitting the 16TB limit of common file systems, let's talk strategy.
Strategy 1: Application-Level Sharding (The "Manual" Way)
This is the oldest trick in the book. Your application knows which customer it is serving, so it directs the query to the correct database server.
The Concept
You define a Sharding Key (usually user_id or company_id). You use a modulus operator or a lookup table to determine the destination.
Pros: Full control. No extra middleware latency.
Cons: You must rewrite every single SQL query in your app. Cross-shard joins are impossible.
Here is a Python logic example of how this routing looks in the backend:
class ShardRouter:
def __init__(self):
# Define connections to your CoolVDS instances
self.shards = {
0: "db_node_oslo_01",
1: "db_node_oslo_02",
2: "db_node_oslo_03",
3: "db_node_oslo_04"
}
def get_db_connection(self, user_id):
# Simple Modulo Sharding
shard_id = user_id % 4
host = self.shards[shard_id]
print(f"Routing User {user_id} to Shard {shard_id} ({host})")
return connect_to_host(host)
# Usage
router = ShardRouter()
# User 105 routes to Shard 1
conn = router.get_db_connection(105)
cursor = conn.cursor()
cursor.execute("SELECT * FROM orders WHERE user_id = 105")
This works until one shard gets hot. If db_node_oslo_02 hosts your biggest client, that node will melt while others sit idle. You then need a Virtual Bucket approach, but that's a topic for another day.
Strategy 2: Middleware Sharding (ProxySQL / Vitess)
If you don't want to rewrite your application code, you put a smart proxy in front of your database. For MySQL, ProxySQL is the gold standard.
The application connects to ProxySQL as if it were a single MySQL server. ProxySQL parses the SQL, checks the routing rules, and forwards the packet to the correct backend node.
Here is how you configure proxysql.cnf to split writes and reads, or shard based on schema:
-- Define your backend CoolVDS instances
INSERT INTO mysql_servers (hostgroup_id, hostname, port) VALUES
(10, '10.0.0.5', 3306), -- Shard 1
(20, '10.0.0.6', 3306); -- Shard 2
-- Define sharding rules based on username or schema
INSERT INTO mysql_query_rules (rule_id, active, username, destination_hostgroup, apply)
VALUES (1, 1, 'shard_user_1', 10, 1);
INSERT INTO mysql_query_rules (rule_id, active, username, destination_hostgroup, apply)
VALUES (2, 1, 'shard_user_2', 20, 1);
-- Load to runtime
LOAD MYSQL SERVERS TO RUNTIME;
SAVE MYSQL SERVERS TO DISK;
LOAD MYSQL QUERY RULES TO RUNTIME;
SAVE MYSQL QUERY RULES TO DISK;
Using middleware introduces a tiny bit of latency. However, if your servers are in the same data center (like our Oslo facility), the latency is negligible (often <0.5ms). ping -c 4 10.0.0.5 between our nodes usually returns 0.2ms.
Strategy 3: Native Partitioning (PostgreSQL)
If you are on PostgreSQL 14/15/16, you have declarative partitioning. This isn't technically "sharding" across multiple servers out of the box (unless you use Citus), but it solves the "table too big" problem by splitting data into smaller chunks on disk.
This improves cache locality. The database engine only scans the relevant partition.
-- Create the parent table
CREATE TABLE sensor_data (
sensor_id INT NOT NULL,
recorded_at TIMESTAMP NOT NULL,
temperature DECIMAL(5,2)
) PARTITION BY RANGE (recorded_at);
-- Create partitions for Q1 and Q2 2024
CREATE TABLE sensor_data_2024_q1 PARTITION OF sensor_data
FOR VALUES FROM ('2024-01-01') TO ('2024-04-01');
CREATE TABLE sensor_data_2024_q2 PARTITION OF sensor_data
FOR VALUES FROM ('2024-04-01') TO ('2024-07-01');
-- Verify where data goes
EXPLAIN SELECT * FROM sensor_data WHERE recorded_at = '2024-02-15';
The EXPLAIN output will show that Postgres skips the Q2 table entirely. This is crucial for performance on time-series data.
The Nordic Context: Latency and GDPR
When you shard, you increase network chatter. If Shard A is in Frankfurt and Shard B is in Oslo, your JOIN equivalent (done in app code) will be agonizingly slow.
Latency Matters: For a Norwegian user base, serving from Oslo is non-negotiable. Round-trip time (RTT) from Oslo to Amsterdam can be 15-20ms. Inside Oslo (NIX), it's roughly 1-2ms.
Compliance (Schrems II): Splitting data across borders adds a compliance headache. Datatilsynet (The Norwegian Data Protection Authority) is strict. If you shard user data, keep all shards within the same jurisdiction unless you have a very expensive lawyer.
Performance Tuning the Shards
Just because you shard doesn't mean you ignore node config. Each shard on CoolVDS should be tuned as a dedicated entity.
Check your MySQL config /etc/my.cnf:
[mysqld]
# Ensure buffer pool is 70-80% of RAM
innodb_buffer_pool_size = 8G
# Crucial for heavy write loads on NVMe
innodb_io_capacity = 2000
innodb_io_capacity_max = 4000
# Prevent double buffering with OS
innodb_flush_method = O_DIRECT
Don't forget to restart the service: systemctl restart mysqld.
When to choose CoolVDS?
We built our platform specifically for this "Battle-Hardened" scenario. We don't oversubscribe CPU cores aggressively like the budget providers. When you are calculating a sharding hash or rebalancing chunks, you need raw integer performance.
- Low Latency Network: Our internal network allows high-speed communication between shards without hitting the public internet.
- NVMe Standard: Database shards are I/O bound. We only use enterprise NVMe.
- Scalability: You can resize a shard from 4GB RAM to 64GB RAM in moments, delaying the need to re-shard.
Sharding is a necessary evil for massive scale. But for 95% of businesses, a well-tuned, vertically scaled database on high-performance infrastructure is the smarter, more stable choice.
Is your database choking on I/O? Stop optimizing bad queries on slow disks. Spin up a CoolVDS High-Performance instance in Oslo and watch your load average drop.