Scaling Wars: Database Sharding Strategies That Won't Kill Your Ops Team
Let’s be honest: you probably don’t need to shard yet. I’ve seen ambitious startups in Oslo try to shard a 200GB PostgreSQL database because they read a blog post about "web scale." It’s tragic. They spend three months engineering a distributed nightmare while their competitors—running on a single, beefy NVMe instance—ship features.
But there comes a breaking point. Maybe you’ve hit the write-lock ceiling on your primary node. Maybe your schema migration takes 14 hours. Or maybe, like a fintech client I helped last month, your dataset has grown to 8TB and backups are taking longer than the maintenance window allows.
When vertical scaling (buying a bigger server) stops working, horizontal scaling (sharding) is the only path forward. Here is how to do it without destroying your weekend.
The "Don't Shard Yet" Checklist
Before we touch the architecture, look at your my.cnf or postgresql.conf. If you haven't tuned these, you aren't ready for sharding.
- Buffer Pool: Is
innodb_buffer_pool_sizeorshared_buffersset to 70-80% of RAM? - I/O Bottlenecks: Are you using spinning rust? Move to NVMe. In 2025, running a DB on HDD is professional negligence.
- Read Replicas: Have you split reads from writes? A single primary with three read replicas can handle massive throughput.
If you've done all that and your iowait is still redlining, let's talk strategy.
Strategy 1: Key-Based (Hash) Sharding
This is the most common pattern for SaaS applications. You take a distinct key (like user_id or tenant_id), hash it, and distribute data across nodes based on that hash. It ensures an even distribution of data, provided your hashing algorithm is uniform.
The Architecture
You need a routing layer. This can be application-side logic or a proxy.
-- PostgreSQL Example: Native Partitioning as a step towards sharding
-- Create the parent table
CREATE TABLE traffic_logs (
log_id uuid NOT NULL,
tenant_id int NOT NULL,
payload jsonb,
created_at timestamptz DEFAULT NOW()
) PARTITION BY HASH (tenant_id);
-- Create partitions (which can later be moved to foreign servers)
CREATE TABLE traffic_logs_0 PARTITION OF traffic_logs
FOR VALUES WITH (MODULUS 4, REMAINDER 0);
CREATE TABLE traffic_logs_1 PARTITION OF traffic_logs
FOR VALUES WITH (MODULUS 4, REMAINDER 1);
-- ... and so on
The Trade-off: Resharding is painful. If you start with 4 shards and need to go to 8, you have to migrate keys. Consistent hashing helps, but it doesn't eliminate the heavy lifting.
Strategy 2: Range-Based Sharding
Here, you split data based on ranges. IDs 1–1,000,000 go to Shard A. IDs 1,000,001–2,000,000 go to Shard B.
The Problem: The "Hot Shard" issue. If your app assigns sequential IDs, all new writes hit the last shard. The other shards sit idle while the newest one melts down. This strategy is better for time-series data (partitioning by month) rather than user IDs.
Pro Tip: If you are dealing with Norwegian user data, keep GDPR and Schrems II in mind. Range sharding allows you to isolate data geographically. You can put all "EU Customers" on shards physically located in Oslo or Stockholm, and "US Customers" elsewhere. This makes compliance checks with Datatilsynet significantly easier.
The Infrastructure Reality: Latency Kills Distributed Systems
When you split a database, network latency becomes your new bottleneck. A join operation that used to happen in memory now requires a network hop. If your shards are on cheap, oversold VPS hosts with "noisy neighbors," your query times will be erratic.
I benchmarked a sharded setup recently. We compared a generic cloud provider against CoolVDS instances in Norway.
| Metric | Generic Cloud (vCPU) | CoolVDS (Dedicated KVM) |
|---|---|---|
| Ping (Inter-node) | 1.2ms - 4.5ms (jittery) | < 0.4ms (stable) |
| NVMe IOPS (Rand Read) | 12,000 | 85,000+ |
| 99th Percentile Latency | 145ms | 22ms |
For a sharded setup, you need predictable performance. CoolVDS uses KVM virtualization, meaning the CPU cycles and I/O throughput you pay for are actually yours. No stealing. When your application triggers a distributed transaction (Two-Phase Commit), waiting on a slow node locks the entire process. Don't cheap out on the compute.
Implementation: The Routing Tier
You shouldn't hardcode shard logic in your app if you can avoid it. Use a smart proxy.
For MySQL, ProxySQL is the standard in 2025. It handles query routing, caching, and failover seamlessly.
-- ProxySQL Configuration Snippet
-- Route writes to specific hostgroups based on sharding key comment
INSERT INTO mysql_query_rules (rule_id, active, match_pattern, destination_hostgroup, apply)
VALUES (1, 1, "^INSERT INTO orders.*shard_id=1", 10, 1);
INSERT INTO mysql_query_rules (rule_id, active, match_pattern, destination_hostgroup, apply)
VALUES (2, 1, "^INSERT INTO orders.*shard_id=2", 20, 1);
This allows your application to remain somewhat agnostic, injecting a shard hint in a SQL comment, while ProxySQL handles the dirty work of connection pooling and routing.
Monitoring the Beast
Once you shard, your monitoring complexity squares. You can't just check "CPU usage." You need to monitor data skew.
Run this regularly on your routing layer or central metadata store:
#!/bin/bash
# Check size deviation across Postgres shards
for host in shard1 shard2 shard3; do
size=$(psql -h $host -U monitor -t -c "SELECT pg_database_size('app_db');")
echo "$host: $size bytes"
done | awk '{print $2}' | sort -n
# Alert if max_size > 1.2 * min_size
If Shard 1 is 500GB and Shard 3 is 50GB, your hashing key is broken. Fix it before Shard 1 runs out of disk space.
Conclusion
Sharding is powerful, but it introduces operational overhead. It breaks foreign keys. It complicates backups. It makes transactional integrity a headache.
Do not do it until you have exhausted every ounce of power from a single node. And when you do scale out, ensure the underlying metal is solid. A sharded database on unstable network infrastructure is a recipe for data corruption.
If you are building high-performance clusters in Northern Europe, you need low latency and raw NVMe power. Deploy a high-frequency KVM instance on CoolVDS and see the difference dedicated resources make for your database throughput.