Scaling Write-Heavy Workloads: A Realist's Guide to Database Sharding
There is a specific feeling of dread that settles in when you realize your primary database node has hit 95% CPU utilization and you have already upgraded to the largest instance type available. You are running on 128GB of RAM, 64 vCPUs, and fast storage, yet the iowait is climbing.
At this stage, most engineering teams panic. They start caching things that shouldn't be cached, or worse, they start deleting data. But if you are building a system intended to survive meaningful scale, you eventually have to stop scaling up (vertical) and start scaling out (horizontal).
Welcome to sharding. It is complex, it creates maintenance overhead, and if you implement it poorly, you will lose data. But for high-throughput applications targeting the Norwegian and wider European market, it is often the only path forward.
The Latency Trap: Why Infrastructure Matters Before Code
Before we touch a single line of configuration, we need to address the physical reality of sharding. When you split a monolithic database into ten shards, you are effectively turning local function calls into network calls. If your shards are hosted on oversold hardware with "noisy neighbors," your P99 latency will destroy your application performance.
Pro Tip: Never shard on standard HDD or hybrid storage. The random read/write patterns of a sharded architecture demand high IOPS. This is why we standardize on enterprise-grade NVMe storage at CoolVDS. If your disk latency isn't measured in microseconds, your database is already dead.
Furthermore, for Norwegian businesses, the location of these shards is not just a technical detail—it is a legal one. Under GDPR and the interpretation of Schrems II, keeping data residency within the EEA (and specifically Norway for sensitive government or health data) is critical. Hosting your user shards in Oslo (via local NIX peering) ensures you aren't just compliant, but you are also shaving milliseconds off every transaction compared to routing through Frankfurt or Amsterdam.
Strategy 1: Application-Level Sharding (The "Do It Yourself" Approach)
The simplest way to shard is to handle the routing logic inside your application code. You don't need complex middleware; you just need a deterministic way to decide which server holds the data.
The most common approach is Algorithmic Sharding (often using Consistent Hashing). Here is a stripped-down Python example of how you might route a user's write operation based on their User ID:
import zlib
# Configuration of your CoolVDS database instances
SHARDS = {
0: {'host': '10.0.0.1', 'name': 'shard_alpha'},
1: {'host': '10.0.0.2', 'name': 'shard_beta'},
2: {'host': '10.0.0.3', 'name': 'shard_gamma'},
3: {'host': '10.0.0.4', 'name': 'shard_delta'}
}
def get_shard_id(key):
"""
Returns the shard index based on the input key (e.g., user_id).
Uses CRC32 for deterministic hashing.
"""
# crc32 ensures the same key always goes to the same shard
checksum = zlib.crc32(str(key).encode('utf-8'))
return checksum % len(SHARDS)
def get_db_connection(user_id):
shard_id = get_shard_id(user_id)
shard_config = SHARDS[shard_id]
print(f"Routing User {user_id} to {shard_config['name']} ({shard_config['host']})")
# In production, return an actual SQLAlchemy engine or connection pool here
return shard_config
# Simulation
get_db_connection(user_id=4521)
get_db_connection(user_id=1092)
This approach is fast because it requires no lookup tables. However, the downside is resharding. If you add a fifth node, the modulo changes, and you have to migrate massive amounts of data. This is why we recommend starting with a high number of "logical" shards (e.g., 100) mapped to a smaller number of physical nodes.
Strategy 2: Directory-Based Sharding (For Compliance & Multi-Tenancy)
If you are a SaaS provider in Oslo serving clients across Europe, you might need to isolate specific customers to specific hardware. Directory-based sharding uses a lookup table service to find where data lives.
| Sharding Method | Pros | Cons |
|---|---|---|
| Algorithmic | Even data distribution, extremely fast routing. | Resharding is painful. difficult to implement tiered storage. |
| Directory/Lookup | Flexible. Can move specific tenants to high-performance NVMe nodes easily. | Lookup service becomes a Single Point of Failure (SPOF). Requires extra query. |
| Geographic | Low latency for users, high GDPR compliance. | Uneven load (Oslo might be busier than Bergen). |
The Configuration Reality: Tuning for Shards
Whether you use MySQL or PostgreSQL, the configuration on the individual nodes changes when you shard. You are no longer tuning for one massive buffer pool; you are tuning for high concurrency and network resilience.
On a dedicated CoolVDS instance running MySQL 8.0, you need to ensure your InnoDB settings are optimized for write throughput, as sharding is usually a response to write pressure.
[mysqld]
# /etc/my.cnf optimization for a write-heavy shard node
# 70-80% of available RAM on the VPS
innodb_buffer_pool_size = 8G
# Crucial for write performance.
# Set to 1 for ACID compliance (safer).
# Set to 2 if you can tolerate 1 second of data loss for massive speed gains.
innodb_flush_log_at_trx_commit = 1
# Separation of redo logs for NVMe performance
innodb_log_file_size = 2G
# Networking tuning for inter-shard communication
max_connections = 2000
thread_cache_size = 50
# Ensure UUIDs or Snowflakes don't cause page splitting fragmentation
innodb_fill_factor = 90
Middleware Solutions: Vitess and ProxySQL
For those who don't want to rewrite their application code, middleware is the standard 2023 solution. ProxySQL sits between your app and the database nodes. It can analyze the SQL query and route it to the correct shard automatically.
Here is a snippet of how you configure query rules in ProxySQL to split reads and writes, or route based on schema:
-- Route all SELECTs to the read-replica hostgroup (HG 20)
INSERT INTO mysql_query_rules (rule_id, active, match_digest, destination_hostgroup, apply)
VALUES (1, 1, '^SELECT.*FOR UPDATE$', 10, 1);
INSERT INTO mysql_query_rules (rule_id, active, match_digest, destination_hostgroup, apply)
VALUES (2, 1, '^SELECT', 20, 1);
-- Sharding by Schema Name (e.g., client_a, client_b)
INSERT INTO mysql_query_rules (rule_id, active, schemaname, destination_hostgroup, apply)
VALUES (10, 1, 'client_norway', 30, 1);
LOAD MYSQL QUERY RULES TO RUNTIME;
SAVE MYSQL QUERY RULES TO DISK;
The CoolVDS Advantage: Pure KVM and High IOPS
Virtualization overhead is the enemy of sharding. Many budget providers use container-based virtualization (like OpenVZ/LXC) where kernel resources are shared. If another customer on the host gets DDOSed, your database shards stall.
We don't play that game. CoolVDS uses KVM (Kernel-based Virtual Machine). This provides hardware-level virtualization. Your RAM is your RAM. Your CPU cycles are reserved. When you are coordinating distributed transactions across four shards, consistent latency is paramount. We also provide direct DDOS protection tuned for TCP floods, ensuring that an attack on your public interface doesn't saturate the internal network links your shards use to communicate.
Conclusion
Sharding is not a toggle you flip; it is an architectural commitment. It introduces network latency, consistency challenges, and operational complexity. But when your dataset exceeds 2TB or your write operations saturate a single NVMe drive, it is the standard for modern infrastructure.
Don't build a complex sharded architecture on shaky foundations. You need predictable I/O, strict data residency in Norway, and root access to tune your kernel parameters.
Ready to test your sharding logic? Spin up a high-frequency NVMe instance on CoolVDS in under 55 seconds and see the difference raw performance makes.