The Storage Bottleneck is Dead. Long Live the Interrupt Storm.
For the last decade, we blamed the disk. Spindle drives were too slow. SATA SSDs were an improvement but capped by the interface. Even early NVMe implementations often saturated the bus before the controller broke a sweat.
That era ended when we racked our first PCIe 5.0 servers in Oslo this January.
With PCIe 5.0, we are looking at raw throughput limits pushing 14 GB/s per drive and random 4K read speeds exceeding 3 million IOPS. If you are running a standard VPS setup, your bottleneck is no longer the drive. It is your kernel's ability to handle interrupts. Most hosting providers slap a "High Performance" label on PCIe Gen4 hardware and call it a day. That is negligence.
Here is what happens when you put Gen5 storage under a real workload, and how to configure Linux to actually use it.
The raw math: Gen4 vs. Gen5
Doubling the bandwidth per lane (32 GT/s vs 16 GT/s) creates a paradigm shift for database architects. If you manage PostgreSQL clusters or high-traffic caching layers (Redis/Memcached) where persistence matters, look at the latency numbers.
| Spec | PCIe 4.0 NVMe | PCIe 5.0 NVMe (2025 Standard) |
|---|---|---|
| Seq Read | ~7,000 MB/s | ~14,000 MB/s |
| Seq Write | ~5,000 MB/s | ~12,000 MB/s |
| Random 4K Read | ~1M IOPS | ~3M+ IOPS |
The "CoolVDS" Reality Check: CPU Steal
We recently migrated a client's analytics engine to our new Oslo node. They were moving huge datasets for a fintech firm operating under strict Datatilsynet compliance. They expected faster queries. Instead, they saw CPU spikes.
Why? Soft IRQs.
When an NVMe drive pushes 3 million IOPS, the CPU has to process 3 million completion interrupts per second. If your VPS provider oversubscribes vCPUs (which 90% of them do), your "fast storage" waits in a queue for the CPU to wake up and acknowledge the data.
At CoolVDS, we pin vCPUs on high-performance plans. We do not gamble with steal time.
Benchmarking the Beast
Don't trust the marketing sticker. Verify your throughput. In 2025, the standard dd command is useless for measuring NVMe performance because it's single-threaded. You need fio.
Here is the exact job file we use to stress-test new drives before they enter our production pool:
[global]
ioengine=libaio
direct=1
# Use hugepages if available for better latency
# hugepage_size=2m
[random-read-4k-iops]
rw=randread
bs=4k
size=10G
numjobs=8
iodepth=128
group_reporting
runtime=60
time_based
filename=/data/testfile
Run this on your current setup. If your IOPS are under 100k, you aren't using NVMe; you're using a glorified SATA emulation.
Tuning Linux Kernel 6.8+ for Gen5
To handle 14GB/s, the stock sysctl settings on Ubuntu 24.04 or AlmaLinux 9 are insufficient. You need to widen the I/O path.
1. Asynchronous I/O Limits
Database engines like MySQL (InnoDB) and ScyllaDB rely heavily on asynchronous I/O. The default fs.aio-max-nr is often too low (65536).
# Check current limit
sysctl fs.aio-max-nr
# Increase significantly for Gen5 throughput
sudo sysctl -w fs.aio-max-nr=1048576
2. NVMe Interrupt Coalescing
You can trade a few microseconds of latency for a massive reduction in CPU load by adjusting interrupt coalescing. This tells the NVMe controller: "Wait until you have X completions or Y microseconds have passed before interrupting the CPU."
Using nvme-cli:
# Install nvme-cli if missing
sudo apt-get install nvme-cli
# Set coalescing: 50us or 8 completions
nvme set-feature /dev/nvme0n1 -f 8 -v 0x1F40008
Pro Tip: Be careful with coalescing on latency-sensitive apps like high-frequency trading (HFT). For HFT setups in our Oslo zone, we disable coalescing entirely to ensure immediate interrupt handling, provided the client has dedicated cores.
3. I/O Scheduler
For NVMe, the only acceptable scheduler is none or kyber. The mq-deadline scheduler can introduce overhead at these speeds.
# Check current scheduler
cat /sys/block/nvme0n1/queue/scheduler
# [none] mq-deadline kyber
# If it's not [none], change it via udev rule or immediate command:
echo none > /sys/block/nvme0n1/queue/scheduler
The Norwegian Context: Latency & Compliance
Hardware is global, but latency is local. If your target market is Scandinavia, hosting on a "fast" server in Frankfurt adds 15-20ms of round-trip time (RTT). In the world of PCIe 5.0, where disk latency is measured in microseconds, adding 20ms of network lag negates the hardware investment.
Furthermore, Norway is not in the EU, though we follow GDPR via the EEA agreement. Data sovereignty is becoming a massive headache for CTOs post-Schrems II. Hosting on CoolVDS infrastructure in Oslo ensures your data stays within the NIX (Norwegian Internet Exchange) ecosystem, leveraging green hydropower and strict privacy laws.
Code: Checking NUMA Alignment
On dual-socket servers (common for AMD EPYC Genoa builds), your NVMe drive connects to one specific CPU socket. If your VM runs on the other socket, memory access crosses the UPI/Infinity Fabric interconnect, killing performance.
Verify your alignment:
# Check which NUMA node the NVMe device belongs to
cat /sys/class/nvme/nvme0/device/numa_node
# Check which NUMA node your CPU cores are on
lscpu | grep NUMA
If you see a mismatch, pin your processes using taskset or numactl:
# Run MySQL pinned to NUMA node 0 (where the drive is)
numactl --cpunodebind=0 --membind=0 mysqld
Summary: Speed is a System, Not a Component
Buying PCIe 5.0 storage without optimizing the OS is like putting rocket fuel in a lawnmower. The engine will just overheat.
You need:
- Kernel 6.x with updated NVMe drivers.
- Dedicated CPU cores to handle the interrupt storm.
- Local proximity to reduce network latency.
We built the CoolVDS Gen5 platform to solve the "noisy neighbor" and interrupt bottleneck problems inherent in older virtualization stacks. We provide the raw IOPS, the NUMA-aware provisioning, and the low-latency connectivity to NIX.
Don't let slow I/O kill your SEO or your database locks. Deploy a Gen5 test instance in Oslo today.