Stop Blaming the Code: Your API Gateway is Choking on Physics
It’s 2 AM. PagerDuty just screamed at you because your 99th percentile latency hit 4 seconds. Your backend microservices are humming along fine, yet the load balancer is throwing 502 Bad Gateway errors like it's going out of style. I’ve been there. In a recent deployment for a Norwegian fintech scaling for Vipps integration, we hit a wall. The code was optimized, but the gateway collapsed under the "thundering herd" of concurrent connections.
The problem usually isn't your application logic. It's the plumbing. Specifically, the Linux kernel defaults and the hypervisor jitter beneath your feet. In 2020, with traffic volumes exploding and the legal landscape shifting under Schrems II, you cannot afford a default configuration.
Let's fix your infrastructure.
1. The OS Layer: Tuning the Kernel for Concurrency
Most Linux distributions, including the Ubuntu 20.04 LTS images you're likely pulling, ship with conservative defaults intended for desktop usage or light web serving. When your API gateway needs to handle 10,000 concurrent connections, these defaults are fatal. The first bottleneck is almost always file descriptors.
Everything in Linux is a file. An open TCP connection is a file. If you hit the limit, your kernel silently drops SYN packets. Here is the baseline sysctl.conf tuning we deploy on every CoolVDS instance acting as an edge node:
# /etc/sysctl.conf
# Increase system-wide file descriptor limit
fs.file-max = 2097152
# Widen the port range for outgoing connections
net.ipv4.ip_local_port_range = 10000 65000
# Reuse sockets in TIME_WAIT state for new connections
net.ipv4.tcp_tw_reuse = 1
# Increase the backlog queue for incoming connections
net.core.somaxconn = 65535
net.ipv4.tcp_max_syn_backlog = 4096
# Boost TCP buffer sizes for high-bandwidth links
net.core.rmem_max = 16777216
net.core.wmem_max = 16777216
After applying this with sysctl -p, you must also ensure your user limits (ulimits) allow the NGINX or HAProxy user to utilize these descriptors. Check /etc/security/limits.conf:
* soft nofile 1048576
* hard nofile 1048576
Pro Tip: Never rely on the default ephemeral port range if you are proxying to upstream servers. You will run out of ports (ephemeral port exhaustion) long before you run out of RAM. Widening the range gives you breathing room during traffic spikes.
2. NGINX Configuration: The Keepalive Trap
Whether you are using raw NGINX, Kong, or OpenResty, the mistake is the same: failing to maintain keepalive connections to your upstream backends. By default, NGINX acts as a polite HTTP/1.0 client to your backend, closing the connection after every request. This forces a new TCP handshake (and potentially a TLS handshake) for every single API call.
This adds milliseconds of latency that you cannot afford. Here is how you configure the upstream block correctly:
upstream backend_microservices {
server 10.0.0.5:8080;
server 10.0.0.6:8080;
# Keep at least 64 idle connections open to the backend
keepalive 64;
}
server {
location /api/ {
proxy_pass http://backend_microservices;
# Required to enable keepalive
proxy_http_version 1.1;
proxy_set_header Connection "";
}
}
Without the empty Connection header, NGINX will forward the client's "close" header, defeating the purpose. This change alone dropped our internal latency from 15ms to 2ms in the Oslo data center.
3. The Hardware Reality: Why "Cloud" Often Fails
You can tune your software perfectly, but if your Virtual Private Server (VPS) is fighting for CPU cycles, your API latency will jitter. This is the "Noisy Neighbor" effect. In cheap container-based hosting (like OpenVZ or LXC), resources are often oversold. When a neighbor compiles a kernel, your API gateway stalls.
For an API Gateway, consistency is more important than raw burst speed. We benchmarked standard shared hosting against dedicated KVM instances.
| Metric | Standard Container VPS | CoolVDS KVM (NVMe) |
|---|---|---|
| Avg Latency | 45ms | 12ms |
| 99th % Latency | 350ms (Jitter) | 28ms (Stable) |
| Disk I/O (Write) | 120 MB/s | 1.2 GB/s |
At CoolVDS, we utilize KVM virtualization exclusively. This ensures that the CPU cores assigned to your gateway are effectively yours. Furthermore, API gateways generate massive logs. If your disk I/O blocks while writing access logs, the request hangs. Our local NVMe storage eliminates that I/O wait state.
4. Benchmarking the Truth
Don't take my word for it. Install wrk and test it yourself. If you are targeting a Norwegian audience, run this test from a node inside the NIX (Norwegian Internet Exchange) network to minimize network hops.
# Run a 30-second test with 12 threads and 400 connections
wrk -t12 -c400 -d30s --latency https://your-api-endpoint.com/health
If you see a "Socket errors: connect" counter, your somaxconn or file-max is too low. If your transfer rate is high but latency is erratic, your CPU is being stolen by a noisy neighbor.
5. The Legal Latency: Schrems II and Data Sovereignty
Performance isn't just about speed; it's about availability. In July 2020, the CJEU's Schrems II ruling invalidated the Privacy Shield framework. If you are hosting your API Gateway on US-owned hyperscalers, you are now navigating a legal minefield regarding the transfer of personal data (IP addresses are personal data under GDPR).
Hosting in Norway, outside the direct jurisdiction of the US Cloud Act, is becoming a technical requirement for compliance-focused CTOs. CoolVDS infrastructure is owned and operated here in Europe. Combining legal safety with the low latency of local peering at NIX is the pragmatic choice for 2021 planning.
Final Thoughts
Building a high-performance API gateway is an exercise in removing bottlenecks. First, open the kernel gates. Second, stop closing backend connections. Third, ensure your underlying hardware isn't lying to you about resources.
Don't let I/O wait kill your SLA. Deploy a KVM-based, NVMe-powered instance on CoolVDS today and see what real dedicated resources feel like.