Console Login

Crushing API Latency: Advanced Nginx & Kernel Tuning for High-Throughput Gateways

Crushing API Latency: Advanced Nginx & Kernel Tuning

If your API gateway adds more than 15ms to a request, you are bleeding users. In the high-frequency trading world or real-time bidding, that latency is a death sentence. For the rest of us building SaaS platforms in 2019, it's just sloppy engineering.

Most developers spin up a default Ubuntu 18.04 LTS instance, install Nginx via apt, and wonder why their throughput plateaus at 2,000 requests per second (RPS). The bottleneck isn't usually your application logic—it's the default Linux network stack and a web server configuration designed for compatibility, not speed.

I've spent the last month auditing infrastructure for a fintech client here in Oslo. They were hosting on a generic cloud provider, suffering from noisy neighbors and unpredictable I/O wait times. We moved them to dedicated KVM instances and tuned the hell out of the stack. The result? A 400% increase in throughput.

Here is the exact blueprint we used.

1. The Kernel is Your First Bottleneck

Linux defaults are conservative. They are designed to run on everything from a Raspberry Pi to a workstation. For a dedicated API gateway handling thousands of concurrent TCP connections, we need to tell the kernel to stop being polite.

Open your /etc/sysctl.conf. We are going to adjust the TCP stack to handle connection churn efficiently. When you have high traffic, you run out of ephemeral ports fast, and connections get stuck in the TIME_WAIT state.

Pro Tip: Before applying these, check your current values with sysctl -a. Documentation is your friend.

Optimized sysctl.conf

# Maximize the backlog for incoming connections
net.core.somaxconn = 65535
net.core.netdev_max_backlog = 65535

# Increase ephemeral port range
net.ipv4.ip_local_port_range = 1024 65535

# Allow reuse of sockets in TIME_WAIT state for new connections
net.ipv4.tcp_tw_reuse = 1

# Reduce the time a connection stays in FIN-WAIT-2
net.ipv4.tcp_fin_timeout = 15

# TCP Fast Open (TFO) - reduces handshake RTT
net.ipv4.tcp_fastopen = 3

# Increase TCP buffer sizes for 10Gbps+ networks (standard on CoolVDS)
net.ipv4.tcp_rmem = 4096 87380 67108864
net.ipv4.tcp_wmem = 4096 65536 67108864

Apply these changes with sysctl -p. The tcp_tw_reuse flag is critical. Without it, your API gateway will exhaust available sockets during traffic spikes, resulting in connection timeouts even if your CPU is idling.

2. Nginx: Beyond the Basics

Nginx is the industry standard for a reason. But strictly acting as a reverse proxy requires specific directives. The most common mistake I see is neglecting upstream keepalives.

By default, Nginx opens a new connection to your backend service (Node.js, Go, PHP-FPM) for every single incoming request. This involves a full TCP handshake for every call. It consumes file descriptors and CPU cycles unnecessarily.

The Keepalive Configuration

In your upstream block, you must explicitly enable keepalives.

upstream backend_api {
    server 10.0.0.5:8080;
    server 10.0.0.6:8080;
    
    # Keep 64 idle connections open to the backend
    keepalive 64;
}

Then, in your server location block, you must clear the connection header. If you don't, Nginx forwards the "Connection: close" header from the client to the backend, defeating the purpose.

location /api/ {
    proxy_pass http://backend_api;
    
    # Required for keepalive to work
    proxy_http_version 1.1;
    proxy_set_header Connection "";
    
    # Pass real IP (crucial for logs and GDPR compliance audits)
    proxy_set_header X-Real-IP $remote_addr;
    proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
}

3. TLS 1.3: The New Standard

It is 2019. If you aren't supporting TLS 1.3 yet, you are lagging behind. OpenSSL 1.1.1 was released late last year, and it brings TLS 1.3 support, which significantly reduces handshake latency by requiring one less round-trip (1-RTT).

Check your OpenSSL version:

openssl version

If you are on Ubuntu 18.04, ensure you are fully patched. Configure Nginx to prioritize the new protocol. This improves security and speed simultaneously—a rare win-win.

ssl_protocols TLSv1.2 TLSv1.3;
ssl_ciphers ECDHE-ECDSA-AES128-GCM-SHA256:ECDHE-RSA-AES128-GCM-SHA256:ECDHE-ECDSA-AES256-GCM-SHA384:ECDHE-RSA-AES256-GCM-SHA384:ECDHE-ECDSA-CHACHA20-POLY1305:ECDHE-RSA-CHACHA20-POLY1305:DHE-RSA-AES128-GCM-SHA256:DHE-RSA-AES256-GCM-SHA384;
ssl_prefer_server_ciphers off;

4. The Hardware Reality: Why Virtualization Matters

You can tune sysctl until your fingers bleed, but if your underlying disk I/O is slow, your database-backed API will crawl. This is where the distinction between "Cloud" and "Performance VPS" becomes clear.

Many providers oversell their storage throughput. In a shared environment, if your neighbor decides to run a massive backup or a heavy compile job, your API latency spikes. This is the "noisy neighbor" effect.

At CoolVDS, we specifically architected our Norwegian zones to mitigate this:

  • Pure NVMe Storage: We don't use spinning rust. NVMe offers significantly higher IOPS and lower latency than SATA SSDs.
  • KVM Virtualization: Unlike OpenVZ or LXC containers (which share a kernel), KVM provides true isolation. Your kernel tuning parameters stick.
  • Local Peering: Our Oslo datacenter peers directly at NIX (Norwegian Internet Exchange). If your customers are in Norway, their packets don't need to detour through Frankfurt or Stockholm.

For GDPR compliance—which Datatilsynet is strictly enforcing since May last year—keeping data physically within Norwegian borders (or EEA) on hardware you control is a significant advantage.

5. Validation

Don't trust my word. Measure it. Use wrk to benchmark your endpoint before and after these changes.

wrk -t12 -c400 -d30s https://your-api.com/endpoint

You should see a tighter distribution in latency percentiles and higher requests per second. Performance isn't magic; it's physics and configuration. When you pair a tuned Linux kernel with the raw I/O power of NVMe, the results speak for themselves.

Ready to stop waiting on I/O? Deploy a CoolVDS NVMe instance in Oslo today and see what your code is actually capable of.