API Gateway Tuning: Squeezing Microseconds Out of NGINX and Kong in 2024
If your API takes 200ms to respond, your users are already looking at a competitor. In the high-frequency trading floors of Oslo or the rapid-fire e-commerce checkouts across Scandinavia, milliseconds aren't just time—they are revenue. Most DevOps engineers slap a default NGINX or Kong container in front of their microservices and call it a day. Then they wonder why they hit a latency wall at 500 requests per second (RPS).
The problem isn't usually your code. It's the glue holding it together. Default Linux kernel settings and stock gateway configurations are designed for general-purpose compatibility, not high-throughput performance. I've seen robust Go binaries choked by a gateway that couldn't handle the TCP handshake overhead.
This guide cuts through the noise. We are going to tune the Linux networking stack, optimize upstream connections, and look at why hardware proximity to NIX (Norwegian Internet Exchange) matters more than your CDN cache policy.
The Hidden Killer: TCP Handshakes and Ephemeral Ports
I recall debugging a payment gateway implementation for a Norwegian fintech client last month. Their logs were clean, but 502 Bad Gateway errors were spiking during lunch hours. The culprit? Ephemeral port exhaustion. Their API gateway was opening a new TCP connection for every single request to the backend microservices, flooding the connection tracking table.
To fix this, you need to treat your gateway servers like high-performance routers, not web servers.
1. Kernel Level Tuning
Before touching the application layer, we must prep the OS. On a standard Ubuntu 24.04 LTS install (which runs beautifully on CoolVDS), the defaults are too conservative.
Edit your /etc/sysctl.conf:
# Allow reusing sockets in TIME_WAIT state for new connections
net.ipv4.tcp_tw_reuse = 1
# Increase the range of ephemeral ports
net.ipv4.ip_local_port_range = 1024 65535
# Maximize the backlog of pending connections
net.core.somaxconn = 65535
net.ipv4.tcp_max_syn_backlog = 4096
# Increase TCP buffer sizes for 10Gbps+ links (standard on high-end VPS)
net.core.rmem_max = 16777216
net.core.wmem_max = 16777216
Apply these changes with sysctl -p. This ensures your CoolVDS instance doesn't drop packets just because the queue is full.
NGINX / Kong Configuration: Stop Closing Connections
The single biggest mistake in API Gateway configuration is failing to reuse connections to the upstream (backend) services. SSL handshakes are expensive. Establishing a TCP connection takes time. Do not do it for every request.
Here is the reference configuration for NGINX (and by extension, the underlying template for Kong) to enable keepalives:
http {
upstream backend_api {
server 10.0.0.5:8080;
server 10.0.0.6:8080;
# CRITICAL: Keep 100 idle connections open per worker
keepalive 100;
}
server {
location /api/ {
proxy_pass http://backend_api;
# HTTP 1.1 is required for keepalive
proxy_http_version 1.1;
# Clear the Connection header to prevent closing
proxy_set_header Connection "";
# Buffer tuning for JSON payloads
proxy_buffers 16 16k;
proxy_buffer_size 32k;
}
}
}
Pro Tip: If you are using Kong Gateway 3.6, check your `upstreams` configuration object. Ensure `keepalive_pool_size` is set to at least 256 for high-traffic endpoints. Defaulting this to 0 (off) forces a handshake per request, which destroys CPU efficiency.
The Hardware Reality: NVMe and CPU Steal
Software tuning only gets you so far. If your underlying storage IOPS are capped or your hypervisor is over-committing CPU, your 99th percentile (p99) latency will be erratic. In the virtualized world, "Noisy Neighbor" syndrome is the enemy of consistent API performance.
We specifically engineered CoolVDS infrastructure to mitigate this. By utilizing local NVMe storage passed through via KVM, rather than network-attached block storage which adds latency, we reduce I/O wait times drastically. When your API gateway logs access data or buffers a large payload to disk, that write operation needs to happen instantly.
| Metric | Standard HDD VPS | SATA SSD VPS | CoolVDS NVMe |
|---|---|---|---|
| Random Read IOPS | ~300 | ~5,000 | ~50,000+ |
| Write Latency | 10-20ms | 1-3ms | < 0.1ms |
| Throughput Limit | 100 MB/s | 500 MB/s | 2000+ MB/s |
Geography: The Oslo Advantage
Light travels fast, but not infinitely fast. Round-trip time (RTT) from Oslo to Frankfurt is roughly 20-25ms. From Oslo to a local data center in Norway? Less than 2ms.
For a standard REST API call involving a pre-flight OPTIONS request and the actual GET/POST, a 20ms network latency penalty hits you four times (TCP handshake, TLS handshake, Request, Response). That is 80ms of dead air before your code even executes.
Hosting your API Gateway physically close to your users—or at least close to the NIX peering points—is the easiest performance win available. Furthermore, keeping data within Norwegian borders simplifies GDPR compliance and adheres to Datatilsynet recommendations regarding data sovereignty, a concern that has only grown since the Schrems II ruling.
Monitoring the Wins
You cannot improve what you do not measure. Use wrk to benchmark your endpoints before and after applying these changes. Do not run the benchmark from the same machine; use a separate CoolVDS instance to simulate real network traffic.
# Install wrk
sudo apt-get update && sudo apt-get install wrk
# Run a 30-second test with 12 threads and 400 connections
wrk -t12 -c400 -d30s --latency https://api.yourdomain.no/v1/status
Look specifically at the Latency Distribution. An average response time of 50ms is useless if your max is 2 seconds. The kernel and NGINX tuning we discussed above targets that tail latency, smoothing out the spikes caused by connection overhead.
Conclusion
Building a high-performance API gateway is an exercise in removing friction. You remove friction at the kernel level by opening up file descriptors and ports. You remove friction at the application level by keeping connections alive. And you remove friction at the infrastructure level by choosing high-frequency compute and local NVMe storage.
Don't let your infrastructure be the bottleneck. Deploy a high-performance, NVMe-backed instance on CoolVDS today and see what your code can actually do when the brakes are taken off.