Console Login

The Latency API: Tuning NGINX & Kong for High-Throughput Norwegian Workloads

The Latency API: Tuning NGINX & Kong for High-Throughput Norwegian Workloads

I recently audited a fintech setup in Oslo that was bleeding customers. Their backend logic was solid—Go microservices responding in sub-5ms—but their mobile app felt sluggish. The culprit? An untuned API Gateway adding a staggering 150ms overhead per request. In a world where Datatilsynet (The Norwegian Data Protection Authority) demands strict compliance and users demand instant feedback, that kind of latency is a business killer.

Most DevOps engineers slap a default NGINX or Kong container on a server and call it a day. They assume the defaults are "good enough." They aren't. Default configurations are designed for compatibility, not high-performance throughput on modern NVMe-backed infrastructure. If you are running serious workloads, you need to get your hands dirty with kernel parameters and gateway directives.

This isn't a theoretical overview. This is the exact tuning checklist I use when deploying critical infrastructure on CoolVDS for clients who cannot afford downtime.

1. The Foundation: Kernel-Level Tuning

Before touching the application layer, you must prep the OS. Linux defaults are often conservative, dating back to when 512MB RAM was a luxury. On a modern CoolVDS instance with dedicated cores, we can push these limits significantly.

The first bottleneck you will hit is the file descriptor limit. Every incoming TCP connection is a file. If your gateway hits the ceiling, it starts dropping packets.

ulimit -n 65535

That is the temporary fix. For permanence, you edit /etc/security/limits.conf. But the real magic happens in sysctl.conf. We need to optimize the TCP stack to handle a flood of short-lived connections, typical in API traffic.

Here is the baseline sysctl configuration I deploy on Ubuntu 24.04 LTS nodes:

# /etc/sysctl.conf optimization for API Gateways

# Increase system-wide file descriptor limit
fs.file-max = 2097152

# Widen the port range to allow more concurrent connections
net.ipv4.ip_local_port_range = 1024 65535

# Enable fast recycling of TIME_WAIT sockets (Safe for most internal setups)
net.ipv4.tcp_tw_reuse = 1

# Increase the maximum number of connections in the backlog
net.core.somaxconn = 65535
net.ipv4.tcp_max_syn_backlog = 65535

# Increase buffer sizes for high-speed networks (Crucial for 10Gbps+ links)
net.core.rmem_max = 16777216
net.core.wmem_max = 16777216
net.ipv4.tcp_rmem = 4096 87380 16777216
net.ipv4.tcp_wmem = 4096 65536 16777216
Pro Tip: Apply these changes with sysctl -p. If you are on a standard shared VPS, your provider might block write access to `net.core`. This is why we use KVM at CoolVDS—you get a real kernel, not a containerized slice where you have to beg for permissions.

2. NGINX & Kong Configuration

Whether you are using raw NGINX or Kong (which is built on OpenResty/NGINX), the worker configuration is paramount. The goal is to pin workers to CPU cores to prevent context switching, which destroys cache locality.

In your nginx.conf, avoid worker_processes auto; if you are on a noisy host. However, on a dedicated resource plan like CoolVDS, auto works well because the cores are actually yours. The bigger win comes from worker_rlimit_nofile and the event model.

worker_processes auto;
worker_rlimit_nofile 65535;

events {
    multi_accept on;
    use epoll;
    worker_connections 16384;
}

This tells NGINX to accept as many connections as possible immediately (`multi_accept`) and use the efficient Linux `epoll` method.

Upstream Keepalive: The Silent Performance Killer

This is where 90% of setups fail. By default, NGINX opens a new connection to your upstream service (your actual API) for every single request. This adds a full TCP handshake (and potentially SSL handshake) to every call. It sends latency through the roof.

You must configure an upstream block with `keepalive`. This keeps a pool of idle connections open to your backend services.

upstream backend_microservice {
    server 10.0.0.5:8080;
    # Keep 64 idle connections open to this upstream
    keepalive 64;
}

server {
    location /api/v1/ {
        proxy_pass http://backend_microservice;
        
        # Required for keepalive to work
        proxy_http_version 1.1;
        proxy_set_header Connection "";
    } 
}

Without those header clears, NGINX forwards the client's "Connection: close" header, destroying the keepalive tunnel you just tried to build.

3. SSL/TLS Offloading with Hardware Acceleration

In 2025, TLS 1.3 is mandatory. If you are handling traffic from Norwegian ISPs like Telenor or Altibox, you want 0-RTT (Zero Round Trip Time) resumption enabled. This allows returning visitors to send data immediately without a full handshake.

However, crypto is expensive. If you are serving thousands of requests per second, OpenSSL can eat 30-40% of your CPU. Check if your CPU supports AES-NI (Intel/AMD) or equivalent instructions.

grep -o aes /proc/cpuinfo

If that returns output, you are in luck. Ensure your NGINX SSL directives are modernized:

ssl_protocols TLSv1.2 TLSv1.3;
ssl_ciphers EECDH+AESGCM:EDH+AESGCM;
ssl_prefer_server_ciphers on;
ssl_session_cache shared:SSL:50m;
ssl_session_timeout 1d;
ssl_session_tickets off;

# OCSP Stapling (Vital for speed and privacy)
ssl_stapling on;
ssl_stapling_verify on;
resolver 1.1.1.1 8.8.8.8 valid=300s;
resolver_timeout 5s;

4. Observability vs. Performance

You cannot tune what you cannot measure. But be careful—logging to disk is an I/O heavy operation. If your gateway writes a verbose access log for 10,000 req/s to a standard SATA drive, your iowait will skyrocket, blocking the CPU from processing network packets.

This is where infrastructure choice becomes an architectural decision. We standardized on NVMe storage for all CoolVDS instances because rotating rust (HDDs) simply cannot handle the random write patterns of high-volume access logs. Even with NVMe, I recommend buffering your logs:

access_log /var/log/nginx/access.log combined buffer=32k flush=5s;

This holds logs in RAM and flushes them in chunks, drastically reducing I/O syscalls.

5. The CoolVDS Factor: Why Infrastructure Matters

You can apply every optimization in this article, but if your hypervisor is oversubscribing CPU, your p99 latency will still be garbage. CPU Steal (%) is the metric to watch. If you see this above 0% on your dashboard, your neighbors are stealing your processing power.

FeatureStandard VPSCoolVDS Architecture
VirtualizationContainer (LXC/OpenVZ) - Shared KernelKVM - Dedicated Kernel
Storage I/OShared SATA/SSD (Noisy Neighbors)Dedicated NVMe Lanes
CPU SchedulingOversubscribedDedicated Cores Available
Norwegian LatencyVariable (often routed via Frankfurt)Direct peering options

We built our platform for the "Pragmatic CTO" and the "Performance Obsessive." When you are dealing with GDPR data that must stay within the EEA, and you need low latency to the NIX exchange, you need a host that respects the physics of hardware.

Final Thoughts

Performance tuning is an iterative process. Start with the kernel, move to the gateway config, and always keep an eye on your underlying infrastructure metrics. High latency is rarely a single bug; it is usually death by a thousand misconfigured defaults.

Ready to see what your API is actually capable of? Don't let slow I/O kill your benchmarks. Deploy a test instance on CoolVDS in 55 seconds and run your load tests on real NVMe hardware.