Crushing Latency: API Gateway Tuning for High-Throughput Norwegian Workloads

If your API response time hits 200ms, you have a problem. If it fluctuates between 50ms and 500ms, you have a crisis. In the high-frequency trading floors of Oslo or the real-time logistics hubs of the Nordics, variance (jitter) is the enemy of stability.

Most DevOps engineers spend weeks refactoring Go or Rust microservices, only to put them behind a default Nginx or HAProxy configuration that throttles traffic like a bottleneck on the E6 during rush hour. I recently audited a fintech setup in Bergen where the backend could handle 50k RPS, but the gateway choked at 12k. The culprit wasn't the code; it was the Linux kernel TCP stack and a noisy neighbor on a cheap VPS.

Let’s fix that. We are going to tune the Linux networking stack for 2022 standards, optimize Nginx for massive concurrency, and discuss why the underlying virtualization technology—specifically KVM—is non-negotiable.

1. The Hardware Reality Check: CPU Steal

Before touching a single config file, look at %st (steal time) in top. If you are hosting on a budget container-based platform (like OpenVZ or LXC), your API gateway is sharing kernel resources with twenty other tenants. If one of them decides to mine crypto or compile a heavy Java app, your SSL handshake times will spike.

This is why we architect CoolVDS around KVM (Kernel-based Virtual Machine). It provides hard hardware isolation. When we allocate 4 vCPUs to your gateway, they are yours. You can pin interrupts to specific cores without fighting the hypervisor. For an API gateway doing heavy TLS termination, raw compute consistency is better than "burstable" promises.

2. Kernel Tuning: Beyond Defaults

The default Linux kernel settings are designed for general-purpose desktop usage, not for handling 100,000 simultaneous TCP connections. To handle high throughput, we need to widen the TCP pipe.

Edit your /etc/sysctl.conf. These settings are aggressive but safe for a dedicated CoolVDS instance running Ubuntu 20.04 or 22.04 LTS.

# /etc/sysctl.conf

# Maximize the backlog of incoming connections
net.core.somaxconn = 65535
net.core.netdev_max_backlog = 65535

# Increase the range of ephemeral ports for outgoing upstream connections
net.ipv4.ip_local_port_range = 1024 65535

# Allow reusing sockets in TIME_WAIT state for new connections
net.ipv4.tcp_tw_reuse = 1

# BBR Congestion Control (Standard since Kernel 4.9)
# This drastically improves throughput on high-latency links
net.core.default_qdisc = fq
net.ipv4.tcp_congestion_control = bbr

# Increase TCP buffer sizes (16MB) for modern high-speed networks
net.ipv4.tcp_rmem = 4096 87380 16777216
net.ipv4.tcp_wmem = 4096 65536 16777216

# Protect against SYN floods without killing performance
net.ipv4.tcp_syncookies = 1
net.ipv4.tcp_syn_retries = 2

Apply these changes with sysctl -p. The BBR congestion control algorithm is particularly vital if your API serves clients across Europe; it handles packet loss much more gracefully than the old CUBIC algorithm.

3. Nginx / OpenResty Optimization

Whether you use raw Nginx, Kong, or APISIX, the underlying engine is likely Nginx. The default nginx.conf is often too conservative.

Worker Processes and File Descriptors

Nginx is event-driven. You need to ensure it can open enough file descriptors. A common error log I see is worker_connections are not enough.

First, raise the system-wide limits in /etc/security/limits.conf:

* soft nofile 1000000
* hard nofile 1000000

Then, configure Nginx to use them:

# nginx.conf

worker_processes auto; # Matches number of CPU cores
worker_rlimit_nofile 1000000; # Allow workers to open many files

events {
    worker_connections 20000;
    use epoll;
    multi_accept on;
}

http {
    # Disable access logs for high-traffic assets if you have metrics elsewhere
    access_log off;
    
    # Sendfile copies data between FDs within the kernel
    sendfile on;
    tcp_nopush on;
    tcp_nodelay on;
    
    # KEEPALIVE TO UPSTREAM IS CRITICAL
    upstream backend_api {
        server 10.0.0.5:8080;
        keepalive 64;
    }
}

Pro Tip: The keepalive directive in the upstream block is the most overlooked setting. Without it, Nginx opens and closes a new TCP connection to your backend service for every single request. This adds significant latency and exhausts ephemeral ports. Enabling keepalive reuses existing connections.

4. SSL/TLS Offloading Nuances

TLS termination is CPU intensive. In 2022, you should enforce TLS 1.3 for speed (0-RTT resumption) and security. However, processing large SSL buffers can block the worker.

Adjust the ssl_buffer_size. By default, it's 16k. For APIs sending small JSON payloads, this causes unnecessary buffering latency.

server {
    listen 443 ssl http2;
    
    ssl_protocols TLSv1.2 TLSv1.3;
    ssl_ciphers EECDH+AESGCM:EDH+AESGCM;
    
    # Reduce buffer size for lower Time To First Byte (TTFB)
    ssl_buffer_size 4k;
    
    # Session cache speeds up repeated handshakes
    ssl_session_cache shared:SSL:50m;
    ssl_session_timeout 1d;
    ssl_session_tickets off;
}

5. Local Data Sovereignty & Network Topology

Technical tuning is useless if your physics are wrong. If your customers are in Oslo and your server is in Frankfurt, you are adding ~20ms of round-trip time (RTT) that no amount of kernel tuning can remove. With the current legal landscape regarding Schrems II and GDPR, keeping data within Norwegian borders or at least strict EU jurisdictions is not just a performance play; it's a compliance requirement.

CoolVDS infrastructure is peered directly at major Nordic exchange points. This reduces hops. Fewer hops mean less jitter. When you combine our NVMe storage (which prevents I/O blocking on logs) with a properly tuned kernel, you create a gateway capable of handling spikes without sweating.

6. Benchmarking the Results

Don't guess. Measure. Use wrk to load test your tuned gateway. Here is a command to simulate 1000 concurrent users for 30 seconds:

wrk -t12 -c1000 -d30s --latency https://your-api.no/endpoint

Look at the latency distribution, specifically the 99th percentile. On a standard VPS, you might see p99 at 250ms. On a tuned CoolVDS instance, we aim for sub-50ms.

Summary Checklist for Deployment

Component	Action	Benefit
Kernel	Enable BBR, increase somaxconn	Better throughput, queue handling
Nginx	Upstream Keepalive	Reduces backend connection overhead
SSL	Reduce buffer to 4k	Lower TTFB for API responses
Hardware	CoolVDS KVM + NVMe	No noisy neighbors, fast I/O

Performance isn't magic; it's engineering. By controlling the entire stack—from the KVM slice up to the TCP window size—you ensure your application performs as well in production as it did on your laptop.

Ready to drop your latency? Don't let slow I/O kill your SEO. Deploy a test instance on CoolVDS in 55 seconds and see the difference raw power makes.

🍪 We Value Your Privacy

Privacy & Cookie Settings

Your Privacy Rights

Crushing Latency: API Gateway Tuning for High-Throughput Norwegian Workloads

Crushing Latency: API Gateway Tuning for High-Throughput Norwegian Workloads

1. The Hardware Reality Check: CPU Steal

2. Kernel Tuning: Beyond Defaults

3. Nginx / OpenResty Optimization

Worker Processes and File Descriptors

4. SSL/TLS Offloading Nuances

5. Local Data Sovereignty & Network Topology

6. Benchmarking the Results

Summary Checklist for Deployment

/// RELATED POSTS

API Gateway Tuning: Crushing Latency in High-Traffic Nordic Systems

Silence the Noise: Advanced APM Strategies for High-Throughput Norwegian Systems

Bun vs. Node.js in 2025: Why High-Performance Runtimes Die on Cheap VPS Hardware

Zero-Compromise API Gateway Tuning: Reducing Latency from Oslo to the Edge

Nordic Latency Killers: Advanced API Gateway Tuning for High-Throughput Systems

Zen 5 in the Datacenter: Why We Deployed AMD Ryzen 9000 Series for High-Performance VDS