Console Login

Kubernetes Networking Deep Dive: Optimizing Packet Flow for Nordic Compliance and Speed

Kubernetes Networking Deep Dive: Optimizing Packet Flow for Nordic Compliance and Speed

I still remember the Tuesday morning my pager went off at 03:42. A fintech client in Oslo was reporting intermittent 502 errors during a batch processing window. Their dashboard was green, their pods were running, but the network was silently dropping packets. The culprit? A default conntrack table limit in the kernel that nobody touched because "Kubernetes manages that, right?" Wrong.

Kubernetes networking is not magic. It is a complex abstraction layer built on top of ancient Linux primitives—iptables, IPVS, and namespaces. If you don't understand the underlying plumbing, you are building a skyscraper on a swamp. In 2024, with the adoption of eBPF and the Gateway API maturing, the landscape has shifted, but the physics of latency remain the same.

This guide cuts through the marketing noise. We are going to look at how to architect K8s networking for high-performance workloads, specifically within the context of the Norwegian infrastructure market where data sovereignty (thanks, Datatilsynet) and latency to NIX (Norwegian Internet Exchange) matter.

1. The CNI Battlefield: Why We Moved to eBPF

For years, Flannel and Calico were the defaults. Flannel is simple (VXLAN) but lacks policy features. Calico is robust but traditionally relied heavily on iptables, which becomes a bottleneck when you have thousands of services. Every packet traversing a service IP has to run the gauntlet of a linear list of iptables rules. It’s O(n) complexity. It kills CPU.

In our recent deployments, we have standardized on Cilium. Why? Because it uses eBPF (Extended Berkeley Packet Filter) to bypass much of the kernel's network stack overhead. It handles load balancing directly, often removing the need for kube-proxy entirely.

Pro Tip: If you are running a cluster on CoolVDS, enable the "host-routing" mode in Cilium. Since our VDS instances provide dedicated KVM resources without the noisy neighbor effect of shared kernels, you get near-metal performance.

Here is a production-ready `values.yaml` snippet for deploying Cilium via Helm, explicitly optimizing for low latency:

cluster:
  name: coolvds-oslo-01
  id: 1

kubeProxyReplacement: "strict"

# Enable eBPF host routing for maximum performance
tunnel: "disabled"
autoDirectNodeRoutes: true
ipv4:
  enabled: true
  nativeRoutingCIDR: "10.0.0.0/16"

loadBalancer:
  mode: "dsr" # Direct Server Return saves bandwidth
  acceleration: "native"

hubble:
  relay:
    enabled: true
  ui:
    enabled: true

This configuration uses Direct Server Return (DSR). The request goes to the load balancer, but the response goes directly from the pod to the client, bypassing the load balancer on the return trip. This cuts latency significantly.

2. The Hidden Bottleneck: etcd and Disk I/O

You might ask, "What does disk I/O have to do with networking?" Everything. Kubernetes networking state (Services, Endpoints, NetworkPolicies) is stored in etcd. If etcd is slow, network updates propagate slowly. I've seen clusters take 45 seconds to update an Ingress rule because the underlying storage was choking.

To verify if your storage is killing your network convergence, run this:

etcdctl check perf

If you aren't seeing fsync times below 10ms, you are in trouble. This is why we insist on NVMe storage for all CoolVDS instances. Spinning rust or standard SATA SSDs over a shared network (ceph without optimization) simply cannot handle the write-heavy load of a busy K8s control plane.

3. Tuning the Node Kernel

Out of the box, most Linux distributions are tuned for general-purpose computing, not high-throughput packet forwarding. You need to manipulate `sysctl`. This is where managed Kubernetes services often handcuff you—they won't let you touch the node kernel configuration. On a CoolVDS instance, you have root. Use it.

Check your current connection tracking limit:

sysctl net.netfilter.nf_conntrack_max

For a high-traffic ingress node, the default (often 65536) is laughable. Here is the `sysctl.conf` block we deploy for high-performance nodes handling web traffic:

# Increase connection tracking table size
net.netfilter.nf_conntrack_max = 524288
net.netfilter.nf_conntrack_tcp_timeout_established = 86400

# Enable TCP BBR congestion control for better throughput over WAN
net.core.default_qdisc = fq
net.ipv4.tcp_congestion_control = bbr

# Widen the port range for outbound connections
net.ipv4.ip_local_port_range = 1024 65535

# Allow reusing sockets in TIME_WAIT state for new connections
net.ipv4.tcp_tw_reuse = 1

# Increase backlog for incoming packets
net.core.netdev_max_backlog = 5000
net.ipv4.tcp_max_syn_backlog = 5000

Apply this with `sysctl -p`. The TCP BBR algorithm is particularly crucial if your users are connecting from mobile networks in rural Norway, where signal quality can fluctuate. It handles packet loss much more gracefully than CUBIC.

4. Ingress: NGINX vs. The World

While Gateway API is the future, the NGINX Ingress Controller is still the workhorse for 90% of production workloads in 2024. However, the default config is conservative. We frequently see "upstream timed out" errors that are actually just poorly tuned buffers.

You need to inject this configuration into your NGINX ConfigMap to handle high concurrency:

apiVersion: v1
kind: ConfigMap
metadata:
  name: nginx-configuration
  namespace: ingress-nginx
data:
  worker-processes: "auto" 
  worker-connections: "10240"
  keep-alive: "60"
  upstream-keepalive-connections: "100"
  compute-full-forwarded-for: "true"
  use-forwarded-headers: "true"
  client-body-buffer-size: "64k"
  proxy-buffer-size: "16k"

Setting `upstream-keepalive-connections` is critical. Without it, NGINX opens a new TCP connection to your backend pods for every single request. That adds significant overhead.

5. The Nordic Context: Latency and Sovereignty

If your target audience is in Norway, hosting in Frankfurt or Amsterdam adds 20-30ms of round-trip time. That doesn't sound like much, but for a microservices architecture where a single user click triggers 50 internal API calls, that latency compounds.

Verify your latency to the Norwegian Internet Exchange (NIX):

mtr -rwc 10 nix.no

Furthermore, GDPR and Schrems II compliance dictates strict control over data flow. By running your K8s cluster on CoolVDS infrastructure located physically in Norway, you simplify your compliance posture significantly. You know exactly where the physical drive sits.

Summary: The Hardware Matters

You can tune software until you are blue in the face, but you cannot software-patch slow hardware. Kubernetes requires low-latency disk I/O for etcd and high single-thread performance for the API server.

Checklist for your next deployment:

  • Replace `kube-proxy` with Cilium (eBPF).
  • Verify disk write speeds are NVMe class (`fio` or `etcdctl check perf`).
  • Tune kernel `sysctl` parameters for high connection counts.
  • Ensure data residency aligns with your legal requirements.

Don't let legacy infrastructure throttle your container orchestration. If you need a foundation that respects the physics of networking, deploy a test instance on CoolVDS in 55 seconds. Experiencing raw NVMe speed is better than reading about it.