Kubernetes Networking Deep Dive: Stop Praying to the IPTables Gods
It’s 3:00 AM. Your pager is screaming because the checkout microservice in Oslo just timed out. Again. `kubectl get pods` shows everything is running, but `curl` requests are vanishing into the void. Welcome to the Bermuda Triangle of modern infrastructure: Kubernetes Networking.
Most developers treat the Kubernetes network model as a black box. You define a Service, throw in an Ingress, and expect packets to flow. But when you are pushing 50,000 requests per second, that black box becomes a bottleneck. I’ve seen production clusters grind to a halt not because of CPU load, but because the conntrack table on the host node was full.
In this post, we are going to tear apart the abstraction layers. We will look at how CNI plugins actually move bytes, why you should probably switch to IPVS mode, and why your choice of underlying VPS in Norway dictates your network latency more than your Go code does.
The Flat Network Lie
Kubernetes dictates a simple rule: every Pod must be able to communicate with every other Pod without NAT. Ideally, it looks like a flat LAN. Under the hood, it is a mess of virtual ethernet pairs (veth), bridges, and routing tables.
In 2019, you generally have two main architectural choices for your CNI (Container Network Interface):
- Overlay Networks (VXLAN/IPIP): Used by Flannel (default) and Calico (in IPIP mode). This encapsulates packets inside packets. It’s easy to set up but introduces CPU overhead for encapsulation/decapsulation.
- Layer 3 Routing (BGP): Used by Calico. This shares routes directly with the host. No encapsulation overhead. Pure speed.
If you are running on a provider that blocks BGP or filters MAC addresses aggressively, you are forced into Overlay networks. This is where performance dies. On CoolVDS KVM instances, we allow the flexibility needed to run efficient Layer 3 routing because we don't treat our users like children who can't be trusted with a routing table.
Code Snippet: Checking Your CNI Config
Don't assume you know what's running. Check the CNI configuration directory.
ls /etc/cni/net.d/
# If you see 10-flannel.conflist, you are using VXLAN.
# If you see 10-calico.conflist, check the mode.
The IPTables vs. IPVS Battlefield
Before Kubernetes 1.11, `kube-proxy` relied heavily on `iptables` to route Service traffic. In a cluster with 5,000 services, `iptables` rules are evaluated sequentially. That is O(n) complexity. Every packet has to traverse a massive list of rules. It is slow. It consumes CPU. It adds latency.
Enter IPVS (IP Virtual Server). It uses hash tables for routing, giving you O(1) complexity. It doesn't matter if you have 10 services or 10,000; the lookup time is constant. As of Kubernetes 1.14/1.15, IPVS is stable, yet so many of you are still running default configurations with iptables.
Pro Tip: If you see high `sy` (system) CPU usage on your nodes during network load, your kernel is drowning in iptables processing. Switch to IPVS.
Configuration: Enabling IPVS Mode
You need to ensure the IPVS kernel modules are loaded on your host (or VPS). Run this script on your nodes:
#!/bin/bash
# Load IPVS modules
modprobe ip_vs
modprobe ip_vs_rr
modprobe ip_vs_wrr
modprobe ip_vs_sh
modprobe nf_conntrack_ipv4
Then, update your `kube-proxy` ConfigMap. This is often found in the `kube-system` namespace.
kubectl edit configmap kube-proxy -n kube-system
Change the mode to `ipvs`:
apiVersion: kubecfg.k8s.io/v1alpha1
kind: KubeProxyConfiguration
mode: "ipvs"
ipvs:
strictARP: true
Kill the kube-proxy pods to restart them. Your latency metrics will thank you.
DNS: The Hidden Villain
Half of all "network" issues are actually DNS issues. In Kubernetes 1.15, CoreDNS is the standard, having replaced kube-dns. But the default `ndots:5` configuration in Linux resolvers means every lookup for an external domain (like `google.com`) triggers 5 failed lookups for internal search domains first (e.g., `google.com.default.svc.cluster.local`, `google.com.svc.cluster.local`...).
This floods your DNS server. If you are hosting on a platform with slow UDP packet processing, you will see random `Temporary Failure in Name Resolution` errors.
The Fix: Customize your `resolv.conf` in the Pod spec if you don't need internal service discovery for that pod, or optimize CoreDNS caching.
apiVersion: v1
kind: Pod
metadata:
name: intense-worker
spec:
dnsPolicy: "None"
dnsConfig:
nameservers:
- 1.1.1.1
searches:
- svc.cluster.local
Securing the Mesh: Network Policies
By default, Kubernetes is an open promiscuous network. Any pod can talk to any pod. If an attacker compromises your frontend Nginx pod, they can port scan your internal Redis database.
We operate under GDPR. The Norwegian Data Inspectorate (Datatilsynet) does not look kindly on architectures that allow unchecked lateral movement. You need NetworkPolicies. These are essentially firewall rules for pods.
Example: Deny All Ingress (The "Zero Trust" Start)
Apply this to a namespace to block all incoming traffic, then whitelist what you need.
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
name: default-deny-all
namespace: backend-secure
spec:
podSelector: {}
policyTypes:
- Ingress
Then, allow traffic only from the frontend:
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
name: allow-frontend
namespace: backend-secure
spec:
podSelector:
matchLabels:
app: database
ingress:
- from:
- namespaceSelector:
matchLabels:
project: myproject
- podSelector:
matchLabels:
role: frontend
The Hardware Reality: Why Virtualization Matters
You can tune sysctls until you are blue in the face, but if the underlying hypervisor steals CPU cycles or throttles I/O, your K8s cluster will be sluggish. Network processing is CPU intensive. Every packet interruption, every context switch adds microseconds.
In a containerized environment, you have the container network namespace, the pod, the node's virtual interface, and the physical interface. That is a lot of hops.
This is why we built CoolVDS with performance optimization in mind. We use KVM (Kernel-based Virtual Machine) which provides near-native performance. Unlike older OpenVZ containers where you share a kernel with noisy neighbors, KVM gives you dedicated resources. When you are pushing packets through a complex Calico BGP mesh, you need that raw CPU stability.
Benchmark: Latency to NIX (Norwegian Internet Exchange)
| Provider Type | Virtualization | Avg Ping to NIX (Oslo) | Jitter |
|---|---|---|---|
| Budget VPS | OpenVZ / Shared | 12ms | ±45ms |
| Global Cloud | Xen / KVM | 8ms | ±15ms |
| CoolVDS | KVM / NVMe | 2ms | ±1ms |
(Figures based on internal benchmarks from our Oslo datacenter, May 2019)
Final Thoughts
Kubernetes networking is brittle if you ignore the fundamentals. Stop using defaults. Switch to IPVS. Implement Network Policies before your security audit fails. And most importantly, run your control plane and worker nodes on infrastructure that respects packet priority.
If you are tired of debugging latency ghosts in the machine, it might not be your config—it might be your host. Deploy a KVM-based instance on CoolVDS today and see what happens when your CNI plugin actually gets the CPU cycles it asks for.