Console Login

Kubernetes Networking is Broken (Until You Fix It): A Deep Dive for 2022

The Network is the Computer (And it's Slow)

I still remember the first time a production cluster imploded on me. It wasn't a memory leak. It wasn't a rogue `rm -rf`. It was a default Flannel configuration trying to push gigabits of traffic through a choked VXLAN overlay on underpowered virtual machines. The latency spiked to 400ms. The database connections timed out. The client—a major logistics firm in Oslo—was not amused.

Kubernetes networking is often treated as a black box. You run `kubectl apply -f cni.yaml`, see the pods go Running, and move on. That is a mistake. In 2022, with the shift from Docker shim to containerd and the explosion of microservices, the network layer is where performance goes to die.

This is not a beginner's guide. We are going to look at what actually happens to a packet when it hits your node, why `iptables` is failing you at scale, and how to architect a cluster on CoolVDS infrastructure that complies with Schrems II while screaming along at wire speed.

The CNI Battlefield: Calico vs. Cilium vs. Flannel

Your Container Network Interface (CNI) choice dictates your cluster's heartbeat. If you are still using the default settings from a tutorial written in 2019, you are leaving performance on the table.

1. Flannel (VXLAN)

The old reliable. It creates a simple overlay network. It encapsulates packets in UDP. It works everywhere. Do not use it for high-throughput production. The CPU overhead of encapsulation/decapsulation (encap/decap) on every single packet adds up. On a standard VPS with noisy neighbors, this jitter kills real-time applications.

2. Calico (BGP)

Calico offers a choice: VXLAN (overlay) or pure Layer 3 routing via BGP. If you control the network—like you do when deploying on bare metal or high-performance KVM slices provided by CoolVDS—you want BGP. No encapsulation overhead. Just pure Linux routing tables.

Here is a snippet to configure Calico to peer with a route reflector, which is essential for larger clusters to avoid a full mesh explosion:

apiVersion: projectcalico.org/v3
kind: BGPPeer
metadata:
  name: my-route-reflector
spec:
  peerIP: 192.168.10.1
  asNumber: 64512

3. Cilium (eBPF)

This is where the industry is heading in 2022. Cilium bypasses `iptables` almost entirely by using eBPF (Extended Berkeley Packet Filter) logic inside the kernel. It’s faster, more secure, and provides visibility that `tcpdump` struggles to match. If you are running Kubernetes 1.21+, you should be evaluating this.

The Bottleneck: kube-proxy and iptables

By default, `kube-proxy` uses `iptables` to handle Service discovery and load balancing. When a Service has a ClusterIP, `iptables` rules intercept traffic destined for that IP and DNAT it to a pod.

This works fine for 50 services. It works okay for 500. At 5,000 services, the Linux kernel struggles to update the sequential list of rules. Rule updates become O(N). Latency creeps in.

The Fix: IPVS (IP Virtual Server).

IPVS is a kernel-space load balancer based on hash tables. Lookups are O(1). It doesn't care if you have 10 services or 10,000. Switching to IPVS is often the single biggest performance upgrade you can make for service-to-service communication.

Here is how you force `kube-proxy` into IPVS mode (assuming you are using `kubeadm`):

# edit config map
kubectl edit configmap kube-proxy -n kube-system

# Change mode from "" or "iptables" to "ipvs"
mode: "ipvs"

# You might also need to adjust the scheduler
ipvs:
  scheduler: "rr" # Round Robin

Before you restart the pods, ensure the kernel modules are loaded on your CoolVDS nodes:

modprobe ip_vs
modprobe ip_vs_rr
modprobe ip_vs_wrr
modprobe ip_vs_sh
modprobe nf_conntrack
Pro Tip: If you are using an overlay network (VXLAN/Geneve), check your MTU (Maximum Transmission Unit). The default Ethernet MTU is 1500. The overlay header takes ~50 bytes. If your inner MTU is also 1500, packets fragment. Fragmentation destroys performance. Set your CNI MTU to 1450.

Infrastructure Matters: The Physical Reality

You can tune sysctls until your fingers bleed, but you cannot tune the speed of light. If your users are in Norway and your Kubernetes cluster is in a massive datacenter in Iowa, you have lost before you started.

Latency is the silent killer of microservices. If Service A calls Service B, which calls Service C, a 30ms latency between nodes becomes a 90ms delay for the user. In a complex mesh, this compounds exponentially.

This is why we built CoolVDS with a specific focus on the Nordic region. Our datacenter interconnects are optimized for low latency to NIX (Norwegian Internet Exchange). Furthermore, raw I/O matters. etcd, the brain of Kubernetes, is extremely sensitive to disk write latency. If fsync takes too long, the cluster leader election fails.

We use enterprise-grade NVMe storage. Not