Stop Trusting the Daemon: A Realist's Guide to Container Security
There is a dangerous misconception circulating in junior DevOps circles right now. It is the idea that docker run is equivalent to spinning up a firewall. It is not. If you are deploying containers in production without hardening them, you are essentially handing a loaded gun to anyone who manages to compromise your application layer. Just this February (2019), we saw CVE-2019-5736, a critical vulnerability in runc that allowed a malicious container to overwrite the host runc binary and gain root execution on the host machine. If that didn't wake you up, nothing will.
I have spent the last month auditing clusters for a fintech client in Oslo. The findings were terrifying. Root users everywhere. Writable filesystems. Secrets stored in environment variables. We need to do better. Efficiency is the goal, but security is the requirement. Here is how you lock down your container infrastructure using the tools available to us today.
1. The Root Problem (Literally)
By default, a process inside a Docker container runs as root. If an attacker breaks out of the container (via a kernel exploit), they are root on your host. Even with namespaces mapping, relying on defaults is negligence.
Your Dockerfile should strictly define a non-privileged user. Never let the build process decide your runtime user ID.
Bad Practice:
FROM node:10-alpine
CMD ["node", "app.js"]
Production Standard:
FROM node:10-alpine
# Create a group and user manually
RUN addgroup -S appgroup && adduser -S appuser -G appgroup
# Tell Docker to switch context
USER appuser
WORKDIR /home/appuser
COPY . .
CMD ["node", "app.js"]
Pro Tip: If you are using Kubernetes, enforce this at the cluster level. Use a PodSecurityPolicy (PSP) that forbids runAsUser: 0. If a developer tries to deploy a root container, the API server should reject it immediately.
2. Runtime Hardening: Drop Capabilities
The Linux kernel divides the privileges traditionally associated with superuser into distinct units, known as capabilities. Most web applications need to bind to a port and write to a log file. They do not need to modify kernel modules, manipulate network stacks, or change system time.
Yet, by default, Docker grants a broad set of capabilities. We operate on a "deny all, permit some" basis. When running containers, specifically on high-performance infrastructure like CoolVDS where you have full control over the daemon, you should drop everything and add back only what is necessary.
docker run --cap-drop=ALL --cap-add=NET_BIND_SERVICE --read-only --tmpfs /tmp:rw,noexec,nosuid my-app:latest
This command does three things:
- --cap-drop=ALL: Strips all privileges.
- --cap-add=NET_BIND_SERVICE: Allows binding to ports < 1024 (if needed).
- --read-only: Mounts the container's root filesystem as read-only. Attackers cannot download scripts or modify binaries if the disk rejects writes.
3. The Isolation Myth: Why KVM Matters
Containers share the host kernel. This is their efficiency superpower, but it is their security Achilles' heel. If a noisy neighbor on a shared hosting platform triggers a kernel panic or exploits a syscall vulnerability, your container goes down with the ship.
This is why serious infrastructure relies on KVM (Kernel-based Virtual Machine) for the base layer. At CoolVDS, we do not oversell "container hosting" on shared kernels. We provide KVM instances. This means you get your own dedicated kernel.
| Feature | Shared Container Hosting (LXC/OpenVZ) | CoolVDS (KVM) |
|---|---|---|
| Kernel Isolation | Shared (High Risk) | Dedicated (High Security) |
| Performance | Variable (Noisy Neighbors) | Consistent (NVMe + Dedicated RAM) |
| Sysctl Tuning | Restricted | Full Control |
If you are handling sensitive user data subject to the GDPR, relying on soft separation via cgroups is a risk that Datatilsynet (The Norwegian Data Protection Authority) might frown upon if a breach occurs. Hardware virtualization is the only boundary I trust.
4. Network Segmentation and Data Sovereignty
In 2019, we cannot ignore where the packets flow. If you are hosting in Norway, you want your traffic to stay in Norway, routing through NIX (Norwegian Internet Exchange) for minimal latency. But inside the cluster, we need segmentation.
A compromised frontend container should not be able to talk to your database. It should talk to the API, which talks to the database. In Kubernetes (v1.14), NetworkPolicies are your best friend.
apiVersion: networking.k8s.io/v1
kind: NetworkPolicy
metadata:
name: db-access-policy
namespace: production
spec:
podSelector:
matchLabels:
role: database
policyTypes:
- Ingress
ingress:
- from:
- podSelector:
matchLabels:
role: api-backend
ports:
- protocol: TCP
port: 5432
Without this, a breach in a WordPress container (we all know how often plugins get exploited) allows the attacker to scan your entire internal network. Block it by default.
5. Supply Chain: Pin Your Digests
Using the :latest tag is unprofessional. "Latest" is a moving target. The postgres:latest you pulled today might be different from the one you pull tomorrow, potentially introducing a new vulnerability or breaking change.
For high-security environments, use the SHA256 digest. This guarantees that the code you tested is exactly the code you are deploying.
Insecure:
FROM nginx:latest
Secure:
FROM nginx@sha256:59fd10... (truncated hash)
Combining this with a local registry or a scanning tool like Clair allows you to catch CVEs before they hit your KVM instance. When we provision high-performance NVMe storage on CoolVDS, scanning images becomes instantaneous, removing the I/O bottleneck that often makes teams skip security checks.
Conclusion
Security is not a product; it is a process of reducing surface area. By dropping capabilities, enforcing user limits, and—crucially—ensuring your containers run on top of true hardware virtualization like KVM, you mitigate the majority of 2019's threat landscape.
Do not let a shared kernel be your single point of failure. Build on a foundation that respects isolation.
Ready to harden your stack? Deploy a CoolVDS KVM instance in Oslo today. With pure NVMe storage and dedicated resources, you get the performance of bare metal with the flexibility of the cloud.