Zero-Touch Production: A Battle-Tested GitOps Workflow for Nordic Infrastructure
If you are still SSH-ing into a server to run git pull or, god forbid, manually editing an nginx.conf in Vi while traffic is live, you are a ticking time bomb. I say this not to be harsh, but because I have been there. I have seen an entire e-commerce platform vanish during a Black Friday sale because of a "quick fix" applied manually that was overwritten by an automated script an hour later.
Manual operations are technical debt. In 2024, the only acceptable standard for managing infrastructure is GitOps. The concept is simple: Git is the single source of truth. If it's not in the repo, it doesn't exist in the cluster.
However, implementing GitOps in a European context—specifically here in Norway—adds layers of complexity regarding latency, data residency (GDPR), and infrastructure reliability. This guide breaks down the exact workflow we use to manage high-availability clusters, relying on ArgoCD, GitLab CI, and high-performance underlying infrastructure like CoolVDS to keep the reconciliation loops tight.
The Architecture: Pull vs. Push
Traditional CI/CD is "Push-based". Your Jenkins server runs a script and pushes changes to the target environment. This is a security nightmare. It requires your CI server to have root/admin access to your production cluster. If your CI server is compromised, your production environment is gone.
We use a "Pull-based" approach (GitOps). The cluster pulls its own configuration.
The Stack
- Code & CI: GitLab (Self-hosted or SaaS, ideally running on local VPS Norway instances for speed).
- CD Controller: ArgoCD running inside the Kubernetes cluster.
- Infrastructure: KVM-based Virtualization (CoolVDS) with strict NVMe storage requirements.
Step 1: The CI Pipeline (Building the Artifact)
The job of the CI pipeline is strictly to run tests and build a container image. It should never touch the production cluster directly. Here is a stripped-down, production-ready .gitlab-ci.yml using Kaniko for secure builds (no Docker-in-Docker daemon required).
stages:
- test
- build
- update-manifests
variables:
REGISTRY: registry.example.no
IMAGE_NAME: $REGISTRY/backend-service
unit_tests:
stage: test
image: golang:1.22-alpine
script:
- go test ./... -v
build_image:
stage: build
image:
name: gcr.io/kaniko-project/executor:v1.19.2-debug
entrypoint: [""]
script:
- /kaniko/executor \
--context "$CI_PROJECT_DIR" \
--dockerfile "$CI_PROJECT_DIR/Dockerfile" \
--destination "$IMAGE_NAME:$CI_COMMIT_SHORT_SHA"
# The "GitOps" magic happens here
update_gitops_repo:
stage: update-manifests
image: bitnami/git:2.44.0
script:
- git config --global user.email "ci-bot@example.no"
- git clone https://oauth2:${GITOPS_TOKEN}@gitlab.example.no/ops/cluster-manifests.git
- cd cluster-manifests
- sed -i "s|image: .*|image: $IMAGE_NAME:$CI_COMMIT_SHORT_SHA|" deployments/backend.yaml
- git commit -am "Update image to $CI_COMMIT_SHORT_SHA"
- git push origin main
Notice the final stage. The pipeline commits a change to a separate repository containing Kubernetes manifests. This separation is crucial for auditing and rollbacks.
Step 2: The CD Controller (ArgoCD)
Inside your Kubernetes cluster, ArgoCD watches that manifest repository. When it sees the commit from the CI pipeline, it detects a "Drift". The actual state (cluster) differs from the desired state (git). It then applies the changes.
Here is the Application manifest we deploy to configure ArgoCD itself:
apiVersion: argoproj.io/v1alpha1
kind: Application
metadata:
name: backend-service-prod
namespace: argocd
spec:
project: default
source:
repoURL: 'https://gitlab.example.no/ops/cluster-manifests.git'
targetRevision: HEAD
path: deployments/prod
helm:
valueFiles:
- values-prod.yaml
destination:
server: 'https://kubernetes.default.svc'
namespace: backend
syncPolicy:
automated:
prune: true
selfHeal: true
syncOptions:
- CreateNamespace=true
The Hardware Bottleneck: Why etcd Latency Matters
This is where most tutorials fail you. They assume hardware is infinite. In a GitOps workflow, ArgoCD is constantly polling your Git repository and your Kubernetes API server. Kubernetes relies on etcd as its database.
Pro Tip:etcdis extremely sensitive to disk write latency. Iffsynctakes longer than 10ms, your cluster becomes unstable. Leader elections fail. Pods get stuck in "Terminating".
We ran benchmarks comparing standard cloud volume storage against CoolVDS NVMe storage. The difference in a high-churn GitOps environment is night and day.
Check your etcd WAL (Write Ahead Log) duration with this command:
kubectl -n monitoring logs -l app=prometheus | grep "etcd_disk_wal_fsync_duration_seconds"
If you see values consistently above 0.01 (10ms), your storage is too slow. This is why we default to CoolVDS for our control planes. The raw NVMe pass-through ensures we stay in the sub-millisecond range, keeping the reconciliation loop instant.
Step 3: Managing Configuration Drift
One of the biggest risks in DevOps is manual intervention. A developer fixes a bug by running:
kubectl edit deployment backend -n prod
They fix the issue, but they forget to update Git. Two days later, ArgoCD syncs, and the bug returns. To prevent this, ArgoCD has a Self-Heal mechanism. If someone changes the cluster manually, ArgoCD immediately reverts it to match Git.
Structure your Helm values to handle environment specifics without code duplication:
# values-prod.yaml
replicaCount: 5
resources:
limits:
cpu: 1000m
memory: 2Gi
requests:
cpu: 500m
memory: 1Gi
autoscaling:
enabled: true
minReplicas: 5
maxReplicas: 20
targetCPUUtilizationPercentage: 75
# GDPR Compliance Flag (App Specific)
# Ensures logs are scrubbed of PII before export
env:
DATA_RESIDENCY_MODE: "strict_eea"
Network Latency and The "Oslo" Factor
For Norwegian businesses, the physical location of your Git repository and your cluster matters. If your Git repo is hosted in US-East and your cluster is in Oslo, the polling latency adds up. We host our GitLab instances on CoolVDS servers in the same datacenter as our production clusters.
Testing latency to NIX (Norwegian Internet Exchange) is a good proxy for local connectivity:
ping -c 4 nix.no
On our infrastructure, we consistently see:
64 bytes from 194.19.83.10: icmp_seq=1 ttl=58 time=1.2 ms
64 bytes from 194.19.83.10: icmp_seq=2 ttl=58 time=1.1 ms
This low latency ensures that when you push code, the deployment starts effectively instantly.
Security: The Locked Down Cluster
In this workflow, no developer needs kubectl access to production. Access is restricted to:
- The ArgoCD Controller: Runs inside the cluster.
- Break-glass Admins: A tiny group of seniors with physical hardware keys (YubiKeys).
To verify your cluster isn't exposing unnecessary ports, run a quick scan from an external node:
nmap -p 6443,443,80 -sV your-cluster-ip
You should only see 443/80 open for traffic, and 6443 (API) should be firewall-restricted to your management VPN. CoolVDS includes DDoS protection at the edge, which is vital because automated pipelines can trigger false positives on aggressive WAFs if not whitelisted correctly.
Conclusion
GitOps is not just a buzzword; it is the operational model that separates professionals from amateurs. It creates an audit trail required by Data Privacy regulations and ensures stability.
However, the software stack is only as good as the iron it runs on. You cannot run a high-frequency reconciliation loop on oversold, noisy-neighbor hardware. You need dedicated resources and fast I/O.
Ready to stabilize your pipeline? Don't let IOwait kill your deployments. Spin up a high-performance, GitOps-ready environment on CoolVDS today and experience the difference of pure NVMe performance.