Console Login

Zero-Touch Production: A Battle-Tested GitOps Workflow for Nordic Infrastructure

Zero-Touch Production: A Battle-Tested GitOps Workflow for Nordic Infrastructure

If you are still SSH-ing into a server to run git pull or, god forbid, manually editing an nginx.conf in Vi while traffic is live, you are a ticking time bomb. I say this not to be harsh, but because I have been there. I have seen an entire e-commerce platform vanish during a Black Friday sale because of a "quick fix" applied manually that was overwritten by an automated script an hour later.

Manual operations are technical debt. In 2024, the only acceptable standard for managing infrastructure is GitOps. The concept is simple: Git is the single source of truth. If it's not in the repo, it doesn't exist in the cluster.

However, implementing GitOps in a European context—specifically here in Norway—adds layers of complexity regarding latency, data residency (GDPR), and infrastructure reliability. This guide breaks down the exact workflow we use to manage high-availability clusters, relying on ArgoCD, GitLab CI, and high-performance underlying infrastructure like CoolVDS to keep the reconciliation loops tight.

The Architecture: Pull vs. Push

Traditional CI/CD is "Push-based". Your Jenkins server runs a script and pushes changes to the target environment. This is a security nightmare. It requires your CI server to have root/admin access to your production cluster. If your CI server is compromised, your production environment is gone.

We use a "Pull-based" approach (GitOps). The cluster pulls its own configuration.

The Stack

  • Code & CI: GitLab (Self-hosted or SaaS, ideally running on local VPS Norway instances for speed).
  • CD Controller: ArgoCD running inside the Kubernetes cluster.
  • Infrastructure: KVM-based Virtualization (CoolVDS) with strict NVMe storage requirements.

Step 1: The CI Pipeline (Building the Artifact)

The job of the CI pipeline is strictly to run tests and build a container image. It should never touch the production cluster directly. Here is a stripped-down, production-ready .gitlab-ci.yml using Kaniko for secure builds (no Docker-in-Docker daemon required).

stages:
  - test
  - build
  - update-manifests

variables:
  REGISTRY: registry.example.no
  IMAGE_NAME: $REGISTRY/backend-service

unit_tests:
  stage: test
  image: golang:1.22-alpine
  script:
    - go test ./... -v

build_image:
  stage: build
  image:
    name: gcr.io/kaniko-project/executor:v1.19.2-debug
    entrypoint: [""]
  script:
    - /kaniko/executor \
      --context "$CI_PROJECT_DIR" \
      --dockerfile "$CI_PROJECT_DIR/Dockerfile" \
      --destination "$IMAGE_NAME:$CI_COMMIT_SHORT_SHA"

# The "GitOps" magic happens here
update_gitops_repo:
  stage: update-manifests
  image: bitnami/git:2.44.0
  script:
    - git config --global user.email "ci-bot@example.no"
    - git clone https://oauth2:${GITOPS_TOKEN}@gitlab.example.no/ops/cluster-manifests.git
    - cd cluster-manifests
    - sed -i "s|image: .*|image: $IMAGE_NAME:$CI_COMMIT_SHORT_SHA|" deployments/backend.yaml
    - git commit -am "Update image to $CI_COMMIT_SHORT_SHA"
    - git push origin main

Notice the final stage. The pipeline commits a change to a separate repository containing Kubernetes manifests. This separation is crucial for auditing and rollbacks.

Step 2: The CD Controller (ArgoCD)

Inside your Kubernetes cluster, ArgoCD watches that manifest repository. When it sees the commit from the CI pipeline, it detects a "Drift". The actual state (cluster) differs from the desired state (git). It then applies the changes.

Here is the Application manifest we deploy to configure ArgoCD itself:

apiVersion: argoproj.io/v1alpha1
kind: Application
metadata:
  name: backend-service-prod
  namespace: argocd
spec:
  project: default
  source:
    repoURL: 'https://gitlab.example.no/ops/cluster-manifests.git'
    targetRevision: HEAD
    path: deployments/prod
    helm:
      valueFiles:
        - values-prod.yaml
  destination:
    server: 'https://kubernetes.default.svc'
    namespace: backend
  syncPolicy:
    automated:
      prune: true
      selfHeal: true
    syncOptions:
      - CreateNamespace=true

The Hardware Bottleneck: Why etcd Latency Matters

This is where most tutorials fail you. They assume hardware is infinite. In a GitOps workflow, ArgoCD is constantly polling your Git repository and your Kubernetes API server. Kubernetes relies on etcd as its database.

Pro Tip: etcd is extremely sensitive to disk write latency. If fsync takes longer than 10ms, your cluster becomes unstable. Leader elections fail. Pods get stuck in "Terminating".

We ran benchmarks comparing standard cloud volume storage against CoolVDS NVMe storage. The difference in a high-churn GitOps environment is night and day.

Check your etcd WAL (Write Ahead Log) duration with this command:

kubectl -n monitoring logs -l app=prometheus | grep "etcd_disk_wal_fsync_duration_seconds"

If you see values consistently above 0.01 (10ms), your storage is too slow. This is why we default to CoolVDS for our control planes. The raw NVMe pass-through ensures we stay in the sub-millisecond range, keeping the reconciliation loop instant.

Step 3: Managing Configuration Drift

One of the biggest risks in DevOps is manual intervention. A developer fixes a bug by running:

kubectl edit deployment backend -n prod

They fix the issue, but they forget to update Git. Two days later, ArgoCD syncs, and the bug returns. To prevent this, ArgoCD has a Self-Heal mechanism. If someone changes the cluster manually, ArgoCD immediately reverts it to match Git.

Structure your Helm values to handle environment specifics without code duplication:

# values-prod.yaml
replicaCount: 5

resources:
  limits:
    cpu: 1000m
    memory: 2Gi
  requests:
    cpu: 500m
    memory: 1Gi

autoscaling:
  enabled: true
  minReplicas: 5
  maxReplicas: 20
  targetCPUUtilizationPercentage: 75

# GDPR Compliance Flag (App Specific)
# Ensures logs are scrubbed of PII before export
env:
  DATA_RESIDENCY_MODE: "strict_eea"

Network Latency and The "Oslo" Factor

For Norwegian businesses, the physical location of your Git repository and your cluster matters. If your Git repo is hosted in US-East and your cluster is in Oslo, the polling latency adds up. We host our GitLab instances on CoolVDS servers in the same datacenter as our production clusters.

Testing latency to NIX (Norwegian Internet Exchange) is a good proxy for local connectivity:

ping -c 4 nix.no

On our infrastructure, we consistently see:

64 bytes from 194.19.83.10: icmp_seq=1 ttl=58 time=1.2 ms
64 bytes from 194.19.83.10: icmp_seq=2 ttl=58 time=1.1 ms

This low latency ensures that when you push code, the deployment starts effectively instantly.

Security: The Locked Down Cluster

In this workflow, no developer needs kubectl access to production. Access is restricted to:

  1. The ArgoCD Controller: Runs inside the cluster.
  2. Break-glass Admins: A tiny group of seniors with physical hardware keys (YubiKeys).

To verify your cluster isn't exposing unnecessary ports, run a quick scan from an external node:

nmap -p 6443,443,80 -sV your-cluster-ip

You should only see 443/80 open for traffic, and 6443 (API) should be firewall-restricted to your management VPN. CoolVDS includes DDoS protection at the edge, which is vital because automated pipelines can trigger false positives on aggressive WAFs if not whitelisted correctly.

Conclusion

GitOps is not just a buzzword; it is the operational model that separates professionals from amateurs. It creates an audit trail required by Data Privacy regulations and ensures stability.

However, the software stack is only as good as the iron it runs on. You cannot run a high-frequency reconciliation loop on oversold, noisy-neighbor hardware. You need dedicated resources and fast I/O.

Ready to stabilize your pipeline? Don't let IOwait kill your deployments. Spin up a high-performance, GitOps-ready environment on CoolVDS today and experience the difference of pure NVMe performance.