Console Login

CI/CD Pipelines Are I/O Bound: Why Your Builds Are Slow (And How to Fix It)

CI/CD Pipelines Are I/O Bound: Why Your Builds Are Slow (And How to Fix It)

It is Friday, 15:45. The production hotfix is committed. You push to master. You switch to the pipeline view. And then you wait. And wait. The spinner rotates, mocking you. Your build takes 18 minutes. It should take three. Why?

Most DevOps engineers throw CPU cores at slow pipelines. This is usually the wrong move. In 2024, after analyzing hundreds of deployment workflows across the Nordic region, the culprit is rarely raw compute power. It is almost always Disk I/O and Network Latency. Modern CI/CD is an artifact-shuffling game. You are pulling Docker images, extracting node_modules cache, compiling binaries, and pushing artifacts. This is a read/write intensive operation.

If your runner is sitting on a shared HDD or a throttled cloud instance, your CPU is spending half its life in iowait. Here is how we fix it, focusing on infrastructure realities available today.

1. Diagnosing the I/O Bottleneck

Before optimizing, we measure. If you are running your own GitLab Runners or Jenkins agents (which you should be doing for cost control and performance), SSH into the runner during a build.

Run iostat to see if your disk is the choke point.

iostat -xz 1

Look at the %util and await columns. If %util is hovering near 100% or await (average wait time for I/O requests) exceeds 10ms, your storage is too slow. This is common in cheap VPS providers that oversell storage. At CoolVDS, we enforce strict KVM isolation on NVMe drives specifically to prevent this "noisy neighbor" effect. Your IOPS are yours.

Another quick check for Docker context loading times:

docker system df -v

2. The Docker Layer Caching Strategy

Many developers treat Dockerfiles like a grocery list, throwing commands in random order. This kills the build cache. Docker caches layers. If a line changes, every subsequent line must be re-executed.

Bad Practice: Copying source code before installing dependencies. If you change one line of code, Docker re-installs all dependencies.

Good Practice: Copy dependency definitions first, install, then copy source.

Here is a robust, multi-stage Dockerfile optimizing for cache hits and binary size:

# Syntax available as of Docker 24+
FROM golang:1.23-alpine AS builder

# Set working directory
WORKDIR /app

# 1. Copy go.mod and go.sum FIRST
# This layer is cached unless dependencies change
COPY go.mod go.sum ./

# 2. Download dependencies
RUN go mod download

# 3. Copy the rest of the source code
COPY . .

# 4. Build the application
# CGO_ENABLED=0 for static binaries
RUN CGO_ENABLED=0 GOOS=linux go build -o myapp .

# Final Stage
FROM alpine:3.19
WORKDIR /root/
COPY --from=builder /app/myapp .
CMD ["./myapp"]

By structuring the file this way, the expensive go mod download step is skipped during routine code changes. This alone can shave 3-5 minutes off a pipeline.

3. Distributed Caching with MinIO

Local caching works great for single runners. But in a scaled environment with auto-scaling runners, the next job might land on a fresh server. You need distributed object storage for your cache.

In Norway, data sovereignty is critical. Using US-based S3 buckets for caching introduces latency (approx. 90ms RTT from Oslo to us-east-1) and potential GDPR headaches with Datatilsynet if code artifacts contain sensitive data. The solution is hosting a MinIO instance within the same datacenter as your runners.

Pro Tip: Network latency within the same datacenter is <1ms. Transferring a 500MB cache file locally takes roughly 2 seconds. Transferring it to AWS takes significantly longer and costs egress fees.

Configure your GitLab Runner config.toml to use a local MinIO/S3 compatible endpoint:

[[runners]]
  name = "coolvds-nvme-runner-01"
  url = "https://gitlab.com/"
  token = "REGISTRATION_TOKEN"
  executor = "docker"
  [runners.docker]
    image = "docker:26.1.3"
    privileged = true
    volumes = ["/cache"]
  [runners.cache]
    Type = "s3"
    Path = "gitlab-runner"
    Shared = true
    [runners.cache.s3]
      ServerAddress = "minio.internal.coolvds.net:9000"
      AccessKey = "ACCESS_KEY"
      SecretKey = "SECRET_KEY"
      BucketName = "runner-cache"
      Insecure = false

4. Tuning the Kernel for Heavy Docker Workloads

Default Linux kernel settings are optimized for general use, not for a CI server spinning up and destroying 500 containers a day. You need to adjust network stack and file watcher limits.

Increase the file watch limit, or tools like Webpack and heavy linters will crash:

sysctl -w fs.inotify.max_user_watches=524288

Optimize the ephemeral port range for high-concurrency network calls during testing:

sysctl -w net.ipv4.ip_local_port_range="1024 65535"

Persist these in /etc/sysctl.conf.

5. The Pipeline Configuration (GitLab CI Example)

Using docker-in-docker (dind) is standard, but it can be slow if not utilizing the overlay driver correctly. Ensure your .gitlab-ci.yml explicitly requests the overlay2 driver, which relies heavily on the underlying filesystem performance (again, NVMe matters).

variables:
  # Use the Overlay2 driver for improved performance
  DOCKER_DRIVER: overlay2
  # Disable TLS for dind to save setup time if running in trusted internal network
  DOCKER_TLS_CERTDIR: ""

services:
  - name: docker:27-dind
    command: ["--mtu=1400"]

stages:
  - build
  - test

build_image:
  stage: build
  image: docker:27
  before_script:
    - echo "$CI_REGISTRY_PASSWORD" | docker login $CI_REGISTRY -u $CI_REGISTRY_USER --password-stdin
  script:
    # Pull the 'latest' image to use as a cache layer for the build
    - docker pull $CI_REGISTRY_IMAGE:latest || true
    - >
      docker build
      --cache-from $CI_REGISTRY_IMAGE:latest
      --tag $CI_REGISTRY_IMAGE:$CI_COMMIT_SHA
      --tag $CI_REGISTRY_IMAGE:latest
      .
    - docker push $CI_REGISTRY_IMAGE:$CI_COMMIT_SHA
    - docker push $CI_REGISTRY_IMAGE:latest

This configuration implements "inline caching" by pulling the previous image to use its layers. Note that this requires high bandwidth. If your VPS has a capped 100Mbps port, pulling the cache might take longer than rebuilding it. CoolVDS offers high-bandwidth ports connected directly to major European peering exchanges, ensuring these pulls are saturated.

6. RAM Disks for Temporary Artifacts

If you have extremely transient data (like unit test results or temporary DBs for integration testing) that doesn't need to persist after the job, mount a RAM disk.

In your runner config or Docker run command:

--tmpfs /var/lib/mysql:rw,noexec,nosuid,size=512m

This puts the MySQL data directory entirely in RAM. Integration tests involving database writes will fly.

Why Infrastructure Choice Dictates Pipeline Velocity

You can optimize Dockerfiles and yaml configs for weeks, but you hit a hard ceiling defined by physics. Shared hosting environments with "noisy neighbors" introduce unpredictable latency. When another tenant on the node runs a backup, your build stalls.

For professional DevOps teams in Norway and Europe, consistency is the metric of success. We built CoolVDS on KVM virtualization to guarantee that when you pay for 4 vCPUs and NVMe storage, you get 100% of those cycles and IOPS. No stealing, no oversubscription.

Don't let slow I/O kill your developer momentum. Deploy a dedicated CI/CD runner on a CoolVDS NVMe instance in 55 seconds and see your build times drop.