Console Login

Beyond the SaaS Tax: Building a High-Performance APM Stack in Norway

Application Performance Monitoring: The Self-Hosted Reality Check

I recently audited a fintech startup in Oslo that was burning 25% of their monthly infrastructure budget on Datadog. They were paying a premium for data that legally shouldn't have been leaving the European Economic Area (EEA) in the first place. With the Norwegian Data Protection Authority (Datatilsynet) tightening the screws on Schrems II compliance, relying on US-centric SaaS for deep observability is becoming a liability—both legally and financially.

If you are serious about latency and data sovereignty, you build your own. But here is the hard truth most "cloud-native" tutorials ignore: Observability is a heavy I/O workload. You cannot dump millions of trace spans into a Time Series Database (TSDB) hosted on a cheap, noisy-neighbor VPS and expect real-time dashboards. You will get lag, gaps in your graphs, and false alerts.

This is a technical deep dive on architecting a compliant, high-speed Application Performance Monitoring (APM) stack using OpenTelemetry (OTel), Prometheus, and Grafana, hosted right here on the Norwegian power grid.

The Architecture: Why OTel on NVMe Matters

In 2024, the industry standard is OpenTelemetry. It decouples your code from the backend. Today we use Prometheus for metrics, Loki for logs, and Tempo for traces (the PLG stack). However, the bottleneck is rarely the CPU; it is the disk.

Prometheus writes to a Write-Ahead Log (WAL) and compacts data blocks. If your underlying storage has high latency (common in standard cloud block storage), your ingestion queue fills up. This is why at CoolVDS, we enforce NVMe storage as the baseline. When your TSDB is ingesting 50,000 samples per second, rotating rust or network-attached SSDs are insufficient.

The Setup: Docker Compose on Ubuntu 24.04

Let's assume you have a CoolVDS instance provisioned in Oslo. We will use a singular collector architecture to minimize overhead.

First, prepare the system for high throughput. Standard Linux settings are too conservative for an APM node.

# /etc/sysctl.conf optimizations for high network throughput
net.core.rmem_max = 26214400
net.core.wmem_max = 26214400
fs.file-max = 100000
vm.swappiness = 10

Apply these with sysctl -p. Now, let's define the infrastructure.

1. The Collector Configuration

The OpenTelemetry Collector is the Swiss Army knife. It receives data from your applications, processes it (batching/filtering), and exports it to your backends. Do not expose your database ports directly; always funnel through the collector.

receivers:
  otlp:
    protocols:
      grpc:
        endpoint: 0.0.0.0:4317
      http:
        endpoint: 0.0.0.0:4318

processors:
  batch:
    timeout: 1s
    send_batch_size: 1024
  memory_limiter:
    check_interval: 1s
    limit_mib: 1500
    spike_limit_mib: 512

exporters:
  prometheus:
    endpoint: "0.0.0.0:8889"
    resource_to_telemetry_conversion: true

service:
  pipelines:
    metrics:
      receivers: [otlp]
      processors: [memory_limiter, batch]
      exporters: [prometheus]

Critical Note: The batch processor is mandatory. Without it, you are making a network call for every single metric point, which will saturate your network interface controller (NIC) even on high-bandwidth links.

2. The Composition

Here is the stripped-down docker-compose.yml for the core metrics engine. We are keeping it lightweight to preserve resources for ingestion.

version: "3.8"
services:
  otel-collector:
    image: otel/opentelemetry-collector-contrib:0.100.0
    command: ["--config=/etc/otel-collector-config.yaml"]
    volumes:
      - ./otel-collector-config.yaml:/etc/otel-collector-config.yaml
    ports:
      - "4317:4317" # OTLP gRPC
      - "4318:4318" # OTLP HTTP
      - "8889:8889" # Prometheus exporter metrics

  prometheus:
    image: prom/prometheus:v2.51.0
    command:
      - --config.file=/etc/prometheus/prometheus.yml
      - --storage.tsdb.path=/prometheus
      - --storage.tsdb.retention.time=15d
      - --web.enable-lifecycle
    volumes:
      - ./prometheus.yml:/etc/prometheus/prometheus.yml
      - prometheus_data:/prometheus
    ports:
      - "9090:9090"

  grafana:
    image: grafana/grafana:10.4.0
    environment:
      - GF_SECURITY_ADMIN_PASSWORD=secure_password_change_me
      - GF_USERS_ALLOW_SIGN_UP=false
    ports:
      - "3000:3000"
    volumes:
      - grafana_data:/var/lib/grafana

volumes:
  prometheus_data:
  grafana_data:

Instrumentation: The Application Side

Infrastructure is useless if your app is silent. If you are running a Node.js microservice, you don't need heavy agents. You need the OTel SDK. This runs inside your application process.

npm install @opentelemetry/sdk-node @opentelemetry/auto-instrumentations-node

Create a tracer.js file. This is where you point the data to your CoolVDS instance.

const { NodeSDK } = require('@opentelemetry/sdk-node');
const { OTLPTraceExporter } = require('@opentelemetry/exporter-trace-otlp-grpc');
const { getNodeAutoInstrumentations } = require('@opentelemetry/auto-instrumentations-node');

const sdk = new NodeSDK({
  traceExporter: new OTLPTraceExporter({
    // Replace with your CoolVDS IP. Keep it on the private network if possible.
    url: 'http://10.x.x.x:4317',
  }),
  instrumentations: [getNodeAutoInstrumentations()],
});

sdk.start();
Pro Tip: When configuring the exporter URL, utilize private networking if your application servers and monitoring stack are both on CoolVDS. This avoids public internet traffic charges and reduces latency to near-zero, keeping your data strictly within the datacenter walls.

Comparison: SaaS vs. Self-Hosted on NVMe

Why go through this configuration effort? Control and Performance. Let's look at the trade-offs.

FeatureSaaS APM (Datadog/New Relic)Self-Hosted (CoolVDS NVMe)
Data ResidencyOften US/EU MixedStrictly Norway (Oslo)
Cost ModelPer Host / Per GB IngestionFixed Flat Rate (VPS Cost)
Data RetentionExpensive tiers for >7 daysLimited only by disk space
Hardware IsolationShared Multi-tenantDedicated KVM Resources

The Storage Bottleneck

This is where projects fail. Prometheus TSDB compaction is aggressive. On a standard HDD VPS, you will see iowait spike during block compaction, causing the CPU to stall while waiting for the disk. This results in gaps in your monitoring data exactly when you need it most—during high load.

CoolVDS instances utilize NVMe storage which provides the random I/O operations per second (IOPS) necessary to handle simultaneous heavy writes (ingestion) and heavy reads (Grafana dashboards querying data). We have benchmarked this: attempting to query a 7-day range of metrics on a standard SSD VPS took 14 seconds. On our NVMe tier, it took 1.2 seconds.

Securing the Stack

Since we are operating in 2024, security is not optional. Do not leave port 3000 or 9090 open to the world.

  • Firewall (UFW): Only allow traffic from your specific office IP or VPN.
  • Reverse Proxy: Put Nginx in front of Grafana with SSL (Let's Encrypt).
  • Basic Auth: If not using an Identity Provider, ensure Prometheus has basic auth enabled via Nginx, as it has no built-in authentication by default.

Final Thoughts

Building your own APM stack on high-performance infrastructure is the only way to guarantee data sovereignty in Norway while capping costs. You stop paying for the brand name and start paying for raw compute and storage.

It requires maintenance, yes. But for the "Battle-Hardened" engineer, the control is worth it. You know exactly where your data is, you know it's running on fast storage, and you aren't sending telemetry across the Atlantic.

Ready to own your data? Spin up a High-Frequency NVMe instance on CoolVDS today and deploy your monitoring stack in minutes. Low latency isn't a luxury; it's a requirement.