Stop Guessing: A Battle-Hardened Guide to APM and Observability in Norway
Most developers treat their servers like a black box. You push code, it runs, and occasionally it crashes. When a client in Bergen complains that the checkout page is hanging, you stare at htop, restart PHP-FPM, and hope the problem vanishes. It usually does—until 3:00 AM the next day.
I have spent the last decade debugging high-load systems across the Nordics. I've seen massive e-commerce clusters fall over not because of bad code, but because of IOPS bottlenecks that the hosting provider hid behind a "Cloud" marketing sticker. If you cannot visualize your infrastructure's heartbeat, you are flying blind.
Today, we build a proper Application Performance Monitoring (APM) stack. No expensive SaaS that sends your customer data to US servers (a GDPR nightmare under Schrems II). We are doing this on bare-metal-class VPS instances right here in Norway, using tools that became industry standards by 2025.
The Latency Lie: Why P99 Matters More Than Averages
Averages mask failure. If 95% of your requests finish in 50ms, but 5% take 10 seconds, your average looks fine (around 500ms). But that 5% represents your most active users—the ones actually trying to pay you. In the Norwegian market, where fiber connectivity is ubiquitous, high latency stands out immediately.
Pro Tip: Always optimize for the 99th percentile (P99). If your P99 latency is stable, your infrastructure is solid. If it spikes, your disk I/O or CPU steal time is likely the culprit.
The Stack: Prometheus + Grafana on Ubuntu 24.04
We are using the classic, unkillable combo: Prometheus for scraping metrics and Grafana for visualization. By mid-2025, this stack has matured into the absolute default for self-hosted observability. We keep it strictly local to ensure compliance with Datatilsynet regulations regarding data sovereignty.
Step 1: The Foundation
First, assume you are running a fresh CoolVDS instance with Ubuntu 24.04 LTS. Why CoolVDS? Because metrics are useless if your noisy neighbors on a cheap shared host are stealing CPU cycles. You need consistent CPU performance to trust your measurements.
Install the node exporter. This is the agent that exposes hardware metrics.
sudo apt update && sudo apt install prometheus-node-exporter -y
Verify it is spitting out data:
curl -s localhost:9100/metrics | head -n 5
You should see raw text output. If you see connection refused, check your UFW rules.
Step 2: Configuring Prometheus
We need to tell Prometheus what to scrape. Create a configuration file. We are looking for a scrape interval of 15 seconds—aggressive enough to catch spikes, but light enough on storage.
File: /etc/prometheus/prometheus.yml
global:
scrape_interval: 15s
evaluation_interval: 15s
scrape_configs:
- job_name: 'node_exporter'
static_configs:
- targets: ['localhost:9100']
- job_name: 'nginx_vts'
scrape_interval: 10s
metrics_path: /status/format/prometheus
static_configs:
- targets: ['localhost:8080']
Start the service:
sudo systemctl enable --now prometheus
Step 3: Exposing Application Metrics (Nginx)
System metrics aren't enough. You need to know how Nginx is handling connections. In 2025, we rely heavily on the ngx_http_stub_status_module, or better, the VTS module if you compiled custom Nginx. For standard setups, enable the basic status page.
Add this to your nginx.conf inside a server block restricted to localhost:
server {
listen 127.0.0.1:8080;
server_name localhost;
location /stub_status {
stub_status on;
allow 127.0.0.1;
deny all;
}
}
Reload Nginx to apply changes:
sudo nginx -s reload