Observability is Not Just "More Monitoring"

It’s 3:00 AM. Your phone buzzes. PagerDuty is screaming. You open your Grafana dashboard. All the lights are green. CPU is at 40%, RAM is steady, disk I/O is nominal. Yet, your support inbox is flooding with Norwegian users claiming they can't complete a purchase.

This is the failure of Monitoring. Monitoring answers the question: "Is the system healthy?" based on thresholds you defined three years ago. It handles "known unknowns."

Observability, on the other hand, answers: "Why is the system behaving this way?" regardless of what you predicted. It handles the "unknown unknowns." In the high-stakes environment of Nordic e-commerce and SaaS, relying solely on basic metrics is negligence.

The Anatomy of a lie: When HTTP 200 is a Failure

I recall a specific incident deploying a microservices architecture for a fintech client in Oslo. We had strict SLAs. Our Nginx monitoring reported 100% uptime and sub-100ms response times. But the application logic was silently failing due to a race condition in the database layer that only triggered under specific high-concurrency writes.

Monitoring saw HTTP 200 OK because the API gateway successfully returned a generic "Please try again" JSON payload. The infrastructure was fine. The business was bleeding money.

We only caught it because we had implemented distributed tracing via OpenTelemetry. We saw a span duration spike in the payment-service that didn't correlate with CPU load. It was a thread lock.

The "LGTM" Stack: A 2024 Standard

While the ELK stack (Elasticsearch, Logstash, Kibana) was the king of the 2010s, it is heavy, resource-hungry, and expensive to scale. In 2024, the pragmatic choice for serious DevOps teams is the LGTM stack (Loki, Grafana, Tempo, Mimir/Prometheus). It decouples storage from compute effectively.

Here is how you actually set this up. Don't just install packages; configure them for high cardinality.

1. The Collector Configuration

You need an OpenTelemetry Collector to sit between your apps and your backend. This allows you to sanitize data (critical for GDPR compliance) before it hits the disk.

receivers:
  otlp:
    protocols:
      grpc:
      http:

processors:
  batch:
  # Scrubbing PII for GDPR compliance before storage
  attributes/gdpr:
    actions:
      - key: user.email
        action: hash

exporters:
  prometheus:
    endpoint: "0.0.0.0:8889"
  otlp:
    endpoint: "tempo:4317"
    tls:
      insecure: true

service:
  pipelines:
    traces:
      receivers: [otlp]
      processors: [batch, attributes/gdpr]
      exporters: [otlp]
    metrics:
      receivers: [otlp]
      processors: [batch]
      exporters: [prometheus]

Note the attributes/gdpr processor. If you are logging raw user data in Norway without anonymization, Datatilsynet (The Norwegian Data Protection Authority) will eventually have a very expensive chat with you.

Infrastructure Matters: The I/O Bottleneck

Here is the trade-off nobody talks about: Observability generates massive amounts of write-heavy data.

If you enable full tracing on a high-traffic application, you are writing gigabytes of logs and traces per hour. On a budget VPS with standard SSDs (or worse, spinning rust), your `iowait` will skyrocket. The observability tool itself becomes the cause of your outage. This is the Heisenberg Uncertainty Principle of DevOps: measuring the system crashes the system.

Pro Tip: Never run your observability stack on the same disk controller as your database. If you can't separate the hardware, ensure you have high-throughput NVMe storage. This is why we standardized on NVMe for all CoolVDS instances—monitoring shouldn't kill your production.

Self-Hosting vs. SaaS (Schrems II & Cost)

In 2024, sending your telemetry data to Datadog or New Relic is a double-edged sword. First, the cost scales linearly with traffic. Second, data residency. Under Schrems II, shipping log data containing IP addresses or user identifiers to US-controlled clouds is legally risky for European companies.

Self-hosting Grafana and Loki in Norway gives you two advantages:

Legal Safety: Data stays within the jurisdiction.
Latency: Sending traces to a US endpoint adds 100ms+ overhead to the request loop if you are using synchronous blocking calls (don't do that, but legacy apps happen). Sending it to a local instance in Oslo takes <2ms.

Deploying Loki with Docker Compose

Here is a battle-tested snippet for getting Loki up with sane retention limits (to prevent filling your disk):

version: "3"
services:
  loki:
    image: grafana/loki:2.9.0
    command: -config.file=/etc/loki/local-config.yaml
    volumes:
      - ./loki-config.yaml:/etc/loki/local-config.yaml
      - loki-data:/loki
    ports:
      - "3100:3100"
    restart: unless-stopped

  promtail:
    image: grafana/promtail:2.9.0
    volumes:
      - /var/log:/var/log
      - ./promtail-config.yaml:/etc/promtail/config.yaml
    command: -config.file=/etc/promtail/config.yaml
    restart: unless-stopped

volumes:
  loki-data:

Comparison: Traditional vs. Observable

Feature	Traditional Monitoring	Modern Observability
Core Question	Is it working?	Why is it broken?
Data Source	Aggregates (Averages)	High-cardinality Events
Granularity	Server / Host	Request / User ID
Infrastructure Needs	Low (SNMP, Ping)	High (NVMe, RAM)

Implementation Strategy

Don't try to boil the ocean. Start by instrumenting your most critical API endpoints. Use the "RED" method:

Rate (Requests per second)
Errors (The number of those requests that are failing)
Duration (The amount of time those requests take)

Once you have metrics, add Tracing to the slow endpoints. Finally, correlate Logs to those traces using TraceIDs.

The Verdict

Observability is an investment in your sleep schedule. It allows you to debug production without logging into the server. But it demands respect for the underlying hardware. You cannot run a heavy Grafana/Loki stack on oversold, noisy-neighbor hosting environments.

If you are building for the Nordic market, you need the low latency of local peering and the raw I/O throughput to handle ingestion spikes without choking your actual application. We built CoolVDS to handle exactly these kinds of workloads—where performance guarantees aren't just marketing copy, but a technical necessity.

Ready to stop guessing? Deploy your own observability stack on a CoolVDS NVMe instance today. Spin it up in under 60 seconds and see what your application is really doing.

🍪 We Value Your Privacy

Privacy & Cookie Settings

Your Privacy Rights

Observability vs. Monitoring: Why Green Dashboards Lie to You

Observability is Not Just "More Monitoring"

The Anatomy of a lie: When HTTP 200 is a Failure

The "LGTM" Stack: A 2024 Standard

1. The Collector Configuration

Infrastructure Matters: The I/O Bottleneck

Self-Hosting vs. SaaS (Schrems II & Cost)

Deploying Loki with Docker Compose

Comparison: Traditional vs. Observable

Implementation Strategy

The Verdict

/// RELATED POSTS

Edge Computing in Norway: Architecting for Sub-5ms Latency in 2025

Kubernetes Networking Deep Dive: Optimizing Packet Flow for Low Latency in 2025

Surviving the Packet Storm: A Deep Dive into Kubernetes Networking & CNI Performance in 2025

Surviving the Millisecond War: Edge Computing Architectures for the Nordic Market

Kubernetes Networking Deep Dive: Why Your Packets Are Dropping in the Overlay

Serverless Without the Handcuffs: Implementing Private FaaS Patterns on High-Performance VDS in 2025