Console Login

Architecting Private Serverless: Beating Cold Starts with Hybrid VDS Patterns

Beyond AWS Lambda: Implementing Private Serverless on Bare-Metal VDS

Let’s cut through the marketing noise immediately: "Serverless" is a misnomer that usually translates to "expensive, high-latency compute owned by a mega-corp." For many developers in the Nordic region, the promise of Function-as-a-Service (FaaS) quickly dissolves when faced with the reality of cold starts, unpredictable billing spikes, and the GDPR headaches associated with data shunting through US-owned data centers in Frankfurt or Ireland. I have seen too many production environments stall because a Lambda function took 4 seconds to warm up during a traffic spike, or because the API Gateway costs eclipsed the actual compute spend. The solution isn't to abandon the event-driven pattern—it is efficient and scalable—but to repatriate the infrastructure. By building a private serverless substrate on top of high-performance Virtual Dedicated Servers (VDS), specifically using K3s and OpenFaaS, we gain the developer velocity of FaaS with the raw metal performance of a dedicated Linux environment, all while keeping the data squarely within Norwegian jurisdiction under Datatilsynet’s watchful eye.

The Architecture: K3s, OpenFaaS, and NVMe

To replicate the agility of public cloud serverless without the "Lambda Tax," we rely on a stack that was battle-tested throughout 2023 and solidified in 2024: Kubernetes (via K3s) for orchestration, OpenFaaS for the function runtime, and a high-IOPS backing store. The choice of hardware here is binary; you either have NVMe or you have a bottleneck. In a recent deployment for a fintech client in Oslo, we migrated their transaction processing from Azure Functions to a CoolVDS instance running this exact stack. The primary challenge in self-hosted FaaS is the control plane overhead. Standard Kubernetes (K8s) is too heavy for a single node or small cluster VDS setup, eating up valuable RAM before you even deploy a function. K3s strips away the legacy cloud provider plugins and uses sqlite by default (though we will swap this for etcd on larger setups), resulting in a binary less than 100MB. This efficiency allows the VDS to dedicate its cycles to your actual Python or Go workers rather than managing the cluster state. Furthermore, by running this on a VDS in Oslo, we slashed network latency from 45ms (Oslo to Frankfurt) to roughly 2-3ms for local users. That implies an immediate responsiveness upgrade for end-users before a single line of code is optimized.

Step 1: The Base Metal Configuration

Before installing the orchestration layer, the Linux kernel needs tuning to handle high-churn container workloads. Default sysctl settings are designed for general-purpose computing, not for spinning up and destroying thousands of containers per hour. We need to adjust the bridge settings and increase the user watch limit for Kubernetes.

# /etc/sysctl.d/99-k3s-serverless.conf

# Enable IP forwarding for container networking
net.ipv4.ip_forward = 1

# Increase user watches for high-density pod scenarios
fs.inotify.max_user_watches = 524288

# Optimize connection tracking for high throughput
net.netfilter.nf_conntrack_max = 131072

# Reduce swap tendency to prioritize RAM for active functions
vm.swappiness = 1

Deploying the Control Plane

With the kernel primed, we deploy K3s. In this architecture, we treat the CoolVDS instance as a converged node—acting as both the control plane and the worker. This is acceptable for high-performance VDS instances because the CPU isolation provided by the underlying KVM hypervisor ensures we aren't fighting for cycles with other tenants, a critical factor often overlooked in cheaper shared hosting. Public cloud vCPUs are often heavily oversubscribed; a dedicated slice on CoolVDS behaves more like bare metal.

curl -sfL https://get.k3s.io | INSTALL_K3S_EXEC="server --disable traefik" sh -
Pro Tip: We disable the default Traefik ingress controller here. For a serverless API gateway, we want tighter control using OpenFaaS's gateway or a custom Nginx ingress that can handle rate limiting and caching logic specifically tuned for function invocations.

The Event Loop: Implementing OpenFaaS

Once K3s is capable, we install OpenFaaS using arkade (the standard CLI tool in 2024). OpenFaaS abstracts the complexity of Kubernetes, giving you a simple CLI to deploy functions. However, the default configuration assumes a "toy" setup. For production, especially when handling financial or real-time data, we must configure the queue-worker limits. If your VDS has 8 vCPUs, you don't want the queue worker to consume all of them, starving the synchronous API gateway. We also need to configure the faas-netes provider to utilize the NVMe storage for temporary scratch space during function execution, which is significantly faster than network-attached block storage common in public clouds.

arkade install openfaas \
  --load-balancer false \
  --set gateway.upstreamTimeout=60s \
  --set gateway.writeTimeout=60s \
  --set queueWorker.ackWait=60s \
  --set faasIdler.dryRun=false

Defining a High-Performance Function

Let's look at a concrete example of a function configuration. In a public cloud, you are often limited by the vendor's runtime version. Here, we define a Dockerfile context. This gives us total control over the binary. For a high-throughput image processing task, we can compile a Go binary with specific optimizations for the host CPU architecture.

version: 1.0
provider:
  name: openfaas
  gateway: http://127.0.0.1:8080
functions:
  img-processor:
    lang: go
    handler: ./img-processor
    image: registry.coolvds-client.no/img-processor:latest
    labels:
      com.openfaas.scale.min: 1
      com.openfaas.scale.max: 20
    annotations:
      # Critical for K8s scheduling on high-load nodes
      com.openfaas.health.http.initialDelay: "2s"
    limits:
      memory: 128Mi
      cpu: 100m
    requests:
      memory: 64Mi
      cpu: 50m

The Persistence Layer: Where Public Cloud Fails

The Achilles' heel of serverless is state. AWS Lambda forces you to use DynamoDB or S3 for state, adding latency. On our CoolVDS architecture, we can run a stateful Redis instance right next to the functions on the same high-speed NVMe fabric. This creates a "Data Locality" advantage that is impossible to replicate in distributed public serverless environments without massive cost. For a recent project, we configured Redis as a persistent store for session data, achieving sub-millisecond read times.

# redis.conf optimized for VDS NVMe

# Snapshotting strategy
save 900 1
save 300 10
save 60 10000

# Append Only File for durability
appendonly yes
appendfsync everysec

# Memory management
maxmemory 2gb
maxmemory-policy allkeys-lru

Comparison: Public FaaS vs. CoolVDS Private FaaS

Feature Public Cloud FaaS (e.g., Lambda) Private FaaS on CoolVDS
Cold Start Variable (100ms - 2s) Constant / Tunable (Keep-warm is free)
Execution Limit Often 15 mins max Unlimited
Storage I/O Networked / Throttled Local NVMe (Direct Access)
Data Sovereignty Complex (US Cloud Act concerns) Guaranteed (Norway/EU)

Ultimately, the choice to move to a self-hosted serverless pattern is about maturity. It signifies a transition from the "move fast and break things" phase to the "optimize and stabilize" phase. While CoolVDS provides the robust, raw compute power necessary to run these orchestration layers without jitter, the real value lies in owning the entire stack. You are no longer subject to a sudden deprecation of a Node.js runtime version or a change in pricing tiers. You control the kernel, the network stack, and the data. For Norwegian businesses dealing with sensitive data, this architectural pattern is not just a performance upgrade; it is a compliance necessity.

Don't let cold starts and foreign routing kill your application's perceived performance. Take control of your infrastructure. Deploy a K3s-ready NVMe instance on CoolVDS today and start building a serverless platform that actually serves you.