Console Login

Surviving the Slashdot Effect: High Availability with HAProxy on CentOS

Surviving the Slashdot Effect: High Availability with HAProxy on CentOS

It starts with a slow page load. Then, the connection times out. Finally, your monitoring system starts screaming because Apache has consumed every bit of RAM and Swap on your server. If you are running a single LAMP stack, you are sitting on a ticking time bomb. I've seen it happen too many times: a marketing campaign goes viral, or you get linked on a high-traffic site, and suddenly your revenue stream is dead because your server is thrashing.

Stop throwing money at RAM. It doesn’t scale linearly. The solution isn't a bigger server; it's smart architecture.

Today, we are going to set up HAProxy 1.4 (High Availability Proxy) to split traffic across multiple web nodes. This is the exact setup I recommend for clients who need to handle spikes without downtime.

Why HAProxy?

You could use Nginx as a balancer, but HAProxy is a dedicated scalpel for this surgery. It handles connection regulation better than anything else in the open-source world right now. It is purely event-driven, meaning it doesn't spawn a process per connection like Apache prefork does. It sips CPU while pushing thousands of concurrent connections.

The Architecture

We are going to move from a single point of failure to a redundant setup:

  • Node 1 (LB): HAProxy (Public IP)
  • Node 2 (Web A): Apache/PHP (Private Network)
  • Node 3 (Web B): Apache/PHP (Private Network)

Using CoolVDS instances for this is ideal because we provide a distinct private network backend (vLAN), reducing latency between your balancer and your web heads to almost zero.

The Configuration

Assuming you are on CentOS 5.5 or the new Debian 6 (Squeeze), install HAProxy. Once installed, back up the default config and open /etc/haproxy/haproxy.cfg. We aren't going to use the default fluff.

Here is a battle-tested configuration for HTTP traffic:

global
    log 127.0.0.1 local0
    maxconn 4096
    user haproxy
    group haproxy
    daemon
    # Spreading checks prevents CPU spikes on the LB
    spread-checks 5

defaults
    log     global
    mode    http
    option  httplog
    option  dontlognull
    retries 3
    option  redispatch
    maxconn 2000
    contimeout 5000
    clitimeout 50000
    srvtimeout 50000

listen  webfarm 0.0.0.0:80
    mode http
    stats enable
    stats auth admin:password123
    stats uri /haproxy?stats
    balance roundrobin
    option httpclose
    option forwardfor
    option httpchk HEAD /check.txt HTTP/1.0
    server web01 192.168.1.10:80 check inter 2000 rise 2 fall 3
    server web02 192.168.1.11:80 check inter 2000 rise 2 fall 3

Key Directives Explained

  • balance roundrobin: This distributes requests sequentially. Web01, then Web02, then Web01. Simple and effective for stateless apps.
  • option httpchk: This is critical. HAProxy checks if a specific file exists. If Apache is up but PHP is broken, a TCP check might pass, but an HTTP check will fail, correctly taking the node out of rotation.
  • option forwardfor: Since the web servers only see the Load Balancer's IP, this adds the X-Forwarded-For header so your logs show the real visitor IP.

The