Stop Flying Blind: Advanced Log Analysis with AWStats
If you are still monitoring your traffic by running tail -f /var/log/httpd/access_log and hoping for the best, you are doing it wrong. I have seen seasoned sysadmins lose hours trying to diagnose a traffic spike using grep and awk while the server load climbed above 20.0. It is not heroic. It is inefficient.
In the Norwegian hosting market, where bandwidth costs are reasonable but latency is scrutinized, knowing exactly who is hitting your server and what they are grabbing is critical. Webalizer is ancient. Google Analytics is client-side only (and misses all the bots scraping your content). You need server-side log analysis. You need AWStats properly configured.
The I/O Bottleneck of Log Parsing
Here is the reality check. AWStats is a Perl script. It parses massive text files. If you run this on a cheap, oversold shared host, the disk I/O wait (iowait) will cripple your Apache processes. The CPU waits for the disk to spin, and your site hangs.
I worked on a project last month for a media client in Oslo. They had 2GB log files daily. Running the update script on a standard SATA drive took 45 minutes, during which the MySQL database locked up due to resource starvation. We moved them to a CoolVDS Xen-based VPS with RAID-10 SAS storage (15k RPM drives matter), and the parse time dropped to 3 minutes. Architecture matters.
Step-by-Step Implementation
Let's assume you are running CentOS 5 or Debian Lenny. First, stop relying on the default repositories if they are outdated. Grab the latest stable release (currently 6.95).
1. The Apache Configuration
AWStats needs a specific log format to extract the most value. Open your httpd.conf or apache2.conf:
LogFormat "%h %l %u %t \"%r\" %>s %b \"%{Referer}i\" \"%{User-Agent}i\"" combined
CustomLog /var/log/httpd/access_log combinedDon't forget to reload Apache. If you don't use the combined format, you lose User-Agent data, meaning you won't distinguish between a Firefox user and a malicious botnet.
2. Configuring the .conf File
Copy the model config file in /etc/awstats/ to your domain config. The critical setting most people miss is DNS lookup.
# /etc/awstats/awstats.yourdomain.conf
DNSLookup=1Warning: Setting DNSLookup=1 gives you beautiful hostname resolution (seeing telenor.net instead of an IP), but it is a performance killer. It forces a reverse DNS lookup for every IP. If you are on a high-latency network or a slow server, turn this OFF (set to 0) or use the libgeo-ip-perl plugin for country resolution instead.
Pro Tip: Schedule your cron job to run the update script (awstats_updateall.pl) every hour, not every 24 hours. Spreading the I/O load prevents that midnight server spike that wakes you up.Compliance: The Norwegian Context (Datatilsynet)
Operating a server in Norway means adhering to the Personal Data Act (Personopplysningsloven). IP addresses can be considered personal data. If you are storing logs indefinitely without anonymization, you are walking a fine line.
To stay compliant while hosting in Oslo or deploying for European clients, consider enabling the implementation of IP masking in your analysis or rotating logs strictly. While the Safe Harbor principles cover data transfer, local storage requires strict access control. Ensure your /awstats/ directory is password protected via .htaccess.
Why Virtualization Choice Impacts Analysis
Log analysis is bursty. It demands high CPU and Disk I/O for short periods. This is where the underlying virtualization technology of your provider exposes itself.
| Virtualization | Log Parsing Performance | Verdict |
|---|---|---|
| OpenVZ / Virtuozzo | Variable. "Noisy neighbors" can steal your I/O operations. | Risky for high traffic sites. |
| Xen PV (CoolVDS Standard) | Guaranteed. RAM and Disk throughput are isolated. | Recommended. Predictable performance. |
At CoolVDS, we stick to Xen paravirtualization. When you run a heavy perl process to crunch 500MB of logs, you need to know the CPU cycles are yours, not shared with 50 other users. Our high-performance storage arrays are designed to handle these sequential read operations without creating latency for your live web users.
Final Thoughts
Tools like AWStats are essential for detecting hotlinking, understanding bandwidth theft, and optimizing your site for the actual browsers visiting you. But software is only as good as the hardware it runs on.
Don't let log analysis become a denial-of-service attack on your own infrastructure. Is your current hosting sluggish when you try to view your stats? It might be time to upgrade to dedicated resources.
Need a server that can crunch logs without breaking a sweat? Deploy a Xen VPS with CoolVDS today and get root access in under 2 minutes.