Fixing Logrotate Varnish Errors

Every day logrotate complains about varnishlog and varnishnsda. Let's enable varnish logging to quieten it.

The Error

Every day I get an e-mail from logrotate complaining with the following message:

/etc/cron.daily/logrotate:
Job for varnishlog.service failed. See 'systemctl status varnishlog.service' and 'journalctl -xn' for
+details.
invoke-rc.d: initscript varnishlog, action "reload" failed.
error: error running non-shared postrotate script for /var/log/varnish/varnish.log of
+'/var/log/varnish/varnish.log '
Job for varnishncsa.service failed. See 'systemctl status varnishncsa.service' and 'journalctl -xn' for
+details.
invoke-rc.d: initscript varnishncsa, action "reload" failed.
error: error running non-shared postrotate script for /var/log/varnish/varnishncsa.log of
+'/var/log/varnish/varnishncsa.log '
run-parts: /etc/cron.daily/logrotate exited with return code 1

Basically logrotate is trying to rotate Varnish's logs, and then it is trying to tell the logging programs to reload and stop logging using the file they have open.

The problem? I don't have varnish logging enabled. Rather than telling logrotate to check if the varnish loggers are running, I'm just gonig to enable the varnish loggers. Thus, if you don't have much disk space, this might not be the solution for you.

The Fix

Debian Jessie uses systemd so /etc/default/varnishlog and /etc/default/varnishncsa are irrelevant (except to learn what files need editing).

For varnishncsa, we need to copy /lib/systemd/system/varnishncsa.service to /etc/systemd/system/

sudo su
cp /lib/systemd/system/varnishncsa.service /etc/systemd/system/

I want to add the hostname to the logs, so I need to modify varnishncsa.service:

nano /etc/systemd/system/varnishncsa.service

[Unit]
Description=Varnish HTTP accelerator log daemon
After=varnish.service

[Service]
Environment="LOG_FORMAT='%%h %%l %%u %%t %%{host}i \"%%r\" %%s %%b \"%%{Referer}i\" \"%%{User-agent}i\"'"
User=varnishlog
ExecStart=/usr/bin/varnishncsa -a -w /var/log/varnish/varnishncsa.log -F $LOG_FORMAT
ExecReload=/bin/kill -HUP $MAINPID

[Install]
WantedBy=multi-user.target

After saving, I just need to enable it.

systemctl daemon-reload
systemctl start varnishncsa
systemctl enable varnishncsa

As for varnishlog, I don't need to modify anything, just enable it:

cp /lib/systemd/system/varnishlog.service /etc/systemd/system/
systemctl start varnishlog
systemctl enable varnishlog
exit

When using systemctl enable it will probably complain that the file exists. That is fine.

The only way to find out if logrotate no longer complains is to wait until tomorrow.

The Logs

We now have two varnish logs, and two log formats. varnishlog is logging in varnish format (massive log files - it is the same format as watching varnishlog) whereas varnishncsa logs in NCSA Standard Log Format (although I have modified it above so the requested host is included just before the request, so it is not NCSA format.

logrotate should now stop complaining about varnishlog.service and varnishncsa.service, and I should stop getting mail every day with logrotate complaints.

Unlike my nginx logs on my VPS, I am not keeping my varnish logs beyond the 7 days.

The reason for that would be obvious once you saw the logs.

varnishlog

varnishlog format:

*   << Request  >> 32804     
-   Begin	  req 32803 rxreq
-   Timestamp      Start: 1447057723.086865 0.000000 0.000000
-   Timestamp      Req: 1447057723.086865 0.000000 0.000000
-   ReqStart       fdd7:5938:e2e6:3::6c8d 50762
-   ReqMethod      HEAD
-   ReqURL	 /inc/esi/site_name
-   ReqProtocol    HTTP/1.1
-   ReqHeader      Host: web.johncook.uk
-   ReqHeader      Connection: close
-   ReqHeader      X-Forwarded-For: fdd7:5938:e2e6:3::6c8d
-   VCL_call       RECV
-   ReqUnset       Host: web.johncook.uk
-   ReqHeader      host: web.johncook.uk
-   ReqHeader      Surrogate-Capability: varnish=ESI/1.0
-   VCL_return     hash
-   VCL_call       HASH
-   VCL_return     lookup
-   Hit	    2147483651
-   VCL_call       HIT
-   VCL_return     deliver
-   RespProtocol   HTTP/1.1
-   RespStatus     200
-   RespReason     OK
-   RespHeader     X-Content-Type-Options: nosniff
-   RespHeader     X-Powered-By: PHP/5.6.14-0+deb8u1
-   RespHeader     Vary: Accept-Encoding, Cookie
-   RespHeader     Last-Modified: Mon, 02 Nov 2015 06:58:07 GMT
-   RespHeader     Content-type: text/plain;charset=UTF-8
-   RespHeader     Date: Mon, 09 Nov 2015 08:19:55 GMT
-   RespHeader     Server: lighttpd/1.4.35
-   RespHeader     Cache-Control: Public, max-age=7200, must-revalidate, s-maxage=86400, proxy-revalidate, no-transform
-   RespHeader     Content-Encoding: gzip
-   RespHeader     X-Varnish: 32804 3
-   RespHeader     Age: 527
-   RespHeader     Via: 1.1 varnish-v4
-   VCL_call       DELIVER
-   VCL_return     deliver
-   Timestamp      Process: 1447057723.087098 0.000232 0.000232
-   RespHeader     Content-Length: 43
-   RespUnset      Content-Encoding: gzip
-   RespUnset      Content-Length: 43
-   Debug	  "RES_MODE 40"
-   RespHeader     Connection: close
-   Timestamp      Resp: 1447057723.087216 0.000351 0.000118
-   Debug	  "XXX REF 2"
-   ReqAcct	78 0 78 438 0 438
-   End	    

*   << Session  >> 32803     
-   Begin	  sess 0 HTTP/1
-   SessOpen       fdd7:5938:e2e6:3::6c8d 50762 [fdd7:5938:e2e6:1::80:c]:6081 fdd7:5938:e2e6:1::80:c 6081 1447057723.086689 11
-   Link	   req 32804 rxreq
-   SessClose      RESP_CLOSE 0.001
-   End

varnishncsa

varnishncsa format:

fdd7:5938:e2e6:3::6c8d - - [09/Nov/2015:08:28:43 +0000] web.johncook.uk "HEAD http://web.johncook.uk/inc/esi/site_name HTTP/1.1" 200 0 "-" "-"

Varnish Logs

Those log entries are for the "are you alive?" monitoring. A HEAD request is the shortest request/response possible on the Web, and sending If-Modified-Since (to use 304s) would actually increase bandwidth usage for these requests because the request header would be larger. In fact, since "HTTP/1.1 304 Not Modified" is longer than "HTTP/1.1 200 OK" the response would be larger in size as well.

This is an HTTP response like the one the varnish instance on my VPS would have received for that HEAD request:

HTTP/1.1 200 OK
X-Content-Type-Options: nosniff
X-Powered-By: PHP/5.6.14-0+deb8u1
Vary: Accept-Encoding, Cookie
Last-Modified: Mon, 02 Nov 2015 06:58:07 GMT
Content-type: text/plain;charset=UTF-8
Date: Mon, 09 Nov 2015 08:50:18 GMT
Server: lighttpd/1.4.35
Cache-Control: Public, max-age=7200, must-revalidate, s-maxage=86400, proxy-revalidate, no-transform
Content-Encoding: gzip
X-Varnish: 138
Age: 0
Via: 1.1 varnish-v4
Content-Length: 43
Connection: close

Now, here are typical log entries for a single request from a search engine:

fdd7:5938:e2e6:3::6c8d - - [09/Nov/2015:08:31:11 +0000] web.johncook.uk "GET http://web.johncook.uk/blogs/politics/the-jury-team-2010-election HTTP/1.1" 200 4401 "-" "Crawler"
fdd7:5938:e2e6:3::6c8d - - [09/Nov/2015:08:31:11 +0000] web.johncook.uk "GET http://web.johncook.uk/inc/copyright HTTP/1.1" 304 0 "-" "-"

The crawler requested a page from CloudFlare that wasn't cached, so CloudFlare asked nginx on my VPS. My VPS's nginx asked my VPS's varnish cache and it wasn't cached so it asked the varnish cache on my home server.

My varnish cache on my home server asked lighttpd for the file (not seen in this log file) and sent it back to the varnish cache on my vps. The varnish instance on my VPS processed the ESI includes and saw that the copyright include (/inc/copyright) had expired in the cache.

The reason I use Varnish 4 is because it supports If-Modified-Since and Last-Modified headers between it and its upstream servers (304 support in other words). Because /inc/copyright hadn't been changed, it got a 304 response back and knew the old copy it had is still good.

The reason I don't want to keep these logs around is because they are for varnish. While they could be useful for longer-term cache hits/misses statistics, they aren't much use for anything other than debugging caching problems. If a problem exists for longer than 7 days then it has probably also been logged within the last 7 days.

nginx

Anyway, this is what the nginx log on my VPS for that exact same request looked like:

108.162.223.245 - - [09/Nov/2015:08:31:11 +0000] http://web.johncook.uk:80 "GET /blogs/politics/the-jury-team-2010-election HTTP/1.1" 200 5427 "-" "Mozilla/5.0 (compatible; Baiduspider/2.0; +http://www.baidu.com/search/spider.html)"

So, Baidu's crawler requested the page, it requested it from the canonical domain (although the canonical URL is on the https:// protocol and on port 443), and it got a 200 response. A WHOIS on the IP shows that it came via CloudFlare.

Real IP Addresses

Now, real IP addresses is something I've been meaning to change for a while. CloudFlare have two lists listing their IPv4 and IPv6 IP address ranges, and I have a script somewhere that can make GET requests using If-Modified-Since.

Actually, I never wrote that script. The commands are still listed at Ubuntu Server VPS Upgrade and because it needs doing manually my TLD list hasn't been updated since July 15th. Better do that now.

I have now created two scripts. The first one, for CloudFlare's IP addresses and nginx, is in my article CloudFlare and NGINX - Real IP Addresses.