The Error
Every day I get an e-mail from logrotate complaining with the following message:
/etc/cron.daily/logrotate: Job for varnishlog.service failed. See 'systemctl status varnishlog.service' and 'journalctl -xn' for +details. invoke-rc.d: initscript varnishlog, action "reload" failed. error: error running non-shared postrotate script for /var/log/varnish/varnish.log of +'/var/log/varnish/varnish.log ' Job for varnishncsa.service failed. See 'systemctl status varnishncsa.service' and 'journalctl -xn' for +details. invoke-rc.d: initscript varnishncsa, action "reload" failed. error: error running non-shared postrotate script for /var/log/varnish/varnishncsa.log of +'/var/log/varnish/varnishncsa.log ' run-parts: /etc/cron.daily/logrotate exited with return code 1
Basically logrotate is trying to rotate Varnish's logs, and then it is trying to tell the logging programs to reload and stop logging using the file they have open.
The problem? I don't have varnish logging enabled. Rather than telling logrotate to check if the varnish loggers are running, I'm just gonig to enable the varnish loggers. Thus, if you don't have much disk space, this might not be the solution for you.
The Fix
Debian Jessie uses systemd so /etc/default/varnishlog and /etc/default/varnishncsa are irrelevant (except to learn what files need editing).
For varnishncsa, we need to copy /lib/systemd/system/varnishncsa.service to /etc/systemd/system/
sudo su
cp /lib/systemd/system/varnishncsa.service /etc/systemd/system/
I want to add the hostname to the logs, so I need to modify varnishncsa.service:
nano /etc/systemd/system/varnishncsa.service
[Unit]
Description=Varnish HTTP accelerator log daemon
After=varnish.service
[Service]
Environment="LOG_FORMAT='%%h %%l %%u %%t %%{host}i \"%%r\" %%s %%b \"%%{Referer}i\" \"%%{User-agent}i\"'"
User=varnishlog
ExecStart=/usr/bin/varnishncsa -a -w /var/log/varnish/varnishncsa.log -F $LOG_FORMAT
ExecReload=/bin/kill -HUP $MAINPID
[Install]
WantedBy=multi-user.target
After saving, I just need to enable it.
systemctl daemon-reload
systemctl start varnishncsa
systemctl enable varnishncsa
As for varnishlog, I don't need to modify anything, just enable it:
cp /lib/systemd/system/varnishlog.service /etc/systemd/system/
systemctl start varnishlog
systemctl enable varnishlog
exit
When using systemctl enable
it will probably complain that the file exists. That is fine.
The only way to find out if logrotate no longer complains is to wait until tomorrow.
The Logs
We now have two varnish logs, and two log formats. varnishlog is logging in varnish format (massive log files - it is the same format as watching varnishlog) whereas varnishncsa logs in NCSA Standard Log Format (although I have modified it above so the requested host is included just before the request, so it is not NCSA format.
logrotate should now stop complaining about varnishlog.service and varnishncsa.service, and I should stop getting mail every day with logrotate complaints.
Unlike my nginx logs on my VPS, I am not keeping my varnish logs beyond the 7 days.
The reason for that would be obvious once you saw the logs.
varnishlog
varnishlog format:
* << Request >> 32804 - Begin req 32803 rxreq - Timestamp Start: 1447057723.086865 0.000000 0.000000 - Timestamp Req: 1447057723.086865 0.000000 0.000000 - ReqStart fdd7:5938:e2e6:3::6c8d 50762 - ReqMethod HEAD - ReqURL /inc/esi/site_name - ReqProtocol HTTP/1.1 - ReqHeader Host: web.johncook.uk - ReqHeader Connection: close - ReqHeader X-Forwarded-For: fdd7:5938:e2e6:3::6c8d - VCL_call RECV - ReqUnset Host: web.johncook.uk - ReqHeader host: web.johncook.uk - ReqHeader Surrogate-Capability: varnish=ESI/1.0 - VCL_return hash - VCL_call HASH - VCL_return lookup - Hit 2147483651 - VCL_call HIT - VCL_return deliver - RespProtocol HTTP/1.1 - RespStatus 200 - RespReason OK - RespHeader X-Content-Type-Options: nosniff - RespHeader X-Powered-By: PHP/5.6.14-0+deb8u1 - RespHeader Vary: Accept-Encoding, Cookie - RespHeader Last-Modified: Mon, 02 Nov 2015 06:58:07 GMT - RespHeader Content-type: text/plain;charset=UTF-8 - RespHeader Date: Mon, 09 Nov 2015 08:19:55 GMT - RespHeader Server: lighttpd/1.4.35 - RespHeader Cache-Control: Public, max-age=7200, must-revalidate, s-maxage=86400, proxy-revalidate, no-transform - RespHeader Content-Encoding: gzip - RespHeader X-Varnish: 32804 3 - RespHeader Age: 527 - RespHeader Via: 1.1 varnish-v4 - VCL_call DELIVER - VCL_return deliver - Timestamp Process: 1447057723.087098 0.000232 0.000232 - RespHeader Content-Length: 43 - RespUnset Content-Encoding: gzip - RespUnset Content-Length: 43 - Debug "RES_MODE 40" - RespHeader Connection: close - Timestamp Resp: 1447057723.087216 0.000351 0.000118 - Debug "XXX REF 2" - ReqAcct 78 0 78 438 0 438 - End * << Session >> 32803 - Begin sess 0 HTTP/1 - SessOpen fdd7:5938:e2e6:3::6c8d 50762 [fdd7:5938:e2e6:1::80:c]:6081 fdd7:5938:e2e6:1::80:c 6081 1447057723.086689 11 - Link req 32804 rxreq - SessClose RESP_CLOSE 0.001 - End
varnishncsa
varnishncsa format:
fdd7:5938:e2e6:3::6c8d - - [09/Nov/2015:08:28:43 +0000] web.johncook.uk "HEAD http://web.johncook.uk/inc/esi/site_name HTTP/1.1" 200 0 "-" "-"
Varnish Logs
Those log entries are for the "are you alive?" monitoring. A HEAD request is the shortest request/response possible on the Web, and sending If-Modified-Since (to use 304s) would actually increase bandwidth usage for these requests because the request header would be larger. In fact, since "HTTP/1.1 304 Not Modified" is longer than "HTTP/1.1 200 OK" the response would be larger in size as well.
This is an HTTP response like the one the varnish instance on my VPS would have received for that HEAD request:
HTTP/1.1 200 OK X-Content-Type-Options: nosniff X-Powered-By: PHP/5.6.14-0+deb8u1 Vary: Accept-Encoding, Cookie Last-Modified: Mon, 02 Nov 2015 06:58:07 GMT Content-type: text/plain;charset=UTF-8 Date: Mon, 09 Nov 2015 08:50:18 GMT Server: lighttpd/1.4.35 Cache-Control: Public, max-age=7200, must-revalidate, s-maxage=86400, proxy-revalidate, no-transform Content-Encoding: gzip X-Varnish: 138 Age: 0 Via: 1.1 varnish-v4 Content-Length: 43 Connection: close
Now, here are typical log entries for a single request from a search engine:
fdd7:5938:e2e6:3::6c8d - - [09/Nov/2015:08:31:11 +0000] web.johncook.uk "GET http://web.johncook.uk/blogs/politics/the-jury-team-2010-election HTTP/1.1" 200 4401 "-" "Crawler" fdd7:5938:e2e6:3::6c8d - - [09/Nov/2015:08:31:11 +0000] web.johncook.uk "GET http://web.johncook.uk/inc/copyright HTTP/1.1" 304 0 "-" "-"
The crawler requested a page from CloudFlare that wasn't cached, so CloudFlare asked nginx on my VPS. My VPS's nginx asked my VPS's varnish cache and it wasn't cached so it asked the varnish cache on my home server.
My varnish cache on my home server asked lighttpd for the file (not seen in this log file) and sent it back to the varnish cache on my vps. The varnish instance on my VPS processed the ESI includes and saw that the copyright include (/inc/copyright) had expired in the cache.
The reason I use Varnish 4 is because it supports If-Modified-Since and Last-Modified headers between it and its upstream servers (304 support in other words). Because /inc/copyright hadn't been changed, it got a 304 response back and knew the old copy it had is still good.
The reason I don't want to keep these logs around is because they are for varnish. While they could be useful for longer-term cache hits/misses statistics, they aren't much use for anything other than debugging caching problems. If a problem exists for longer than 7 days then it has probably also been logged within the last 7 days.
nginx
Anyway, this is what the nginx log on my VPS for that exact same request looked like:
108.162.223.245 - - [09/Nov/2015:08:31:11 +0000] http://web.johncook.uk:80 "GET /blogs/politics/the-jury-team-2010-election HTTP/1.1" 200 5427 "-" "Mozilla/5.0 (compatible; Baiduspider/2.0; +http://www.baidu.com/search/spider.html)"
So, Baidu's crawler requested the page, it requested it from the canonical domain (although the canonical URL is on the https:// protocol and on port 443), and it got a 200 response. A WHOIS on the IP shows that it came via CloudFlare.
Real IP Addresses
Now, real IP addresses is something I've been meaning to change for a while. CloudFlare have two lists listing their IPv4 and IPv6 IP address ranges, and I have a script somewhere that can make GET requests using If-Modified-Since.
Actually, I never wrote that script. The commands are still listed at Ubuntu Server VPS Upgrade and because it needs doing manually my TLD list hasn't been updated since July 15th. Better do that now.
I have now created two scripts. The first one, for CloudFlare's IP addresses and nginx, is in my article CloudFlare and NGINX - Real IP Addresses.