The Problem
In my most recent article, Fixing Logrotate Varnish Errors I noted the following:
108.162.223.245 - - [09/Nov/2015:08:31:11 +0000] http://web.johncook.uk:80 "GET /blogs/politics/the-jury-team-2010-election HTTP/1.1" 200 5427 "-" "Mozilla/5.0 (compatible; Baiduspider/2.0; +http://www.baidu.com/search/spider.html)"So, Baidu's crawler requested the page, it requested it from the canonical domain (although the canonical URL is on the https:// protocol and on port 443), and it got a 200 response. A WHOIS on the IP shows that it came via CloudFlare.
John Cook, Fixing Logrotate Varnish Errors
Now, my previous decision was that requests that come in via CloudFlare should show CloudFlare's IP addresses. There are, however, some situations where I would rather the originating IP address be logged.
For example, the other day a Web crawler/spider/bot decided to parse the text of my pages and follow anything that looks like a link. It followed a slash ellipsis (representing, in text, all URLs starting with a certain string) and it also somehow managed to try and visit domain1/domain2/domain3 because somewhere I had wrote my 3 new domains and didn't link them because I was probably referring to a/b/c and didn't want to link them.
— John Cook (@WatfordJC) — John Cook (@WatfordJC)Anyway, it now looks like it would be a good idea to log the actual IP addresses, although it does lose me whether a request is via CloudFlare or not.
The Fix
The fix is rather simple, you just need nginx compiled --with-http_realip_module
(nginx -V
will show if that is the case) and to tell nginx what IP addresses/ranges to trust.
cd /home/thejc/Scripts
mkdir cloudflare-ips
cd cloudflare-ips
curl --compressed --interface eth0 --location --cacert /etc/ssl/certs/ca-certificates.crt -z /home/thejc/Scripts/cloudflare-ips/ips-v4 --create-dirs -o /home/thejc/Scripts/cloudflare-ips/ips-v4 --verbose --silent https://www.cloudflare.com/ips-v4
Last-Modified: Wed, 04 Nov 2015 19:26:58 GMT
touch -t `date -u --date="Wed, 04 Nov 2015 19:26:58 GMT" +%Y%m%d%H%M.%S` /home/thejc/Scripts/cloudflare-ips/ips-v4
curl --compressed --interface eth0 --location --cacert /etc/ssl/certs/ca-certificates.crt -z /home/thejc/Scripts/cloudflare-ips/ips-v6 --create-dirs -o /home/thejc/Scripts/cloudflare-ips/ips-v6 --verbose --silent https://www.cloudflare.com/ips-v6
Last-Modified: Wed, 04 Nov 2015 19:26:58 GMT
touch -t `date -u --date="Wed, 04 Nov 2015 19:26:58 GMT" +%Y%m%d%H%M.%S` /home/thejc/Scripts/cloudflare-ips/ips-v6
I'll get around to automating that at some point, I'm sure.
A quick script to parse the lines in the files and create an include for nginx:
cd /home/thejc/Scripts
nano update-cloudflare-ips.sh
#!/bin/sh
OUTFILE=/home/thejc/Scripts/cloudflare-ips/nginx.cloudflare-real-ips
echo "#CloudFlare" > "$OUTFILE"
while read line; do
echo "set_real_ip_from $line;" >> "$OUTFILE"
done < /home/thejc/Scripts/cloudflare-ips/ips-v4
while read line; do
echo "set_real_ip_from $line;" >> "$OUTFILE"
done < /home/thejc/Scripts/cloudflare-ips/ips-v6
echo "real_ip_header CF-Connecting-IP;" >> "$OUTFILE"
sudo chown www-data:www-data "$OUTFILE"
sudo mv "$OUTFILE" /etc/nginx/includes/
sudo service nginx reload
A quick edit to /etc/nginx/sites-enabled/johncook.uk so the file is included (the last line in this snippet is what I've added):
server {
include /etc/nginx/includes/web.johncook.uk-ips-https;
include /etc/nginx/includes/web.johncook.uk-ips-http;
server_name web.johncook.uk;
include /etc/nginx/includes/web.johncook.uk-ssl;
include /etc/nginx/includes/nginx.cloudflare-real-ips;
And, of course, for the bare domain:
server {
include /etc/nginx/includes/web.johncook.uk-ips-https;
include /etc/nginx/includes/web.johncook.uk-ips-http;
server_name web.johncook.uk;
include /etc/nginx/includes/web.johncook.uk-ssl;
include /etc/nginx/includes/nginx.cloudflare-real-ips;
And finally:
chmod +x update-cloudflare-ips.sh
sudo ./update-cloudflare-ips.sh
Voilà, all the IPs logged by nginx on my only CloudFlare-enabled site are now logged as the real IP addresses. I won't be logging real IPs on varnish or lighttpd because my internal ULA IP addresses are better for diagnosing issues.
180.76.15.153 - - [09/Nov/2015:09:59:14 +0000] http://web.johncook.uk:80 "GET / HTTP/1.1" 200 7334 "-" "Mozilla/5.0 (compatible; Baiduspider/2.0; +http://www.baidu.com/search/spider.html)" 180.76.15.156 - - [09/Nov/2015:10:00:32 +0000] http://web.johncook.uk:80 "GET / HTTP/1.1" 200 7286 "-" "Mozilla/5.0 (compatible; Baiduspider/2.0; +http://www.baidu.com/search/spider.html)"
Now to work on that having to do things manually issue.
Automation
The -D
option for curl dumps the received headers to a file, making --verbose
no longer needed. So, for the IPv4 address file I can change the command to the following:
curl --compressed --interface eth0 --location --cacert /etc/ssl/certs/ca-certificates.crt -z /home/thejc/Scripts/cloudflare-ips/ips-v4 --create-dirs -o /home/thejc/Scripts/cloudflare-ips/ips-v4 --silent https://www.cloudflare.com/ips-v4 -D /home/thejc/Scripts/cloudflare-ips/ips-v4-headers
And then it becomes as simple as:
touch -t $(date -u --date="$(grep -E "^Last-Modified:" /home/thejc/Scripts/cloudflare-ips/ips-v4-headers | sed 's/Last-Modified: //')" +%Y%m%d%H%M.%S) /home/thejc/Scripts/cloudflare-ips/ips-v4
So, putting it all together (and while I was at it, adding in support for Expires headers):
#!/bin/sh
if [ ! $(id -u) -eq 0 ]; then
>&2 echo "Please run with sudo or as root."
>&2 echo "sudo $0"
exit 1
fi
SCRIPT_OUTPUT_DIR=/home/thejc/Scripts/cloudflare-ips
SCRIPT_OUTPUT_FILE="$SCRIPT_OUTPUT_DIR/nginx.cloudflare-real-ips"
NGINX_INCLUDE_DIR=/etc/nginx/includes
CAFILE=/etc/ssl/certs/ca-certificates.crt
IP4_WEB=https://www.cloudflare.com/ips-v4
IP4_FILE="$SCRIPT_OUTPUT_DIR/ips-v4"
IP4_HEADERS="$SCRIPT_OUTPUT_DIR/ips-v4-headers"
IP6_WEB=https://www.cloudflare.com/ips-v6
IP6_FILE="$SCRIPT_OUTPUT_DIR/ips-v6"
IP6_HEADERS="$SCRIPT_OUTPUT_DIR/ips-v6-headers"
HAVE_EXPIRED=0
FILES_UPDATED=0
CURRENTTIME=$(date -u +%s)
EXPIRES_HEADER=$(grep -E "^Expires:" "$IP4_HEADERS" 2>/dev/null || echo 1970-01-01)
EXPIRES=$(date -u --date="$(echo $EXPIRES_HEADER | sed 's/Expires: //')" +%s)
if [ $EXPIRES -lt $CURRENTTIME ]; then
HAVE_EXPIRED=1
FILELASTMOD=$(stat -c %Y "$IP4_FILE" 2>/dev/null || echo 0)
curl --compressed --interface eth0 --location --cacert "$CAFILE" -z "$IP4_FILE" --create-dirs -o "$IP4_FILE" --silent "$IP4_WEB" -D "$IP4_HEADERS"
if [ ! $? -eq 0 ]; then
>&2 echo "curl error, $IP4_WEB"
exit 1
fi
LASTMOD_HEADER=$(grep -E "^Last-Modified:" "$IP4_HEADERS" 2>/dev/null || echo 1970-01-01)
LASTMOD=$(date -u --date="$(echo $LASTMOD_HEADER | sed 's/Expires: //')" +%s)
if [ $LASTMOD -gt $FILELASTMOD ]; then
touch -t $(date -u --date="@$LASTMOD" +%Y%m%d%H%M.%S) "$IP4_FILE"
if [ ! $? -eq 0 ]; then
>&2 echo "touch error"
fi
FILES_UPDATED=1
fi
fi
EXPIRES_HEADER=$(grep -E "^Expires:" "$IP6_HEADERS" 2>/dev/null || echo 1970-01-01)
EXPIRES=$(date -u --date="$(echo $EXPIRES_HEADER | sed 's/Expires: //')" +%s)
if [ $EXPIRES -lt $CURRENTTIME ]; then
HAVE_EXPIRED=1
FILELASTMOD=$(stat -c %Y "$IP6_FILE" 2>/dev/null || echo 0)
curl --compressed --interface eth0 --location --cacert "$CAFILE" -z "$IP6_FILE" --create-dirs -o "$IP6_FILE" --silent "$IP6_WEB" -D "$IP6_HEADERS"
if [ ! $? -eq 0 ]; then
>&2 echo "curl error, $IP6_WEB"
exit 1
fi
LASTMOD_HEADER=$(grep -E "^Last-Modified:" "$IP6_HEADERS" 2>/dev/null || echo 1970-01-01)
LASTMOD=$(date -u --date="$(echo $LASTMOD_HEADER | sed 's/Expires: //')" +%s)
if [ $LASTMOD -gt $FILELASTMOD ]; then
touch -t $(date -u --date="@$LASTMOD" +%Y%m%d%H%M.%S) "$IP6_FILE"
if [ ! $? -eq 0 ]; then
>&2 echo "touch error"
fi
FILES_UPDATED=1
fi
fi
if [ $HAVE_EXPIRED -eq 0 ]; then
echo "Previous expires headers are in the future."
exit 0
fi
if [ $FILES_UPDATED = 1 ]; then
echo "#CloudFlare" > "$SCRIPT_OUTPUT_FILE"
while read line; do
echo "set_real_ip_from $line;" >> "$SCRIPT_OUTPUT_FILE"
done < "$IP4_FILE"
while read line; do
echo "set_real_ip_from $line;" >> "$SCRIPT_OUTPUT_FILE"
done < "$IP6_FILE"
echo "real_ip_header CF-Connecting-IP;" >> "$SCRIPT_OUTPUT_FILE"
chown www-data:www-data "$SCRIPT_OUTPUT_FILE"
mv "$SCRIPT_OUTPUT_FILE" "$NGINX_INCLUDE_DIR/"
service nginx reload
else
echo "304 Files Not Modified"
fi
A rather long script compared to what it was, but there are quite a few checks in it. Let's walk through it.
Code Walk Through
First, a check is made to check the script is being run as user id 0. Rather than checking again UID or EUID, this way gives 0 if being run as root or with sudo.
Next, variables for the output directory for the downloaded files, their headers, and the temporary copy of the nginx include file.
We then have the file for the certificate authority (or in this case, the bundle file of all trusted CAs on the machine).
We then have the location of the file on the Web, where it will be stored locally, and where the headers will be stored. We have the same thing for the IPv6 addresses file.
We then have boolean variables for if the expires header is in the past, and if updated files have been downloaded.
We check the expires header for the ips-v4 file (if the headers file doesn't exist we set the date to 0 seconds after the unix epoch) and if the expiry time is earlier than the current time we proceed.
We then download the ips-v4 file, if it has been modified, and if it has been modified we set the file modification time to time in the Last-Modified header and set the boolean FILES_UPDATED
variable to 1.
We then repeat this process for the ips-v6 file.
If at this point both files are "cached" because their expiry dates are in the future we exit.
If not, we check if the files have been updated, and if they have we do everything we did in the old script to create the nginx include. If the files haven't been updated, we just exit.
Portability of Code
Now this script is easily modifiable for other uses where a file needs to be regularly downloaded if it has been modified.
For example, for the place where I borrowed the code from for the curl command I'm using.
Public Suffix List Updater
I hadn't updated the effective_tld_names.dat file since July. Now I can do it automatically.
#!/bin/sh
if [ ! $(id -u) -eq 0 ]; then
>&2 echo "Please run as root or with sudo."
>&2 echo "sudo $0"
exit 1
fi
SCRIPT_OUTPUT_DIR=/home/thejc/Scripts/output
SCRIPT_OUTPUT_FILE="$SCRIPT_OUTPUT_DIR/effective_tld_names.dat"
CAFILE=/etc/ssl/certs/ca-certificates.crt
WEB_FILE=https://publicsuffix.org/list/effective_tld_names.dat
WEB_HEADERS="$SCRIPT_OUTPUT_FILE"-headers
HAVE_EXPIRED=0
FILES_UPDATED=0
CURRENTTIME=$(date -u +%s)
EXPIRES_HEADER=$(grep -E "^Expires:" "$WEB_HEADERS" 2>/dev/null || echo 1970-01-01)
EXPIRES=$(date -u --date="$(echo $EXPIRES_HEADER | sed 's/Expires: //')" +%s)
if [ $EXPIRES -lt $CURRENTTIME ]; then
HAVE_EXPIRED=1
FILELASTMOD=$(stat -c %Y "$SCRIPT_OUTPUT_FILE" 2>/dev/null || echo 0)
curl --compressed --interface eth0 --location --cacert "$CAFILE" -z "$SCRIPT_OUTPUT_FILE" --create-dirs -o "$SCRIPT_OUTPUT_FILE" --silent "$WEB_FILE" -D "$WEB_HEADERS"
if [ ! $? -eq 0 ]; then
>&2 echo "curl error, $WEB_FILE"
exit 1
fi
LASTMOD_HEADER=$(grep -E "^Last-Modified:" "$WEB_HEADERS" 2>/dev/null || echo 1970-01-01)
LASTMOD=$(date -u --date="$(echo $LASTMOD_HEADER | sed 's/Expires: //')" +%s)
if [ $LASTMOD -gt $FILELASTMOD ]; then
touch -t $(date -u --date="@$LASTMOD" +%Y%m%d%H%M.%S) "$SCRIPT_OUTPUT_FILE"
if [ ! $? -eq 0 ]; then
>&2 echo "touch error"
fi
FILES_UPDATED=1
fi
fi
if [ $HAVE_EXPIRED -eq 0 ]; then
echo "Previous expires headers are in the future."
exit 0
fi
if [ $FILES_UPDATED = 1 ]; then
service opendmarc restart
service postfix reload
else
echo "304 Files Not Modified"
fi
sudo ./update-effective_tlds.sh
Restarting OpenDMARC: opendmarc. * Reloading Postfix configuration... [ OK ]
Now, there is a slight problem: there is no Last-Modified header sent by publicsuffix.org on a 304. That won't be a problem because this script covers that. publicsuffix.org's server doesn't do a calculation to compare the If-Modified-Since header, they check if it is an exact match.
The problem is the following line (replaced with new code in the above scripts):
LASTMOD=$(date -u --date="$(grep -E "^Last-Modified:" "$WEB_HEADERS" | sed 's/Last-Modified: //')" +%s)
In order to fix this, we need to check if a Last-Modified header exists in the file, and if it doesn't, abort updating the file modification time. I have already updated the above source code with the new code.
LASTMOD_HEADER=$(grep -E "^Last-Modified:" "$WEB_HEADERS" 2>/dev/null || echo 1970-01-01)
LASTMOD=$(date -u --date="$(echo $LASTMOD_HEADER | sed 's/Expires: //')" +%s)
This does the same thing as is done with the expires header. If the last modified header doesn't exist, it is assumed to be 0 seconds after the unix epoch.
Since zero will never be greater than the current time in seconds since the unix epoch (excluding a Y2+38 bug) then the file modification time is not changed.
sudo ./update-effective_tlds.sh
Previous expires headers are in the future.
One more thing…
cd ~/Scripts/
sudo chown root:thejc update-*
sudo chmod 750 update-*
Cronjobs
sudo crontab -e
05 2 * * * /home/thejc/Scripts/update-cloudflare-ips.sh
07 2 * * * /home/thejc/Scripts/update-effective_tlds.sh
I run cronjobs at minutes which are prime numbers to minimise lots of cronjobs starting at the same time (e.g. a */2 will only run at the same time as a 2, because there are no jobs scheduled for 4, 6, 8, etc.).
My OCSP stapling updater runs at 02:02 every Mon,Wed,Fri,Sat, my postsuper queue releaser runs at 3 minutes past every even numbered hour (i.e. 00:03, 02:03, 04:03, etc.), and now my Cloudflare IP address updater and my TLD updater now run every 24 hours at 02:05 and 02:07 respectively.
A final modification I've made to the scripts, which is something I haven't done before: if I'm echoing something and then exiting with a return code of 1, I am redirecting those echos to stderr rather than stdout.