New Xen VPS, No Default Gateway

After my VPS provider set-up my new VPS container (Ubuntu 10.04 LTS), I tried connecting to it through SSH using the details supplied, but the connection was refused.

I used XenVZ's (my provider's) recovery console to successfully SSH in, and had a poke around looking for the problem.

First thing I thought, was that SSH might be having a problem, so tried running apt-get update but that just returned a load of errors, all saying the same thing: Temporary failure resolving âgb.archive.ubuntu.comâ.

Google gave me Temporary failure resolving 'archive.ubuntu.com', but that gave me nothing to go on.

Next up, trying to ping Google. Unknown Host. OK, how about pinging my other VPS in the same data centre? Unknown Host. Problem resolving hosts? How about pinging an IP in /etc/resolv.conf? Network is unreachable.

Googling ubuntu server unknown host brought me to [SOLVED] Ping - Unknown Host which suggested looking for a default gateway (UG).

root@vps2:~# route -n
Kernel IP routing table
Destination     Gateway         Genmask         Flags Metric Ref    Use Iface
78.129.150.0    0.0.0.0         255.255.255.0   U     0      0        0 eth0

Well then, that there is the problem. I can't access the outside world because there's no route to the outside world. All I have to do is add a default gateway? OK, where do I find what it is?

root@vps2:~# man route
-bash: man: command not found

Googling ubuntu 10.04 network brought me to Network Configuration, which had me look at /etc/network/interfaces (without the help of nano, I hate vi).

# This configuration file is auto-generated.
# WARNING: Do not edit this file unless you enable the power user option
# within our control panel, otherwise your changes will be lost.


# Auto generated eth0 interfaces
auto eth0 lo
iface eth0 inet static
        address 78.129.150.96
        netmask 255.255.255.0
        up route add -net 78.129.150.4 netmask 255.255.255.0 dev eth0
        up route add default gw 78.129.150.4
iface lo inet loopback

auto eth0:0
iface eth0:0 inet static
        address 78.129.150.97
        netmask 255.255.255.0

route add default gw 78.129.150.4 worked, and I could finally apt-get update and apt-get upgrade.

That wasn't the end of the problem though. I didn't know why the default gateway wasn't being set. What happened if I run the line above it?

root@vps2:~# route add -net 78.129.150.4 netmask 255.255.255.0 dev eth0
route: netmask doesn't match route address

Googling for xen "netmask doesn't match route address", and a visit to UBUNTU - OpenVPN Network Unreachable and following the hard-to-find link to where the question was originally asked brought me to ~~OpenVPN Network Unreachable~~ and the (obvious to those that understood netmasks) answer that 78.129.150.4 was wrong, and should be 78.129.150.0.

OK, problem. Because changes to that file aren't remembered unless I "enable the power user option", I'd better find out if that has any drawbacks.

When booting Xen VPS our systems will normally mount your filesystem in order to automatically write various configuration files such as network settings and kernel modules for worry-free maintenance. Advanced users may enable the Power User flag to disable this behaviour. If you choose to enable Power User, please ensure your kernel modules are kept up-to-date as we will no longer automatically do so which may prevent your VPS from operating as expected.

I searched around, and realised I'd rather not have to worry about keeping kernel modules up-to-date, but I wanted to test if what I thought was a fix would work. Therefore, I enabled the Power User option, changed the configuration file line up route add -net 78.129.150.4 netmask 255.255.255.0 dev eth0 to up route add -net 78.129.150.0 netmask 255.255.255.0 dev eth0 (yes, I'd installed nano by then), saved, and then rebooted my VPS.

Not only did it boot fine, but I tested the change by SSH'ing in the regular way, rather than the recover console way, and I got in. I double-checked the routing table, and saw a minor annoyance.

root@vps2:~# route -n
Kernel IP routing table
Destination     Gateway         Genmask         Flags Metric Ref    Use Iface
78.129.150.0    0.0.0.0         255.255.255.0   U     0      0        0 eth0
78.129.150.0    0.0.0.0         255.255.255.0   U     0      0        0 eth0
0.0.0.0         78.129.150.4    0.0.0.0         UG    0      0        0 eth0

A double entry for the same route, possibly because the default gateway line added the last two lines. A look at my OpenVZ routing table caused me to question my fix.

root@vps:~# route -n
Kernel IP routing table
Destination     Gateway         Genmask         Flags Metric Ref    Use Iface
0.0.0.0         0.0.0.0         0.0.0.0         U     0      0        0 venet0

Should all traffic be routed through 78.129.150.4, or just the traffic for the network 78.129.150.0? Should everything else just be routed through the device eth0? Should I just raise a support ticket and get them to fix the issue?

I decided to comment out my fix in /etc/network/interfaces, leaving just the default gateway line, and rebooted. Goodbye double entry, hello all traffic being routed through the default gateway.

Next, I wanted to switch off the Power User option. After Googling for various things and coming up empty (everything I found mentioned editing /etc/network/interfaces or editing network manager settings), I searched for a way to run a cron job on boot. Googling crontab reboot brought me to Unix Crontab – mkaz tumbles along and the rather obvious @reboot special entry.

I knew the default gateway command would work, albeit with me having that duplicate line back, and I'd possibly have an issue if the job was run before eth0 was up (still don't know when @reboot jobs are run), but I decided to go ahead and find an alternative solution later (I appended dev eth0 just in case).

# m h  dom mon dow   command
@reboot /sbin/route add default gw 78.129.150.4 dev eth0

After rebooting, XenVZ's default interfaces configuration file was back, and cron job successfully added the default gateway route. I decided to just leave it alone as it was, and move on to setting up the DNS servers.

I experienced several issues setting up my DNS servers. CurveDNS has issues with AXFR, so I had to use iptables to redirect packets destined for port 53 from the IP address of my backup DNS server (ns1.twisted4life.com) to axfrdns, and sending packets destined for port 53 from everyone else to CurveDNS.

Not only did I learn a fair bit while I got it to work, but a pesky mistake in a script I wrote to get iptables-save and iptables-restore to work on rebooting caused me to learn about upstart. To quote a tweet of mine:

— John Cook (@WatfordJC) June 4, 2011

Having grasped a very basic understand of upstart, I decided to come back to the default gateway issue. What would be the best way of ditching an OK cron job, to replace it with something that will meet the following goals?

I do not have to enable Super User.
I do not have to use cron.
If my VPS provider fixes the configuration file or makes changes to it, I stop using a workaround.
If my VPS provider doesn't want all my traffic routed through the default gateway, there is an easy way for them to change it without changing/disabling my VPS.
Bonus: I garner enough knowledge from my solution to override any of the configuration files that my VPS provider overwrites on boot.

Rather than showing all my working out, I will just provide my final solution that covers everything and met all my goals.

Create the directory that will hold the files the upstart script will need:
```
root@vps2:~# mkdir -p /root/upstart/network-fix-gateway
```
Create an md5sum file containing the md5sum of the VPS provider's /etc/network/interfaces configuration file:
```
root@vps2:~# md5sum /etc/network/interfaces > /root/upstart/network-fix-gateway/interfaces.md5sum
```

Create a new file, that will replace /etc/network/interfaces:

root@vps2:~# nano /root/upstart/network-fix-gateway/interfaces

Edit the contents of the file to my liking and save:

# This configuration file is auto-generated.
# WARNING: Do not edit this file unless you enable the power user option
# within our control panel, otherwise your changes will be lost.


# Auto generated eth0 interfaces
auto eth0 lo
iface eth0 inet static
        address 78.129.150.96
        netmask 255.255.255.0
#        up route add -net 78.129.150.0 netmask 255.255.255.0 dev eth0
        up route add default gw 78.129.150.4
iface lo inet loopback

auto eth0:0
iface eth0:0 inet static
        address 78.129.150.97
        netmask 255.255.255.0

Create a new file for my upstart script.

root@vps:~# nano /root/upstart/network-fix-gateway/network-fix-gateway.conf

Edit the contents of the file to my liking and save (This script does not function properly! See Page 3 for my final script):

# networking-add-gateway - configure eth0 default gateway
#
# Default Gatway does not get set on boot using xenvz.co.uk
# default /etc/network/interfaces script.

description     "adds default gateway"

start on (filesystem and net-device-up IFACE=eth0)

script
if [ -f /root/upstart/network-fix-gateway/interfaces.md5sum ]; then
        /usr/bin/md5sum -c --quiet /root/upstart/network-fix-gateway/interfaces.md5sum
        if [ $? -eq 0 ]; then
                /bin/cp /root/upstart/network-fix-gateway/interfaces /etc/network/interfaces
                /sbin/ifdown --force eth0
                /sbin/ip route flush dev eth0
                /sbin/ifup eth0
        fi
fi
end script

Copy my new upstart configuration file to the directory it monitors:

root@vps2:~# cp /root/upstart/network-fix-gateway/network-fix-gateway.conf /etc/init

My upstart script pretty much does the following:

Start when filesystems have been mounted, and eth0 has been brought up.
Check if the md5sum file exists.
Read and check the md5sums in the file. The --quiet option is used because I don't care about the output, just the exit code.
If the exit code is 0, then reconfigure eth0. If the md5sum is different (e.g. after this script has run or my VPS provider has changed their file), then do nothing. Not sure whether the script ends after /sbin/ifup eth0 has returned, nor whether bringing eth0 back up will cause the script to run again. Could modification cause an infinite loop?
Copy my network interfaces file to /etc/network/interfaces
Take down eth0 forcefully, because it's misconfigured and won't go down otherwise.
Flush any routes associated with eth0. Possibly redundant now that ifdown works.
Bring eth0 up using my configuration file.

One obvious problem with this, is that eth0 is brought up on boot, taken down again, and brought up a final time. All other scripts that fire when eth0 is brought up are started twice on each boot - a waste of resources and time.

I've just found Upstart Intro, Cookbook and Best Practices, so I'll have a flip through the recipes and see if I can find a better way of doing things - overwriting /etc/network/interfaces before eth0 and lo are brought up (and before the configuration file is read) would be ideal.

Having done some testing, I've found the script didn't do exactly what I thought it did. I found that upstart hates any commands that don't return with an exit code of 0, and refuses to do anything after that. The obvious line? The md5sum check would always return 1 if it failed, rendering the subsequent exit code check pointless.

I now came up with a new list of what I wanted to achieve with the script:

It starts before /etc/network/interfaces has been read, but after the filesystems have been mounted.
It starts before any attempt to add a network interface.
It simply overwrites the /etc/network/interfaces file, and doesn't do anything with eth0 itself.
It is capable of saying "I'm stopping because of X", rather than just stopping.
It works.

So, what did I come up with?

# networking-add-gateway - configure eth0 default gateway
#
# Default Gatway does not get set on boot using xenvz.co.uk
# default /etc/network/interfaces script.

description     "adds default gateway"

start on (filesystem and starting networking)

pre-start script
        [ -f /root/upstart/network-fix-gateway/interfaces.md5sum ]
end script

script
        ((/usr/bin/md5sum -c --status /root/upstart/network-fix-gateway/interfaces.md5sum) ||
                (logger -t "network-fix-gateway" "md5sum failed on /etc/network/interfaces" &&
                /bin/cp /etc/network/interfaces /root/upstart/network-fix-gateway/interfaces.original &&
                logger -t "network-fix-gateway" "Copied changed /etc/network/interfaces file to /root/upstart/network-fix-gateway/interfaces.original" &&
                stop))
        ((/bin/cp /root/upstart/network-fix-gateway/interfaces /etc/network/interfaces &&
                logger -t "network-fix-gateway" "Custom /etc/network/interfaces file installed.") ||
                (logger -t "network-fix-gateway" "Unable to install custom /etc/network/interfaces file. Please check write permissions." &&
                stop))
end script

After making many changes to the above code, I finally got it right - stupid brackets. After the next reboot, I yet again saw in my syslog:

Jun  6 13:45:08 vps2 network-fix-gateway: md5sum failed on /etc/network/interfaces
Jun  6 13:45:08 vps2 network-fix-gateway: Copied changed /etc/network/interfaces file to /root/upstart/network-fix-gateway/interfaces.original

Yet again I looked in /etc/network/interfaces to see if the file had been changed. For some reason I had been getting the message about interfaces.original and it containing my custom script, but I thought I'd fixed that. I discovered the script must have worked this time because my VPS provider's copy of /etc/network/interfaces was in there .

Yeah, the script worked and I still got the error the file had changed - XenVZ fixed their version. A quick check showed:

root@vps2:~# route -n
Kernel IP routing table
Destination     Gateway         Genmask         Flags Metric Ref    Use Iface
78.129.150.4    0.0.0.0         255.255.255.255 UH    0      0        0 eth0
0.0.0.0         78.129.150.4    0.0.0.0         UG    0      0        0 eth0

A quick md5sum /etc/network/interfaces > interfaces.md5sum, and a initctl start network-fix-gateway, and voila:

Jun  6 13:52:15 vps2 network-fix-gateway: Custom /etc/network/interfaces file installed.

So, I got my final solution when it was too late. Not that it is a major disaster them fixing it, I achieved the bonus goal points.