So my ultimate goal here is to try out VXLAN between some VMs on instances in Google compute engine, but today I’m just going to get VXLAN working because that took a fair bit longer than I expected. First off, boot your instances — because I will need nested virt later I chose two instances on Google Cloud. Please note that you need to do a bit of a dance to turn on nested virt there. I also chose to use Debian for this experiment:
gcloud compute instances create vx-1 --zone us-central1-b --min-cpu-platform "Intel Haswell" --image nested-vm-image
Now do those standard things you do to all new instances:
sudo apt-get update
sudo apt-get dist-upgrade -y
Now let’s setup VXLAN between the two nodes, with a big nod to this web page. First create a VXLAN interface on each machine (if you care about the port your VXLAN traffic is on being to IANA standards, see the postscript at the end of this):
sudo ip link add vxlan0 type vxlan id 42 dev eth0 dstport 0
Now we need to put the two nodes into a mesh, where 34.70.161.180 is the IP of the nodeĀ we are not running this command on andĀ the IP address for the second command needs to be different on each machine.
sudo bridge fdb append to 00:00:00:00:00:00 dst 34.70.161.180 dev vxlan0
sudo ip addr add 192.168.200.1/24 dev vxlan0
sudo ip link set up dev vxlan0
I am pretty sure that this style of mesh (all nodes connected) wouldn’t scale past non-trivial sizes, but hey baby steps right? Finally, because we’re using Google Cloud we need to add firewall rules to allow our traffic into the instances:
Note that these rules are a source of confusion for me right now. I wanted (and configured) VXLAN. So why do I need to allow OTV for this to work? I suspect Linux has politely ignored my request and used OTV not VXLAN for my traffic.
We should now be able to ping those newly configured IP addresses from each machine:
ping 192.168.200.2 -c 1
PING 192.168.200.2 (192.168.200.2) 56(84) bytes of data.
64 bytes from 192.168.200.2: icmp_seq=1 ttl=64 time=1.76 ms
Which produces traffic like this on the underlay network:
tcpdump -n -i eth0 host 34.70.161.180
tcpdump: verbose output suppressed, use -v or -vv for full protocol decode
listening on eth0, link-type EN10MB (Ethernet), capture size 262144 bytes
09:01:58.159092 IP 10.128.0.9.59341 > 34.70.161.180.8472: OTV, flags [I] (0x08), overlay 0, instance 42
IP 192.168.200.1 > 192.168.200.2: ICMP echo request, id 20119, seq 1, length 64
09:01:58.160786 IP 34.70.161.180.48471 > 10.128.0.9.8472: OTV, flags [I] (0x08), overlay 0, instance 42
IP 192.168.200.2 > 192.168.200.1: ICMP echo reply, id 20119, seq 1, length 64
^C
2 packets captured
2 packets received by filter
0 packets dropped by kernel
Hopefully this is helpful to someone else. Thanks again to Joe Julian for a very helpful post.
Postscript: Dale Shaw pointed out on twitter that I might still be talking VXLAN, just on a weird port. This is supported by this comment I found on the internets: “when VXLAN was first implemented in linux, UDP ports were not specified. Many vendors use 8472, and Linux uses the same port. Later, IANA allocated 4789 as the port. If you need to use the IANA port, you need to specify it with dstport”.