cloud – Page 2 – Made by Mikal

Setting up VXLAN on Google Compute Engine

Post author:mikal
Post published:January 5, 2020
Post category:Google Google Compute Engine

So my ultimate goal here is to try out VXLAN between some VMs on instances in Google compute engine, but today I'm just going to get VXLAN working because that took a fair bit longer than I expected. First off, boot your instances -- because I will need nested virt later I chose two instances on Google Cloud. Please note that you need to do a bit of a dance to turn on nested virt there. I also chose to use Debian for this experiment: gcloud compute instances create vx-1 --zone us-central1-b --min-cpu-platform "Intel Haswell" --image nested-vm-image Now do those standard things you do to all new instances: sudo apt-get update sudo apt-get dist-upgrade -y Now let's setup VXLAN between the two nodes, with a big nod to this web page. First create a VXLAN interface on each machine (if you care about the port your VXLAN traffic is on being to IANA standards, see the postscript at the end of this): sudo ip link add vxlan0 type vxlan id 42 dev eth0 dstport 0 Now we need to put the two nodes into a mesh, where 34.70.161.180 is the IP of the node we are not running this command on…

Further thoughts on Azure instance start times

Post author:mikal
Post published:December 23, 2019
Post category:Azure

My post from the other day about slow instance starts on Azure caused some commentary (mainly on reddit) that prompted me to think more about all this. In the end, there were a few more experiments I wanted to run to see if I could squeeze more performance out of Azure. First off, looking at the logs from my initial testing it looks like resource groups are slow. The original terraform creates a resource group as part of the test and then cleans it up at the end. What if instead we had a single permanent resource group and created instances within that? Here is a series of instance starts and deletes using the terraform from the last post: You'll notice that there's no delete value for the last instance. That's because terraform crashed and never deleted the instance. You can also see that instance starts are somewhat consistent, except for being slower in the second half of the test than the first, and occasionally spiking out to very very slow. Oh, and deletes are almost always really slow. What happens if we use a permanent resource group and network? This means that all the "instance start terraform" is doing…

Why is Azure so slow to start instances?

Post author:mikal
Post published:December 21, 2019
Post category:Azure

I've been playing with terraform recently, and decided to see how different the terraform for launching a simple Ubuntu instance in various clouds is. There are two big questions there for me -- how big is the variation between OpenStack derived clouds; and how painful is it to move between the proprietary clouds? Part of this is because terraform doesn't present a standardised layer of cloud functionality, it has a provider per cloud. (Although, I suspect there's nothing stopping someone from writing a libcloud provider or something like that. It is an interesting idea which requires some additional thought.) My terraform implementations for each cloud are on github if you're interested. I don't want to spend a lot of analysis on the actual terraform, because I think the really interesting thing I found isn't where I expected it to be (there's a hint in the title for this post). That said, the OpenStack clouds vary mostly by capabilities. vexxhost for example seems to only offer flavors that require boot-from-volume. The proprietary clouds are complete re-writes, but are generally relatively simple and well documented. However, that interesting accidental thing -- as best as I can tell, Microsoft Azure is really really slow to…

Further adventures with base images in OpenStack

Post author:mikal
Post published:January 4, 2012
Post category:OpenStack

I was bored over the New Years weekend, so I figured I'd have a go at implementing image cache management as discussed previously. I actually have an implementation of about 75% of that blueprint now, but its not ready for prime time yet. The point of this post is more to document some stuff I learnt about VM startup along the way so I don't forget it later. So, you want to start a VM on a compute node. Once the scheduler has selected a node to run the VM on, the next step is the compute instance on that machine starting the VM up. First the specified disk image is fetched from your image service (in my case glance), and placed in a temporary location on disk. If the image is already a raw image, it is then renamed to the correct name in the instances/_base directory. If it isn't a raw image then it is converted to raw format, and that converted file is put in the right place. Optionally, the image can be extended to a specified size as part of this process. Then, depending on if you have copy on write (COW) images turned on or…

Openstack compute node cleanup

Post author:mikal
Post published:December 20, 2011
Post category:OpenStack

I've never used openstack before, which I imagine is similar to many other people out there. Its actually pretty cool, although I encountered a problem the other day that I think is worthy of some more documentation. Openstack runs virtual machines for users, in much the same manner as Amazon's EC2 system. These instances are started with a base image, and then copy on write is used to write differences for the instance as it changes stuff. This makes sense in a world where a given machine might be running more than one copy of the instance. However, I encountered a compute node which was running low on disk. This is because there is currently nothing which cleans up these base images, so even if none of the instances on a machine require that image, and even if the machine is experiencing disk stress, the images still hang around. There are a few blog posts out there about this, but nothing really definitive that I could find. I've filed a bug asking for the Ubuntu package to include some sort of cleanup script, and interestingly that led me to learn that there are plans for a pretty comprehensive image management…