What would a Nova developer tell a deployer to think about before their first OpenStack install? This was the question I wanted to answer for my linux.conf.au OpenStack miniconf talk, and writing this essay seemed like a reasonable way to take the bullet point list of ideas we generated and turn it into something that was a cohesive story. Hopefully this essay is also useful to people who couldn’t make the conference talk.
Please understand that none of these are hard rules — what I seek is for you to consider your options and make informed decisions. Its really up to you how you deploy Nova.
Operating environment
- Consider what base OS you use for your hypervisor nodes if you’re using Linux. I know that many environments have standardized on a given distribution, and that many have a preference for a long term supported release. However, Nova is at its most basic level a way of orchestrating tools packaged by your distribution via APIs. If those underlying tools are buggy, then your Nova experience will suffer as well. Sometimes we can work around known issues in older versions of our dependencies, but often those work-arounds are hard to implement (and therefore likely to be less than perfect) or have performance impacts. There are many examples of the problems you can encounter, but hypervisor kernel panics, and disk image corruption are just two examples. We are trying to work with distributions on ensuring they back port fixes, but the distributions might not be always willing to do that. Sometimes upgrading the base OS on your hypervisor nodes might be a better call.
- The version of Python you use matters. The OpenStack project only tests with specific versions of Python, and there can be bugs between releases. This is especially true for very old versions of Python (anything older than 2.7) and new versions of Python (Python 3 is not supported for example). Your choice of base OS will affect the versions of Python available, so this is related to the previous point.
- There are existing configuration management recipes for most configuration management systems. I’d avoid reinventing the wheel here and use the community supported recipes. There are definitely resources available for chef, puppet, juju, ansible and salt. If you’re building a very large deployment from scratch consider triple-o as well. Please please please don’t fork the community recipes. I know its tempting, but contribute to upstream instead. Invariably upstream will continue developing their stuff, and if you fork you’ll spend a lot of effort keeping in sync.
- Have a good plan for log collection and retention at your intended scale. The hard reality at the moment is that diagnosing Nova often requires that you turn on debug logging, which is very chatty. Whilst we’re happy to take bug reports where we’ve gotten the log level wrong, we haven’t had a lot of success at systematically fixing this issue. Your log infrastructure therefore needs to be able to handle the demands of debug logging when its turned on. If you’re using central log servers think seriously about how much disks they require. If you’re not doing centralized syslog logging, perhaps consider something like logstash.
- Pay attention to memory usage on your controller nodes. OpenStack python processes can often consume hundreds of megabytes of virtual memory space. If you run many controller services on the same node, make sure you have enough RAM to deal with the number of processes that will, by default, be spawned for the many service endpoints. After a day or so of running a controller node, check in on the VMM used for python processes and make any adjustments needed to your “workers” configuration settings.
Scale
- Estimate your final scale now. Sure, you’re building a proof of concept, but these things have a habit of becoming entrenched. If you are planning a deployment that is likely to end up being thousands of nodes, then you are going to need to deploy with cells. This is also possibly true if you’re going to have more than one hypervisor or hardware platform in your deployment — its very common to have a cell per hypervisor type or per hardware platform. Cells is relatively cheap to deploy for your proof of concept, and it helps when that initial deploy grows into a bigger thing. Should you be deploying cells from the beginning? It should be noted however that not all features are currently implemented in cells. We are working on this at the moment though.
- Consider carefully what SQL database to use. Nova supports many SQL databases via sqlalchemy, but are some are better tested and more widely deployed than others. For example, the Postgres back end is rarely deployed and is less tested. I’d recommend a variant of MySQL for your deployment. Personally I’ve seen good performance on Percona, but I know that many use the stock MySQL as well. There are known issues at the moment with Galera as well, so show caution there. There is active development happening on the select-for-update problems with Galera at the moment, so that might change by the time you get around to deploying in production. You can read more about our current Galera problems on Jay Pipe’s blog .
- We support read only replicas of the SQL database. Nova supports offloading read only SQL traffic to read only replicas of the main SQL database, but I do no believe this is widely deployed. It might be of interest to you though.
- Expect a lot of SQL database connections. While Nova has the nova-conductor service to control the number of connections to the database server, other OpenStack services do not, and you will quickly out pace the number of default connections allowed, at least for a MySQL deployment. Actively monitor your SQL database connection counts so you know before you run out. Additionally, there are many places in Nova where a user request will block on a database query, so if your SQL back end isn’t keeping up this will affect performance of your entire Nova deployment.
- There are options with message queues as well. We currently support rabbitmq, zeromq and qpid. However, rabbitmq is the original and by far the most widely deployed. rabbitmq is therefore a reasonable default choice for deployment.
Hypervisors
- Not all hypervisor drivers are created equal. Let’s be frank here — some hypervisor drivers just aren’t as actively developed as others. This is especially true for drivers which aren’t in the Nova code base — at least the ones the Nova team manage are updated when we change the internals of Nova. I’m not a hypervisor bigot — there is a place in the world for many different hypervisor options. However, the start of a Nova deploy might be the right time to consider what hypervisor you want to use. I’d personally recommend drivers in the Nova code base with active development teams and good continuous integration, but ultimately you have to select a driver based on its merits in your situation. I’ve included some more detailed thoughts on how to evaluate hypervisor drivers later in this post, as I don’t want to go off on a big tangent during my nicely formatted bullet list.
- Remember that the hypervisor state is interesting debugging information. For example with the libvirt hypervisor, the contents on /var/lib/instances is super useful for debugging misbehaving instances. Additionally, all of the existing libvirt tools work, so you can use those to investigate as well. However, I strongly recommend you only change instance state via Nova, and not go directly to the hypervisor.
Networking
- Avoid new deployments of nova-network. nova-network has been on the deprecation path for a very long time now, and we’re currently working on the final steps of a migration plan for nova-network users to neutron. If you’re a new deployment of Nova and therefore don’t yet depend on any of the features of nova-network, I’d start with Neutron from the beginning. This will save you a possible troublesome migration to Neutron later.
Testing and upgrades
- You need a test lab. For a non-trivial deployment, you need a realistic test environment. Its expected that you test all upgrades before you do them in production, and rollbacks can sometimes be problematic. For example, some database migrations are very hard to roll back, especially if new instances have been created in the time it took you to decide to roll back. Perhaps consider turning off API access (or putting the API into a read only state) while you are validating a production deploy post upgrade, that way you can restore a database snapshot if you need to undo the upgrade. We know this isn’t perfect and are working on a better upgrade strategy for information stored in the database, but we will always expect you to test upgrades before deploying them.
- Test database migrations on a copy of your production database before doing them for real. Another reason to test upgrades before doing them in production is because some database migrations can be very slow. Its hard for the Nova developers to predict which migrations will be slow, but we do try to test for this and minimize the pain. However, aspects of your deployment can affect this in ways we don’t expect — for example if you have large numbers of volumes per instance, then that could result in database tables being larger than we expect. You should always test database migrations in a lab and report any problems you see.
- Think about your upgrade strategy in general. While we now support having the control infrastructure running a newer release than the services on hypervisor nodes, we only support that for one release (so you could have your control plane running Kilo for example while you are still running Juno on your hypervisors, you couldn’t run Icehouse on the hypervisors though). Are you going to upgrade every six months? Or are you going to do it less frequently but step through a series of upgrades in one session? I suspect the latter option is more risky — if you encounter a bug in a previous release we would need to back port a fix, which is a much slower process than fixing the most recent release. There are also deployments which choose to “continuously deploy” from trunk. This gets the access to features as they’re added, but means that the deployments need to have more operational skill and a closer association with the upstream developers. In general continuous deployers are larger public clouds as best as I can tell.
libvirt specific considerations
- For those intending to run the libvirt hypervisor driver, not all libvirt hypervisors are created equal. libvirt implements pluggable hypervisors, so if you select the Nova libvirt hypervisor driver, you then need to select what hypervisor to use with libvirt as well. It should be noted however that some hypervisors work better than others, with kvm being the most widely deployed.
- There are two types of storage for instances. There is “instance storage”, which is block devices that exist for the life of the instance and are then cleaned up when the instance is destroyed. There is also block storage provided Cinder, which is persistent and arguably easier to manage than instance storage. I won’t discuss storage provided by Cinder any further however, because it is outside the scope of this post. Instance storage is provided by a plug in layer in the libvirt hypervisor driver, which presents you with another set of deployment decisions.
- Shared instance storage is attractive, but it comes at a cost. Shared instance storage is an attractive option, but isn’t required for live migration of instances using the libvirt hypervisor. Think about the costs of shared storage though — for example putting everything on network attached storage is likely to be expensive, especially if most of your instances don’t need the facility. There are other options such as Ceph, but the storage interface layer in libvirt is one of the areas of code where we need to improve testing so be wary of bugs before relying on those storage back ends.
Thoughts on how to evaluate hypervisor drivers
As promised, I also have some thoughts on how to evaluate which hypervisor driver is the right choice for you. First off, if your organization has a lot of experience with a particular hypervisor, then there is always value in that. If that is the case, then you should seriously consider running the hypervisor you already have experience with, as long as that hypervisor has a driver for Nova which meets the criteria below.
What’s important is to be looking for a driver which works well with Nova, and a good measure of that is how well the driver development team works with the Nova development team. The obvious best case here is where both teams are the same people — which is true for drivers that are in the Nova code base. I am aware there are drivers that live outside of Nova’s code repository, but you need to remember that the interface these drivers plug into isn’t a stable or versioned interface. The risk of those drivers being broken by the ongoing development of Nova is very high. Additionally, only a very small number of those “out of tree” drivers contribute to our continuous integration testing. That means that the Nova team also doesn’t know when those drivers are broken. The breakages can also be subtle, so if your vendor isn’t at the very least doing tempest runs against their out of tree driver before shipping it to you then I’d be very worried.
You should also check out how many bugs are open in LaunchPad for your chosen driver (this assumes the Nova team is aware of the existence of the driver I suppose). Here’s an example link to the libvirt driver bugs currently open. As well as total bug count, I’d be looking for bug close activity — its nice if there is a very small number of bugs filed, but perhaps that’s because there aren’t many users. It doesn’t necessarily mean the team for that driver is super awesome at closing bugs. The easiest way to look into bug close rates (and general code activity) would be to checkout the code for Nova and then look at the log for your chosen driver. For example for the libvirt driver again:
$ git clone http://git.openstack.org/openstack/nova $ cd nova/nova/virt/driver/libvirt $ git log .
That will give you a report on all the commits ever for that driver. You don’t need to read the entire report, but it will give you an idea of what the driver authors have recently been thinking about.
Another good metric is the specification activity for your driver. Specifications are the formal design documents that Nova adopted for the Juno release, and they document all the features that we’re currently working on. I write summaries of the current state of Nova specs regularly, which you can see posted at stillhq.com with this being the most recent summary at the time of writing this post. You should also check how much your driver authors interact with the core Nova team. The easiest way to do that is probably to keep an eye on the Nova team meeting minutes, which are posted online.
Finally, the OpenStack project believes strongly in continuous integration testing. It (s/It/Testing) has clear value in the number of bugs it finds in code before our users experience them, and I would be very wary of driver code which isn’t continuously integrated with Nova. Thus, you need to ensure that your driver has well maintained continuous integration testing. This is easy for “in tree” drivers, as we do that for all of them. For out of tree drivers, continuous integration testing is done with a thing called “third party CI”.
How do you determine if a third party CI system is well maintained? First off, I’d start by determining if a third party CI system actually exists by looking at OpenStack’s list of known third party CI systems. If the third party isn’t listed on that page, then that’s a very big warning sign. Next you can use Joe Gordon’s lastcomment tool to see when a given CI system last reported a result:
$ git clone https://github.com/jogo/lastcomment $ ./lastcomment.py --name "DB Datasets CI" last 5 comments from 'DB Datasets CI' [0] 2015-01-07 00:46:33 (1:35:13 old) https://review.openstack.org/145378 'Ignore 'dynamic' addr flag on gateway initialization' [1] 2015-01-07 00:37:24 (1:44:22 old) https://review.openstack.org/136931 'Use session with neutronclient' [2] 2015-01-07 00:35:33 (1:46:13 old) https://review.openstack.org/145377 'libvirt: Expanded test libvirt driver' [3] 2015-01-07 00:29:50 (1:51:56 old) https://review.openstack.org/142450 'ephemeral file names should reflect fs type and mkfs command' [4] 2015-01-07 00:15:59 (2:05:47 old) https://review.openstack.org/142534 'Support for ext4 as default filesystem for ephemeral disks'
You can see here that the most recent run is 1 hour 35 minutes old when I ran this command. That’s actually pretty good given that I wrote this while most of America was asleep. If the most recent run is days old, that’s another warning sign. If you’re left in doubt, then I’d recommend appearing in the OpenStack IRC channels on freenode and asking for advice. OpenStack has a number of requirements for third party CI systems, and I haven’t discussed many of them here. There is more detail on what OpenStack considers a “well run CI system” on the OpenStack Infrastructure documentation page.
General operational advice
Finally, I have some general advice for operators of OpenStack. There is an active community of operators who discuss their use of the various OpenStack components at the openstack-operators mailing list, if you’re deploying Nova you should consider joining that mailing list. While you’re welcome to ask questions about deploying OpenStack at that list, you can also ask questions at the more general OpenStack mailing list if you want to.
There are also many companies now which will offer to operate an OpenStack cloud for you. For some organizations engaging a subject matter expert will be the right decision. Probably the most obvious way to evaluate which of those companies to use is to look at their track record of successful deployments, as well as their overall involvement in the OpenStack community. You need a partner who can advocate for you with the OpenStack developers, as well as keeping an eye on what’s happening upstream to ensure it meets your needs.
Conclusion
Thanks for reading so far! I hope this document is useful to someone out there. I’d love to hear your feedback — are there other things we wished deployers considered before committing to a plan? Am I simply wrong somewhere? Finally, this is the first time that I’ve posted an essay form of a conference talk instead of just the slide deck, and I’d be interested in if people find this format more useful than a YouTube video post conference. Please drop me a line and let me know if you find this useful!