This is the next post summarizing the Juno Nova mid-cycle meetup. This post covers the cells functionality used by some deployments to scale Nova.
For those unfamiliar with cells, it’s a way of combining smaller Nova installations into a thing which feels like a single large Nova install. So for example, Rackspace deploys Nova in cells of hundreds of machines, and these cells form a Nova availability zone which might contain thousands of machines. The cells in one of these deployments form a tree: users talk to the top level of the tree, which might only contain API services. That cell then routes requests to child cells which can actually perform the operation requested.
There are a few reasons why Rackspace does this. Firstly, it keeps the MySQL databases smaller, which can improve the performance of database operations and backups. Additionally, cells can contain different types of hardware, which are then partitioned logically. For example, OnMetal (Rackspace’s Ironic-based baremetal product) instances come from a cell which contains OnMetal machines and only publishes OnMetal flavors to the parent cell.
Cells was originally written by Rackspace to meet its deployment needs, but is now used by other sites as well. However, I think it would be a stretch to say that cells is commonly used, and it is certainly not the deployment default. In fact, most deployments don’t run any of the cells code, so you can’t really call them even a “single cell install”. One of the reasons cells isn’t more widely deployed is that it doesn’t implement the entire Nova API, which means some features are missing. As a simple example, you can’t live-migrate an instance between two child cells.
At the meetup, the first thing we discussed regarding cells was a general desire to see cells finished and become the default deployment method for Nova. Perhaps most people end up running a single cell, but in that case at least the cells code paths are well used. The first step to get there is improving the Tempest coverage for cells. There was a recent openstack-dev mailing list thread on this topic, which was discussed at the meetup. There was commitment from several Nova developers to work on this, and notably not all of them are from Rackspace.
It’s important that we improve the Tempest coverage for cells, because it positions us for the next step in the process, which is bringing feature parity to cells compared with a non-cells deployment. There is some level of frustration that the work on cells hasn’t really progressed in Juno, and that it is currently incomplete. At the meetup, we made a commitment to bringing a well-researched plan to the Kilo summit for implementing feature parity for a single cell deployment compared with a current default deployment. We also made a commitment to make cells the default deployment model when this work is complete. If this doesn’t happen in time for Kilo, then we will be forced to seriously consider removing cells from Nova. A half-done cells deployment has so far stopped other development teams from trying to solve the problems that cells addresses, so we either need to finish cells, or get out of the way so that someone else can have a go. I am confident that the cells team will take this feedback on board and come to the summit with a good plan. Once we have a plan we can ask the whole community to rally around and help finish this effort, which I think will benefit all of us.
In the next blog post I will cover something we’ve been struggling with for the last few releases: how we get our bug count down to a reasonable level.