I thought I should write up the dev summit sessions I am hosting now that the program is starting to look solid. This is mostly for my own benefit, so I have a solid understanding of where to start these sessions off. Both are short brainstorm sessions, so I am not intending to produce slide decks or anything like that. I just want to make sure there is something to kick discussion off.
Image caching, where to from here (nova hypervisors)
As of essex libvirt has an image cache to speed startup of new instances. This cache stores images direct from glance, as well as resized images. There is a periodic task which cleans up images in the cache which are no longer needed. The periodic task can also optionally detect images which have become corrupted on disk.
So first off, do we want to implement this for other hypervisors as well? As mentioned in a recent blog post I’d like to see the image cache manager become common code and have all the hypervisors deal with this in exactly the same manner — that makes it easier to document, and means that on-call operations people don’t need to determine what hypervisor a compute node is running before starting to debug. However, that requires the other hypervisor implementations to change how they stage images for instance startup, and I think it bears further discussion.
Additionally, the blueprint (https://blueprints.launchpad.net/nova/+spec/nova-image-cache-management) proposed that popular / strategic images could be pre-cached on compute nodes. Is this something we still want to do? What factors do we want to use for the reference implementation? I have a few ideas here that are listed in the blueprint, but most of them require talking to glance to implement. There is some hesitance in adding glance calls to a periodic task, because in a keystone’d implementation that would require an admin token in the nova configuration file. Is there a better way to do this, or is it ok to rely on glance in a periodic task?
Ops pain points (nova other)
Apart from my own ideas (better instance logging for example), I’m very interested in hearing from other people about what we can do to make nova easier for ops people to run. This is especially true for relatively easy to implement things we can get done in Folsom. This blueprint for deployer friendly configuration files is a good example of changes which don’t look too hard to implement, but that would make the world a better place for opsen. There are many other examples of blueprints in this space, including:
- An ability to disable an entire user account
- Reduced downtime upgrades
- Reaping old data from the database
- More friendly error messages for common configuration errors
What else can we be doing to make life better for opsen? I’m especially interested in getting people who actually run openstack in the wild into the room to tell us what is painful for them at the moment.