Welcome to the next exciting installment of the Nova Juno mid-cycle meetup summary. In the previous chapter, our hero battled a partially complete cells implementation, by using his +2 smile of good intentions. In this next exciting chapter, watch him battle our seemingly never ending pile of bugs! Sorry, now that I’m on to my sixth post in this series I feel like it’s time to get more adventurous in the introductions.
For at least the last cycle, and probably longer, Nova has been struggling with the number of bugs filed in Launchpad. I don’t think the problem is that Nova has terrible code, it is instead that we have a lot of users filing bugs, and the team working on triaging and closing bugs is small. The complexity of the deployment options with Nova make this problem worse, and that complexity increases as we allow new drivers for things like different storage engines to land in the code base.
The increasing number of permutations possible with Nova configurations is a problem for our CI systems as well, as we don’t cover all of these options and this sometimes leads us to discover that they don’t work as expected in the field. CI is a tangent from the main intent of this post though, so I will reserve further discussion of our CI system until a later post.
Tracy Jones and Joe Gordon have been doing good work in this cycle trying to get a grip on the state of the bugs filed against Nova. For example, a very large number of bugs (hundreds) were for problems we’d fixed, but where the bug bot had failed to close the bug when the fix merged. Many other bugs were waiting for feedback from users, but had been waiting for longer than six months. In both those cases the response was to close the bug, with the understanding that the user can always reopen it if they come back to talk to us again. Doing “quick hit” things like this has reduced our open bug count to about one thousand bugs. You can see a dashboard that Tracy has produced that shows the state of our bugs at http://54.201.139.117/nova-bugs.html. I believe that Joe has been moving towards moving this onto OpenStack hosted infrastructure, but this hasn’t happened yet.
At the mid-cycle meetup, the goal of the conversation was to try and find other ways to get our bug queue further under control. Some of the suggestions were largely mechanical, like tightening up our definitions of the confirmed (we agree this is a bug) and triaged (and we know how to fix it) bug states. Others were things like auto-abandoning bugs which are marked incomplete for more than 60 days without a reply from the person who filed the bug, or unassigning bugs when the review that proposed a fix is abandoned in Gerrit.
Unfortunately, we have more ideas for how to automate dealing with bugs than we have people writing automation. If there’s someone out there who wants to have a big impact on Nova, but isn’t sure where to get started, helping us out with this automation would be a super helpful way to get started. Let Tracy or I know if you’re interested.
We also talked about having more targeted bug days. This was prompted by our last bug day being largely unsuccessful. Instead we’re proposing that the next bug day have a really well defined theme, such as moving things from the “undecided” to the “confirmed” state, or similar. I believe the current plan is to run a bug day like this after J-3 when we’re winding down from feature development and starting to focus on stabilization.
Finally, I would encourage people fixing bugs in Nova to do a quick search for duplicate bugs when they are closing a bug. I wouldn’t be at all surprised to discover that there are many bugs where you can close duplicates at the same time with minimal effort.
In the next post I’ll cover our discussions of the state of the current scheduler work in Nova.