Rethinking velocity

I’ve been thinking about the concept of “velocity” in software development the last few days, in response to a couple cases recently where I’ve seen people express dislike for it. I can understand why, and the more I think about it the more I think that the concept does more harm than good.

When I was first introduced to the concept it was with the analogy of running a race. To win, a runner wants to increase their velocity, running the same distance in a shorter amount of time. Even though the distance is the same each time they run the race, with practice the runner can complete it faster and faster.

The distance run, in the software development analogy, is the feature completed. In Scrum, velocity is the measure how many story points a team completes in a sprint. Individual pieces of work are sized by their “complexity”, so with practice, a team should be able to increase their velocity by finishing work of a given complexity in less time. I have trouble with this first because story points are problematic at best, so any velocity you try to calculate will be easily manipulated. Since I’ve gotten into trouble with the Scrum word police before, I’m going to put that aside for a moment and say that the units you use don’t matter for what I’m talking about.

It should be fair to say that increasing velocity as Scrum defines it is about being able to do more complex work within a sprint without actually doing more work (more time, more effort), because the team gets better at doing that work. (This works both for a larger amount of a fixed complexity of work, or a sprint’s worth of work that is more complex than could have been done in previous sprints.) Without worry about some nebulous “points”, the concept is still about being do more than you could before in a fixed amount of time.

But that’s not what people actually hear when you say we need to “increase velocity”.

Rather, it feels like being asked to do the work faster and faster. Put the feature factory at full steam! You need to work faster, you need to get more done, you need to be able to finish any feature in less than two weeks. Asking how you can increase velocity doesn’t ask “how can we make this work easier?” It asks, “why is this taking so long?” It feels like a judgement, and so we react negatively to it.

While it certainly does make sense to try to make repeated work easier with each iteration, I don’t think that should be the goal of a team. The point of being agile as I’ve come to understand it (and I’ll go with a small “a” here again to avoid the language police) is to be flexible to change by encouraging shorter feedback cycles, which itself is only possible by delivering incrementally to customers earlier than if we worked in large process-heavy single-delivery projects.

Building working-ish versions of a widget and delivering incremental improvements more often might take longer to get to the finished widget, but with the constant corrections along the way the end result should actually be better than it otherwise would have been. And, of course, if the earlier iterations can meet some level of the customer’s need, then they starting getting value far sooner in the process as well. The complexity of the widget doesn’t change, but I’d be happy to take the lower velocity for a higher quality product.

I’m bringing it back to smaller increments and getting feedback because one of the conversations that led to this thinking was about whether putting our products in front of users sooner was the same as asking for increased velocity. Specifically, I said “If you aren’t putting your designs in front of users, you’re wasting your time.” In a sense, I am asking for something to be faster, and going faster means velocity, so these concepts get conflated.

The “velocity” I care about isn’t the number of points done in a given time frame (or number of stories, or number of any kind of deliverable.) What I care about is, how many feedback points did I get in that time? How many times did we check our assumptions? How sure are we that the work we’ve been doing is actually going to provide the value needed? Maybe “feedback frequency” is what we should be talking about.

A straight line from start to finish for a completed widget with feedback at the start and env, vs a looping line with seven points of feedback but longer to get to the end.
And this is generously assuming you have a good idea of what needs to be built in the first place.

Importantly, I’m not necessarily talking about getting feedback on a completed (or prototype) feature delivered to production. Much like I argued that you can demo things that aren’t done, there is information to be gained at every stage of development, from initial idea through design to prototypes and final polish. I’ve always been an information junkie, so I see any information about the outside world, be it anecdotal oddities to huge statistical models of tracking behaviours in your app. Even just making observations about the world, learning about your intended users’ needs before you know what to offer them, all feeds into this category. Too often this happens only once at the outset. A second time when all else is said and done if you’re lucky. I’m not well versed in the design and user experience side of thing yet, but I wager even the big picture, blue-sky, big steps and exploration we might want to do can still be checked against the outside world more often than most people think.

Much like “agile” and “automation“, the word “velocity” itself has become a distraction. People associate it with the sense of wanting to do the same thing faster and faster. What I actually want is to do things in smaller chunks, more often. Higher frequency adjustments to remain agile and build better products, not just rushing for the finish line.

In defence of time over story points

I have to admit, there was a time when I was totally on board with estimating work in “story points”. Briefly I was the resident point-apologist around town, explaining metaphors about how points are like the distance of a race that people complete in different times. These days, while estimating complexity has its uses, I’m coming to appreciate those old fashioned time estimates.

Story points are overrated. Here’s a few of the reasons why I think so. Strap yourselves in, this is a bit of a rant. But don’t worry, I’ll hedge at the end.

The scale is arbitrary and unintuitive

How do you measure complexity? What units do you use? Can you count the number of requirements, the acceptance criteria, the number of changes, the smelliness of the code to be changed, the number of test cases required, or the temperature of the room after the developers have debated the best implementation?

To avoid that question, story points use an arbitrary scale with arbitrary increments. It could be the Fibonacci sequence, powers of two, or just numbers 1 through 5. That itself is not necessarily a problem — Fahrenheit and Celsius are both arbitrary scales that measure something objective — but if you ask 10 developers what a “1” means you’ll get zero answers if they haven’t used points yet and 20 answers 6 months later.

I don’t know anybody who has an intuition for estimating “complexity” because there’s no scale for it. There’s nothing to check it against. Meanwhile we’ve all been developing an intuition for time every since we started asking “are we there yet?” from the back of the car or complaining that it wasn’t late enough for bedtime.

People claim that you can build your scale by taking the simplest task as a “1” and going from there. But complexity doesn’t scale like that. What’s twice as complicated as, say, changing a configuration value? Even if you compare tickets being estimated with previous ones, you’re never going to place it in an ordered list (even if binned) of all previous tickets. You’re guaranteed to have some that are more “complex” than others rated at lower points because you were feeling confident that day or didn’t have a full picture of the work. (Though if you do try this, it can give you the side benefit of questioning whether those old tickets really deserve the points they got.)

It may not be impossible to get a group of people to come to a common intuition around estimating complexity, but it sure takes a lot longer than agreeing on how long a day or a week is. Even if you did reach that common understanding, nobody outside the team will understand it.

Points aren’t what people actually care about

People, be it either the business or dependent teams, need to schedule things. If we want to have goals and try to reach them, we have to have some idea of how much we have to do to get there and how much time it will take to do that work. If someone asks “when can we start work on feature B” and you say “well feature A has 16 points”, their next question is “OK, and how long will that take?” or “and when will it be done?” Points don’t answer either question, and nobody is going to be happy if you tell them the question can’t be answered.

In practice (at least in my experience) people use time anyway. “It’ll only take an hour so I’m giving it one point”. “I’d want to spend a week on this so let’s give it 8 points.” When someone says “This is more complicated so we better give it more points” it’s because they’ll need more time to do it!

Maybe I care about complexity because complexity breeds risk and I’ll need to be more careful testing it. That’s fair, and a decent reason for asking the question, but it also just means you need more time to test it. Complexity is certainly one dimension of that but it isn’t the whole story (impact and probability of risks manifested are others).

Even the whole premise of points, to be able to measure the velocity of a team, admits that time is the important factor. Velocity matters because it tells you how much you work you can reasonably put into your sprint. But given a sprint length you already know how many hours you can fit into a sprint. What’s the point of going around the bush about it?

Points don’t provide feedback

Time provides has a built in feedback that points can’t. That’ll take me less than a day, I say. Two days later we have a problem.

Meanwhile I say something is 16 story points. Two days later it isn’t done… do I care? Am I running behind? What about 4 weeks later? Was this really a 16 point story or not? Oh, actually, someone was expecting it last Thursday? That pesky fourth dimension again!

Points don’t avoid uncertainty

I once heard someone claim that one benefit of story points is that they don’t change when something unexpected comes up. In one sense that’s true, but only if there’s no feedback on the actual value of points. Counterexamples are about as easy to find as stories themselves.

Two systems interact with each other in a way the team didn’t realize. Someone depends on the legacy behaviour so you need to add a migration plan. The library that was going to make the implementation a single line has a bug in it. Someone forgot to mention a crucial requirement. There are new stakeholders that need to be looped in. Internet Explorer is a special snowflake. The list goes on, and each new thing can make something more complex. If they don’t add complexity after you’ve assigned a number, what creates the complexity in the first place?

Sure you try to figure out all aspects of the work as early as possible, maybe even before it gets to the point of estimating for a sprint. Bring in the three amigos! But all the work you do to nail down the “complexity” of a ticket isn’t anything special about “complexity” as a concept, it’s exactly the same kind of work you’d do to refine a time estimate. Neither one has a monopoly on certainty.

Points don’t represent work

One work ticket might require entering configurations for 100 clients for a feature we developed last sprint. It’s dead simple brainless work and there’s minimal risk beyond copy-paste errors that there are protections for anyway. Complexity? Nah, it’s one point, but I’ll spend the whole sprint doing it.

Another work ticket is replacing a legacy piece of code to support an upcoming batch of features. We know the old code tends to be buggy and we’ve been scared to touch it for years because of that. The new version is already designed but it’ll be tricky to plug in and test thoroughly to make sure we don’t break anything in the process. Not a big job—it can still be done in one sprint—but relatively high risk and complex. 20 points.

So wait, if both of those fit in one sprint, why do I care what the complexity is? There are real answers to that, but answering the question of how much work it is isn’t one of them. If you argue that those two examples should have similar complexity since they both take an entire sprint, then you’re already using time as the real estimator and I don’t need to convince you.

Points are easily manipulated

Like any metric, we must question how points can be manipulated and whether there’s incentive to do so.

In order to see increase in velocity, you have to have a really well understood scale. The only way to calibrate that scale without using a measurable unit is to spend months “getting a feel for it”.

Now if you’re looking for ways to increase your velocity, guaranteed the cheapest way to do that (deliberately or not) is to just start assigning more points to things. Now that the team has been at this for a while, one might say, they can better estimate complexity. Fewer unknowns mean more knowns, which are more things to muddy the discussion and push up those complexity estimates. (Maybe you are estimating more accurately, but how can you actually know that?) Voila. Faster velocity brought to you in whole by the arbitrary, immeasurable, and subjective nature of points.

Let’s say we avoid that trap, and we actually are getting better at the work we’re doing. Something that was really difficult six months ago can be handled pretty quickly now without really thinking about it. Is that ticket still as complex as it was six months ago? If the work hasn’t changed it should be, but it sure won’t feel as complex. So is your instinct going to be to put the same points on it? Velocity stagnates even though you’re getting more done. Not only can velocity be manipulated through malice, it doesn’t even correlate with the thing you want to measure!

It’s a feature, not a bug

One argument I still anticipate in favour of points is that the incomprehensibility of them is actually a feature, not a bug. It’s arbitrary on purpose so that it’s harder for people outside the team to translate them into deadlines to be imposed onto that team. It’s a protection mechanism. A secret code among developers to protect their own sanity.

If that’s the excuse, then you’ve got a product management problem, not an estimation problem.

In fact it’s a difficulty with metrics, communication, and overzealous people generally, not something special about time. The further metrics get from the thing they measure, the more likely they are to be misused. Points, if anybody understood them, would be just as susceptible to that.

A final defence of complexity

As far as a replacement for estimating work in time, story points are an almost entirely useless concept that introduces more complexity than it estimates. There’s a lot of jumping through hoops and hand waving to make it look like you’re not estimating things in time anymore. I’d much rather deal in a quantity we actually have units for. I’m tempted to say save yourself the effort, except for one thing: trying to describe the complexity of proposed work is a useful tool for fleshing out what the work actually requires and to get everybody on an equal footing understanding that work. That part doesn’t go away, though the number you assign to it might as well. Just don’t pretend it’s more meaningful than hours on a clock.