Agile Testing book club: Let them feel pain

This is the second part is a series of exercises where I highlight one detail from a chapter or two of Agile Testing by Janet Gregory and Lisa Crispin. Part one of the series can be found here. This installment comes from Chapter 3.

Let them feel pain

This chapter is largely about making the transition into agile workflows, and the growing pains that can come from that. I’ve mentioned before on this blog that when I went through that transition, I worried about maintaining the high standard of testing that we had in place. The book is coming from a slightly different angle, of trying to overcome reluctance to introducing good quality practices, but the idea is the same. This is the sentence that stuck out most to me in the whole chapter:

Let them feel pain: Sometimes you just have to watch the train wreck.

I did eventually learn this lesson, though it took probably 6 months of struggling against the tide and a tutorial session by Mike Sowers at STAR Canada on metrics before it really sunk in. Metrics are a bit of a bugaboo in testing these days, but just hold your breath and power through with me for a second. Mike was going over the idea of “Defect Detection Percentage”, which basically just asks what percentage of bugs you caught before releasing. The usefulness of it was that you can probably push it arbitrarily high, so that you catch 99% of bugs before release, but you have to be willing to spend the time to do it. On the other end, maybe your customers are happy with a few bugs if it means they get access to the new features sooner, in which case you can afford to limit the kinds of testing you do. If you maintain an 80% defect detection percentage and still keep your customers happy, it’s not worth the extra time testing it’d take to get that higher. Yes this all depends on how you count bugs, and happiness, and which bugs you’re finding, and maybe you can test better instead of faster, but none of that is the point. This is:

If you drop some aspect of testing and the end result is the same, then it’s probably not worth the effort to do it in the first place.

There are dangers here, of course. You don’t want to drop one kind of testing just because it takes a lot of time if it’s covering a real risk. People will be happy only until that risk manifests itself as a nasty failure in the product. As ever, this is an exercise of balancing concerns.

Being in a bad spot

Part of why this idea stuck with me at the time was that the rocky transition I was going through left me in a pretty bad mental space. I eventually found myself thinking, “Fine, if nobody else cares about following these established test processes like I do, then let everybody else stop doing them and we’ll see how you like it when nothing works anymore.”

This is the cynical way of reading the advice from Janet and Lisa to “let them feel pain” and sit back to “watch the train wreck”. In the wrong work environment you can end up reaching the same conclusion from a place of spite, much like I did. But it doesn’t have to come from a negative place. Mike framed it in terms of metrics and balancing cost and benefit in a way that provided some clarity for an analytical mind like mine, and I think Lisa and Janet are being a bit facetious here deliberately. Now that I’m working in a much more positive space (mentally and corporately) I have a much better way of interpreting this idea: the best motivation for people to change their practices is for them to have reason to change them.

What actually happened for us when we started to drop our old test processes was that everything was more or less fine. The official number of bugs recorded went down overall, but I suspect that was as much a consequence of the change in our reporting habits in small agile teams as anything else. We definitely still pushed bugs into production, but they weren’t dramatically worse than before. What I do know for sure is that nobody came running to me saying “Greg you were right all along, we’re putting too many bugs into production, please save us!”

If that had happened, then great, there would be motivation to change something. But it didn’t, so we turned our attention to other things entirely.

Introducing change (and when not to)

When thinking about this, I kept coming back to two other passages that I had highlighted earlier in the same chapter:

If you are a tester in an organization that has no effective quality philosophy, you probably struggle to get good quality practices accepted. The agile approach will provide you with a mechanism for introducing good quality-oriented practices.

and also

If you’re a tester who is pushing for the team to implement continuous integration, but the programmers simply refuse to try, you’re in a bad spot.

Agile might provide a way of introducing new processes, but it doesn’t mean that anybody is going to want to embrace or even try them. If you have to twist some arms to get a commitment to try something new for even one sprint, if it doesn’t have a positive (or at least neutral) impact you better be prepared to let the team drop it (or convince them that the effects need more time to be seen). If everybody already feels that the deployment process is going swimmingly, why do you need to introduce continuous integration at all?

It might be easy when it’s deciding not to keep something new, but when already established test processes were on the line, this was a very hard thing for me to do. In a lot of ways it was like being forced to admit that we had been wrong about what testing we needed to be doing, even though all of it had been justified at one time or another. We had to realize that certain tests were generating pain for the team, and the only way we could tell if that was really worth it was to drop them and see what happens.

The take away

Today I’m in a much different place. I’m no longer coping with the loss or fragmentation of huge and well established test processes, but rather looking at establishing new processes on my team and those we work with. As tempting as it is to latch onto various testing ideas and “best” practices I hear about, it’s likely wasted effort if I don’t first ask “where are we feeling the most pain?”

How do you know you actually did something?

When I was defending my thesis, one of the most memorable questions I was asked was:

“How do you know you actually did something?”

It was not only an important question about my work at the time, but has also been a very useful question for me in at least two ways. In a practical sense, it comes up as something like “did we actually fix that bug?” More foundationally, it can be as simple as “what is this test for?”

Did it actually work?

The original context was a discussion about the efforts I had gone through to remove sources of noise in my data. As I talked about in my previous post, I was using radio telescopes to measure the hydrogen distribution in the early universe. It was a very difficult measurement because even in the most optimistic models it was expected to be several orders of magnitude dimmer than everything else the telescopes could see. Not only did we have to filter out radio emission from things in the sky we weren’t interested in, it should not be surprising that there’s also a whole lot of radio coming from Earth. Although we weren’t actually looking in the FM band, it would be a lot like trying to pick out some faint static in the background of the local classic rock radio station.

One of the reasons these telescopes were built in rural India was because there was relatively little radio in use in the area. Some of it we couldn’t do anything about, but it turned out that a fair amount of radio noise in the area was accidental. The most memorable example was a stray piece of wire that had somehow been caught on some high voltage power lines and was essentially acting like an antenna leaking power from the lines and broadcasting it across the country.

We developed a way of using the telescopes to scan the horizon and for bright sources of radio signals on the ground and triangulate their location. I actually spent as much time wandering through the countryside with a GPS device in one hand and a radio antenna in the other trying to find these sources. This is what led to what has since become the defining photo of my graduate school experience:

Standing in a field with a radio antenna, next to a cow
Cows don’t care, they have work to do.

Getting back to the question at hand, after spending weeks wandering through fields tightening loose connections, wrapping things in radio shielding, and getting the local power company to clean wires of their transmission lines… did we actually fix anything? Did we reduce the noise in our data? Did we make it easier to see the hydrogen signal we were after?

Did we actually fix the bug?

In many ways, getting rid of those errant radio emitters was like removing bugs in data. Noisy little things that were there only accidentally and that we could blame for at least some of the the noise in our data.

But these weren’t the bugs that you find before you release. Those are the bugs you find because you anticipated that they would come up and planned your testing around it. These things, in contrast, were already out in the field. They were the kinds of bugs that come from the user or something weird someone notices in the logs. You don’t know what caused them, you didn’t uncover them in testing, and you’re not sure at first what is triggering them. These are the ones that are met only with a hunch, an idea that “this might have to do with X” and “we should try doing Y and that might fix it.”

But how do you actually know that you’ve fixed it? Ideally you should have a test for it, but coming up with a new test that will catch a regression and seeing it pass isn’t enough. The test needs to fail. If it doesn’t you can’t show that fixing the bug actually causes it to pass. If you aren’t able to see the issue in the first place, it doesn’t tell you anything if you make a fix and then still don’t see the issue.

For us in the field, the equivalent reproducing the bug was going out with an antenna, pointing it at what we thought was a source, and hearing static on a handheld radio. One step after fixing it (or after the power company told us they fixed it) was to go out with the same antenna as see if the noise had gone away or not. The next step was turning on the antennas and measuring the noise again; push the fix to production and see what happens.

What is this test for?

Where this can go wrong — whether you know there’s a bug there or not — is when you have a test that doesn’t actually test anything useful. The classic example is an automated test that doesn’t actually check anything, but it can just as easily be the test that checks the wrong thing, the test that doesn’t check what it claims, or even the test that doesn’t check anything different from another one.

To me this is just like asking “did you actually do something”, because running tests that don’t actually check anything useful don’t do anything useful. If your tests don’t fail when there are bugs, then your tests don’t work.

In a world where time is limited and context is king, whether you can articulate what a test is for can be a useful heuristic for deciding whether or not something is worth doing. It’s much trickier than knowing whether you fixed a specific bug, though. We could go out into the field and hear clearly on the radio that a noise source had been fixed, but it was much harder to answer whether it paid off for our research project overall. Was it worth investing more time into it or not?

How do you know whether a test is worth doing?