Chapter 5 had a lot of interesting things about metrics and strategy that I’ve taken notes on and have planned to incorporate into my work. I’m working on another post about metrics that will probably draw on some of the things discussed, but before I get into the nitty gritty of metrics I want to stay a bit higher level. The part that I want to focus on now was about having a goal or a purpose.
When you’re trying to figure out what to measure, first understand what problem you’re trying to solve. — Agile Testing by Lisa Crispin and Janet Gregory, Chapter 5
This is very much connected to other ideas I’ve talked about before, like letting the team feel pain; pain provides a tangible goal and gives anything you do to address it a clear purpose.
This also put something from Chapter 4 into another context. Lisa and Janet talked about using retrospectives to figure out if a problem could benefit from hiring more testers. I’ve thought about this a few times in my current role: does the team need another tester?
The answer has to come from considering what led you to ask the question in the first place. If your team is limited just by the number of hours in a day and the number of people you have, then you probably want to ask about what skills would be best to add to the team, be it testing or otherwise. If the team is struggling on testing specifically, then another tester may just be a band-aid; could the team be better served by working on building everybody’s testing skills? What problem are you trying to solve? The answer is never “we have the wrong tester to developer ratio.”
It’s also a great grounding question. Last week I wrote about how important it is to identify specifically what you’re trying to test at any moment. The motivation is the same. If you find yourself struggling with too many variables in the air and a hundred contingencies and tangents about all kinds of other things, stop for a moment and ask: “What problem am I trying to solve?” or “What problem do I need to solve right now?”
Arrakis teaches the attitude of the knife—chopping off what’s incomplete and saying: “Now it’s complete because it’s ended here.”
—from “Collected Sayings of Maud’Dib” by the Princess Irulan, in Frank Herbert’s Dune
Testing isn’t explicitly mentioned this time, but most testers who have been around the block a few times will probably recognize why this caught my eye. Often it feels like there are an infinite number of things to test, and the only way testing can be “done” is by putting down the knife.
Is there any testing that the “attitude of the knife” doesn’t apply to?
Of course, we like to attribute more agency to ourselves than that. Maybe we say our testing is complete when we’ve tested all the aspects described in our well-reasoned test plan. You know, that perfect test plan that takes into consideration all the stakeholders’ needs, the risks involved, the impact to the users, and timeliness required to arrive at just the right amount of testing for this particular context. At the end you can push back from your terminal and say, “Yes, I tested everything I set out to test. Everything about it has been validated and verified to the extent reasonable, and I’m not just saying that because we’ve run out of time.”
But then again, did you not come up with your test plan or strategy knowing how much time would be reasonable to spend on this? Did you know that the knife was going to come down on a certain day? Did you have a sense of the tolerance of your stakeholders, and balance that against the risks?
We all know that exhaustive testing is impossible. Most of us probably realize that exhaustive testing isn’t the goal anyway, and not just because we don’t have infinite time. The knife has to come down sometime, and our testing will be complete because it ended.*
Though… does anybody ever finish testing anything? Short of moving on to another product or company entirely, I don’t think I do. Maybe we don’t live on Arrakis.
I recently read Frank Herbert’s Dune and was surprised to find a couple passages on testing. I thought it would be fun to take them completely out of context (that’s a good habit for testers, right?) and try to apply them to software instead of spice. Here’s the first:
Any road followed precisely to its end leads precisely nowhere. Climb the mountain just a little bit to test that it’s a mountain. From the top of the mountain, you cannot see the mountain.
—Bene Gesserit proverb, from Frank Herbert’s Dune
This proverb appeared in one of the excerpts from in-universe books that open each chapter, all meant to have been written after the main events of the story and usually highlighting some important aspects of the plot or characters in the chapter to come.
To be honest, I’m not really sure what the relevance of this passage to the story was. It comes immediately before the chapter where Lady Jessica finds the greenhouse in their new home on Arrakis. In it, she finds a warning that there are double agents among them advancing a plot to overthrow the Duke. Maybe these initial clues at treachery are like the first steps on the road, or the first bit of the mountain. I don’t know. It doesn’t matter.
Is there any way we can say this proverb applies to software testing?
One interesting bit for us is the advice to “climb the mountain just a little bit to test that it’s a mountain.” My first thought was that it could be about how you can’t practically test everything completely, but it equally implies that you could climb the mountain or follow the road to its end (whereas you can’t test everything). This isn’t saying anything about what’s possible or not. I’d be inclined to say that it means we don’t need to climb the mountain—that we can tell it’s a mountain without going to the summit—except for the other half of the proverb:
From the top of the mountain, you cannot see the mountain.
This isn’t as easy to recklessly apply to software. With the road, once you’ve gone to the end of it, it goes nowhere else. For the mountain, it sounds like the essence of the mountain is that it is this large thing towering above the landscape. It’s something that goes up. So you can’t appreciate its mountain-ness from the top. Does this apply to software? Do we lose something about the software by testing it to its end? It might work on something like scientific study—once you understand everything there’s no more science to do—but a piece of software doesn’t stop being useful just because we’ve already exercised everything it can do.
Alternatively, perhaps it means that you lose sight of what the mountain or the road are once you’re that far along them. When you’re fully immersed in something, when you’ve studied that software every way you can think of and think you understand it inside and out, you lose the ability to see it as a beginner or an outsider. As a tester, when you become an expert on a particular system, do you risk losing the ability to see the software for what it is? I don’t feel like that has ever happened to me, but it’s also the sort of thing that you wouldn’t notice happening. It is certainly true that the things I think to test on a product after two years are different from what I would test if I were seeing it for the first time. Hopefully it’s because I know more and have a better idea of where the risks are, but maybe not.
This would be where it would be helpful to understand what the passage actually meant in the context of the novel. Maybe there’s nothing here about testing at all even though it mentions how to test something. I’d love to hear other interpretations!
In a future post, when I don’t have a real testing topic for the week, I’ll post the second passage from the book that is quite the opposite: there’s no specific mention of testing but it will be quite familiar to how we test. (Update: part two is here.)
When I was defending my thesis, one of the most memorable questions I was asked was:
“How do you know you actually did something?”
It was not only an important question about my work at the time, but has also been a very useful question for me in at least two ways. In a practical sense, it comes up as something like “did we actually fix that bug?” More foundationally, it can be as simple as “what is this test for?”
Did it actually work?
The original context was a discussion about the efforts I had gone through to remove sources of noise in my data. As I talked about in my previous post, I was using radio telescopes to measure the hydrogen distribution in the early universe. It was a very difficult measurement because even in the most optimistic models it was expected to be several orders of magnitude dimmer than everything else the telescopes could see. Not only did we have to filter out radio emission from things in the sky we weren’t interested in, it should not be surprising that there’s also a whole lot of radio coming from Earth. Although we weren’t actually looking in the FM band, it would be a lot like trying to pick out some faint static in the background of the local classic rock radio station.
One of the reasons these telescopes were built in rural India was because there was relatively little radio in use in the area. Some of it we couldn’t do anything about, but it turned out that a fair amount of radio noise in the area was accidental. The most memorable example was a stray piece of wire that had somehow been caught on some high voltage power lines and was essentially acting like an antenna leaking power from the lines and broadcasting it across the country.
We developed a way of using the telescopes to scan the horizon and for bright sources of radio signals on the ground and triangulate their location. I actually spent as much time wandering through the countryside with a GPS device in one hand and a radio antenna in the other trying to find these sources. This is what led to what has since become the defining photo of my graduate school experience:
Getting back to the question at hand, after spending weeks wandering through fields tightening loose connections, wrapping things in radio shielding, and getting the local power company to clean wires of their transmission lines… did we actually fix anything? Did we reduce the noise in our data? Did we make it easier to see the hydrogen signal we were after?
Did we actually fix the bug?
In many ways, getting rid of those errant radio emitters was like removing bugs in data. Noisy little things that were there only accidentally and that we could blame for at least some of the the noise in our data.
But these weren’t the bugs that you find before you release. Those are the bugs you find because you anticipated that they would come up and planned your testing around it. These things, in contrast, were already out in the field. They were the kinds of bugs that come from the user or something weird someone notices in the logs. You don’t know what caused them, you didn’t uncover them in testing, and you’re not sure at first what is triggering them. These are the ones that are met only with a hunch, an idea that “this might have to do with X” and “we should try doing Y and that might fix it.”
But how do you actually know that you’ve fixed it? Ideally you should have a test for it, but coming up with a new test that will catch a regression and seeing it pass isn’t enough. The test needs to fail. If it doesn’t you can’t show that fixing the bug actually causes it to pass. If you aren’t able to see the issue in the first place, it doesn’t tell you anything if you make a fix and then still don’t see the issue.
For us in the field, the equivalent reproducing the bug was going out with an antenna, pointing it at what we thought was a source, and hearing static on a handheld radio. One step after fixing it (or after the power company told us they fixed it) was to go out with the same antenna as see if the noise had gone away or not. The next step was turning on the antennas and measuring the noise again; push the fix to production and see what happens.
What is this test for?
Where this can go wrong — whether you know there’s a bug there or not — is when you have a test that doesn’t actually test anything useful. The classic example is an automated test that doesn’t actually check anything, but it can just as easily be the test that checks the wrong thing, the test that doesn’t check what it claims, or even the test that doesn’t check anything different from another one.
To me this is just like asking “did you actually do something”, because running tests that don’t actually check anything useful don’t do anything useful. If your tests don’t fail when there are bugs, then your tests don’t work.
In a world where time is limited and context is king, whether you can articulate what a test is for can be a useful heuristic for deciding whether or not something is worth doing. It’s much trickier than knowing whether you fixed a specific bug, though. We could go out into the field and hear clearly on the radio that a noise source had been fixed, but it was much harder to answer whether it paid off for our research project overall. Was it worth investing more time into it or not?