Testing on Dune (Part 1)

I recently read Frank Herbert’s Dune and was surprised to find a couple passages on testing. I thought it would be fun to take them completely out of context (that’s a good habit for testers, right?) and try to apply them to software instead of spice. Here’s the first:

Any road followed precisely to its end leads precisely nowhere. Climb the mountain just a little bit to test that it’s a mountain. From the top of the mountain, you cannot see the mountain.

—Bene Gesserit proverb, from Frank Herbert’s Dune

This proverb appeared in one of the excerpts from in-universe books that open each chapter, all meant to have been written after the main events of the story and usually highlighting some important aspects of the plot or characters in the chapter to come.

To be honest, I’m not really sure what the relevance of this passage to the story was. It comes immediately before the chapter where Lady Jessica finds the greenhouse in their new home on Arrakis. In it, she finds a warning that there are double agents among them advancing a plot to overthrow the Duke. Maybe these initial clues at treachery are like the first steps on the road, or the first bit of the mountain. I don’t know. It doesn’t matter.

Is there any way we can say this proverb applies to software testing?

One interesting bit for us is the advice to “climb the mountain just a little bit to test that it’s a mountain.” My first thought was that it could be about how you can’t practically test everything completely, but it equally implies that you could climb the mountain or follow the road to its end (whereas you can’t test everything). This isn’t saying anything about what’s possible or not. I’d be inclined to say that it means we don’t need to climb the mountain—that we can tell it’s a mountain without going to the summit—except for the other half of the proverb:

From the top of the mountain, you cannot see the mountain.

This isn’t as easy to recklessly apply to software. With the road, once you’ve gone to the end of it, it goes nowhere else. For the mountain, it sounds like the essence of the mountain is that it is this large thing towering above the landscape. It’s something that goes up. So you can’t appreciate its mountain-ness from the top. Does this apply to software? Do we lose something about the software by testing it to its end? It might work on something like scientific study—once you understand everything there’s no more science to do—but a piece of software doesn’t stop being useful just because we’ve already exercised everything it can do.

Alternatively, perhaps it means that you lose sight of what the mountain or the road are once you’re that far along them. When you’re fully immersed in something, when you’ve studied that software every way you can think of and think you understand it inside and out, you lose the ability to see it as a beginner or an outsider. As a tester, when you become an expert on a particular system, do you risk losing the ability to see the software for what it is? I don’t feel like that has ever happened to me, but it’s also the sort of thing that you wouldn’t notice happening. It is certainly true that the things I think to test on a product after two years are different from what I would test if I were seeing it for the first time. Hopefully it’s because I know more and have a better idea of where the risks are, but maybe not.

This would be where it would be helpful to understand what the passage actually meant in the context of the novel. Maybe there’s nothing here about testing at all even though it mentions how to test something. I’d love to hear other interpretations!

In a future post, when I don’t have a real testing topic for the week, I’ll post the second passage from the book that is quite the opposite: there’s no specific mention of testing but it will be quite familiar to how we test. (Update: part two is here.)

My first game of TestSphere

Today (as I write this, last week as it is published) I had my first experience playing TestSphere. I’ve had a deck for ages but only recently suggested trying to play it with the QA community of practice in my department. Going from never having played it at all to facilitating a session with a whole group was quite a leap and I wasn’t at all sure how it would go. Here’s some of my observations about the experience.

Test sphere cards laid out on a table

Seven thoughts about TestSphere

1. Ten’s a crowd: The weekly meeting of the group usually has anywhere from 4 to 16 people attending, with the typical number around 12. I planned on playing the standard game, which the box says is best for 4 to 8 people. I was prepared to split us into two groups if needed, but in the end tried playing with the full group of 10 that came that day.

2. One for all or a bunch for each: The instructions say to reveal one or more cards depending on the experience level of the group, though it’s not clear to me which way those should correlate. I decide to go with one card of each colour so there would be a variety of types ofthings to think about. This turned out to be exactly the wrong number. Though I deliberately put us as a small table, people still had to pick up cards from the middle to read them. As soon as we started, 5 people were reading cards and 5 people were doing nothing. Should I do this again, I would try one extreme or the other: 1 or 2 cards that the whole group could focus on together, or 3-5 cards each to think about independently and have people play cards from their own hand. In the latter case I can then imagine combo play (“I have a card that applies to that story!” or “I have an experience with that too, plus this other concept from my hand”) but let’s not get carried away.

3. Combining cards: Nobody attempted to combine multiple cards into a single story, which I thought would be part of the fun of trying to “win”. This may have just been because people were passing cards around one at a time rather than looking at them as a group. I suspect it would have been easier to combine cards with fewer people or ones that was already familiar with the cards.

4. Minimalism: We didn’t make use of most of the text on the cards. The examples are great and really show the amount of good work Beren Van Daele and the MoT put into designing the deck, but it was just too much to make use of in this format. While the extra text is useful to fully understand the concept, a minimal deck with just the concept, slogan, and a simple graphic might be less intimidating. (The Easter egg here is that Minimalism is one of the cards we talked about in our group today; going back and reading the card again I’m really torn by this since the examples really do illuminate it in a way the slogan alone doesn’t, and the three are so different from each other that even limiting it to one would not be quite the same.)

5. Waiting patiently: The group naturally developed a pattern of picking up new cards as soon as they came up and holding on to them until it was their turn to tell the story. I wouldn’t say that I expected it to be a raucous fight for cards and who got to tell their story first, but I didn’t expect it to be so calm and orderly either. Once or twice this resulted in someone who had picked up a card just to read it seemingly getting stuck into telling a story about that card whether they meant to or not.

6. Everybody had a story: The energy of the game varied quite a bit depending on who was speaking. Some people are just better story tellers or more comfortable with public speaking than others. Nonetheless, I was quite happy that nobody dominated the conversation too much, and by the end everybody had shared at least once. I had laid out a rule at the beginning that if two people had a story to share we would defer to whoever hadn’t spoken yet, but we only had to invoke it once.

7. My QA is not your QA: Several times I was surprised with the stories people told given the card they picked up, often struggling to see what the connection was. To me this illustrates how differently people think, which would keep this interesting to play with another group of people. Not only that, but they’ll likely work quite easily outside of QA circles. At one point we had only one person left who hadn’t collected any cards yet. “I’m a developer,” he said, “I only have developer stories.” But when prompted he was able to pick up a card just as easily as anybody else.

The forgotten debrief

In the end, we shared about 15 stories in 50 minutes. Overall I think it was a good experience, and it was a neat way to hear more about everybody’s experiences on other teams. Unfortunately I didn’t manage time well and we got kicked out of the meeting room before I had a chance to debrief with anybody about their experience with the game. Some ideas for focus questions I had jotted down (roughly trying to follow an ORID model) were:

  1. What are some of the concepts and examples that came up on the cards?
  2. Were there concepts someone else talked about that you also had a story for? Were any concepts totally new to you?
  3. Did anything surprise you about the experiences others shared? What did you learn about someone that you didn’t know before? What did or didn’t work well about this experience?

and finally:

  1. Would you play again?

Agile Testing book club: Let them feel pain

This is the second part is a series of exercises where I highlight one detail from a chapter or two of Agile Testing by Janet Gregory and Lisa Crispin. Part one of the series can be found here. This installment comes from Chapter 3.

Let them feel pain

This chapter is largely about making the transition into agile workflows, and the growing pains that can come from that. I’ve mentioned before on this blog that when I went through that transition, I worried about maintaining the high standard of testing that we had in place. The book is coming from a slightly different angle, of trying to overcome reluctance to introducing good quality practices, but the idea is the same. This is the sentence that stuck out most to me in the whole chapter:

Let them feel pain: Sometimes you just have to watch the train wreck.

I did eventually learn this lesson, though it took probably 6 months of struggling against the tide and a tutorial session by Mike Sowers at STAR Canada on metrics before it really sunk in. Metrics are a bit of a bugaboo in testing these days, but just hold your breath and power through with me for a second. Mike was going over the idea of “Defect Detection Percentage”, which basically just asks what percentage of bugs you caught before releasing. The usefulness of it was that you can probably push it arbitrarily high, so that you catch 99% of bugs before release, but you have to be willing to spend the time to do it. On the other end, maybe your customers are happy with a few bugs if it means they get access to the new features sooner, in which case you can afford to limit the kinds of testing you do. If you maintain an 80% defect detection percentage and still keep your customers happy, it’s not worth the extra time testing it’d take to get that higher. Yes this all depends on how you count bugs, and happiness, and which bugs you’re finding, and maybe you can test better instead of faster, but none of that is the point. This is:

If you drop some aspect of testing and the end result is the same, then it’s probably not worth the effort to do it in the first place.

There are dangers here, of course. You don’t want to drop one kind of testing just because it takes a lot of time if it’s covering a real risk. People will be happy only until that risk manifests itself as a nasty failure in the product. As ever, this is an exercise of balancing concerns.

Being in a bad spot

Part of why this idea stuck with me at the time was that the rocky transition I was going through left me in a pretty bad mental space. I eventually found myself thinking, “Fine, if nobody else cares about following these established test processes like I do, then let everybody else stop doing them and we’ll see how you like it when nothing works anymore.”

This is the cynical way of reading the advice from Janet and Lisa to “let them feel pain” and sit back to “watch the train wreck”. In the wrong work environment you can end up reaching the same conclusion from a place of spite, much like I did. But it doesn’t have to come from a negative place. Mike framed it in terms of metrics and balancing cost and benefit in a way that provided some clarity for an analytical mind like mine, and I think Lisa and Janet are being a bit facetious here deliberately. Now that I’m working in a much more positive space (mentally and corporately) I have a much better way of interpreting this idea: the best motivation for people to change their practices is for them to have reason to change them.

What actually happened for us when we started to drop our old test processes was that everything was more or less fine. The official number of bugs recorded went down overall, but I suspect that was as much a consequence of the change in our reporting habits in small agile teams as anything else. We definitely still pushed bugs into production, but they weren’t dramatically worse than before. What I do know for sure is that nobody came running to me saying “Greg you were right all along, we’re putting too many bugs into production, please save us!”

If that had happened, then great, there would be motivation to change something. But it didn’t, so we turned our attention to other things entirely.

Introducing change (and when not to)

When thinking about this, I kept coming back to two other passages that I had highlighted earlier in the same chapter:

If you are a tester in an organization that has no effective quality philosophy, you probably struggle to get good quality practices accepted. The agile approach will provide you with a mechanism for introducing good quality-oriented practices.

and also

If you’re a tester who is pushing for the team to implement continuous integration, but the programmers simply refuse to try, you’re in a bad spot.

Agile might provide a way of introducing new processes, but it doesn’t mean that anybody is going to want to embrace or even try them. If you have to twist some arms to get a commitment to try something new for even one sprint, if it doesn’t have a positive (or at least neutral) impact you better be prepared to let the team drop it (or convince them that the effects need more time to be seen). If everybody already feels that the deployment process is going swimmingly, why do you need to introduce continuous integration at all?

It might be easy when it’s deciding not to keep something new, but when already established test processes were on the line, this was a very hard thing for me to do. In a lot of ways it was like being forced to admit that we had been wrong about what testing we needed to be doing, even though all of it had been justified at one time or another. We had to realize that certain tests were generating pain for the team, and the only way we could tell if that was really worth it was to drop them and see what happens.

The take away

Today I’m in a much different place. I’m no longer coping with the loss or fragmentation of huge and well established test processes, but rather looking at establishing new processes on my team and those we work with. As tempting as it is to latch onto various testing ideas and “best” practices I hear about, it’s likely wasted effort if I don’t first ask “where are we feeling the most pain?”

Testing like you’re laughing

I was in a brainstorming meeting recently. The woman running the meeting started setting up an activity by dividing in the board into several sections. In one, she wrote “Lessons Learned” and in a second she wrote “Problem Areas”. The idea was that we’d each come up with a few ideas to put into each category and then discuss.

I immediately asked, “What if one of the lessons I learned is that we have a problem area?”

To her credit, she gave a perfectly thoughtful and reasonable answer about how to differentiate the two categories. The details don’t matter; what was important was that others in the room started joking that, as the “only QA” in the room, I immediately started testing her activity and trying to break it. This was all in good fun, and I joked along saying “Sorry, I didn’t mean to be hard on you right away.”

“You’re the QA,” she said, “It’s your job!”

This tickled an idea in the back of my mind but it didn’t come to me right away. Later that day, though, I realized what the answer to that should have been:

“As long as I’m QA-ing with you, not at you.”


Footnote: There’s nothing significant in the use of “QA” over “testing” here; I’m using “QA” only because that’s the lingo used where I am. It works just as well if you replace “QA” with “tester” and “QA-ing” with “testing”, whether or not you care about the difference.

The Greg Score: 12 Steps to Better Testing

Ok, I’ll admit right off the bat that this post is not going to give you 12 steps to better testing on a silver platter, but bear with me.

A while back, I was trying to figure out a way for agile teams without a dedicated tester or QA expert on their team to recognize bottlenecks and inefficiencies in their testing processes. I wanted a way of helping teams see where they could improve their testing by sharing the expertise that already existed elsewhere in the company. I had been reading about maturity models, and though they can be problematic—more on that later—it lead me to try to come up with a simple list of good practices teams could aim to adopt.

When I started floating the idea with colleagues and circulating a few early drafts, a friend of mine pointed out that what I was moving towards was a lot like a testing version of the Joel Test:

The Joel Test: 12 Steps to Better Code

Now, to be clear, that Joel Test is 18 years old, and it shows. It’s outdated in a lot of ways, and even a little insulting (“you’re wasting money by having $100/hour programmers do work that can be done by $30/hour testers”). It might be more useful as a representation of where software development was in 2000 than anything else, but some parts of it still hold up. The concept was there, at least. The question for me was: could I come up with a similarly simply list of practices for testing that teams could use to get some perspective on how they were doing?

A testers’ version of the Joel Test

In my first draft I wrote out ideas for good practices, why teams would adopt it, and examples of how it would apply to some of the specific products we worked on. I came up with 20-30 ideas after that first pass. A second pass cut that nearly in half after grouping similar things together, rephrasing some to better expose the core ideas, and getting feedback from testers on a couple other teams. I don’t have a copy of the list that we came up with any more, but if I were to come up with one today off the top of my head it might include:

  1. Do tests run automatically as part of every build?
  2. Do developers get instant feedback when a commit causes tests to fail?
  3. Can someone set up a clean test environment instantly?
  4. Does each team have access to a test environment that is independent of other teams?
  5. Do you keep a record of tests results for every production release?
  6. Do you discuss as a team how features should be tested before writing any code?
  7. Is test code version controlled in sync with the code it tests?
  8. Does everybody contribute to test code?
  9. Are tests run at multiple levels of development?
  10. Do tests reliably pass when nothing is wrong?

I’m deliberately writing these to be somewhat general now, even though the original list could include a lot of technical details about our products and existing process. After I left the company, someone I had worked with on the idea joked with me that they had started calling the list the “Greg Score”. Unfortunately the whole enterprise was more of a spider than a starfish and as far as I know it never went anywhere after that.

I’m not going to go into detail about what I mean about each of these or why I thought to include it today, because I’m not actually here trying to propose this as a model (so you can hold off on writing that scathing take down of why this is a terrible list). I want to talk about the idea itself.

The problem with maturity models

When someone recently used the word “mature” in the online community in reference to testing, it sparked immediate debate about what “maturity” really means and whether it’s a useful concept at all. Unsurprisingly, Michael Bolton has written about this before, coming down hard against maturity models, in particular the TMMi. Despite those arguments, the only problem I really see is that the TMMi is someone else’s model for what maturity means. It’s a bunch of ideas about how to do good testing prioritized in a way that made sense to the people writing it at the time. Michael Bolton just happens to have a different idea of what a mature process would look like:

A genuinely mature process shouldn’t emphasize repeatability as a virtue unto itself, but rather as something set up to foster appropriate variation, sustainability, and growth. A mature process should encourage risk-taking and mistakes while taking steps to limit the severity and consequence of the mistakes, because without making mistakes, learning isn’t possible.

— Michael Bolton, Maturity Models Have It Backwards

That sounds like the outline for a maturity model to me.

In coming up with my list, there were a couple things to emphasize.

One: This wasn’t about comparing teams to say one is better than another. There is definitely a risk it could be turned into a comparison metric if poorly managed, but even if you wanted to it should prove impossible pretty quickly because:

Two: I deliberately tried to describe why a team would adopt each idea, not why they should. That is, I wanted to make it explicit that if the reasons a team would consider adopting a process didn’t exist, then they shouldn’t adopt it. If I gave this list to 10 teams, they’d all find at least one thing on it that they’d decide wasn’t important to their process. Given that, who cares if one team has 2/10 and another has 8/10, as long as their both producing the appropriate level of quality and value for their contexts? Maybe the six ideas in between don’t matter in the same way to each team, or wouldn’t have the same impact even if you did implement them.

Third: I didn’t make any claims that adopting these 10 or 12 ideas would equate to a “fully mature” or “complete” process, they were just the top 10 or 12 ideas that this workgroup of testers decided could offer the best ROI for teams in need. It was a way of offering some expertise, not of imposing a perfect system.

Different models for different needs

This list doesn’t have everything on it that I would have put on it two years ago, and it likely has things on it that I’ll disagree with two years from now. (Actually I wrote that list a couple days ago and I am already raising my eyebrow at a couple of them.) I have no reason to expect that this list would make a good model for anybody else. I don’t even have any reason to expect that it would make a good model for my own team since I didn’t get their input on it. Even if it was, I wouldn’t score perfectly on it. If you could then that means the list is out of date or no longer useful.

What I do suggest is to try to come up a list like this for yourself, in your own context. It might overlap with mine and it might not. What are the key aspects of your testing that you couldn’t do without, and what do you wish you could add? It would be very interesting to have as many testers as possible to write their own 10-point rubric for their ideal test process to see how much overlap there actually is, and then do it again in a year or two to see how it’s changed.