Debating the Modern Testing Principles

Last week I had the opportunity to moderate a discussion on the Modern Testing Principles being developed by Alan Page and Brent Jensen with a group of QA folks. I’m a relative late-comer to the AB Testing podcast, having first subscribed somewhere around Episode 60, but have been quite interested in this take on testing. Armed primarily with the 4 episodes starting from their initial introduction and some back-up from the article by the Ministry of Testing, we had a pretty interesting discussion.

Discussing the seven principles

After giving a bit of a preamble based on the mission statement—“Accelerate the Achievement of Shippable Quality”—we went through the principles one by one. For each one I asked (roughly) these questions:

  1. What do you think this statement means?
  2. How do you feel about this a core principle of testing?
  3. How well (or not) does this describe your approach to testing?
  4. Is this a principle you would adopt?

For the first four principles, there was a lot of agreement. We discussed building better products versus trying to assure the product’s quality, the importance of prioritization of tests and identifying bottlenecks, leaky safety nets, data-driven decisions, and the easy alignment with a whole-team Agile mindset. Then it started to get a bit more interesting.

Disagreement one: Judging Quality

The fifth principle started to get problematic for some people:

5. We believe that the customer is the only one capable to judge and evaluate the quality of our product.

There was a lot of debate here. Although a couple people were on board right away, the biggest question for most in the room was: who is the “customer”? Lots of people could fall into that category. Internally there are stakeholders in different parts of the business, product owners in our team, managers in our department, and team itself to some degree. We also have both our current end users and the people we want to attract into regular users. Some of you may have simpler environments with a clear cut individual client, but others could be even more complicated.

What we did agree on was that you have to use the product to be able to judge it. The people testing have to think like the customer and have a good idea of what their expectations are. Interestingly, when we changed “customer” to “anybody who uses the product”, everybody around the table could agree with the principle as a whole.

I suspect, though, that if we only say “anybody who uses the product is capable of judging and evaluating the quality of the product”, the statement loses its power. My feeling is that if this principle feels problematic in its original form, you may just not have a firm idea of who your customer really is. This just highlights for me how important it is to ask who’s opinion, at the end of the day, is the one that counts.

Disagreement two: The dedicated specialist

It’s likely unsurprising that a principle suggesting the elimination of the testing specialist would raise a few eyebrows in a group of testing specialists.

7. We expand testing abilities and knowhow across the team; understanding that this may reduce (or eliminate) the need for a dedicated testing specialist.

There was no disagreement with the first clause. Many people immediately connected it with the 4th principle, to “coach, lead, and nurture the team towards a more mature quality culture”. Surely endeavouring to “expand the testing abilities and know-how across the team” is a good way to achieve that. When the group initially discussed the 4th principle, we were all in agreement that we wanted to drive a culture of quality and a whole-team approach to testing.

I am still unsure whether the disagreement with eliminating the dedicated specialist was just a knee-jerk reaction or not. I tried to use an analogy of the tester-as-Mary-Poppins: She stays only as long as she is truly needed, and then takes to the wind again to find a new family in need. It didn’t seem to sell the point. We agreed that our teams should be able to function without us… temporarily. There was one assertion that QA was the most important part of the whole process and therefore could not be eliminated. Another one that the skills are different from other roles. And yet another that not everybody wants to be a dev. (Although, of course, the principle doesn’t end with “… so that they can become a developer.”)

Additional context from Alan and Brent helps here too. In some of the episodes after the principles were first introduced, they do talk about now not every tester needs to be a Capital-M Capital-T Modern Tester. I don’t believe the intent is to eventually eliminate the need for testing specialists full stop. It’s not even a given that the specialist would be eliminated on a particular team, just that the need for a specialist should be reduced. To me this principle is a corollary of reducing bottlenecks and building the testing know-how on the team, albeit phrased more provocatively.

Nonetheless, the closest we got to agreement on this was to say we could eventually eliminate the singular position of a testing specialist, but not eliminate the function.

Is that any different or just an easier pill to swallow?

Wrapping up

Both of these, the two biggest objections to the Modern Testing Principles, have a common theme. The 4th principle asserts that testers aren’t the judge of quality or even truly capable of evaluating it. The 7th pushes the idea that given the right expertise and know-how, a testing specialist may not even be needed. Both of these can feel like a threat. Both speak to a fear of losing agency. Alan and Brent also talked about this in the podcasts: one of the motivations for formulating these principles was to prepare people for how testing is changing so that we aren’t all caught unprepared. While I have doubts that there’s an apocalyptic testing singularity coming—something I plan to write on in another post—it does emphasize how important it is to be prepared for new ways of thinking in the industry.

To wrap up the discussion, we did a quick summary of the words and concepts that had come up as common themes in the principles. Then, to compare, I asked for testing concepts or buzzwords that had been conspicuously absent. Chief among the latter were automation, defect tracking, reporting, traceability, documentation, and not once did we talk about writing a test case. Highlighting what was not explicitly mentioned in the principles seemed to be a great way to highlight what makes this a different approach compared to a lot of our day-to-day experience. Though some of those “missing” elements may come out naturally as tools necessary to really embrace these principles, I felt it important to highlight that they were not the goal in and of themselves.

In the end, these differences and the disagreements were the most interesting part of the Modern Testing Principles. Alan described presenting the principles at Test Bash in much the same way—it’s not much fun if everybody just agrees with everything! Hopefully the discussions sparked some new ways of thinking, if only a little bit.

Testing is like a box of rocks

I was inspired today by Alan Page’s Test Automation Snowman. He makes all good points, but let’s be honest, the model is the same as the good ol’ test pyramid. The only difference is that he’s being explicit about tests at the top of the pyramid being slow and tests at the bottom being fast. Ok, so maybe the snowman thing is a joke, but it did make me think about what might make a better visualization. I quickly doodled something on a sticky note:

A sticky note with a lot of tiny circles in the bottom third, medium circles in the middle third, and a few large circles in the top third.

If the point we want to emphasize is that UI tests are slow (and therefore take a lot of time), we should include that in the visualization! The problem with the pyramid (and the snowman) is that the big tests take up the least amount of space; the small triangle at the top makes it look like having fewer UI tests also means you do less UI testing.

It doesn’t.

At least, not proportionately. If you had an equal number of UI and unit tests, it’s a safe bet that you’re going to spend more of your time working on the UI tests.

So instead, let’s say testing is like a box of rocks. Each rock is a test, and I have to choose how to allocate the space in that box to fit all the rocks that I want. A few big rocks are going to take up a lot more space than a bunch of tiny pebbles. Unless I have a good reason why that big boulder is a lot more interesting than the hundred little rocks I could put in its place, I’m going to go for the little rocks! If I have to add a new rock (to kill a new bug, say) I probably want to choose the smallest one that’ll still do the job.

You can still think about the different levels (unit vs API vs UI, for example) if you picture the little rocks at the bottom forming a foundation for bigger rocks on top. I don’t know if rocks work like that. Just be careful not to get this whole thing confused with that dumb life metaphor.

Ok, it might not be the best model, but I’ll stick with it for now. And like the Alan’s snowman, you’re free to ignore this one too.

Agile Testing book club: Let them feel pain

This is the second part is a series of exercises where I highlight one detail from a chapter or two of Agile Testing by Janet Gregory and Lisa Crispin. Part one of the series can be found here. This installment comes from Chapter 3.

Let them feel pain

This chapter is largely about making the transition into agile workflows, and the growing pains that can come from that. I’ve mentioned before on this blog that when I went through that transition, I worried about maintaining the high standard of testing that we had in place. The book is coming from a slightly different angle, of trying to overcome reluctance to introducing good quality practices, but the idea is the same. This is the sentence that stuck out most to me in the whole chapter:

Let them feel pain: Sometimes you just have to watch the train wreck.

I did eventually learn this lesson, though it took probably 6 months of struggling against the tide and a tutorial session by Mike Sowers at STAR Canada on metrics before it really sunk in. Metrics are a bit of a bugaboo in testing these days, but just hold your breath and power through with me for a second. Mike was going over the idea of “Defect Detection Percentage”, which basically just asks what percentage of bugs you caught before releasing. The usefulness of it was that you can probably push it arbitrarily high, so that you catch 99% of bugs before release, but you have to be willing to spend the time to do it. On the other end, maybe your customers are happy with a few bugs if it means they get access to the new features sooner, in which case you can afford to limit the kinds of testing you do. If you maintain an 80% defect detection percentage and still keep your customers happy, it’s not worth the extra time testing it’d take to get that higher. Yes this all depends on how you count bugs, and happiness, and which bugs you’re finding, and maybe you can test better instead of faster, but none of that is the point. This is:

If you drop some aspect of testing and the end result is the same, then it’s probably not worth the effort to do it in the first place.

There are dangers here, of course. You don’t want to drop one kind of testing just because it takes a lot of time if it’s covering a real risk. People will be happy only until that risk manifests itself as a nasty failure in the product. As ever, this is an exercise of balancing concerns.

Being in a bad spot

Part of why this idea stuck with me at the time was that the rocky transition I was going through left me in a pretty bad mental space. I eventually found myself thinking, “Fine, if nobody else cares about following these established test processes like I do, then let everybody else stop doing them and we’ll see how you like it when nothing works anymore.”

This is the cynical way of reading the advice from Janet and Lisa to “let them feel pain” and sit back to “watch the train wreck”. In the wrong work environment you can end up reaching the same conclusion from a place of spite, much like I did. But it doesn’t have to come from a negative place. Mike framed it in terms of metrics and balancing cost and benefit in a way that provided some clarity for an analytical mind like mine, and I think Lisa and Janet are being a bit facetious here deliberately. Now that I’m working in a much more positive space (mentally and corporately) I have a much better way of interpreting this idea: the best motivation for people to change their practices is for them to have reason to change them.

What actually happened for us when we started to drop our old test processes was that everything was more or less fine. The official number of bugs recorded went down overall, but I suspect that was as much a consequence of the change in our reporting habits in small agile teams as anything else. We definitely still pushed bugs into production, but they weren’t dramatically worse than before. What I do know for sure is that nobody came running to me saying “Greg you were right all along, we’re putting too many bugs into production, please save us!”

If that had happened, then great, there would be motivation to change something. But it didn’t, so we turned our attention to other things entirely.

Introducing change (and when not to)

When thinking about this, I kept coming back to two other passages that I had highlighted earlier in the same chapter:

If you are a tester in an organization that has no effective quality philosophy, you probably struggle to get good quality practices accepted. The agile approach will provide you with a mechanism for introducing good quality-oriented practices.

and also

If you’re a tester who is pushing for the team to implement continuous integration, but the programmers simply refuse to try, you’re in a bad spot.

Agile might provide a way of introducing new processes, but it doesn’t mean that anybody is going to want to embrace or even try them. If you have to twist some arms to get a commitment to try something new for even one sprint, if it doesn’t have a positive (or at least neutral) impact you better be prepared to let the team drop it (or convince them that the effects need more time to be seen). If everybody already feels that the deployment process is going swimmingly, why do you need to introduce continuous integration at all?

It might be easy when it’s deciding not to keep something new, but when already established test processes were on the line, this was a very hard thing for me to do. In a lot of ways it was like being forced to admit that we had been wrong about what testing we needed to be doing, even though all of it had been justified at one time or another. We had to realize that certain tests were generating pain for the team, and the only way we could tell if that was really worth it was to drop them and see what happens.

The take away

Today I’m in a much different place. I’m no longer coping with the loss or fragmentation of huge and well established test processes, but rather looking at establishing new processes on my team and those we work with. As tempting as it is to latch onto various testing ideas and “best” practices I hear about, it’s likely wasted effort if I don’t first ask “where are we feeling the most pain?”

The Greg Score: 12 Steps to Better Testing

Ok, I’ll admit right off the bat that this post is not going to give you 12 steps to better testing on a silver platter, but bear with me.

A while back, I was trying to figure out a way for agile teams without a dedicated tester or QA expert on their team to recognize bottlenecks and inefficiencies in their testing processes. I wanted a way of helping teams see where they could improve their testing by sharing the expertise that already existed elsewhere in the company. I had been reading about maturity models, and though they can be problematic—more on that later—it lead me to try to come up with a simple list of good practices teams could aim to adopt.

When I started floating the idea with colleagues and circulating a few early drafts, a friend of mine pointed out that what I was moving towards was a lot like a testing version of the Joel Test:

The Joel Test: 12 Steps to Better Code

Now, to be clear, that Joel Test is 18 years old, and it shows. It’s outdated in a lot of ways, and even a little insulting (“you’re wasting money by having $100/hour programmers do work that can be done by $30/hour testers”). It might be more useful as a representation of where software development was in 2000 than anything else, but some parts of it still hold up. The concept was there, at least. The question for me was: could I come up with a similarly simply list of practices for testing that teams could use to get some perspective on how they were doing?

A testers’ version of the Joel Test

In my first draft I wrote out ideas for good practices, why teams would adopt it, and examples of how it would apply to some of the specific products we worked on. I came up with 20-30 ideas after that first pass. A second pass cut that nearly in half after grouping similar things together, rephrasing some to better expose the core ideas, and getting feedback from testers on a couple other teams. I don’t have a copy of the list that we came up with any more, but if I were to come up with one today off the top of my head it might include:

  1. Do tests run automatically as part of every build?
  2. Do developers get instant feedback when a commit causes tests to fail?
  3. Can someone set up a clean test environment instantly?
  4. Does each team have access to a test environment that is independent of other teams?
  5. Do you keep a record of tests results for every production release?
  6. Do you discuss as a team how features should be tested before writing any code?
  7. Is test code version controlled in sync with the code it tests?
  8. Does everybody contribute to test code?
  9. Are tests run at multiple levels of development?
  10. Do tests reliably pass when nothing is wrong?

I’m deliberately writing these to be somewhat general now, even though the original list could include a lot of technical details about our products and existing process. After I left the company, someone I had worked with on the idea joked with me that they had started calling the list the “Greg Score”. Unfortunately the whole enterprise was more of a spider than a starfish and as far as I know it never went anywhere after that.

I’m not going to go into detail about what I mean about each of these or why I thought to include it today, because I’m not actually here trying to propose this as a model (so you can hold off on writing that scathing take down of why this is a terrible list). I want to talk about the idea itself.

The problem with maturity models

When someone recently used the word “mature” in the online community in reference to testing, it sparked immediate debate about what “maturity” really means and whether it’s a useful concept at all. Unsurprisingly, Michael Bolton has written about this before, coming down hard against maturity models, in particular the TMMi. Despite those arguments, the only problem I really see is that the TMMi is someone else’s model for what maturity means. It’s a bunch of ideas about how to do good testing prioritized in a way that made sense to the people writing it at the time. Michael Bolton just happens to have a different idea of what a mature process would look like:

A genuinely mature process shouldn’t emphasize repeatability as a virtue unto itself, but rather as something set up to foster appropriate variation, sustainability, and growth. A mature process should encourage risk-taking and mistakes while taking steps to limit the severity and consequence of the mistakes, because without making mistakes, learning isn’t possible.

— Michael Bolton, Maturity Models Have It Backwards

That sounds like the outline for a maturity model to me.

In coming up with my list, there were a couple things to emphasize.

One: This wasn’t about comparing teams to say one is better than another. There is definitely a risk it could be turned into a comparison metric if poorly managed, but even if you wanted to it should prove impossible pretty quickly because:

Two: I deliberately tried to describe why a team would adopt each idea, not why they should. That is, I wanted to make it explicit that if the reasons a team would consider adopting a process didn’t exist, then they shouldn’t adopt it. If I gave this list to 10 teams, they’d all find at least one thing on it that they’d decide wasn’t important to their process. Given that, who cares if one team has 2/10 and another has 8/10, as long as their both producing the appropriate level of quality and value for their contexts? Maybe the six ideas in between don’t matter in the same way to each team, or wouldn’t have the same impact even if you did implement them.

Third: I didn’t make any claims that adopting these 10 or 12 ideas would equate to a “fully mature” or “complete” process, they were just the top 10 or 12 ideas that this workgroup of testers decided could offer the best ROI for teams in need. It was a way of offering some expertise, not of imposing a perfect system.

Different models for different needs

This list doesn’t have everything on it that I would have put on it two years ago, and it likely has things on it that I’ll disagree with two years from now. (Actually I wrote that list a couple days ago and I am already raising my eyebrow at a couple of them.) I have no reason to expect that this list would make a good model for anybody else. I don’t even have any reason to expect that it would make a good model for my own team since I didn’t get their input on it. Even if it was, I wouldn’t score perfectly on it. If you could then that means the list is out of date or no longer useful.

What I do suggest is to try to come up a list like this for yourself, in your own context. It might overlap with mine and it might not. What are the key aspects of your testing that you couldn’t do without, and what do you wish you could add? It would be very interesting to have as many testers as possible to write their own 10-point rubric for their ideal test process to see how much overlap there actually is, and then do it again in a year or two to see how it’s changed.

 

Yes, I test in production

Recently a post on Reddit about a company doing tests on a live production environment sparked some conversation on the Testers.io slack channel about whether “testing in production” is a wise idea or not. One Reddit user commented saying:

Rule number 1 of testing (i.m.o): DO IT ON A NON-PRODUCTION ENVIRONMENT.
Rule number 2 of testing (i.m.o): Make sure you are NOT, I repeat, NOT on a production environment.

Three years ago I might have agreed with that. Today, I absolutely don’t. Am I crazy?

never test in production… for some definitions of production

The original post describes a situation where some medical equipment stopped working over night. After much debugging and technical support, the cause was identified to be that the machines were remotely put into a special mode for testing by the vendor and not restored before the morning.

There’s unlikely anything controversial in saying that this wasn’t a good move on the vendor’s part. They were messing with something a customer was, or could have been, using. Without notifying them about it. Though it’s all the more egregious because of being medical equipment, any customer would be annoyed by this when they found out. But you can’t extend that in a blanket way to all kinds of production environments.

There is certainly a lesson to be learned here, but we will get more from it by being more specific. One might suggest any of:

  • Never test something your customer is already using
  • Never test in a way your customer will notice
  • Never test something your customer will notice without telling them

But I can think of counter-examples to each of these, and it boils down to a very simple observation.

If you never test x, you will miss things about x

If you never test in production, you’re robbing yourself of insights about production. You won’t necessarily miss bugs, but unless you have a test environment that mimics every aspect of production perfectly (and none of us do), there will be something that goes on in production that you won’t see.

This is what I didn’t understand four years ago when I started in this line of work. In my first testing job, we didn’t test in production, so why would we test in production? The naïveté of a newbie tester, eh?

Over time we got bit by this in many different ways. The most common category of issues were use cases happening in production that we simply didn’t anticipate. Of course, some of these you might find earlier by using production data in a test environment, but you will always be limited by your sample. A second category of issues would then arise around faulty assumptions in your test environment. It’s those little details that “shouldn’t affect testing” until they do. If you’re lucky you spot an error right away. If you’re less lucky you push a feature that quietly does nothing until somebody notices it isn’t there. If you’re really having a bad day it silently does the wrong thing.

It’s around this time that you start to catch on to the fact that you need to test new deployments, at the very least to verify that something is working “in the real world”. At this point you’re testing, to some degree, in production. Are we as far along as turning off medical equipment a customer is using? No, but we are already bending the “never test in production” rule.

This is just the beginning of a long list of reasons; iAmALittleTester has compiled a list of many similar scenarios where testing in production can provide information that you aren’t getting otherwise. All of these, though, count on the fact that you aren’t going to break anything by testing. (Maybe this is the crux: if you think that the job of a software tester is to break the software, then you likely think “testing in production” is synonymous with “breaking production”!)

What can you do safely?

One of the key differences between testing in a web or back-end server environment and hardware owned by a customer is that I can usually send requests to a server and examine the response without impacting any other requests hitting the same server. I probably can’t do that with customer hardware. The parallel would be that you probably shouldn’t use a customer’s account for your tests. You need to be careful about any state or statistics a system might be keeping track of. (Though I would raise an eyebrow at anything that needs to be aware that it is being tested. If you have “test mode” in your code, you’ve just created a mini test environment in a live program. Are you gaining any advantages from being in production in that case?)

Whether this works, and the types of things you can do, will depend on the issues you expect to find. I’m not going to try stress testing my production environment during peak traffic. If I suspect that a certain kind of request will corrupt the state of the server, I’m certainly not going to do that in production either. If my test has any chance of having a negative effect on a user, I’m not going to risk that. But on a web app, one more anonymous request should be no different from what “real” users are sending the app. And on the subject of “real” users…

I’m not just a QA, I’m also a user

This is the aspect that I’ve found most useful about testing a production environment. Much like experience is often best gained by doing, knowledge of how a product works may be best gained by using it. If you only use a product in a test environment, then you only know about how your product works in a test environment. There are lots of insights about a product that can’t come from simply using it, of course, and in some cases it isn’t realistic to expect to be able to use a product as much as the intended users. But if it possible, if you can make it possible, then it is an opportunity to see things in a different way.

When I do get to use something I’m working on like an end-user does, on some level I’m always testing it. It’s not a big leap to say that your end users are, on some level, testing your product every time they use it. If your users are testing in production, why aren’t you?


Addendum: Rosie Sherry pulled together some resources on testing in production over at the MoT Club that are definitely worth checking out. Chaos Monkey from Netflix is one really interesting way of testing your production setup that I meant to mention in this post, but was only reminded of again when I came across this thread.