10,000+ synonyms for Quality Assurance

I recently had a conversation with my team about what we should call the status between when work is passed “code review” but not yet “done”. The one thing I didn’t want to call it was “In QA”. One of the developers on my team had another idea that I decided to run with:

Let me explain.

Much like Richard Bradshaw says in this video on “QA” as a term, it doesn’t make much sense to me to name one stage of a feature’s development as if that was the only stage at which QA was done. Michael Bolton regularly insists that quality assurance isn’t even a thing testers do (or should do). (Although interestingly when I brought this up in Testers Chat, Michael was the one to ask whether the name we picked would even make a material difference.) The argument generally goes that testers don’t assure quality. We can do all sorts of other things to it—inform it, study it, encourage it, foster it—but we can’t guarantee or enforce it. We especially can’t do it after the code is written.

Maybe this one is better?

I’ve mentioned before that the terminology at my company is “QA” rather than “Testing”. Asking the difference between “QA” and “Testing” is another sure-fire way to spark debate but I don’t think a piece of development should ever be in a discrete “In Testing” phase either. Generally I’m not too concerned about calling it one or the other; I’m much more interested in what people are actually doing. I haven’t seen any of the dire warnings about using “quality assurance” come true where I am now, but I’m not going to risk encouraging it with an “In QA” phase.

Here’s a third attempt at a better name for QA:

The idea that the developer on my team had was this: if I was so set against calling it anything but “QA”, let’s just take synonyms for “quality” and “assurance” and come up with something that didn’t have all that baggage. He was joking, but I ran with it. I ran with it about 13 thousand times.

Here, come up with some of your own:

This is a little script that will randomly pick alternative terms for “quality assurance”. Very rarely it might actually suggest you stick with “quality assurance”. I do not vouch for any of these being good suggestions, but I think at this point I’m more interested in discussing the merits of “quirk investigation” vs “constitution corroboration” than I am hearing more complaints about “quality assurance” as a term.

The standalone link is here if you want to keep generating more ideas, and I even made a helpful Twitter robot that’ll tweet out a new idea every day. Hit me up on my Twitter or leave a comment if you want to make sure your favourite synonyms are included. Let the pedantry begin!

The Greg Score: 12 Steps to Better Testing

Ok, I’ll admit right off the bat that this post is not going to give you 12 steps to better testing on a silver platter, but bear with me.

A while back, I was trying to figure out a way for agile teams without a dedicated tester or QA expert on their team to recognize bottlenecks and inefficiencies in their testing processes. I wanted a way of helping teams see where they could improve their testing by sharing the expertise that already existed elsewhere in the company. I had been reading about maturity models, and though they can be problematic—more on that later—it lead me to try to come up with a simple list of good practices teams could aim to adopt.

When I started floating the idea with colleagues and circulating a few early drafts, a friend of mine pointed out that what I was moving towards was a lot like a testing version of the Joel Test:

The Joel Test: 12 Steps to Better Code

Now, to be clear, that Joel Test is 18 years old, and it shows. It’s outdated in a lot of ways, and even a little insulting (“you’re wasting money by having $100/hour programmers do work that can be done by $30/hour testers”). It might be more useful as a representation of where software development was in 2000 than anything else, but some parts of it still hold up. The concept was there, at least. The question for me was: could I come up with a similarly simply list of practices for testing that teams could use to get some perspective on how they were doing?

A testers’ version of the Joel Test

In my first draft I wrote out ideas for good practices, why teams would adopt it, and examples of how it would apply to some of the specific products we worked on. I came up with 20-30 ideas after that first pass. A second pass cut that nearly in half after grouping similar things together, rephrasing some to better expose the core ideas, and getting feedback from testers on a couple other teams. I don’t have a copy of the list that we came up with any more, but if I were to come up with one today off the top of my head it might include:

  1. Do tests run automatically as part of every build?
  2. Do developers get instant feedback when a commit causes tests to fail?
  3. Can someone set up a clean test environment instantly?
  4. Does each team have access to a test environment that is independent of other teams?
  5. Do you keep a record of tests results for every production release?
  6. Do you discuss as a team how features should be tested before writing any code?
  7. Is test code version controlled in sync with the code it tests?
  8. Does everybody contribute to test code?
  9. Are tests run at multiple levels of development?
  10. Do tests reliably pass when nothing is wrong?

I’m deliberately writing these to be somewhat general now, even though the original list could include a lot of technical details about our products and existing process. After I left the company, someone I had worked with on the idea joked with me that they had started calling the list the “Greg Score”. Unfortunately the whole enterprise was more of a spider than a starfish and as far as I know it never went anywhere after that.

I’m not going to go into detail about what I mean about each of these or why I thought to include it today, because I’m not actually here trying to propose this as a model (so you can hold off on writing that scathing take down of why this is a terrible list). I want to talk about the idea itself.

The problem with maturity models

When someone recently used the word “mature” in the online community in reference to testing, it sparked immediate debate about what “maturity” really means and whether it’s a useful concept at all. Unsurprisingly, Michael Bolton has written about this before, coming down hard against maturity models, in particular the TMMi. Despite those arguments, the only problem I really see is that the TMMi is someone else’s model for what maturity means. It’s a bunch of ideas about how to do good testing prioritized in a way that made sense to the people writing it at the time. Michael Bolton just happens to have a different idea of what a mature process would look like:

A genuinely mature process shouldn’t emphasize repeatability as a virtue unto itself, but rather as something set up to foster appropriate variation, sustainability, and growth. A mature process should encourage risk-taking and mistakes while taking steps to limit the severity and consequence of the mistakes, because without making mistakes, learning isn’t possible.

— Michael Bolton, Maturity Models Have It Backwards

That sounds like the outline for a maturity model to me.

In coming up with my list, there were a couple things to emphasize.

One: This wasn’t about comparing teams to say one is better than another. There is definitely a risk it could be turned into a comparison metric if poorly managed, but even if you wanted to it should prove impossible pretty quickly because:

Two: I deliberately tried to describe why a team would adopt each idea, not why they should. That is, I wanted to make it explicit that if the reasons a team would consider adopting a process didn’t exist, then they shouldn’t adopt it. If I gave this list to 10 teams, they’d all find at least one thing on it that they’d decide wasn’t important to their process. Given that, who cares if one team has 2/10 and another has 8/10, as long as their both producing the appropriate level of quality and value for their contexts? Maybe the six ideas in between don’t matter in the same way to each team, or wouldn’t have the same impact even if you did implement them.

Third: I didn’t make any claims that adopting these 10 or 12 ideas would equate to a “fully mature” or “complete” process, they were just the top 10 or 12 ideas that this workgroup of testers decided could offer the best ROI for teams in need. It was a way of offering some expertise, not of imposing a perfect system.

Different models for different needs

This list doesn’t have everything on it that I would have put on it two years ago, and it likely has things on it that I’ll disagree with two years from now. (Actually I wrote that list a couple days ago and I am already raising my eyebrow at a couple of them.) I have no reason to expect that this list would make a good model for anybody else. I don’t even have any reason to expect that it would make a good model for my own team since I didn’t get their input on it. Even if it was, I wouldn’t score perfectly on it. If you could then that means the list is out of date or no longer useful.

What I do suggest is to try to come up a list like this for yourself, in your own context. It might overlap with mine and it might not. What are the key aspects of your testing that you couldn’t do without, and what do you wish you could add? It would be very interesting to have as many testers as possible to write their own 10-point rubric for their ideal test process to see how much overlap there actually is, and then do it again in a year or two to see how it’s changed.

 

Tests vs Checks and the complication of AI

There’s a lot to be made in testing literature of the differences between tests and checks. This seems to have been championed primarily by Michael Bolton and James Bach (with additional background information here), though it has not been without debate. I don’t have anything against the distinction such as it is, but I do think it’s interesting to look at what whether it’s really a robust one.

The definitions, just as a starting point, are given in their most recent iteration as:

Testing is the process of evaluating a product by learning about it through exploration and experimentation, which includes to some degree: questioning, study, modeling, observation, inference, etc.

Checking is the process of making evaluations by applying algorithmic decision rules to specific observations of a product.

These come with a host of caveats and clarifications; do go back and read the full explanation if you haven’t already. The difference does not seem intuitive from the words themselves, which may be why there is such debate. Indeed, I’ve never seen anybody other than testing professionals make the distinction, so in normal usage I almost exclusively hear “test” used, and never “check”. Something I might call an automated test, others might call—and insist that it be called—an automated (or machine) check. This is just a consequence of working day-to-day with developers, not with testing nerds who might care about the difference.

Along those lines, I also find it interesting that this statement, still quoting from James Bach’s blog:

One common problem in our industry is that checking is confused with testing. Our purpose here is to reduce that confusion.

goes by with little explanation. There is a clear desire to differentiate what a human can do and what a computer can do. The analogy in the preamble to craftspeople being replaced by factory workers tries to illustrate the problem, but I’m not sure it really works. The factory model also has advantages and requires it own, different, set of skilled workers. I may just be lucky in that I haven’t ever worked in an environment where I was under pressure to blindly automate everything and dismiss the value humans bring to the process, so I’ve never needed the linguistic backing to argue against that. This affords some privilege to wonder whether this distinction has come about only because of a desire to differentiate between what a computer and a human can do, or because there actually is a fundamental difference.

Obviously, as far as what is possible today, there is no argument. But the more we see AI coming into use in testing, the more difficult this distinction will become. If I have an AI that knows how to use different kind of apps, and I can give it an app without giving it any specific instructions, what is it doing? Can it ever be testing, or is it by definition only checking? There are AI products being pushed by vendors now that can report differences between builds of an app, though for now these don’t seem to be much more than a glorified diff tool or monitor for new error messages.

Nonetheless, it’s easy to imagine more and more advanced AIs that can better and better mimic what a real end user (or millions of end users) might do and how they would react. Maybe it can recognize UI elements or simulate the kinds of swipe gestures people make on a phone. Think of the sort of thing I, as a human user, might do when exploring a new app: click everywhere, swipe different ways, move things around, try changing all the settings to see what happens, etc. It’s all that exploration, experimentation, and observation that’s under the “testing” definition above, with some mental model of what I expect from each kind of interaction. I don’t think there’s anything there that an AI fundamentally can’t do, but even then, there would be some kind of report coming out the other end about what the AI found that would have to be evaluated and acted upon by a human. Is the act of running the AI itself the test, and every thing else it does just checks? If you’re the type that wants to say that “testing” by its nature can’t be automated, then do you just move the definition of testing to mean interpreting and acting on the results?

This passage addresses something along those lines, and seems to answer “yes”:

This is exactly parallel with the long established convention of distinguishing between “programming” and “compiling.” Programming is what human programmers do. Compiling is what a particular tool does for the programmer, even though what a compiler does might appear to be, technically, exactly what programmers do. Come to think of it, no one speaks of automated programming or manual programming. There is programming, and there is lots of other stuff done by tools. Once a tool is created to do that stuff, it is never called programming again.

Unfortunately “compiling” and “programming” are both a distracting choice of words for me (few of the tools I use involve compiling and the actual programming is the least interesting and least important step in producing software). More charitably, perhaps as more and more coding becomes automated (and it is), “programming” as used here becomes the act of deciding how to use those tools to get to some end result. When thinking about how the application of AI might confuse “tests” vs “checks”, this passage stuck out because it reminded me of another idea I’ve heard which I can only paraphrase: “It’s only called machine learning (or AI) until it works, then it’s just software”. Unfortunately I do not recall who said that or if I am even remembering the point correctly.

More to the point, James also notes in the comments:

Testing involves creative interpretation and analysis, and checking does not

This too seems to be a position that, as AI becomes more advanced and encroaches on areas previously thought to be exclusive to human thought, will be difficult to hold on to. Again, I’m not making the argument that an AI can replace a good tester any time soon, but I do think that sufficiently advanced tools will continue to do more and more of what we previous thought was not possible. Maybe the bar will be so high that expert tester AIs are never in high enough demand to be developed, but could we one day get to the point where the main responsibility a human “tester” has is checking the recommendations of tester AIs?

I think more likely the addition of real AIs to testing just means less checking that things work, and more focus on testing whether they actually do the right thing. Until AIs can predict what customers or users want better than the users themselves, us humans should still have plenty to do, but that distinction is a different one than just “test” vs “check”.