Seven things I learned at CAST

My CAST debrief from last week ended up being mostly about personal reflection, but I also wanted to call out the some of the things I picked up more specific to testing. After going through my notes I picked out what I thought where the important themes or interesting insights. There are many other tidbits, interactions, and ideas from those notes that I may write about later, these are just the top seven that I think will have an impact right away.

Each section here is paraphrased from the notes I took in each speaker’s session.

1. “Because we don’t like uncertainty we pretend we don’t have it”
Liz Keogh presented a keynote on using Cynefin as a way of embracing uncertainty and chaos in how we work. We are hard wired to see patterns where they don’t exist. “We treat complexity as if it’s predictable.” People experience the highest stress when shocks are unpredictable, so let’s at least prepare for them. To be safe to fail you have to be able to (1) know if it works, (2) amplify it if it does work, (3) know of it fails, (4) dampen it if it does fail, and (5) have a realistic reason for thinking there will be a positive impact in the first place. It’s not about avoiding failure, it’s about being prepared for it. The linked article above is definitely worth the read.

2. Testers are designers
From John Cutler’s keynote: “Design is the rendering of intent”. Anybody who makes decisions that affect that rendering is a designer. What you test, when you test, who you include, your mental models of the customer, and how we talk about what we find all factor in. UX researchers to have to know that they aren’t the user, but often the tester has used the product more than anybody else in the group. There’s no such thing as a design solution, but there is a design rationale. There should be a narrative around the choices we make for our products, and testers provide a ton of the information for that narrative. Without that narrative, no design was happening (nor any testing!) because there’s no sense of the intent.

3. Knowledge goes stale faster than tickets on a board do
From Liz Keogh’s lightning talk: If testers are at their WIP limit and devs can’t push any more work to them, your team will actually go faster if you make the devs go read a book rather than take on new work that will just pile up. You lose more productivity in catching up on work someone has already forgotten about than you gain in “getting ahead”. In practice, of course, you should have the devs help the testers when they’re full and the testers help the devs when they have spare capacity. (By the way, it’s not a Kanban board unless you have a signal for when you can take on new work, like an empty slot in a column. That empty slot is the kanban.)

4. “No matter how it looks at first, it’s always a people problem” – Jerry Weinberg
I don’t recall which session this quote first came up in, but it became a recurring theme in many of the sessions and discussions, especially around coaching testing, communicating, and whole-team testing. Jerry passed away just before the conference started, and though I had never met him he clearly had a strong influence on many of the people there. He’s probably most often cited for his definition that “quality is value to some person”.

5. Pair testing is one of the most powerful techniques we have
Though I don’t think anybody said this explicitly, this was evident to me from how often the concept came up. Lisi Hocke gave a talk about pairing with testers from other companies to improve her own testing while cross-pollinating testing ideas and skills with the wider community. Amit Wertheimer cited pairing with devs as a great way to identify tools and opportunities to make their lives easier. Jose Lima talked about running group exploratory testing sessions and the benefits that brings in learning about the product and coaching testing. Of course coaching itself, I think, is a form of pairing so the tutorial with Anne-Marie Charrett contributed to this theme as well.  This is something that I need to do more of.

6. BDD is not about testing, it’s about collaboration
From Liz Keogh again: “BDD is an analysis technique, it’s not about testing.” It’s very hard to refactor English and non-technical people rarely read the cucumber scenarios anyway. She says to just use a DSL instead. If you’re just implementing cucumber tests but aren’t having a 3-amigos style conversation with the business about what the scenarios should be, it isn’t BDD. Angie Jones emphasized these same points when introducing cucumber in her automation tutorial as a caveat that she was only covering the automation part of BDD in the tutorial, not BDD itself. Though I’ve worked in styles that called themselves “behaviour driven”, I’ve never worked with actual “BDD”, and this was the first time I’ve heard of it being more than a way of automating test cases.

7. Want to come up with good test ideas? Don’t read the f*ing manual!
From Paul Holland: Detailed specifications kill creativity. Start with high level bullet points and brainstorm from there. Share ideas round-robin (think of playing “yes and“) to build a shared list. Even dumb ideas can trigger good ideas. Encourage even non-sensical ideas since users will do things you didn’t think of. Give yourself time and space away to allow yourself to be creative, and only after you’ve come up with every test idea you can should you start looking at the details. John Cleese is a big inspiration here.

Bonus fact: Anybody can stop a rocket launchThe bright light of a rocket engine taking off in the night
The “T-minus” countdown for rocker launches isn’t continuous, they pause it at various intervals and reset it back to checkpoints when they have to address something. What does this have to do with testing? Just that the launch of the Parker Solar Probe was the weekend after CAST. At 3:30am on Saturday I sat on the beach with my husband listening to the live feed from NASA as the engineers performed the pre-launch checks: starting, stopping, and resetting the clock as needed. I was struck by the fact that at “T minus 1 minute 55 seconds” one calm comment from one person about one threshold being crossed scrubbed the entire launch without any debate. There wouldn’t be time to rewind to the previous checkpoint at T-minus 4 minutes before the launch window closed, so they shut the whole thing down. I’m sure that there’s an analogy to the whole team owning gates in their CD pipelines in there somewhere!

Tests vs Checks and the complication of AI

There’s a lot to be made in testing literature of the differences between tests and checks. This seems to have been championed primarily by Michael Bolton and James Bach (with additional background information here), though it has not been without debate. I don’t have anything against the distinction such as it is, but I do think it’s interesting to look at what whether it’s really a robust one.

The definitions, just as a starting point, are given in their most recent iteration as:

Testing is the process of evaluating a product by learning about it through exploration and experimentation, which includes to some degree: questioning, study, modeling, observation, inference, etc.

Checking is the process of making evaluations by applying algorithmic decision rules to specific observations of a product.

These come with a host of caveats and clarifications; do go back and read the full explanation if you haven’t already. The difference does not seem intuitive from the words themselves, which may be why there is such debate. Indeed, I’ve never seen anybody other than testing professionals make the distinction, so in normal usage I almost exclusively hear “test” used, and never “check”. Something I might call an automated test, others might call—and insist that it be called—an automated (or machine) check. This is just a consequence of working day-to-day with developers, not with testing nerds who might care about the difference.

Along those lines, I also find it interesting that this statement, still quoting from James Bach’s blog:

One common problem in our industry is that checking is confused with testing. Our purpose here is to reduce that confusion.

goes by with little explanation. There is a clear desire to differentiate what a human can do and what a computer can do. The analogy in the preamble to craftspeople being replaced by factory workers tries to illustrate the problem, but I’m not sure it really works. The factory model also has advantages and requires it own, different, set of skilled workers. I may just be lucky in that I haven’t ever worked in an environment where I was under pressure to blindly automate everything and dismiss the value humans bring to the process, so I’ve never needed the linguistic backing to argue against that. This affords some privilege to wonder whether this distinction has come about only because of a desire to differentiate between what a computer and a human can do, or because there actually is a fundamental difference.

Obviously, as far as what is possible today, there is no argument. But the more we see AI coming into use in testing, the more difficult this distinction will become. If I have an AI that knows how to use different kind of apps, and I can give it an app without giving it any specific instructions, what is it doing? Can it ever be testing, or is it by definition only checking? There are AI products being pushed by vendors now that can report differences between builds of an app, though for now these don’t seem to be much more than a glorified diff tool or monitor for new error messages.

Nonetheless, it’s easy to imagine more and more advanced AIs that can better and better mimic what a real end user (or millions of end users) might do and how they would react. Maybe it can recognize UI elements or simulate the kinds of swipe gestures people make on a phone. Think of the sort of thing I, as a human user, might do when exploring a new app: click everywhere, swipe different ways, move things around, try changing all the settings to see what happens, etc. It’s all that exploration, experimentation, and observation that’s under the “testing” definition above, with some mental model of what I expect from each kind of interaction. I don’t think there’s anything there that an AI fundamentally can’t do, but even then, there would be some kind of report coming out the other end about what the AI found that would have to be evaluated and acted upon by a human. Is the act of running the AI itself the test, and every thing else it does just checks? If you’re the type that wants to say that “testing” by its nature can’t be automated, then do you just move the definition of testing to mean interpreting and acting on the results?

This passage addresses something along those lines, and seems to answer “yes”:

This is exactly parallel with the long established convention of distinguishing between “programming” and “compiling.” Programming is what human programmers do. Compiling is what a particular tool does for the programmer, even though what a compiler does might appear to be, technically, exactly what programmers do. Come to think of it, no one speaks of automated programming or manual programming. There is programming, and there is lots of other stuff done by tools. Once a tool is created to do that stuff, it is never called programming again.

Unfortunately “compiling” and “programming” are both a distracting choice of words for me (few of the tools I use involve compiling and the actual programming is the least interesting and least important step in producing software). More charitably, perhaps as more and more coding becomes automated (and it is), “programming” as used here becomes the act of deciding how to use those tools to get to some end result. When thinking about how the application of AI might confuse “tests” vs “checks”, this passage stuck out because it reminded me of another idea I’ve heard which I can only paraphrase: “It’s only called machine learning (or AI) until it works, then it’s just software”. Unfortunately I do not recall who said that or if I am even remembering the point correctly.

More to the point, James also notes in the comments:

Testing involves creative interpretation and analysis, and checking does not

This too seems to be a position that, as AI becomes more advanced and encroaches on areas previously thought to be exclusive to human thought, will be difficult to hold on to. Again, I’m not making the argument that an AI can replace a good tester any time soon, but I do think that sufficiently advanced tools will continue to do more and more of what we previous thought was not possible. Maybe the bar will be so high that expert tester AIs are never in high enough demand to be developed, but could we one day get to the point where the main responsibility a human “tester” has is checking the recommendations of tester AIs?

I think more likely the addition of real AIs to testing just means less checking that things work, and more focus on testing whether they actually do the right thing. Until AIs can predict what customers or users want better than the users themselves, us humans should still have plenty to do, but that distinction is a different one than just “test” vs “check”.