Unit tests versus the unit tested

I recently read the great and oft-cited article about testing microservice architectures by Cindy Sridharan over on Medium. It’s broadly applicable beyond just “microservices”, so I highly recommend giving it a read. I was struck by this passage in particular:

The main thrust of my argument wasn’t that unit testing is completely obviated by end-to-end tests, but that being able to correctly identifying the “unit” under test might mean accepting that the unit test might resemble what’s traditionally perceived as “integration testing” involving communication over a network.
— Cindy Sridharan, “Testing Microservices, the Sane Way

Articulating what you’re actually trying to test is one of the most underrated skills in testing. It seems like a simple question to ask, but often a forgotten one. Asking “what information does this test give me that no other test does?” is great heuristic for determining whether a test, especially an automated one, is worth keeping. It’s also a convenient go-to when evaluating whether a tester knows what they’re doing, and when trying to understand what a test does.

What interests me about this passage is that Cindy is highlighting how “the thing being tested” gets conflated with the “unit” in “unit testing”.

When I used to train new testers at my company, I would present four levels of testing: unit, component, integration, and system. Invariably, people would ask what the difference between component and integration was. Unit tests were easy because that tended to be the first (and sometimes) only kind of testing incoming devs were familiar with. System tests were easily understood as the ones at the end with everything up and running. But when are you testing a component versus an integration? It’s not obvious, so it’s no surprise that three-tiered descriptions are much more common these days.

For us, testing a component was sometimes a single service. At other times it was testing a well defined system of a larger program (one that probably would have been a microservice if we built the system from scratch). Inputs and outputs were controlled directly, and no other services needed to be running for it to communicate with. We defined a “unit” as the smallest possible thing that could be tested (often a single function) and the “component” was running the actual program.

Is that different from an “integration” test of those units? Not really. Running the component is an integration of smaller units. But it was convenient for us to separate those tests from testing how separate services communicated with each other. You can ask the same question anywhere along the continuum from function to system. Is testing a class a unit test or a component test? Where do you draw the lines?

The confusion highlighted in the passage above, to me, is only because of different definitions of “unit”. If you want to call the thing, the service or behaviour, that you’re testing a “unit” instead of a “component”, go for it. If you want to call the communication between two services the “unit” (as this article does), great. This ambiguity should not be an obstacle to understanding the point: you should know what you’re testing and have a reason for testing it.

And, of course, that you should be testing the most important things. Which means, for example, don’t mock away the database if your service’s main responsibility is writing to the database. The problem isn’t that Cindy is saying “unit tests should be communicating over the network,” though you might read it that way if you’re dogmatic about the term “unit test”. She’s saying “communicating over the network is important, so I’m going to prioritize that communication as the subject (unit) of my tests.”

For all these terms it’s unlikely we’re ever going to setting on unambiguous definitions. Let’s just try to be clear about what we mean when we use them, and clear about what we’re testing with each test.

Tests vs Checks and the complication of AI

There’s a lot to be made in testing literature of the differences between tests and checks. This seems to have been championed primarily by Michael Bolton and James Bach (with additional background information here), though it has not been without debate. I don’t have anything against the distinction such as it is, but I do think it’s interesting to look at what whether it’s really a robust one.

The definitions, just as a starting point, are given in their most recent iteration as:

Testing is the process of evaluating a product by learning about it through exploration and experimentation, which includes to some degree: questioning, study, modeling, observation, inference, etc.

Checking is the process of making evaluations by applying algorithmic decision rules to specific observations of a product.

These come with a host of caveats and clarifications; do go back and read the full explanation if you haven’t already. The difference does not seem intuitive from the words themselves, which may be why there is such debate. Indeed, I’ve never seen anybody other than testing professionals make the distinction, so in normal usage I almost exclusively hear “test” used, and never “check”. Something I might call an automated test, others might call—and insist that it be called—an automated (or machine) check. This is just a consequence of working day-to-day with developers, not with testing nerds who might care about the difference.

Along those lines, I also find it interesting that this statement, still quoting from James Bach’s blog:

One common problem in our industry is that checking is confused with testing. Our purpose here is to reduce that confusion.

goes by with little explanation. There is a clear desire to differentiate what a human can do and what a computer can do. The analogy in the preamble to craftspeople being replaced by factory workers tries to illustrate the problem, but I’m not sure it really works. The factory model also has advantages and requires it own, different, set of skilled workers. I may just be lucky in that I haven’t ever worked in an environment where I was under pressure to blindly automate everything and dismiss the value humans bring to the process, so I’ve never needed the linguistic backing to argue against that. This affords some privilege to wonder whether this distinction has come about only because of a desire to differentiate between what a computer and a human can do, or because there actually is a fundamental difference.

Obviously, as far as what is possible today, there is no argument. But the more we see AI coming into use in testing, the more difficult this distinction will become. If I have an AI that knows how to use different kind of apps, and I can give it an app without giving it any specific instructions, what is it doing? Can it ever be testing, or is it by definition only checking? There are AI products being pushed by vendors now that can report differences between builds of an app, though for now these don’t seem to be much more than a glorified diff tool or monitor for new error messages.

Nonetheless, it’s easy to imagine more and more advanced AIs that can better and better mimic what a real end user (or millions of end users) might do and how they would react. Maybe it can recognize UI elements or simulate the kinds of swipe gestures people make on a phone. Think of the sort of thing I, as a human user, might do when exploring a new app: click everywhere, swipe different ways, move things around, try changing all the settings to see what happens, etc. It’s all that exploration, experimentation, and observation that’s under the “testing” definition above, with some mental model of what I expect from each kind of interaction. I don’t think there’s anything there that an AI fundamentally can’t do, but even then, there would be some kind of report coming out the other end about what the AI found that would have to be evaluated and acted upon by a human. Is the act of running the AI itself the test, and every thing else it does just checks? If you’re the type that wants to say that “testing” by its nature can’t be automated, then do you just move the definition of testing to mean interpreting and acting on the results?

This passage addresses something along those lines, and seems to answer “yes”:

This is exactly parallel with the long established convention of distinguishing between “programming” and “compiling.” Programming is what human programmers do. Compiling is what a particular tool does for the programmer, even though what a compiler does might appear to be, technically, exactly what programmers do. Come to think of it, no one speaks of automated programming or manual programming. There is programming, and there is lots of other stuff done by tools. Once a tool is created to do that stuff, it is never called programming again.

Unfortunately “compiling” and “programming” are both a distracting choice of words for me (few of the tools I use involve compiling and the actual programming is the least interesting and least important step in producing software). More charitably, perhaps as more and more coding becomes automated (and it is), “programming” as used here becomes the act of deciding how to use those tools to get to some end result. When thinking about how the application of AI might confuse “tests” vs “checks”, this passage stuck out because it reminded me of another idea I’ve heard which I can only paraphrase: “It’s only called machine learning (or AI) until it works, then it’s just software”. Unfortunately I do not recall who said that or if I am even remembering the point correctly.

More to the point, James also notes in the comments:

Testing involves creative interpretation and analysis, and checking does not

This too seems to be a position that, as AI becomes more advanced and encroaches on areas previously thought to be exclusive to human thought, will be difficult to hold on to. Again, I’m not making the argument that an AI can replace a good tester any time soon, but I do think that sufficiently advanced tools will continue to do more and more of what we previous thought was not possible. Maybe the bar will be so high that expert tester AIs are never in high enough demand to be developed, but could we one day get to the point where the main responsibility a human “tester” has is checking the recommendations of tester AIs?

I think more likely the addition of real AIs to testing just means less checking that things work, and more focus on testing whether they actually do the right thing. Until AIs can predict what customers or users want better than the users themselves, us humans should still have plenty to do, but that distinction is a different one than just “test” vs “check”.