Unit tests versus the unit tested

I recently read the great and oft-cited article about testing microservice architectures by Cindy Sridharan over on Medium. It’s broadly applicable beyond just “microservices”, so I highly recommend giving it a read. I was struck by this passage in particular:

The main thrust of my argument wasn’t that unit testing is completely obviated by end-to-end tests, but that being able to correctly identifying the “unit” under test might mean accepting that the unit test might resemble what’s traditionally perceived as “integration testing” involving communication over a network.
— Cindy Sridharan, “Testing Microservices, the Sane Way

Articulating what you’re actually trying to test is one of the most underrated skills in testing. It seems like a simple question to ask, but often a forgotten one. Asking “what information does this test give me that no other test does?” is great heuristic for determining whether a test, especially an automated one, is worth keeping. It’s also a convenient go-to when evaluating whether a tester knows what they’re doing, and when trying to understand what a test does.

What interests me about this passage is that Cindy is highlighting how “the thing being tested” gets conflated with the “unit” in “unit testing”.

When I used to train new testers at my company, I would present four levels of testing: unit, component, integration, and system. Invariably, people would ask what the difference between component and integration was. Unit tests were easy because that tended to be the first (and sometimes) only kind of testing incoming devs were familiar with. System tests were easily understood as the ones at the end with everything up and running. But when are you testing a component versus an integration? It’s not obvious, so it’s no surprise that three-tiered descriptions are much more common these days.

For us, testing a component was sometimes a single service. At other times it was testing a well defined system of a larger program (one that probably would have been a microservice if we built the system from scratch). Inputs and outputs were controlled directly, and no other services needed to be running for it to communicate with. We defined a “unit” as the smallest possible thing that could be tested (often a single function) and the “component” was running the actual program.

Is that different from an “integration” test of those units? Not really. Running the component is an integration of smaller units. But it was convenient for us to separate those tests from testing how separate services communicated with each other. You can ask the same question anywhere along the continuum from function to system. Is testing a class a unit test or a component test? Where do you draw the lines?

The confusion highlighted in the passage above, to me, is only because of different definitions of “unit”. If you want to call the thing, the service or behaviour, that you’re testing a “unit” instead of a “component”, go for it. If you want to call the communication between two services the “unit” (as this article does), great. This ambiguity should not be an obstacle to understanding the point: you should know what you’re testing and have a reason for testing it.

And, of course, that you should be testing the most important things. Which means, for example, don’t mock away the database if your service’s main responsibility is writing to the database. The problem isn’t that Cindy is saying “unit tests should be communicating over the network,” though you might read it that way if you’re dogmatic about the term “unit test”. She’s saying “communicating over the network is important, so I’m going to prioritize that communication as the subject (unit) of my tests.”

For all these terms it’s unlikely we’re ever going to setting on unambiguous definitions. Let’s just try to be clear about what we mean when we use them, and clear about what we’re testing with each test.

If you didn’t test it, it doesn’t work

Gary Bernhardt has a great talk online from 2012 called Boundaries, about how design patterns influence, for better and worse, the testing that can be done. In particular he advocates a “core” and “shell” model, having many independent functional cores and one orchestrating shell around them. The idea is that each functional core can be tested independently from the others more effectively by not having to worry about mocks and dependencies. The tests of each core can be greatly simplified, while focusing on what it’s actually supposed to do.

The concept is definitely solid, but I do take issue with one thing he says towards the end. I’ll acknowledge upfront that I’m drawing much more meaning from his choice of words than I think was intended, it just happened to strike a nerve. It happens at about the 29m30s mark, when he’s describing how small the tests for the functional cores became in his example program. As for the shell, the orchestrating code responsible for connecting the wires between these cores, he says:

The shell is actually untested. The only conditional in the shell is the case statement that we saw that was choosing between keys. There’s no other decision made in the entire shell. So I don’t even really feel the need to test it. I fire the program up, I hit ‘j’, as long as that worked, pretty much the whole thing works.

The obvious objection here is that just because the cores work independently doesn’t mean that the shell manipulates those cores correctly. Conditionals aren’t the only things that can have bugs in them. But that’s not my objection. It’s this:

Firing the program up and hitting ‘j’ to see if that works is testing it.

Are there other tests he could do? Sure. At a minimum, ‘j’ isn’t the only keystroke the program supports. So he’s made a risk assessment that the shell is simple enough that this is the only test he needs. It might be an informal assessment, even an unconscious one, but it’s still based on the foundation of unit tests, his knowledge of how the program works, and how it’s going to be used. There’s absolutely nothing wrong with that.

Has he automated his test? No. But is something only a test if it’s automated? Also no. Much like he made a risk assessment that he only needs one test, there’s also a judgement call (perhaps implicit) that this one test isn’t worth automating. There could be any number of reasons for making that call. That doesn’t mean the sequence of firing it up and trying it out is something outside of “testing it”.

I think there is a tendency these days to think that a test only counts if it’s automated, but we should know better than that. It’s been argued that the parts we automate shouldn’t even be called “tests”. Whether or not you agree with the terminology, it has to be acknowledged that testing encompasses a lot more than automating some checks.

I think this same sentiment is what motivates the #NoTesting hashtag on Twitter. If you have a traditional, waterfall style, gated, or heavily regimented picture of what software testing is, you’ll exclude from that umbrella all kinds of other activities that people naturally do to poke at their products. “We don’t want that stuffy bottleneck-creating deadline-breaking time-suck kind of testing around here”, they say, “so therefore we don’t want testing at all.” I bring this up not to put Gary into that camp—in the context of the talk this quibble of mine wouldn’t affect the point he was making—but to point out that people do take this idea too far.

Coincidentally, the same day I came across this talk, I also read The Honest Manual Writer Heuristic by Michael Bolton. It ends with a line that captures this idea that people do a lot more testing than they realize:

Did you notice that I’ve just described testing without using the word “testing”?

If you didn’t test it, it doesn’t work.

or, to make it perfectly clear what I’m getting at, let’s phrase it like this:

If you know it works, you must have tested it.

Testing is like a box of rocks

I was inspired today by Alan Page’s Test Automation Snowman. He makes all good points, but let’s be honest, the model is the same as the good ol’ test pyramid. The only difference is that he’s being explicit about tests at the top of the pyramid being slow and tests at the bottom being fast. Ok, so maybe the snowman thing is a joke, but it did make me think about what might make a better visualization. I quickly doodled something on a sticky note:

A sticky note with a lot of tiny circles in the bottom third, medium circles in the middle third, and a few large circles in the top third.

If the point we want to emphasize is that UI tests are slow (and therefore take a lot of time), we should include that in the visualization! The problem with the pyramid (and the snowman) is that the big tests take up the least amount of space; the small triangle at the top makes it look like having fewer UI tests also means you do less UI testing.

It doesn’t.

At least, not proportionately. If you had an equal number of UI and unit tests, it’s a safe bet that you’re going to spend more of your time working on the UI tests.

So instead, let’s say testing is like a box of rocks. Each rock is a test, and I have to choose how to allocate the space in that box to fit all the rocks that I want. A few big rocks are going to take up a lot more space than a bunch of tiny pebbles. Unless I have a good reason why that big boulder is a lot more interesting than the hundred little rocks I could put in its place, I’m going to go for the little rocks! If I have to add a new rock (to kill a new bug, say) I probably want to choose the smallest one that’ll still do the job.

You can still think about the different levels (unit vs API vs UI, for example) if you picture the little rocks at the bottom forming a foundation for bigger rocks on top. I don’t know if rocks work like that. Just be careful not to get this whole thing confused with that dumb life metaphor.

Ok, it might not be the best model, but I’ll stick with it for now. And like the Alan’s snowman, you’re free to ignore this one too.