How do you know you actually did something?

When I was defending my thesis, one of the most memorable questions I was asked was:

“How do you know you actually did something?”

It was not only an important question about my work at the time, but has also been a very useful question for me in at least two ways. In a practical sense, it comes up as something like “did we actually fix that bug?” More foundationally, it can be as simple as “what is this test for?”

Did it actually work?

The original context was a discussion about the efforts I had gone through to remove sources of noise in my data. As I talked about in my previous post, I was using radio telescopes to measure the hydrogen distribution in the early universe. It was a very difficult measurement because even in the most optimistic models it was expected to be several orders of magnitude dimmer than everything else the telescopes could see. Not only did we have to filter out radio emission from things in the sky we weren’t interested in, it should not be surprising that there’s also a whole lot of radio coming from Earth. Although we weren’t actually looking in the FM band, it would be a lot like trying to pick out some faint static in the background of the local classic rock radio station.

One of the reasons these telescopes were built in rural India was because there was relatively little radio in use in the area. Some of it we couldn’t do anything about, but it turned out that a fair amount of radio noise in the area was accidental. The most memorable example was a stray piece of wire that had somehow been caught on some high voltage power lines and was essentially acting like an antenna leaking power from the lines and broadcasting it across the country.

We developed a way of using the telescopes to scan the horizon and for bright sources of radio signals on the ground and triangulate their location. I actually spent as much time wandering through the countryside with a GPS device in one hand and a radio antenna in the other trying to find these sources. This is what led to what has since become the defining photo of my graduate school experience:

Standing in a field with a radio antenna, next to a cow
Cows don’t care, they have work to do.

Getting back to the question at hand, after spending weeks wandering through fields tightening loose connections, wrapping things in radio shielding, and getting the local power company to clean wires of their transmission lines… did we actually fix anything? Did we reduce the noise in our data? Did we make it easier to see the hydrogen signal we were after?

Did we actually fix the bug?

In many ways, getting rid of those errant radio emitters was like removing bugs in data. Noisy little things that were there only accidentally and that we could blame for at least some of the the noise in our data.

But these weren’t the bugs that you find before you release. Those are the bugs you find because you anticipated that they would come up and planned your testing around it. These things, in contrast, were already out in the field. They were the kinds of bugs that come from the user or something weird someone notices in the logs. You don’t know what caused them, you didn’t uncover them in testing, and you’re not sure at first what is triggering them. These are the ones that are met only with a hunch, an idea that “this might have to do with X” and “we should try doing Y and that might fix it.”

But how do you actually know that you’ve fixed it? Ideally you should have a test for it, but coming up with a new test that will catch a regression and seeing it pass isn’t enough. The test needs to fail. If it doesn’t you can’t show that fixing the bug actually causes it to pass. If you aren’t able to see the issue in the first place, it doesn’t tell you anything if you make a fix and then still don’t see the issue.

For us in the field, the equivalent reproducing the bug was going out with an antenna, pointing it at what we thought was a source, and hearing static on a handheld radio. One step after fixing it (or after the power company told us they fixed it) was to go out with the same antenna as see if the noise had gone away or not. The next step was turning on the antennas and measuring the noise again; push the fix to production and see what happens.

What is this test for?

Where this can go wrong — whether you know there’s a bug there or not — is when you have a test that doesn’t actually test anything useful. The classic example is an automated test that doesn’t actually check anything, but it can just as easily be the test that checks the wrong thing, the test that doesn’t check what it claims, or even the test that doesn’t check anything different from another one.

To me this is just like asking “did you actually do something”, because running tests that don’t actually check anything useful don’t do anything useful. If your tests don’t fail when there are bugs, then your tests don’t work.

In a world where time is limited and context is king, whether you can articulate what a test is for can be a useful heuristic for deciding whether or not something is worth doing. It’s much trickier than knowing whether you fixed a specific bug, though. We could go out into the field and hear clearly on the radio that a noise source had been fixed, but it was much harder to answer whether it paid off for our research project overall. Was it worth investing more time into it or not?

How do you know whether a test is worth doing?

How I got into testing

In my first post, I talked a bit about why I decided to start this blog. I often get asked how I ended up in testing given my previous career seems so different, so I thought I would step back a few years and talk about what made testing such a good fit for me.

Before my first job in software testing, this is where I used to work:

The Giant Metrewave Radio TelescopeOr at least, that’s where I worked at least a few weeks out of the year while I was collecting data for my research. Before software testing, I was as astrophysicist.

My research involved using the Giant Metrewave Radio Telescope — three antennas of which are pictured above — to study the distribution of hydrogen gas billions of years ago. I was trying to study the universe’s transition from the “Dark Ages” before the first stars formed to the age of light that we know today. Though I didn’t know that what I was doing had anything to do with software testing (or even that “software testing” was its own thing), this is where I was honing the skills that I would need when I changed careers. There are two major reasons for that.

To completely over simplify, the first reason was that I spent a lot of time dealing with really buggy software.

Debugging data pipelines

At the end of the day we were trying to measure one number that nobody had ever measured before using methods nobody had ever tried. That’s what science is all about! What this meant on a practical level was that we had to figure out a way of recording data and processing it using almost entirely custom software. There were packages to do all the fundamental math for us, and the underlying scientific theory was well understood, but it was up to us to build the pipeline that would turn voltages on those radio antennas into the single temperature measurement we wanted.

With custom software, of course, comes custom bugs.

A lot of the code was already established by the research group before I took over, so I basically became the product owner and sole developer of a legacy system without any documentation (not even comments) on day one, and was tasked with extending it into new science without any guarantee that it actually worked in the first place. And believe me, it didn’t. I had signed up for an astrophysics program, but here I was learning how to debug Fortran.

I never got as far as writing explicit “tests”, but I certainly did have to test everything. Made a change to the code? Run the data through again and see if it comes out the same. Getting a weird result? Put through some simple data and see if something sensible comes out. Your 6-day long data reduction pipeline is crashing halfway through one out of every ten times? Requisition some nodes on the computing cluster, learn how to run a debugger, and better hope you don’t have anything else to do for the next week. If I didn’t find and fix the bugs, my research would either be meaningless or take unreasonably long to complete.

The second reason this experience set me up well for testing was that testing and science, believe it or not, are both all about asking questions and running experiments to find the answers.

Experiments are tests

I got into science because I wanted to know more about how the world worked. As a kid, I loved learning why prisms made rainbows and what made the pitch of a race car engine change as it drove by. Can you put the rainbow back in and get white light back out? What happens if the light hitting the prism isn’t white? How fast does the car have to go to break the sound barrier? What if the temperature of the air changes? What happens if the car goes faster than light? The questions got more complicated as I got more degrees under my belt, but the motivation was the same. What happens if we orient the telescopes differently? Or point at a different patch of sky? Get data at a different time of day? Add this new step to the data processing? How about visualizing the data between two steps?

When I left academia, the first company that hired me actually brought me on as a data engineer, since I had experience dealing with hundreds of terabytes at a time. The transition from “scientist” to “data scientists” seemed like it should be easy. But within the first week of training, I had asked so many questions and poked at their systems from so many different directions that they asked if I would consider switching to the test team. I didn’t see anything special about the way I was exploring their system and thinking up new scenarios to try, but they saw someone who knew how to test. What happens if you turn this switch off? What if I set multiple values for this one? What if I start putting things into these columns that you left out of the training notes? What if these two inputs disagree with each other? Why does the system let me input inconsistent data at all?

I may not have learned how to ask those questions because of my experience in science, but that’s the kind of thinking that you need both in a good scientists and in a good tester. I didn’t really know a thing about software engineering, but with years of teaching myself how to code and debug unfamiliar software I was ready to give it a shot.

Without knowing it, I had always been a tester. The only thing that really changed was that now I was testing software.

Introduction about calling your shots

Oh good, yet another blog about software testing.

I want to start by introducing why I decided to start blogging about an area of work that seems to have no shortage of blogs and communities and podcasts and companies all clamoring for attention. As a relative newbie on the scene, how much new is there that I can add to an already very active conversation?

I came into testing as a profession in 2014, just by being in the right place at the right time. It wasn’t something I planned on doing, and not something I had any training in. I’ve taken precisely one computer science course, 10 years ago. In the meantime, I had been pursuing an academic career in physics. I’ve had a lot of catching up to do.

In science, you can go to any one of a hundred introductory textbooks and start learning the same fundamentals. There are books on electromagnetism that all have Maxwell’s equations and there’s classes on quantum mechanics that all talk about Hamiltonians. There’s really only one “physics” until you get the bleeding edge of it, and even then there’s an underlying assumption that even where there are different competing ideas, they’ll eventually converge on the same truth. We all agree that we’re studying the same universe.

Software testing is nothing like that.

We all do software testing, but none of us are testing the same software. Even though we use a lot of the same terms, there are as many ideas about what they mean as testers using them. There’s a vast array of different ideas about just about every aspect of what we do. That’s part of what makes it exciting! But it also makes it difficult to feel like I know what I’m doing. How do I actually learn about a discipline that has so much information in so many different places with so many different perspectives without just completely overwhelming myself?

That’s where curling comes in.

Curling rocks in play
Photo by Benson Kua

In case you didn’t already know that I’m Canadian, I’m also a curler. In a lot of ways, curling is physics-as-sport. And what does it have to do with blogging or testing?

Curling is all about sliding rocks down over 100 feet (30 meters) of ice and having them land in the right place. The two biggest variables are simply the direction and how fast you throw it. Once it’s out of the thrower’s hands, it’s the job of the sweepers to tell if it’s going the right speed to stop in the right place or not, and the job of the skip at the far end of the ice to watch if it’s going in the right direction. They need to communicate, since if either one of those variables is off, the sweepers can brush the ice to affect where the rock goes.

via GIPHY

As the guy who’s walking down the ice trying guess where this thing is going to land 100 feet from now, I can tell you it’s damn hard to get that right. When I first started playing, it was very easy to escort 47 rocks down the ice and still not have any idea where the 48th was going to land until I got a very simple, but oddly frightening, piece of advice.

Just commit to something.

Experienced curlers have a system for communicating how fast a rock is moving by shouting a number from 1 to 10. A “four” means it’s going to stop at the top of the outermost ring. A “seven” means it’ll be right on the button (a bulls-eye, so to speak). I knew this system and I would think about the numbers in my head as I walked beside those rocks, but it wasn’t until I started committing to specific numbers by calling them out to my team that I started to actually get it.

What made it frightening was that those first few times I called out a number I was way off. And I knew that I was going to be way off. I knew I stood a good chance of being wrong, loudly, in front of everybody else on the ice. But by doing it, I actually started to see how the end result compared to what I committed to. Not in the wishy-washy way I did when I would run through those numbers in my head (“that’s about what I would have guessed), but in a concrete way. It’s similar to how you think you know all the answers when watching Jeopardy, but it’s a lot harder when you have to say the answers out loud. I started to think through the numbers more, pay more attention to how the rocks were moving, committed out loud to something, and took in the feedback to learn something.

Can you see where I’m going with this now?

Even though software testing blogs are a dime a dozen, if I want to actually become an expert in this field I think it’s time to start forcing myself to get my thoughts together and commit to something.

My goal with this blog, then, is to think through testing concepts and my experiences and commit those thoughts to paper. I’m not going to try to explain basic terms as if I’m an authority, but I might try to talk through whether some of those concepts are useful to me or not and how I see them actually being used. I plan to talk about my experiences and views as a tester, as “a QA”, as a developer, and and as a member of this community, so that I can commit to growing as a professional.

If nobody else reads this it’ll still be a useful exercise for myself, but I do hope that there’s occasionally a skip on the other end of the ice who’ll hear my “IT’S A TWO!” and shout back “OBVIOUSLY TEN!”