Little Alchemy: A silly little test exercise

Lately I’ve been entertaining myself with a silly little mobile app called Little Alchemy 2, a simple revamp of an earlier concept game. It distracted me enough this week that I didn’t prepare a respectable blog post. While questioning my life choices, I started to convince myself that maybe I had been playing at testing this whole time. Testers, myself included, love to make analogies to science, and this is a game that simulates the precursor to modern chemistry. Of course it can be used as a tortured metaphor for software testing! Follow me on my journey…

Exploration

Screenshot of Little AlchemyThe main concept of the game is to drag icons of real life objects on top of each other, combining them to make new items.

At first there are only a few options: Water, Earth, Fire, Air. The classics. Not hard to just play around with different combinations and see how this game works.

Fire + Water = Steam. Makes sense.

Water + Earth = Mud. Great.

Now I wonder what happens if I combine steam and mud….

After a while you start to get a sense for how this universe works.

Developing heuristics

After a while I start to see patterns. If I combine the same thing with itself I get a bigger version of that thing; can I do that with everything? Fire seems transform most items. When I find out how to make paper or wood, I’m probably going to try to throw those on the fire too.

Combinations quickly multiply.

Before long I have hundreds of items of all sorts. Too many to keep in my head. I start to think I’m probably missing combinations, or I’ve forgotten what combinations I’ve already tried. I know I tried putting wood into fire but I did I put wood into the campfire? Am I missing some obvious combination that’s right in front of me?

Using an Oracle

That’s when I start googling around for hints and suggestions. This part gets a bit cheaty, but at the time it was what I needed to do to get a handle on the problem and keep making progress. I found a site that would randomly show me an item, and I could use that to see if I had the pieces I needed to make it. No more guessing, I was given the right answer.

I suppose this is where I could veer off into automation, but where’s the fun in that? After a while, I started to exhaust the hint system anyway; with only random items the ratio of new to known items started to decline. ROI went down. My oracle was getting less useful, not telling me anything new.

Brute force

There were still an intractable number of items and I had seen enough unexpected combinations that I didn’t trust myself to reason them all out myself. So instead, I turned to brute force.

First item on the list. Try to combine with every item below it. Repeat.

Now I should really think about automation if my goal was to just find all combinations. This is a pure algorithm with finite steps since the game promises a finite number of items. But things start to change after using this for a few iterations manually. Happily the game removes items that have no undiscovered combinations, so in theory the complexity will peak and game will start get simpler again. (Wouldn’t it be nice if software always got simpler with time?) Moreover, a little bit of brute force starts to show patterns that I hadn’t been aware of before. I start to skip ahead: “aha! if that combination works, then I bet this other one will too…” One new strategy begets another!

Inference and deduction

I reach a breaking point where items are being removed from play more and more often. This feels like the home stretch but brute forcing it will still take ages. Often, though, an item only has one or two combinations left to discover before it’s exhausted. I use this to my advantage.

Enter another oracle of sorts; this time, it’s the documentation of everything I’ve done so far. For any item still on the board, the game tells me how many undiscovered combinations an item has, items that have been used to make it, and all the other items it has been used to make so far. This is all data I can analyse to look for yet more patterns, and spot gaps in patterns that have been missed so far. The rate at which I clear items off the board goes way up, and I’m still using the same manual interactions I’ve used all along.

End game

I’m not there yet. Do I still have more to learn? Is another strategy going to make itself obvious before I finish the game, or an old one resurface as the dynamics change?

And what am I going to explore next?

How I got into testing

In my first post, I talked a bit about why I decided to start this blog. I often get asked how I ended up in testing given my previous career seems so different, so I thought I would step back a few years and talk about what made testing such a good fit for me.

Before my first job in software testing, this is where I used to work:

The Giant Metrewave Radio TelescopeOr at least, that’s where I worked at least a few weeks out of the year while I was collecting data for my research. Before software testing, I was as astrophysicist.

My research involved using the Giant Metrewave Radio Telescope — three antennas of which are pictured above — to study the distribution of hydrogen gas billions of years ago. I was trying to study the universe’s transition from the “Dark Ages” before the first stars formed to the age of light that we know today. Though I didn’t know that what I was doing had anything to do with software testing (or even that “software testing” was its own thing), this is where I was honing the skills that I would need when I changed careers. There are two major reasons for that.

To completely over simplify, the first reason was that I spent a lot of time dealing with really buggy software.

Debugging data pipelines

At the end of the day we were trying to measure one number that nobody had ever measured before using methods nobody had ever tried. That’s what science is all about! What this meant on a practical level was that we had to figure out a way of recording data and processing it using almost entirely custom software. There were packages to do all the fundamental math for us, and the underlying scientific theory was well understood, but it was up to us to build the pipeline that would turn voltages on those radio antennas into the single temperature measurement we wanted.

With custom software, of course, comes custom bugs.

A lot of the code was already established by the research group before I took over, so I basically became the product owner and sole developer of a legacy system without any documentation (not even comments) on day one, and was tasked with extending it into new science without any guarantee that it actually worked in the first place. And believe me, it didn’t. I had signed up for an astrophysics program, but here I was learning how to debug Fortran.

I never got as far as writing explicit “tests”, but I certainly did have to test everything. Made a change to the code? Run the data through again and see if it comes out the same. Getting a weird result? Put through some simple data and see if something sensible comes out. Your 6-day long data reduction pipeline is crashing halfway through one out of every ten times? Requisition some nodes on the computing cluster, learn how to run a debugger, and better hope you don’t have anything else to do for the next week. If I didn’t find and fix the bugs, my research would either be meaningless or take unreasonably long to complete.

The second reason this experience set me up well for testing was that testing and science, believe it or not, are both all about asking questions and running experiments to find the answers.

Experiments are tests

I got into science because I wanted to know more about how the world worked. As a kid, I loved learning why prisms made rainbows and what made the pitch of a race car engine change as it drove by. Can you put the rainbow back in and get white light back out? What happens if the light hitting the prism isn’t white? How fast does the car have to go to break the sound barrier? What if the temperature of the air changes? What happens if the car goes faster than light? The questions got more complicated as I got more degrees under my belt, but the motivation was the same. What happens if we orient the telescopes differently? Or point at a different patch of sky? Get data at a different time of day? Add this new step to the data processing? How about visualizing the data between two steps?

When I left academia, the first company that hired me actually brought me on as a data engineer, since I had experience dealing with hundreds of terabytes at a time. The transition from “scientist” to “data scientists” seemed like it should be easy. But within the first week of training, I had asked so many questions and poked at their systems from so many different directions that they asked if I would consider switching to the test team. I didn’t see anything special about the way I was exploring their system and thinking up new scenarios to try, but they saw someone who knew how to test. What happens if you turn this switch off? What if I set multiple values for this one? What if I start putting things into these columns that you left out of the training notes? What if these two inputs disagree with each other? Why does the system let me input inconsistent data at all?

I may not have learned how to ask those questions because of my experience in science, but that’s the kind of thinking that you need both in a good scientists and in a good tester. I didn’t really know a thing about software engineering, but with years of teaching myself how to code and debug unfamiliar software I was ready to give it a shot.

Without knowing it, I had always been a tester. The only thing that really changed was that now I was testing software.