Sunday, January 9, 2022

Use the Force Multiplier

On Fridays I pair with doctors from Ada's medical quality team. It's a fun and productive collaboration where I gain deeper insight into the way that diagnostic information is encoded in our product and they get to see a testing perspective unhindered by domain knowledge.

We meet at the same time each week and decide late on our focus, choosing something that one of us is working on that's in a state where it can be shared. This week we picked up a task that I'd been hoping to get to for a while: exploring an API which takes a list of symptoms and returns a list of potential medical conditions that are consistent with those symptoms. 

I was interested to know whether I could find small input differences that led to large output differences. Without domain knowledge, though, I wasn't really sure what "small" and "large" might mean.

I prepared an input payload and wrote a simple shell script which did the following:

  • make a timestamped directory for this run, results
  • copy the payload to results
  • POST the payload to the API endpoint
  • copy the response to results
  • parse the response to summarise just the condition list
  • copy the condition list to results
  • echo the condition list to the terminal

This super simple runner gave us the ability to loop tightly and efficiently like this:

  • edit the payload
  • call the runner
  • inspect the conditions
  • choose the next edit

We found that we often wanted to compare two successive runs and this was easy because the console just displayed it:

$ input.json


$ input.json

If we needed more of the metadata around the list we could look at the raw response. If we needed to check exactly what payload had produced a specific response or contained a specific symptom we could easily search, and if we wanted to re-run a particular payload that was straightforward too:
$ grep -l symptomX results/*

$ results/2022-01-09_092501/input.json


At the start of our session we prioritised a set of strategies that we thought had the potential to show the kind of effect I was interested in. As it happens, none of the differences we saw were medically significant. But that's not a major problem, we only spent an hour on this work starting from a list of symptoms I had created more or less at random.

I now have a tool with which I can easily control and observe the system under test and some insight into differences that might matter. With those pieces I can take lists of more plausible symptoms, create many slight variations of them, run the script and use my new heuristics to target relevant differences.

I love testing by exploring and I love the way that automation can be a force multiplier for that exploration.
Image: Science Fiction and Fantasy Stack Exchange
Highlighting: Pinetools

No comments:

Post a Comment