Step on It

In recent times I've spoken and written about how much fun and how productive it's been to build random walkers to help me to test services I've been working on: Walking the Talk webinar, A Model Student, and Navigate, Survey, and Explore.

The walkers are clients which use dice rolls to make decisions as they navigate paths through a service, asserting things about the state as they go.

Traditional unit tests tend to be extremely specific. They imagine the system in a particular state with a hard-coded input and an expected output.

In my walkers, the assertions are relatively generic, for example:

a service response payload conforms to a schema
the value of a field representing progress will not decrease across a series of interactions
some values in service responses must be in a particular relationship to others, or to the input.

I doubt this is a technically-correct use of the term but I've been thinking of these things as invariants, statements that are true in broad, defined, contexts, or perhaps even across the whole system under test.

The approach has a strong link with property-based testing which generalises unit tests by specifying properties of the inputs (e.g. integers) and the consequent outputs (e.g. a naturally-sorted list).

From the Hypothesis library documentation, the testing "works by generating arbitrary data matching [the] specification and checking that [the output] guarantee [...] holds."

It's also in a tradition of model-based testing where a model (informally, a kind of flowchart describing the system under test) is traversed and correctness checks are made along the way. As Harry Robinson puts it in Finite State Model-Based Testing on a Shoestring:

Create a finite state model of an application.
Generate sequences of test actions from the model.
Execute the test actions against the application.
Determine if the application worked right.
Find bugs.

Believe me, this stuff is exciting and powerful but in a dry and abstract description like I've given can sound extremely, well, dry and abstract.

I gave a quick demo and presentation about a walker at work last week. I wanted something that visualised why it was exciting and powerful and practical and how it complements what we already do.

This is what I came up with:

The background light green blob is a minefield. It represents the space of possible states of our service. In it, somewhere, are mines, or bugs. We don't know all of their locations and, in any case, as we develop the software the locations change.

One way, reductive for sure, to think about testing is that it seeks to find mines by stamping on the minefield. An interesting aspect of that game (safety considerations aside!) is to find valuable places and productive ways to stamp.

Unit tests are the tiny circles. They stamp repeatedly on the same bit of the minefield.

The unit test might be in that place because, for example, we care to check that a specific input gives a specific output, or because a bug was found there in the past, or that it's considered to be a representative place to check for mines in the surrounding area (the mid-green clouds).

We might call these surrounding areas equivalence classes.

Exploratory testing, particularly when driven by risk and the current context, will regularly march around other parts of the field, pursuing paths that look promising tactically and strategically, and sometimes overlapping the coverage of existing automation.

By default, the walker doesn't care about trying to focus on areas with previous bugs or areas of risk or equivalence classes or anything else. It just puts on its heaviest boots and runs around the field. (I like to think it's also waving its hands in the air, screaming with pure pleasure, and grinning like a maniac.)

These approaches are complementary and, can be combined.

I use walkers as a tool in my exploratory testing. I either hack or configure them to bias to particular areas, or navigation tactics, or to collecting particular kinds of data that I can explore later.

I can take assertions from unit tests and generalise them for the walkers, and I can take issues found by the walker and add unit tests for them, or change existing unit tests to better reflect a new understanding of the service.

I'm pleased to say that the analogy with the minefield was helpful but, much like the walkers themselves, it's not my idea nor novel. Looking back, I think I first came across it in either Rapid Software Testing or possibly Black Box Software Testing courses over ten years ago.

A little cursory searching turns up Cem Kaner talking about it in 2002's Paradigms of
Black Box Software Testing but with the possibility that the idea goes back at least to 1994:

But both he and James Bach in Reasons to Repeat Tests credit Brian Marick as the originator. Bach mentions being inspired by a talk of Marick's called Classic Testing Mistakes I didn't find the talk, but the paper of the same title is itself a classic.

Why does this matter? Well, I'm thinking this stuff in theory and practice has been around for ages but is little known or used in our industry. How can we speed up its adoption?
Image: https://flic.kr/p/5TCvmi

Edit: Paul Hankin noted on Twitter: "Fuzzing is an analogous testing idea -- and the cleverest fuzzers understand the code they are testing and try to find paths that exercise so-far-untested parts of the code (rather than just try random inputs)."

Edit 2: I contacted both Brian Marick and James Bach. Brian doesn't recall originating the minefield analogy; James feels certain that Brian mentioned it in his talk.

Edit 3: I found a reference on Marick's web site to the Classic Testing Mistakes talk being delivered at STAR 97.

Meet Me Halfway?

The Association for Software Testing is crowd-sourcing a book, Navigating the World as a Context-Driven Tester , which aims to provide responses to common questions and statements about testing from a context-driven perspective . It's being edited by Lee Hawkins who is posing questions on Twitter , LinkedIn , Mastodon , Slack , and the AST mailing list and then collating the replies, focusing on practice over theory. I've decided to contribute by answering briefly, and without a lot of editing or crafting, by imagining that I'm speaking to someone in software development who's acting in good faith, cares about their work and mine, but doesn't have much visibility of what testing can be. Perhaps you'd like to join me? --00-- "Stop answering my questions with questions." Sure, I can do that. In return, please stop asking me questions so open to interpretation that any answ...

How do I Test AI?

Recently a few people have asked me how I test AI. I'm happy to share my experiences, but I frame the question more broadly, perhaps something like this: what kinds of things do I consider when testing systems with artificial intelligence components . I freestyled liberally the first time I answered but when the question came up again I thought I'd write a few bullets to help me remember key things. This post is the latest iteration of that list. Caveats: I'm not an expert; what you see below is a reminder of things to pick up on during conversations so it's quite minimal; it's also messy; it's absolutely not a guide or a set of best practices; each point should be applied in context; the categories are very rough; it's certainly not complete. Also note that I work with teams who really know what they're doing on the domain, tech, and medical safety fronts and some of the things listed here are things they'd typically do some or all of. Testing ...

The Best Programmer Dan Knows

I was pairing with my friend Vernon at work last week, on a tool I've been developing. He was smiling broadly as I talked him through what I'd done because we've been here before. The tool facilitates a task that's time-consuming, inefficient, error-prone, tiresome, and important to get right. Vern knows that those kinds of factors trigger me to change or build something, and that's why he was struggling not to laugh out loud. He held himself together and asked a bunch of sensible questions about the need, the desired outcome, and the approach I'd taken. Then he mentioned a talk by Daniel Terhorst-North, called The Best Programmer I Know, and said that much of it paralleled what he sees me doing. It was my turn to laugh then, because I am not a good programmer, and I thought he knew that already. What I do accept, though, is that I am focussed on the value that programs can give, and getting some of that value as early as possible. He sent me a link to the ta...

Reasonable Doubt

In Your job is to deliver code you have proven to work Simon Willison writes: As software engineers we ... need to deliver code that works — and we need to include proof that it works as well. He is coming at this from the perspective of LLM-assisted coding, but most of what he says applies in general. I think this is a reasonable consise summary of his requirements for developers: Manual happy paths: get the system into an initial state, exercise the code, check that it has the desired effect on the state. Manual edge cases: no advice given, just a note that skill here is a sign of a senior engineer. Automated tests: should demonstrate the change like Manual happy paths but also fail if the change is reverted. He notes that, even though LLM tooling can write automated tests, it's humans who are accountable for the code and it's on us to "include evidence that it works as it should." Coincidentally, just the week before I read his post I told one of my...

Notes on Testing Notes

Ben Dowen pinged me and others on Twitter last week , asking for "a nice concise resource to link to for a blog post - about taking good Testing notes." I didn't have one so I thought I'd write a few words on how I'm doing it at the moment for my work at Ada Health, alongside Ben. You may have read previously that I use a script to upload Markdown-based text files to Confluence . Here's the template that I start from: # Date + Title # Mission # Summary WIP! # Notes Then I fill out what I plan to do. The Mission can be as high or low level as I want it to be. Sometimes, if deeper context might be valuable I'll add a Background subsection to it. I don't fill in the Summary section until the end. It's a high-level overview of what I did, what I found, risks identified, value provided, and so on. Between the Mission and Summary I hope that a reader can see what I initially intended and what actually...

Great Shot, Kid

This week I've been playing with altwalker , a model-based testing tool. To get the hang of it, I attempted to build a very simple model of a workflow that is supported by the service my team owns. Hacking away at the example code, and looking frequently at the docs, I was able to get up and running in a few hours, creating: a basic model: nodes for system states, edges for operations simple assertions: mainly consistency checks on the states client: HTTP client to implement the operations against the service's API I configured this so that altwalker will perform a random walk of the model, starting state data is randomised, and the client will choose randomly whenever offered an option. Why so much randomness? Because it means that, over successive runs, more of the infinite space of possible workflow executions will be covered. Once I had that basically working I wrote a shell script that would run this loop a number of times: call altwalker ...

On Herding Cats

Last night I was at the Cambridge Tester meetup for a workshop on leadership. It was a two-parter with Drew Pontikis facilitating conversation about workplace scenarios followed by an AMA with a group of experienced managers. I can't come to work this week, my cat died. Drew opened by asking us what our first thoughts would be as managers on seeing that sentence. Naturally, sadness and sympathy, followed by a week ? for a cat ? and I only got a day for my gran! Then practicalities such as maybe there's company policy that covers that , and then the acknowledgement that it's contextual: perhaps this was a long-time emotional support animal . Having established that management decisions are a mixture of emotion, logic, and contingency Drew noted that most of us don't get training in management or leadership then split us into small groups and confronted us with three situations to talk through: Setting personal development goals for others. Dropping a clange...

LLEWT 2024

This weekend I was at LLEWT 2024, a peer conference on Anglesey , north Wales, discussing communication. Given the day jobs of the participants, it was no surprise that the experience reports and the conversations that followed them mostly focussed on software development contexts. Notes from my presentation are in Express, Listen, and Field . I made sketchnotes (below) for each presentation and a mindmap (above) to try to summarise the whole. Without much reflection yet, I guess I would pull these common high-level threads from the day: There are multiple reasons that communication fails ... like, duh! ... but having multiple strategies for framing a message can help ... and having multiple tactics for delivering a message can help too. Understanding what you want from an interaction is key ... so setting the context to make that more likely is wise ... which might mean meta-conversation, being transparent, or changing your approach...

Exploring It!

This week the test team at Linguamatics held our first internal conference. There was no topic, but three broad categories could be seen in the talks and workshops that were given: experience reports, tooling, and alternative perspectives on our work. (The latter included the life cycle of a bug, and psychology in testing.) My contribution was an experience report looking at how I explore both inside and outside of testing. I've tidied up some of my notes from the prep for it below. There are testing skills that I use elsewhere in my life. Or perhaps there are skills from my life that I bring to testing. Maybe I'm so far down life's road that it's hard to tell quite what started where? Maybe I'm naturally this way and becoming a tester with an interest in improvement amped things up? Maybe I've so tangled up my work, life, and hobby that deciding where one starts and another ends is problematic? The answers to those questions is, I think, almost certai...

PR Coup!

My team uses Dependabot to keep software dependencies up to date. The tool submits PRs against our repositories and we use a checklist/decision tree to help us judge how deeply to review and provide an audit trail of the decision. Patch-level updates of company-internal packages might be just waved in if the tests pass, for example, but a major update of an external library could require us to review or redo a risk analysis. For reasons , the Markdown source for the tree is not automatically added to the PR so we pick it up from another repo, copy it, and paste it into the PR we're looking at. This has got sufficiently irritating to me that I looked for another way. I know about bookmarklets and I've made them before, so I first tried to make one that would paste the text into a comment box on the PR. The checklist is long and I needed to escape various characters and retain line breaks and it was tricky to edit and debug in a simple URL field, so I aborted. I wondered if the...

Hiccupps

Search This Blog