Skip to main content

Infinite Loop * Infinite Space

The Wikipedia page on infinite loops in programming describes them as "a sequence of instructions that, as written, will continue endlessly, unless an external intervention occurs." One common example might be a while loop whose exit condition is never met, and needs to be aborted by a human pressing Control-C.

With that concept in mind, we can make an easy analogy to software development where the same kinds of events happen over and over and over until our product becomes irrelevant, or uneconomic, or our organisation closes down, and the loop is exited. 

Inside the loop, the world in which our product exists will change, the market in that world will change, the requirements on our product in the market will change, the product itself changes as our teams add features, or fix bugs, or update libraries, or run on new platforms, and so on. 

So, for their lifetimes, our products inhabit an infinite loop of change and, if they are non-trivial, also contain an infinite space of possibility: potential inputs, possible outputs, paths through the execution, timings, integrations, network, and the rest.

And that's one of our challenges as testers: 

  • How can we decide what to test and to what extent in the face of that change? 
  • How can we test it in the face of that huge range of potential states? 
  • Given that we already do both of those things today somehow, how could we do it more efficiently and effectively?

--00-- 

On my current project, a dialog engine for a medical symptom checking application, I responded by building a tool, the dialog walker, which:

  • Runs on any available dialog, not just one standard end-to-end test example.
  • Makes random inputs, so the input and output data changes on every run.
  • Follows legal over expected paths, exploring product possibilities not assumed user behaviour.
  • Provides a framework for me to ask and answer questions.
  • Generates large amounts of structured data for post-hoc review.

In large, complex, interconnected systems we can't know, and find it difficult to model and reason about, where we might see issues in general. The walker puts the system under test into novel states — and novel sequences of states — on each run in an attempt to find states where the system doesn't behave as we would expect. 

I keep it in step with the product as it develops, teaching it to understand and use new features as they are added, and allowing it to work around bugs until they are fixed.

 --00-- 

The sequence below shows a handful of steps from an assessment dialog in Ada's symptom checker. The user provides different kinds of inputs (highlighted in red) in different turns and ultimately ends up at a list of potential conditions, reflecting the information the user entered:

You can imagine from the open-ended free text input, the range of choice points that can affect the dialog flow, and knowledge of the broad range of medical conditions a system like this needs to handle that there are, if not an infinite number, many possible dialogs. If you factor in the ability to go back a step at any point, the variability in time taken between turns, and the dependencies on external services, it becomes effectively infinite.

  --00--

In its first iteration, the dialog walker contained a model of a generic dialog, understood how to use the engine's API to traverse it, and used dice rolls to decide what choices to make and what input to provide at each turn in the dialog: 

The orange bubbles on the walker show the kinds of things that it could assert on, for example that some raw template text like {he/she/they} had been sent to the client side rather than being proceesed into a specific pronoun in the engine, or a valid user input had provoked an HTTP error code from the service. These are generic, invariant, assertions: they are true for any dialog.

You'll notice that there are orange bubbles on the system under test too, that's because inspecting the behaviour and output of the system under test while running the walker can be valuable. This is especially true when running the walker at scale, and you can choose how many iterations to execute and how many dialogs to run in parallel in each iteration. 

Note that the walker is not asserting on specific outcomes for specific inputs. It can and does assert on data consistency but in order to operate on any dialog (such as a symptom assessment, questionnaire, or user feedback collection) it needs to have invariants to look for: what must always be true here? what would tell us that something was wrong? what would tell us that something might be wrong?

 --00-- 

Little by little, I've added features as I've had the need for them. The architecture diagram below shows logs, configuration, access to the SUT's database, scripts, and visualisation of runs: 

The logs record all HTTP traffic between the walker and the dialog engine. With access to logs, I can ask questions after a run. How many times was state X entered? What kinds of error responses were seen? Were any dialog steps never encountered? As I learn which of these are productive, I add them to the walker as assertions.

The scripts are used to replay dialogs to help with debugging and reproduction. They're simple brute-force bash scripts that use jq and curl to replay a turn and then wait for a keypress. This means that I can find interesting runs and step through them at my own pace with the application under test running in the debugger and a client open against the back-end database.

Configuration is used to tune the route taken through the dialogs, for example to say how often to go back a step, or to give a list of possible inputs for specific questions. This helps to drive the walker towards particular behaviours that I want to test.

Access to the application's database is interesting. With it, the walker can make assertions about internal state consistency, for instance that the information given to the client is the same information that the application sees itself, or that the information provided by the client ends up in the right place in the database.

  --00-- 

Visualisation was introduced most recently. After a run, I can start the visualiser and it shows a summary of the outcomes, flagging particular instances that had assertion failures:

Clicking through from there, I can look at logs, a traditional HTTP request-response view, or a "chatbot" representation of the dialog (as shown below). This presents the data that the user would see along with metdata per turn and a chunk of system state (greyed out in the image below, sorry):

The system state is particularly interesting when there is a problem because the before and after states can be diffed to help with diagnosis. There's also a search interface that helps to find dialogs where the "user" saw particular text, or entered particular states.

The visualisation is incredibly useful for me to understand what was happening at any given point in a dialog, but also encourages others to use the walker and is exceptional for sharing findings with others (as my team's PM noted.)

--00--  

Which all sounds great, for sure, but how does it help to address the infinite loop and the infinite space?

Infinite space first: over time, and runs, because of the random nature of the paths taken, more and more of the space of possible assessments is covered. Of course, because of the infinite loop the space is not static, so by choosing how many runs to make and tune the way they run, I can cover the space for today's investigation based on current perceived risk.

On the infinite loop the walker helps at different levels. When a PR is ready for review I will often kick off the walker against the engine in that branch while I explore the product changes. Sometimes this produces an interesting finding immediately and if it doesn't I can either configure the walker to try to exercise a specific thing, change the walker to touch the new feature, or accept that there's low risk of unexpected side-effects.

Some of the developers on my team run the walker themselves while they are writing code, and some ask to run the walker with me after they have finished. A significant value to them is finding side-effects in behaviours they did not expect to change and so didn't test for. It's a safety net of sorts against limited confirmatory testing. 

My team has a good test culture, in the sense that the developers see it as their responsibility to write unit, integration, and even occasional end-to-end tests in code when they change the product. However, those kinds of tests only cover the things that the author (a) can think of, (b) can think of a way to implement, and (c) has time and motivation to actually do. 

The walker complements this because (a) it isn't trying to think about potential consequences, (b) there is no additional cost to running along one path or another, and (c) it runs unattended, for as many times as you ask it, without complaint.

Above the implementation level, I can frame questions like: could we ever see ...? how likely is it that ...? is there a route to ...? For example, could we ever see data from separate but concurrent user sessions being mixed up in the database? 

Now, absolutely, we can and should investigate this by inspecting the code, by writing tests for the integration between the product and the database and so on.  But we can additionally run lots of dialogs in parallel and then check for that kind of cross-contamination. Because the dialogs are random, the experiment is more "user-like" than running the same handful of canned dialogs over and over, which is what most regression tests will be doing, and so increases the changes of finding the particular state where there is scope for some kind of data issue.

---00-- 

The walker amplifies my exploratory testing by helping me to ask specific questions and generating data to help answer them, and by looking for potential problems that it knows about already. It operates unsupervised to cover the infinite space and is configurable to allow it to be focused when that's necessary.

It takes ideas from model-based testing, unit testing, property testing, and observability to create a custom tool that pays back the investment I have made in it in spades. But that investment has been incremental, the first version was a horrible bash script, just enough to navigate a dialog randomly against an early skeleton of the product. The current version is a mixture of Javascript for the UI and Java for the walker itself, built on Graphwalker, and in the last year I have used Cursor for many of the changes, notably in building the UI.

I have been talking about exploring with automation on this blog for a long time and I have been actually exploring with automation for a very long time. I find it a productive approach for starting to address the problem of the infinite space inside the infinite loop and, to be honest, I also wouldn't want to work any other way.
Image: Ivan Slade on Unsplash  

This post is a summary of a talk (slides are here) I gave at  EC Utbildning and Ministry of Testing Cambridge recently.

Popular posts from this blog

Meet Me Halfway?

  The Association for Software Testing is crowd-sourcing a book,  Navigating the World as a Context-Driven Tester , which aims to provide  responses to common questions and statements about testing from a  context-driven perspective . It's being edited by  Lee Hawkins  who is  posing questions on  Twitter ,   LinkedIn , Mastodon , Slack , and the AST  mailing list  and then collating the replies, focusing on practice over theory. I've decided to  contribute  by answering briefly, and without a lot of editing or crafting, by imagining that I'm speaking to someone in software development who's acting in good faith, cares about their work and mine, but doesn't have much visibility of what testing can be. Perhaps you'd like to join me?   --00-- "Stop answering my questions with questions." Sure, I can do that. In return, please stop asking me questions so open to interpretation that any answ...

The Best Programmer Dan Knows

  I was pairing with my friend Vernon at work last week, on a tool I've been developing. He was smiling broadly as I talked him through what I'd done because we've been here before. The tool facilitates a task that's time-consuming, inefficient, error-prone, tiresome, and important to get right. Vern knows that those kinds of factors trigger me to change or build something, and that's why he was struggling not to laugh out loud. He held himself together and asked a bunch of sensible questions about the need, the desired outcome, and the approach I'd taken. Then he mentioned a talk by Daniel Terhorst-North, called The Best Programmer I Know, and said that much of it paralleled what he sees me doing. It was my turn to laugh then, because I am not a good programmer, and I thought he knew that already. What I do accept, though, is that I am focussed on the value that programs can give, and getting some of that value as early as possible. He sent me a link to the ta...

My Adidas

If you've met me anywhere outside of a wedding or funeral, a snowy day, or a muddy field in the last 20 years you'll have seen me in Adidas Superstar trainers. But why? This post is for April Cools' Club .  --00-- I'm the butt of many jokes in our house, but not having a good memory features prominently amongst them. See also being bald ("do you need a hat, Dad?"), wearing jeans that have elastane in them (they're very comfy but "oh look, he's got the jeggings on again!"), and finding joy in contorted puns ("no-one's laughing except you, you know that, right?") Which is why it's interesting that I have a very strong, if admittedly not complete, memory of the first time I heard Run DMC. Raising Hell , their third album, was released in the UK in May 1986 and I bought it pretty much immediately after hearing it on the evening show on Radio 1, probably presented by Janice Long, ...

Notes on Testing Notes

Ben Dowen pinged me and others on Twitter last week , asking for "a nice concise resource to link to for a blog post - about taking good Testing notes." I didn't have one so I thought I'd write a few words on how I'm doing it at the moment for my work at Ada Health, alongside Ben. You may have read previously that I use a script to upload Markdown-based text files to Confluence . Here's the template that I start from: # Date + Title # Mission # Summary WIP! # Notes Then I fill out what I plan to do. The Mission can be as high or low level as I want it to be. Sometimes, if deeper context might be valuable I'll add a Background subsection to it. I don't fill in the Summary section until the end. It's a high-level overview of what I did, what I found, risks identified, value provided, and so on. Between the Mission and Summary I hope that a reader can see what I initially intended and what actually...

Going Underground

The map is not the territory. You've heard this before and I've quoted it before . The longer quote (due to Alfred Korzybski) from which the snappy soundbite originated adds some valuable context: A map is not the territory it represents, but, if correct, it has a similar structure to the territory, which accounts for its usefulness. I was thinking about that this week as I came to a product new to me but quite mature with a very rich set of configuration options. When I say rich , I mean — without casting any shade, because I have been there and understand — it is set in multiple locations, has extensive potential effects, and is often difficult to understand.  For my current project I consider it crucial to get a non-shallow view of how this works and so I began to explore. While there is some limited documentation it is, as so often, not up to date so mostly I worked in the codebases. Yes, plural, because this product spans multiple r...

How do I Test AI?

  Recently a few people have asked me how I test AI. I'm happy to share my experiences, but I frame the question more broadly, perhaps something like this: what kinds of things do I consider when testing systems with artificial intelligence components .  I freestyled liberally the first time I answered but when the question came up again I thought I'd write a few bullets to help me remember key things. This post is the latest iteration of that list. Caveats: I'm not an expert; what you see below is a reminder of things to pick up on during conversations so it's quite minimal; it's also messy; it's absolutely not a guide or a set of best practices; each point should be applied in context; the categories are very rough; it's certainly not complete.  Also note that I work with teams who really know what they're doing on the domain, tech, and medical safety fronts and some of the things listed here are things they'd typically do some or all of. Testing ...

On Herding Cats

Last night I was at the Cambridge Tester meetup for a workshop on leadership. It was a two-parter with Drew Pontikis facilitating conversation about workplace scenarios followed by an AMA with a group of experienced managers. I can't come to work this week, my cat died. Drew opened by asking us what our first thoughts would be as managers on seeing that sentence. Naturally, sadness and sympathy,  followed by a week ? for a cat ? and I only got a day for my gran! Then practicalities such as maybe there's company policy that covers that , and then the acknowledgement that it's contextual: perhaps this was a long-time emotional support animal . Having established that management decisions are a mixture of emotion, logic, and contingency Drew noted that most of us don't get training in management or leadership then split us into small groups and confronted us with three situations to talk through: Setting personal development goals for others. Dropping a clange...

Bottom-up or Top-down?

The theme at  LLEWT this year was Rules and constraints to ensure better quality.   My experience report concerned a team I'd been on for several years which developed (bottom-up) a set of working practices that we called team agreements.   The agreements survived "natural" variation such as people leaving and joining and even some structural reorganisation which preserved most of the team members but changed the team's responsibilities or merged in a few people from a disbanded team. The agreements did not, however, persist through a significant round of (top-down) redundancies where the team was merged with two others.  I'm interested in thinking about the ways in which constraints on how people work affect the work and whether there are patterns that could help us to apply the right kinds of constraints at times they are likely to be useful.  I'm going to use this post to dump my thoughts. My starting po...

The Best Testing I Could

Maaret Pyhäjärvi  posted the quote above on LinkedIn a few weeks ago. It speaks strongly to me so I asked Maaret if she'd written more (because she's written a lot ) on this specific point. She hasn't, and the sentence keeps coming back into my head and I'd like to understand why, so I thought I'd try to write down what I take from it. I think it's easy to skim read as some kind of definition of exploratory testing but that would be a mistake in my eyes.  Testing by Exploring  summarises how I felt last time I went into the definition in any depth and, for me, Maaret's quote is concerned with the why  but says nothing of the what or how . But let's say we have a shared definition of exploratory testing, would I make this statement this baldly generally? No, I probably would not. Why? First, it's written in very personal terms (" my time", "the best testing I could") and, second, as a  contex...

Not a Happy Place

  A few months ago I stopped having therapy because I felt I had stabilised myself enough to navigate life without it. For the time being, anyway.  I'm sure the counselling helped me but I couldn't tell you how and I've chosen not to look deeply into it. For someone who is usually pretty analytical this is perhaps an interesting decision but I knew that I didn't want to be second-guessing my counsellor, Sue, or mentally cross-referencing stuff that I'd researched while we were talking. And talk was what we mostly did, with Sue suggesting hardly any specific tools for me to try. One that she did recommend was finding a happy place to visualise, somewhere that I could be out of the moment for a moment to calm disruptive thoughts. (Something like this .) Surprisingly, I found that I couldn't conjure anywhere up inside my head. That's when I realised that I've always had difficulty seeing with my mind's eye but never called it out. If I try to imagine ev...