Skip to main content

Testing All the Way Down, and Other Directions

This is a prettied-up version of the notes I based my CEWT #3 talk on.

Explore It! by Elisabeth Hendrickson is a classic book on exploratory testing that we read - and enjoyed - in the Test Team book club at Linguamatics a few months ago. Intriguingly, to me, although the core focus of the book is exploration, I found myself over and again drawn back to a definition given early on (p.6):
Tested = Checked + Explored
where, to elaborate (p.5):
Checking [is testing] that you design in advance to check that the implementation behaves as intended under supported configurations and conditions.
Exploratory Testing [is] simultaneously designing and executing tests to learn about the system, using your insights from the last experiment to inform the next.
And both of these aspects are necessary for testing to have been performed (p.4-5):
 ... you need a test strategy that answers two core questions:
 1. Does the software behave as intended under the conditions it’s supposed to be able to handle?
 2. Are there any other risks?
Which doesn't sound particularly controversial does it, at first flush, that testing should involve checking against what we know and exploring for other risks? So why did I find myself worrying away at it? Here's a few thoughts:

[1] The definition of testing is cast in terms of a mathematical formula. So I could wonder what ranges of values those variables could take, and their types, and what units (if any) they are measured in, and what kind of algebra is being used.

Perhaps it's something like numerical addition, and so Checked and Explored represent some value associated with their corresponding activity - bug counts or coverage or something; or maybe Checked and Explored are more akin to sets or multisets and I should interpret "+" as a kind of union operator; or, alternatively, perhaps Checked and Explored are simply Boolean values and "+" is something like and AND operation, which makes Tested only true if both Checked and Explored are true.

[2] I wonder whether Checked and Explored can overlap. Can one kind of action qualify as both checking and exploration? Can the same instance of an action qualify as both?

[3] Choosing the past tense to express a definition of Tested appears to wrap up two things: testing and the completion of testing.

[4] My intuition about what I'm prepared to regard as testing is itself tested by a statement like (p.5):
Neither checking nor exploring is sufficient on its own.
For instance, I can imagine circumstances in which I'd accept that the testing for some project would consist only of "checking" via unit tests. And I might do that on the basis of a high-level risk analysis of this project vs other projects in some product, timescale, the kind of application, the expected usage, resource availability and so on.

[5] Further, there's also the suggestion that exploratory testing alone is sufficient for testing. In this exchange, Elisabeth is helping a tester to see what their exploration of requirements is (p.121):
"Huh. Sounds like testing," I said. I waited to hear his response ... His face brightened. "Oh!" he exclaimed. "I get it. I hadn’t thought of it that way before. I am testing the requirements ..."
No checking is mentioned although, of course, it's possible to imagine pro-forma constraints that could be checked, for example that any document is in a language understandable by all parties that need to understand it, or that it has a particular nominated format, or that it's not written in invisible ink.

[6] There's another (implicit, non-formal) definition of testing on page 3:
interact with the software or system, observe its actual behavior, and compare that to your expectations 
which appears to tie testing to software and software systems (such as firmware, infrastructure, collections of software components, say) unless, perhaps, the "system" here can be interpreted as something much more general such as processes. And if exploring requirements can be testing then "system" probably is more general. But, if so, what might be meant by observing the "actual behaviour" of a functional specification or a bunch of Given, When, Thens?

[7] There's other well-known binary divisions of testing, which overlap in terminology with Explore It! and each other: Michael Bolton and James Bach have testing and checking while Paul Gerrard prefers exploring and testing. There's some discussion of the commonality between Bolton/Bach and Gerrard in this Twitter thread and Paul's blog and I wondered how Elisabeth's variant fitted into that kind of space.

"Picky, picky, picky," you may be thinking, "picky, pick, pick, pick ..."

Perhaps. But please don't get the idea that this essay is about pulling apart, or kicking, or rejecting Explore It! on this basis, because it's not. I've got a lot of time for the book and the definitional stuff I've referred to all occurs at the beginning, in the set up, with a light touch, to motivate a description of what the book is and is not about. The depth in the book is - intentionally - in the description and discussion and examples of exploration and how it can be utilised in software development generally and testing specifically.

Which is perhaps a little ironic, because thinking about the definition turned into this essay and is the exploration of what testing means for me. The starting point was the observation that I kept on returning to my model of the Explore It! view of testing even though the core focus of the book is exploration.

Instinctively, I am happy to regard the thoughts that lead to the kinds of questions I've listed above as testing: I am testing the definition of testing in the book using my testing skills to find and interpret evidence, skills such as: reading, re-reading, skim reading, searching, cataloguing, note-taking, cross-referencing, grouping related terms, looking for consistency and inconsistency within and outside the book, comparing to my own intuition, reflecting on the reaction of my colleagues in the book club when I said that I'd been distracted by the definition, filtering, factoring, staying alert, experimenting, evaluating, scepticism, critical thinking, lateral thinking, ...

I feel like I was performing actions which are at least consistent with testing activities:
  • I criticised the definition,
  • I challenged my model of the definition,
  • I analysed Elisabeth’s answers,
  • I reflected on the way I asked questions,
  • I wondered at why I cared about this,
  • I sought justification for all the above.

But I felt that the definition I was thinking about wouldn't classify what I was doing as testing.

Amongst the perennial testing problems (and Explore It!, as you would expect, talks about many of them) are these: which direction to follow and when to stop. In my case I decided, after my initial analysis, that I wanted to continue and that I'd do so by contacting Elisabeth herself and asking her questions about the definitions, her use of them in the book and her views on them today. And she graciously agreed to that.

With additional evidence from our conversation I was able to filter out some of the uncertainties I had and to refine my model of the system under test, if you will. I could then ask further questions and refine my model still further. I could then switch and ask questions of my model: test the model. Perhaps I think that some section of the model is underdeveloped because it doesn't stand up to questioning, then I could, and did, go back to the book and re-read sections of it, again adding data to my model. And I could correlate data from any of them, I could choose what approach to take next based on what I'd just discovered, and so on.

These things are all completely analogous (for me) to the exploration of some application when software testing. And I can take exactly this kind of approach when I'm reading requirements, when talking to the product owner, when reviewing proposals for new internal systems, when looking at my own ideas about how we might organise our team, when I'm thinking about talks or blog posts that I'm writing, ... More or less anything can have this kind of exploratory analysis applied to it, I think.

It can even, to take it to another - meta - level, be applied to the analysis itself: testing the testing. For example, when I was talking to Elisabeth I wanted to do review what I said, how Elisabeth responded, how I interpreted her responses and so on, to understand whether:
  • I felt like I was getting my point across clearly; and, if not, then whether I could find another way such as giving examples or reframing or using different words.
  • I could see that the answers helped to resolve some point of uncertainty for me; and, if not, wondering why not: perhaps it was my setup (the context in which I placed the question) or some misapprehension in my model.
  • I was refining my model with any given experiment; and, if not, then ask whether the line of investigation is valid or worthwhile.
  • ...

Further, when I was testing the definition of testing in Explore It! I was curious about why I cared, and so tried to understand that. How? By testing it!  A couple of questions recurred and required strong answers from me:
  • What value do I see in this exercise, and who for?
  • Am I reading too much into the definitions, building ideas on something that isn't intended to bear deep analysis?

And the techniques that I choose to use for these kinds of analyses are themselves testing techniques such as questioning, review, idea generation, comparison, critical thinking, and so on. Not everything that I am applying my testing to necessarily exhibits "behaviour" (as in Elisabeth's second definition) but it can yield information in response to experiments against it.

At some point, I cast around for other views of testing. It's not uncommon to view testing as a recursive activity. In his keynote at EuroSTAR 2015 Rikard Edgren said this beautiful thing:
Testing is simple: you understand what is important and then you test it.
Adam Knight has a couple of really nice blogs on fractal exploratory testing, and presented a talk on it at Linguamatics recently too.  He argues that, in exploratory testing, each exploration uncovers points which can themselves be explored, and so on, down and down and down, with the same techniques applicable in the same kinds of ways at each step:
as each flaw .. is discovered ... [a] mini exploration will result in a more targeted testing exploration around this feature area
I enjoy this insight. And I have a lot of sympathy for that view of a possible traversal of a testing space. I feel like I follow that pattern while I'm testing. But I also feel that that kind of self-similar structure applies in other ways than simply increasing resolution at each step.  For me, testing can be done across, and around, and inside and outside, and above and below, and at meta levels of a system.

Which you might be happy enough to accept. But I also think that these different dimensions of testing can be taking place in different planes, looking in different directions, seeking different goals, and often many of these at the same time. Imagine this scenario, where I've been asked to test some product feature or other, and I have access to the product owner (PO):
  • While I am putting my questions to the PO, and getting her answers, I am interpreting her response and feeding that data to my model of the system we're testing, and so asking questions of the model.
  • But I'm also wondering whether I could have got more helpful answers to my questions if I'd phrased them a different way and evaluating whether or not I should risk upsetting her now by re-asking, or wait for another opportunity to practise a different question format. 
  • I take a high-level view to find out what the stakeholders want from the testing I'm doing, which makes me question whether what I've done returns that value, which in turn makes me question what I'm doing.
  • I want to find out which stakeholders have useful information about which aspects of the feature. By talking to them I begin to understand where they are reliable, their degrees of uncertainty, the clarity of their vision. While I'm doing this I'm testing their ability to express their opinion, and I'm feeding that into my model by adding uncertainties.
  • At the same time, I'm running an ad hoc experiment against the system based on the PO's data and noticing out of the corner of my eye that some of the text on the dialog we need to use is misaligned, and I recall that there have been similar examples in the past on other dialogs and so I shift my model of that problem into focus.
  • As I start thinking about it, I check myself and realise that I've missed what the PO is saying. 
  • I review that decision and curse myself.
  • And then I observe something that seems at odds with what the PO is saying. It could be that the software is wrong, or it could be that my model of the PO's view of the world is wrong, or the PO's view of the world or the PO's expression of their view of the world, or something else. I frame another experiment - perhaps more questions - to try to isolate whether and where there's an issue. 

And so on and so on and so on. Sometimes multiple activities feed into another. Sometimes one activity feeds into multiple others. Activities can run in parallel, overlap, be serial. A single activity can have multiple intended or accidental outcomes, ... By the definition in Explore It! as I interpret it, only some parts of this are testing. But, for me, it's just testing: all the way down, and the other directions.

I started off by being curious about a definition and then about my own curiosity and then about the value of either of those things. That lead to some interesting thoughts and a very enjoyable exchange and some introspection, and indeed to this essay. But, when analysing something, a natural question to ask can be: well, what are the alternatives?

It might not always be regarded as within a tester's remit to come up with alternatives but, as a testing tool, finding or generating alternatives is very useful.  Perhaps unsurprisingly there are numerous alternatives available and Arborosa's blog post What is Testing?  lists many. Taking inspiration from Michael Bolton's training session at Linguamatics I tried to create one that reflected testing for me, and this is what I came up with:
Testing is the pursuit of actual or potential incongruity.
There is no specific technique; it is not limited to the software; it doesn't have to be linear; there don't need to be requirements or expectations; the same actions can contribute to multiple paths of investigation at the same time; it can apply at many levels and those levels can be distinct or overlapping in space and time.

That's my idea so far. Feel free to test it.

Particular thanks to Elisabeth Hendrickson for being open to my questions.

Edit: I later wrote some more about how I arrived at the specific wording in The Anatomy of a Definition of Testing.

Image: You Are Not So Smart


  1. Thanks for an interesting article, I am not sure about the full coverage of that sentence:
    "Checking ...behaves as intended"
    I would say "as assumed" since in checking we are making assumptions in advance and test by it - it could be a Negative Testing assumption, which we hope to dispel.
    @halperinko - Kobi Halperin

    1. Hi Kobi, I'm glad you found it interesting, thanks.

      That quote comes from Explore It! and Elisabeth's definitions of testing that I've seen typically have some notion of expectation in them (see


Post a Comment

Popular posts from this blog

Notes on Testing Notes

Ben Dowen pinged me and others on Twitter last week , asking for "a nice concise resource to link to for a blog post - about taking good Testing notes." I didn't have one so I thought I'd write a few words on how I'm doing it at the moment for my work at Ada Health, alongside Ben. You may have read previously that I use a script to upload Markdown-based text files to Confluence . Here's the template that I start from: # Date + Title # Mission # Summary WIP! # Notes Then I fill out what I plan to do. The Mission can be as high or low level as I want it to be. Sometimes, if deeper context might be valuable I'll add a Background subsection to it. I don't fill in the Summary section until the end. It's a high-level overview of what I did, what I found, risks identified, value provided, and so on. Between the Mission and Summary I hope that a reader can see what I initially intended and what actually

69.3%, OK?

The Association for Software Testing is crowd-sourcing a book, Navigating the World as a Context-Driven Tester , which aims to provide responses to common questions and statements about testing from a context-driven perspective . It's being edited by Lee Hawkins who is posing questions on Twitter ,  LinkedIn ,  Slack , and the AST mailing list and then collating the replies, focusing on practice over theory. I've decided to contribute by answering briefly, and without a lot of editing or crafting, by imagining that I'm speaking to someone in software development who's acting in good faith, cares about their work and mine, but doesn't have much visibility of what testing can be. Perhaps you'd like to join me?   --00-- "What percentage of our test cases are automated?" There's a lot wrapped up in that question, particularly when it's a metric for monitoring the state of testing. It's not the first time I've been asked either. In my

Why Do They Test Software?

My friend Rachel Kibler asked me the other day "do you have a blog post about why we test software?" and I was surprised to find that, despite having touched on the topic many times, I haven't. So then I thought I'd write one. And then I thought it might be fun to crowdsource so I asked in the Association for Software Testing member's Slack, on LinkedIn , and on Twitter for reasons, one sentence each. And it was fun!  Here are the varied answers, a couple lightly edited, with thanks to everyone who contributed. Edit: I did a bit of analysis of the responses in Reasons to be Cheerful, Part 2 . --00-- Software is complicated, and the people that use it are even worse. — Andy Hird Because there is what software does, what people say it does, and what other people want it to do, and those are often not the same. — Andy Hird Because someone asked/told us to — Lee Hawkins To learn, and identify risks — Louise Perold sometimes: reducing the risk of harming people —

Testing is Knowledge Work

  The Association for Software Testing is crowd-sourcing a book, Navigating the World as a Context-Driven Tester , which aims to provide responses to common questions and statements about testing from a context-driven perspective . It's being edited by Lee Hawkins who is posing questions on Twitter ,  LinkedIn ,  Slack , and the AST mailing list and then collating the replies, focusing on practice over theory. I've decided to contribute by answering briefly, and without a lot of editing or crafting, by imagining that I'm speaking to someone in software development who's acting in good faith, cares about their work and mine, but doesn't have much visibility of what testing can be. Perhaps you'd like to join me?   --00-- "We need some productivity metrics from testers" OK. I'd like to help you meet your need if I can but to do that I'll need to ask a few questions. Let's start with these: Who needs the metrics? Is there a particular pr

My Favourite Tool

Last week I did a presentation to a software testing course at EC Utbildning in Sweden titled Exploring with Automation where I demoed ways in which I use software tools to help me to test. Following up later, one of the students asked whether I had a favourite tool. A favourite tool? Wow, so simple but sooo deep!  Asking for a favourite tool could make a great interview question, to understand the breadth and depth of a candidate's knowledge about tools, how they think about an apparently basic request with deep complexity beneath (favourite for what task, on what basis, in what contexts, over what timescale?  what is a tool anyway?) and how they formulate a response to take all of that into account. I could truthfully but unhelpfully answer this question with a curt Yes or No. Or I could try and give something more nuanced. I went for the latter. At an extremely meta level I would echo Jerry Weinberg in Perfect Software : The number one te

Enjoy Testing

  The testers at work had a lean coffee session this week. One of the questions was  "I like testing best because ..." I said that I find the combination of technical, intellectual, and social challenges endlessly enjoyable, fascinating, and stimulating. That's easy to say, and it sounds good too, but today I wondered whether my work actually reflects it. So I made a list of some of the things I did in the last working week: investigating a production problem and pairing to file an incident report finding problems in the incident reporting process feeding back in various ways to various people about the reporting process facilitating a cross-team retrospective on the Kubernetes issue that affected my team's service participating in several lengthy calibration workshops as my team merges with another trying to walk a line between presenting my perspective on things I find important and over-contributing providing feedback and advice on the process identifying a

Trying to be CEWT

I attend, enjoy, hopefully contribute to, and get a lot from, the local tester meetups and Lean Coffee  in Cambridge. But I'd had the thought kicking around for a long time that I'd like to try a peer workshop inspired by MEWT , DEWT , LEWT and the like. I finally asked a few others, including the local meetup organisers, and got mostly positive noises, so I decided to give it a go. I wrote a short statement to frame the idea, based on LEWT's: CEWT ( Cambirdge Exploratory Workshop on Testing ) is an exploratory peer workshop. We take the view that discussions are more interesting than lectures. We enjoy diverse ideas, and limit some activities in order to work with more ideas. and proposed a mission for an initial attempt to validate it locally on a small scale. Other local testers helped to refine the details in usual the testing ways - you know: criticism, questions, thought experiments, challenges, comparisons, mockery and the rest - and a list of potential at

Testing and Words

  The other day I got tagged on a Twitter thread started by Wicked Witch of the Test about people with a background in linguistics who’ve ended up in testing. That prompted me to think about the language concepts I've found valuable in my day job, then I started listing them, and then realised how many of them I've mentioned here over the years .   This post is one of an occasional series collecting some of those thoughts.  --00-- In The Complete Plain Words , Ernest Gowers notes, acidly, that: What appears to be a sloppy or meaningless use of words may well be a completely correct use of words to express sloppy or meaningless ideas. It surely sounds trite to say it but our choice of words can make a significant difference to how well our message is understood, and how we are judged. We choose from amongst those words we know, our lexicons . The more my lexicon agrees with yours, the greater our chance of us achieving a shared understanding when we converse. But lexic

The Ideal Test Plan

A colleague pinged me the other day, asking about an "ideal test plan" and wondering whether I could suggest something. Not without a bit more information, I said. OK, they said. Who needs the plan, for what purpose? I asked. Their response: it's for internal use, to improve documentation, and provide a standard structure. We work in a medical context and have strict compliance requirements, so I wondered aloud whether the plan is needed for audit, or to show to customers? It's not, they replied, it's just for the team. Smiling now, I stopped asking questions and delivered the good news that I had what they were looking for. Yes? they asked, in anticipation. Naturally I paused for dramatic effect and to enhance the appearance of deep wisdom, before saying: the ideal plan is one that works for you. Which is great and all that, but not heavy on practical advice. --00-- I am currently running a project at the Association for Software Testing and there is a plan for

Use the Force Multiplier

On Fridays I pair with doctors from Ada 's medical quality team. It's a fun and productive collaboration where I gain deeper insight into the way that diagnostic information is encoded in our product and they get to see a testing perspective unhindered by domain knowledge. We meet at the same time each week and decide late on our focus, choosing something that one of us is working on that's in a state where it can be shared. This week we picked up a task that I'd been hoping to get to for a while: exploring an API which takes a list of symptoms and returns a list of potential medical conditions that are consistent with those symptoms.  I was interested to know whether I could find small input differences that led to large output differences. Without domain knowledge, though, I wasn't really sure what "small" and "large" might mean. I prepared an input payload and wrote a simple shell script which did the following: make a