Skip to main content

On Being a Test Charter

Managing a complex set of variables, of variables that interact, of interactions that are interdependent, of interdependencies that are opaque, of opacities that are ... well, you get the idea. That can be hard. And that's just the job some days.

Investigating an apparent performance issue recently, I had variables including platform, product version, build type (debug vs release), compiler options, hardware, machine configuration, data sources and more. I was working with both our Dev and Ops teams to determine which of these seemed most likely in what combination to be able to explain the observations I'd made.

Up to my neck in a potential combinatorial explosion, it occurred to me that in order to proceed I was adopting an approach similar to the ideas behind chart parsing in linguistics. Essentially:
  • keep track of all findings to date, but don't necessarily commit to them (yet)
  • maintain multiple potentially contradictory analyses (hopefully efficiently)
  • pack all of the variables that are consistent to some level in some scenario together while looking at other factors
Some background: parsing is the process of analysing a sequence of symbols for conformance to a set of grammatical rules. You've probably come across this in the context of computer programs - when the compiler or interpreter rejects your carefully crafted code by pointing at a stupid schoolboy syntax error, it's a parser that's laughing at you.

Programming languages will generally be engineered to reduce ambiguity in their syntax in order to reduce the scope for ambiguity in the meaning of any statement. It's advantageous to a programmer if they can be reasonably certain that the compiler or interpreter will understand the same thing that they do for any given program. (And in that respect Perl haters should get a chuckle from this.)

But natural languages such as English are a whole different story. These kinds of languages are by definition not designed and much effort has been expended by linguists to create grammars that describe them. The task is difficult for several reasons, amongst which is the sheer number of possible syntactic analyses in general. And this is a decent analogy for open-ended investigations.

Here's an incomplete syntactic analysis of the simple sentence Sam saw a baby with a telescope - note that the PP node is not attached to the rest.


The parse combines words in the sentence together into a structures according to grammatical rules like these which are conceptually very similar to the kind of grammar you'll see for programming languages such as Python or in, say, the XML specs:
 NP -> DET N
 VP -> V NP
 PP -> P NP
 NP -> Det N PP
 VP -> V NP PP
 S  -> NP VP
The bottom level of these structures are the grammatical category of each word in the sentence e.g. nouns (N), verbs (V), determiners such as "a" or "the" (DET) and prepositions like "in" or "with" (P).

Above this level, a noun phrase (NP) can be a determiner followed by a noun (e.g. the product) and a verb phrase (VP) can be a verb followed by a noun phrase (tested the product) and a sentence can be a noun phrase followed by a verb phrase (A tester tested the product).

The sentence we're considering is taken from a paper by Doug Arnold:
 Sam saw the baby with the telescope
In a first pass, looking only at the words we can see that saw is ambiguous between a noun and a verb. Perhaps you'd think that because you understand the sentence it'd be easy to reject the noun interpretation, but there are similar examples with the same structure which are probably acceptable to you such as:
 Bill Boot the gangster with the gun
So, on the basis of simple syntax alone, we probably don't want to reject anything yet - although we might assign a higher or lower weight to the possibilities. In the case of chart parsing, both are preserved in a single chart data structure which will aggregate information through the parse:
In the analogy with an exploratory investigation, this corresponds to an experimental result with multiple potential causes. We need to keep both in mind but we can prefer one over the other to some extent at any stage, and change our minds as new information is discovered.

As a parser attempts to fit some subset of its rules to a sentence there's a chance that it'll discover the same potential analyses multiple times. For efficiency reasons we'd prefer not to spend time working out that a baby is a noun phrase from first principles over and over.

The chart data structure achieves this by holding information discovered in previous passes for reuse in subsequent ones, but crucially doesn't preclude some other analysis also being found by some other rule. So, although a baby fits one rule well, another rule might say that baby with is a potential, if contradictory, analysis. Both will be available in the chart.

Mapping this to testing, we might say that multiple experiments can generate data which supports a particular analysis and we should provide ourselves the opportunity to recognise when data does this, but not be side-tracked into thinking that there are not other interpretations which cross-cut one another.

In some cases of ambiguity in parsing we'll find that high-level analyses can be satisfied by multiple different lower-level analyses. Recall that the example syntactic analysis given above did not have the PP with the telescope incorporated into it. How might it fit? Well, two possible interpretations involve seeing a baby through a telescope or seeing a baby who has a telescope.

This kind of ambiguity comes from the problem of prepositional phrase attachment: which other element in the parse does the PP with the telescope modify: the seeing (so it attaches to the VP) or the baby (so NP)?

Interestingly, at the syntactic level, both of these result in a verb phrase covering the words saw the baby with the telescope and so in any candidate parse we can consider the rest of the sentence without reference to any of the internal structure below the VP. Here's a chart showing just the two VP interpretations:

You can think of this as a kind of "temporary black box" approach that can usefully reduce complexity when coping with permutations of variables in experiments.

The example sentence and grammar used here are trivial: real natural language grammars might have hundreds of rules and real-life sentences can have hundreds of potential parses. In the course of generating, running and interpreting experiments, however, we don't necessarily yet know the words in the sentence, or know that we have the complete set of rules, so there's another dimension of complexity to consider.

I've tried to restrict to simple syntax in this discussion, but other factors will come into play when determining whether or not a potential parse is plausible - knowledge of the frequencies with which particular sets of words occur in combination would be one. The same will be true in the experimental context, for example you won't always need to complete an experiment to know that the results are going to be useless because you have knowledge from elsewhere.

Also, in making this analogy I'm not suggesting that any particular chart parsing algorithm provides a useful way through the experimental complexity, although there's probably some mapping between such algorithms and ways of exploring the test space.  I am suggesting that being aware of data structures that are designed to cope with complexity can be useful when confronted with it.
Images: Doug Arnoldhttps://flic.kr/p/dxFxXG

Comments

  1. As mentioned, multiple experiments can generate data which supports a particular analysis but Test Data should be considered in both positive and negative ways. Poorly designed test data may not test all possible test scenarios which will hamper the quality of the software. Test data can be used for positive testing to verify a given set of input to a given function if it produces an expected result and also used for negative testing to test the ability of the program to handle unusual, extreme, exceptional, or unexpected input.

    ReplyDelete

Post a Comment

Popular posts from this blog

Can Code, Can't Code, Is Useful

The Association for Software Testing is crowd-sourcing a book,  Navigating the World as a Context-Driven Tester , which aims to provide  responses to common questions and statements about testing from a  context-driven perspective . It's being edited by  Lee Hawkins  who is  posing questions on  Twitter ,   LinkedIn , Mastodon , Slack , and the AST  mailing list  and then collating the replies, focusing on practice over theory. I've decided to  contribute  by answering briefly, and without a lot of editing or crafting, by imagining that I'm speaking to someone in software development who's acting in good faith, cares about their work and mine, but doesn't have much visibility of what testing can be. Perhaps you'd like to join me?   --00-- "If testers can’t code, they’re of no use to us" My first reaction is to wonder what you expect from your testers. I am immediately interested in your working context and the way

Testing (AI) is Testing

Last November I gave a talk, Random Exploration of a Chatbot API , at the BCS Testing, Diversity, AI Conference .  It was a nice surprise afterwards to be offered a book from their catalogue and I chose Artificial Intelligence and Software Testing by Rex Black, James Davenport, Joanna Olszewska, Jeremias Rößler, Adam Leon Smith, and Jonathon Wright.  This week, on a couple of train journeys around East Anglia, I read it and made sketchnotes. As someone not deeply into this field, but who has been experimenting with AI as a testing tool at work, I found the landscape view provided by the book interesting, particularly the lists: of challenges in testing AI, of approaches to testing AI, and of quality aspects to consider when evaluating AI.  Despite the hype around the area right now there's much that any competent tester will be familiar with, and skills that translate directly. Where there's likely to be novelty is in the technology, and the technical domain, and the effect of

Testers are Gate-Crashers

  The Association for Software Testing is crowd-sourcing a book,  Navigating the World as a Context-Driven Tester , which aims to provide  responses to common questions and statements about testing from a  context-driven perspective . It's being edited by  Lee Hawkins  who is  posing questions on  Twitter ,   LinkedIn , Mastodon , Slack , and the AST  mailing list  and then collating the replies, focusing on practice over theory. I've decided to  contribute  by answering briefly, and without a lot of editing or crafting, by imagining that I'm speaking to someone in software development who's acting in good faith, cares about their work and mine, but doesn't have much visibility of what testing can be. Perhaps you'd like to join me?   --00-- "Testers are the gatekeepers of quality" Instinctively I don't like the sound of that, but I wonder what you mean by it. Perhaps one or more of these? Testers set the quality sta

Am I Wrong?

I happened across Exploratory Testing: Why Is It Not Ideal for Agile Projects? by Vitaly Prus this week and I was triggered. But why? I took a few minutes to think that through. Partly, I guess, I feel directly challenged. I work on an agile project (by the definition in the article) and I would say that I use exclusively exploratory testing. Naturally, I like to think I'm doing a good job. Am I wrong? After calming down, and re-reading the article a couple of times, I don't think so. 😸 From the start, even the title makes me tense. The ideal solution is a perfect solution, the best solution. My context-driven instincts are reluctant to accept the premise, and I wonder what the author thinks is an ideal solution for an agile project, or any project. I notice also that I slid so easily from "an approach is not ideal" into "I am not doing a good job" and, in retrospect, that makes me smile. It doesn't do any harm to be reminded that your cognitive bias

Play to Play

I'm reading Rick Rubin's The Creative Act: A Way of Being . It's spiritual without being religious, simultaneously vague and specific, and unerring positive about the power and ubiquity of creativity.  We artists — and we are all artists he says — can boost our creativity by being open and welcoming to knowledge and experiences and layering them with past knowledge and experiences to create new knowledge and experiences.  If that sounds a little New Age to you, well it does to me too, yet also fits with how I think about how I work. This is in part due to that vagueness, in part due to the human tendency to pattern-match, and in part because it's true. I'm only about a quarter of the way through the book but already I am making connections to things that I think and that I have thought in the past. For example, in some ways it resembles essay-format Oblique Strategy cards and I wrote about the potential value of them to testers 12 years ago. This week I found the f

Meet Me Halfway?

  The Association for Software Testing is crowd-sourcing a book,  Navigating the World as a Context-Driven Tester , which aims to provide  responses to common questions and statements about testing from a  context-driven perspective . It's being edited by  Lee Hawkins  who is  posing questions on  Twitter ,   LinkedIn , Mastodon , Slack , and the AST  mailing list  and then collating the replies, focusing on practice over theory. I've decided to  contribute  by answering briefly, and without a lot of editing or crafting, by imagining that I'm speaking to someone in software development who's acting in good faith, cares about their work and mine, but doesn't have much visibility of what testing can be. Perhaps you'd like to join me?   --00-- "Stop answering my questions with questions." Sure, I can do that. In return, please stop asking me questions so open to interpretation that any answer would be almost meaningless and certa

Test Now

The Association for Software Testing is crowd-sourcing a book,  Navigating the World as a Context-Driven Tester , which aims to provide  responses to common questions and statements about testing from a  context-driven perspective . It's being edited by  Lee Hawkins  who is  posing questions on  Twitter ,   LinkedIn , Mastodon , Slack , and the AST  mailing list  and then collating the replies, focusing on practice over theory. I've decided to  contribute  by answering briefly, and without a lot of editing or crafting, by imagining that I'm speaking to someone in software development who's acting in good faith, cares about their work and mine, but doesn't have much visibility of what testing can be. Perhaps you'd like to join me?   --00-- "When is the best time to test?" Twenty posts in , I hope you're not expecting an answer without nuance? You are? Well, I'll do my best. For me, the best time to test is when there

Rage Against the Machinery

  I often review and collaborate on unit tests at work. One of the patterns I see a lot is this: there are a handful of tests, each about a page long the tests share a lot of functionality, copy-pasted the test data is a complex object, created inside the test the test data varies little from test to test. In Kotlin-ish pseudocode, each unit test might look something like this: @Test fun `test input against response for endpoint` () { setupMocks() setupTestContext() ... val input = Object(a, OtherObject(b, c), AnotherObject(d)) ... val response = someHttpCall(endPoint, method, headers, createBodyFromInput(input) ) ... val expected = Object(w, OtherObject(x, y), AnotherObject (z)) val output = Object(process(response.getField()), otherProcess(response.getOtherField()), response.getLastField()) assertEquals(expected, output) } ... While these tests are generally functional, and I rarely have reason to doubt that they

A Qualified Answer

The Association for Software Testing is crowd-sourcing a book,  Navigating the World as a Context-Driven Tester , which aims to provide  responses to common questions and statements about testing from a  context-driven perspective . It's being edited by  Lee Hawkins  who is  posing questions on  Twitter ,   LinkedIn ,   Slack , and the AST  mailing list  and then collating the replies, focusing on practice over theory. I've decided to  contribute  by answering briefly, and without a lot of editing or crafting, by imagining that I'm speaking to someone in software development who's acting in good faith, cares about their work and mine, but doesn't have much visibility of what testing can be. Perhaps you'd like to join me?   --00-- "Whenever possible, you should hire testers with testing certifications"  Interesting. Which would you value more? (a) a candidate who was sent on loads of courses approved by some organisation you don't know and ru

README

    This week at work my team attended a Myers Briggs Type Indicator workshop. Beforehand we each completed a questionnaire which assigned us a personality type based on our position on five behavioural preference axes. For what it's worth, this time I was labelled INFJ-A and roughly at the mid-point on every axis.  I am sceptical about the value of such labels . In my less charitable moments, I imagine that the MBTI exercise gives us each a box and, later when work shows up, we try to force the work into the box regardless of any compatiblity in size and shape. On the other hand, I am not sceptical about the value of having conversations with those I work with about how we each like to work or, if you prefer it, what shape our boxes are, how much they flex, and how eager we are to chop problems up so that they fit into our boxes. Wondering how to stretch the workshop's conversational value into something ongoing I decided to write a README for me and