Saturday, January 25, 2020

Failure, Am I?

It's nothing personal I'm sure but at this week's Cambridge Agile Exchange Ben Mancini told me that I'm a failure. Ouch!

Failure, he says, is "the state or condition of not meeting a desirable or intended objective ... may be viewed as the opposite of success" and his hypothesis is that failure provides more learning than success but that we talk more about success than failure.

In his talk, Ben showed examples of people who have succeeded despite repeated failure, talked about the negative effects of failure, ways to reframe perceived failure, and the benefits of pushing through failure's cloud to its silver lining. Along the way he shared some of his own failures and invited us to share personal faux pas with our neighbours in the audience then look for the learnings in them.

One of the reasons I attend meetups is to be provoked into thought so, in no particular order ...

In my time I've certainly done things that I'd have preferred not to, and some things that others would have preferred I hadn't, and some things that were indisputably not what was required or expected by anyone. Perhaps I'm over-sensitive, but I wonder whether that makes me a failure, or just someone who has, on occasion, failed? If I'm going to call something a failure, I find that intuitively I want to judge actions rather than people.

Ben's definition of failure comes from Wikipedia. At first flush it seems reasonable but read further down the Wikipedia page and you'll find nuance that again accords with my own instinct. As with so many things, failure is subject to the relative rule: failure is failure to somebody at some time. The same event might be viewed differently by different people or by an individual in the moment and then later.

It's easy to say "failure is a better teacher than success" but historically I've been sceptical about it, at least stated as baldly as that: is it really better every time, for any person, for all kinds of learning? I changed my position slightly after reading Matthew Syed's excellent book, Black Box Thinking. I think learning from an action — positive or negative — requires reflection about (for example) what was done, what happened, the relationship between the two, other things that were done contemporaneously and other things that might have happened.

A failure might provoke more of that kind of reflection, for sure. As Richard Cook writes in his paper How Complex Systems Fail:
    ... all practitioner actions are actually gambles, that is, acts that take place in the face of uncertain outcomes. The degree of uncertainty may change from moment to moment. That practitioner actions are gambles appears clear after accidents; in general, post hoc analysis regards these gambles as poor ones. But the converse: that successful outcomes are also the result of gambles; is not widely appreciated.

For me, the key here is post-hoc. The same kind of learning might be taken from positive events if we reviewed them. To reflect on Ben's hypothesis: do we really talk more about success than failure? Which we, measured how?

I find Cook compelling on sociotechnical modes of failure, see e.g. Everybody Flirts, Fail Over, and Read it and Weep, and his system-wide perspective prompts another interesting question: whose learning are we interested in? The individual or the system?

In the talk, James Dyson was used as an example of someone who had failed many times before succeeding. His quest to create a bagless vacuum cleaner is well-documented and for me is an example of one context in which I'm comfortable to say that failure (on some specific interpretation) is indisputably a learning experience.

Dyson created thousands of incrementally different prototypes, iterating his way to one that had all of the functionality that he needed. Was each attempt a failure? Or was each attempt a step towards his desired outcome, a quantum of success? Setting up actions as experiments means that getting a result at all is the goal. Generate-and-test is a legitimate strategy.

Related, which parent hasn't allowed an action to proceed, confident that it will not work, because a bruised knee or a burnt finger or a low mark in the test appears to be the way that the offspring wants to learn a particular lesson. Advice could have taught it, but a painful experience can help too. How should we view this kind of event? From the child's perspective, in the moment, it's a painful failure. From the parent's perspective, in the moment, it's vindication, success. Who is right? For how long?

Most parents would try to set up those kinds of outcomes in a safe way. Assuming that learning from failure does have high value, I wonder whether it is diminished by happening in a safe environment? Might the scale of learning increase with jeopardy? But some failure could be terminal: should we learn to walk the tightrope by crossing Niagara Falls?

As a manager I've used the safe space tactic although I try to be open and explicit about it. Perhaps I'll set it up as "I would be delighted if you showed me I was wrong" or "If this works for you, then I'll have learned something too." I think of this as a way of making the effort into an experiment.

Some jobs are apparently set up for failure: a salesperson might expect to sell once in 100 opportunities. Is that 99 failures per success? I've heard it cast in this way: each rejection means one less before the sale. There is no fear of failure with this philosophy and, while those of us who cringe at the idea of working in sales might find it hard to believe, that kind of approach can be learned.

I wonder how to control the learning that comes from failure. It's well-known that machine learning approaches which rely on being "taught" by trying, failing, and being corrected  can pick up on unexpected aspects of their training material. Is there an analogue for human learning? Ben listed a bunch of authors in his talk, people who'd tried, tried, and tried again to be published despite numerous rejections. What was their learning? To be resilient? To sell themselves well? To find their tribe? To get better at writing?

Could it be that some of those people learned nothing through failure to convince an editor that they had a story worth telling? Could they, for example, simply be already resilient folk with large egos that needed satisfying? What about survivorship bias? Where are all the people who failed as many times but didn't ultimately get published? What was their learning? Was it greater than those who were published? How is that even measured?

My goal in this post was to spend a limited time to work through the right-now thoughts that were spurred by Ben's talk. I think I achieved that. Having seen them, you might decide that all I have is shallow, half-cooked, or just plain nonsense. If so, have I failed? If you liked the notes, have I succeeded? Could either outcome make me a success or a failure? To who? On what basis?

Thursday, January 23, 2020

Metric or Treat

A friend lent me two books on metrics. One was so thumpingly wrong I could not finish it. One was so joyfully right I had to read it end-to-end, despite the author's recommendation not to.

Dave Nicolette will be pleased to know that his book, Software Development Metrics, was the second of the two. Why so right? Because immediately I felt like I was in the hands of a pragmatist, a practitioner, and a personable guide. By page 8 he's chattily differentiated measures and metrics, identified a bunch of dimensions for projects, and then categorised metrics too. Along the way, he drops this piece of wisdom (p. 5):
The sort of development process you're using will influence your choice of metrics. Some metrics depend on the work being done in a certain way. A common problem is that people believe they're using a given process, when in fact they're working according to a conflicting set of assumptions. If you apply metrics that depend on the process being done correctly you won't obtain information that can help you steer the work or measure the results of process-improvement efforts. You have to measure what's really happening, regardless of the buzzwords people use to describe it.

For Nicolette, metrics face forwards (towards an emerging solution) or backwards (towards a predefined plan), are trailing indicators (give feedback on work done) or leading indicators (forecast future work), can have any of three functions (prediction, diagnosis, motivation), and can be used to steer work in progress or process improvements.

He provides axes on which to plot projects, acknowledging that the real world isn't quite as neat as the breakdown might suggest:
  • process model: linear (proceeds through gated phases), iterative (repeated refinement of the solution), time-boxed (iterative in increments), continuous flow (controls work in progress).
  • delivery mode: project (with a start and end date and some goal to achieve before delivery) or ongoing (repeated, frequent, incremental delivery).
  • approach: traditional (up-front planning followed by implementation) or adaptive (evolving plans based on implementation so far).

A set of metrics for both steering and process improvement are presented with Top Trump-style summaries to permit quick reference and comparison. (This is the point at which he advises not reading from start to finish.) These details include factors that are crucial for the success of the metric, and it's striking that they tend to require someone to make some effort to generate and/or record data. Good data.

There is no such thing as a free lunch.

The book has a table describing the applicability of metrics to project types but I found myself wanting some kind of visualisation of it, I think so that I could look for similarities and differences across the traditional and adaptive approaches:

I was interested by the similarities, although there are subtleties and caveats about usage in particular contexts that it's important to take note of. Other commentary on each of the metrics includes warnings about common anti-patterns and examples based on data provided in an accompanying spreadsheet.

The wisdom comes at regular intervals. For example when management wants a particular metric regardless of its validity (p. 42): "If you're required to report progress in this way then do so; just don't assume that you can use the numbers to make critical decisions about your work." There are other, more worthwhile, hills to die on.

If you (or your superiors) find yourselves tempted to attribute magic to the numbers, heed this warning: "metrics won't directly tell you the root causes of a problem; they'll only indicate when reality diverges from expectations or exceeds limits you've defined as 'normal.'" (p.108) Perhaps it's your expectation or your view of normal that needs refinement.

Finally, a motivation to try to find metrics that work for your stakeholders to help them work for you: "One of the key benefits of tracking planning predictability is that it can enhance stakeholder satisfaction ... When stakeholders know they can count on receiving more or less what they're told to expect, they feel confident in the delivery organization and offer greater autonomy and trust to development teams." (p.131)

As you might expect for a book that's designed to be dipped into the content can be repetitive across metrics which serve similar purposes. Despite this, and chapters which talk about using and reporting metrics in practice (with warnings), the book still comes in at a tight 160 pages and is an extremely easy read. A real treat.
Image: Manning

Saturday, January 11, 2020

Steady State

The seventh State of Testing survey is now open and continues its mission to "identify the existing characteristics, practices, and challenges facing the testing community today in hopes to shed light and provoke a fruitful discussion towards improvement." Results of previous rounds of the survey are available at the same link.

Kudos to PractiTest and Tea Time With Testers for reliably keeping on keeping on with this.

Thursday, January 9, 2020

My Software Under Test and Other Animals

What feels like a zillion years ago I wrote a few pieces for the Ministry of Testing's Testing Planet newspaper. Understandably, they've since mostly been replaced with much better stuff on the Dojo but MoT have kindly given me permission to re-run them here.


Let's not kid ourselves; we know that software always ships with bugs in it and that we’re likely to be asked about the release-worthiness of the current build at some point in the release cycle. Putting aside the question of whether or not testers should be making go/no-go release decisions, we have to be able to give a coherent, credible and useful response to that kind of request.

Such a response will be part of your testing or product story at a level appropriate to the person who asked. You might talk about whatever metrics you’re required to produce, or produce for your own benefit (and the value you perceive them to have) and perhaps performance requirements such as execution speed for important use cases, stability and robustness. You might refer back to initial requirements or specification items or user stories that didn’t make it or that didn’t make it entirely and if you have it, include feedback from any beta programmes, internal users, your operations team and so on ...

Read the rest of the ebook at My Software Under Test and Other Animals.
Image: Ministry of Testing

Skills Paradox

What feels like a zillion years ago I wrote a few pieces for the Ministry of Testing's Testing Planet newspaper. Understandably, they've since mostly been replaced with much better stuff on the Dojo but MoT have kindly given me permission to re-run them here.


You've got skills, but still you know you want more skills. Skills for testing, naturally, for the tools you use, for choosing which tools to use, for the domain you work in, the kind of product you work on, the environment your product is deployed in, for working with your colleagues, for reporting to your boss, for managing your boss, for selecting a testing magazine to read, for searching for information about all of the above.

You want to hone your skills, to develop, consolidate and extend them. You want to learn new skills that will help you do what you do more effectively, efficiently, efficaciously, or just effing better. You want to find ways to learn new skills more easily, and you know that learning skills is itself a skill, to be improved by practice. You want to grow the skill that tells you what skills you should seek out in order to accomplish the next challenge. You want to find ways to communicate skills to your colleagues and the wider testing community. You know skills are important. You need skills, as the Beastie Boys said, to pay the bills.

So what you probably don't want is to be told that increased skill can ultimately lead to an increased reliance on luck.

And if that's the case you should definitely not read Michael Mauboussin's The Paradox of Skill: Why Greater Skill Leads to More Luck, a white paper in which he says that, in any environment where there are two forces in competition, both striving to improve their skills, there will be a point when skill sets cancel each other out and outcomes will be more dependent on fortune than facility.  He cites a baseball example in which Major League batting averages have decreased since 1941 even though by parallel with other sports and their records (such as athletics and the time to run 100 metres) you'd expect the players, techniques, equipment, coaching and so on have all improved and hence also averages. 

The reason why they haven't, he concludes, is that batters don't bat in isolation and as their skills improve, so do the skills of the pitchers. It's a process where improvements on either side cancel out improvements on the other. In this scenario, chance events can have a disproportionate effect.

Much as I'd never want to roll out a conflict analogy to describe the relationship between Dev and Test *cough* it's easy to see this kind of arms race in software development. At a simple level: New code is produced. Issues are reported. Issues are fixed. Old code is wrapped in test. New issues become harder to find. Something changes. New issues are found. And so on. The key is what something we choose to change.

We'll often try to provoke new issues by introducing novelty - new test vectors, data, approaches and of course skills. But we can also echo Mauboussin and try to invoke chance interactions in areas we already know and understand to break through the status quo. For example, we might use time and/or repetition to increase our chances of encountering low-probability events. You can't control every parameter of your test environment so running the same tests over and over again will naturally be testing in different environments (while a virus checker is scanning the disk or an update daemon is reinstalling some services, or summer time starts etc). Or we could operate in a deliberately messy way during otherwise sympathetic testing, opening the way for other events to impinge perhaps by re-using a test environment from a previous round of test. James Bach likes to galumph through a product and then there are approaches like fuzzing and other deliberate uses of randomness in interactions with, or data for, our application.

Of course, there's still a skill at play here: choosing the most productive way to spend time to simulate chance. But if you can't decide on that you could always toss a coin.

Show Business

What feels like a zillion years ago I wrote a few pieces for the Ministry of Testing's Testing Planet newspaper. Understandably, they've since mostly been replaced with much better stuff on the Dojo but MoT have kindly given me permission to re-run them here.


It was Irving Berlin who famously said that there was no business like it but, for me, there’s no business without it. Irving’s version scans better, I’ll give him that, but mine captures an essential aspect of pretty much all successful projects with multiple participants: demonstration.

On a project, at some point or points, you’re going to clue your collaborators in on what you’re doing, or what you’ve done or what you reckon you will do. And also what you aren’t, didn’t or won’t.  If not you risk confusion about important stuff like responsibilities, scope, delivery, budget and whose turn it is to make the tea.

Your team mates are a great source of test ideas and, whether they’re on the project or not, should be interested in what’s going on in the part of the product you’re working on. We use a wiki at our place for documenting test plans, exploratory session charters, test tours, test results and so on. People can subscribe to any page and so watch and comment on a piece of work easily. We also have a daily standup, peer reviews, pair testing all of which encourage discussion, sharing, showing.

Bug reports. An old chestnut. But never forget that they are a public document of your research and another way to demonstrate and collaborate. They expose to others how to provoke a problem, how you assess the application’s behaviour and why you think it’s relevant. Take the opportunity to explain these things well and you’ll benefit everyone.

Other stakeholders, or just interested observers, might like to be kept appraised of progress and it’s an important part of a tester’s role to be able to provide them with reports. These can take many forms, from a quick chat while you’re making the tea (your turn again?) where you’ll need to be able to filter out and explain the high-level headline news to a last minute request being to attend a management discussion where detailed explanation of some aspect of your testing and the risks it’s exposed will be required.

You’ll encounter these, and other, situations all the time at work and when you do bear in mind these five Cs and do it as clearly, correctly, comprehensively, consistently and concisely as you can. Now, let’s go on with the show.

Tuesday, January 7, 2020

Peers Exchanging Ideas

Peers Exchanging Ideas: The AST Peer Conference Guide

The Association for Software Testing is an organisation dedicated to advancing the craft of software testing and one of the ways we do that is by promoting and sponsoring peer conferences.

When we say peer conference we don’t have a particular format in mind. Adrian Segar describes a peer conference as
a conference that is small, attendee-driven, inclusive, structured, safe, supportive, interactive, community-building, and provides opportunities for personal and group reflection and action.
We like that kind of framing as it provides room for many local variants, although we’d add that disseminating the results of the conference could be an important outcome too. If we had to boil down our own perspective, it would be something like this: peers exchanging ideas.

To help the conferences that we sponsor, we decided to write a checklist based on the experiences of the AST Board members and other organisers that we've spoken to or who have published their own reports.

I took the task of writing the first version but, as I collected material and review comments, the checklist grew into a guide, Peers Exchanging Ideas, available now from our Github account.

You can read the guide from start to finish to get an overview of the kinds of things you might want to consider for  a conference. It might seem like there’s a lot! Don’t worry, it’s definitely doable — we know, because we’ve done it! However, if you were just looking for a basic checklist of stuff to remember, the Appendices give that, with links back to details in the main text.

Despite our mission as an organisation, the guide isn't specific to software testing. We took a decision to make it generally applicable and we think it's got real value for those thinking of, or actually running, peer conferences.

We are extremely grateful to the conferences and our reviewers for helping to make this a thing. Please have a look, share it, perhaps try a peer conference for yourself, and let us know how you got on.