Sunday, October 14, 2018

And There's More

When new staff join Linguamatics they'll get a brief intro to each of the groups in the company as part of their induction. I give the one for the Test team and, as many of our joiners haven't worked in software before, I've evolved my spiel to be a sky-high view of  testing, how we service other parts of the company, and how anyone might take advantage of that service, even if they're not developing software.

This takes a whiteboard and about 10 minutes and I'll then answer questions for as long as anyone cares to ask. Afterwards we all go on our separate ways happy (I hope) that sufficient information was shared for now and that I'm always available for further discussion on that material or anything else.

I mentioned the helicopter perspective that I give to Karo Stoltzenburg when she was thinking about when and how to draw a testing/checking distinction in her Exploratory Testing 101 workshop for TestBash Manchester. I was delighted that she was able to find something from it to use in her session, and also that she produced a picture that looks significantly nicer than any version I've ever scrawled as I talked.

Karo's picture is above, her notes from the whole session are on the Ministry of Testing club, and below is the kind of thing I typically say to new staff ...

What many people think software testing is, if they've ever even thought for a second about software testing, goes something like this:
  • an important person creates a big, thick document (the specification) full of the things a piece of software is supposed to do
  • this spec gets given to someone else (the developer) to build the software
  • the spec and the software are in turn given to another person (the tester) who checks that all the things listed in the spec are in the software.

In this view of the world, the tester takes specification items such as "when X is input, Y is output" and checks the software to see whether Y is indeed produced when X is put in. The result of testing in this kind of worldview probably looks like a big chart of specification items with ticks or crosses against them, and if there are enough crosses in important enough places the software goes through another round of development.

While checking against specification can be an important part of testing, there's much more that can be done. For example, I want my testers to be thinking about input values other than X, I want them to be wondering what other values can give Y, I want them to be exploring situations where there is no X, or when X is clearly invalid, or when X and some other input are incompatible, or what kinds of things the software shouldn't do and under what conditions ...

That's all good stuff, but there's scope for more. I also want my testers to be wondering what the motivation for building this software is, who the users are, and whether the specification seems appropriate for a software product that meets that need for those users. I'd also expect them to think about whether the project, and the team — including themselves — is likely to be able to create the product, given that requirement and specification, in the current context. For example, is there time to do this work, is there resource to do this work, do the team have sufficient tooling, expertise, or other dependencies, ...?

Even assuming the spec is appropriate, the context is conducive to implementation, and the team are not blocked, you won't be surprised to find that more possibilities exist. I'd like the tester to be alert to factors that might be important but which might not be in the specification much at all. These might include usability, performance, or integration considerations, ...

For me, one of the joys of testing is the intellectual challenge of identifying the places where there might be risk and wondering which of those it makes sense to explore with what priority. Checking the specification can certainly be part of testing, but it's not all that testing is: there's always more.
Image: Jimmy CricketKaro Stoltzenburg

Sunday, October 7, 2018

My Goodness

The six presentations at CEWT #6  took different angles on the topic of good testing or good testers. Here's my brief summary of each of them:

Whatever good testing is, Karo Stoltzenburg said, it's not likely to be improved by putting a box around its practitioners. In fact, pretty much any box you care to construct will fit some other profession just about as well at it fits testing. Testers, like people in general, are individuals and it's in the combination of diverse individuals that we spread our risk of missing something and so increase our chances of doing the good testing we all want.

What makes a senior tester? That's the question Neil Younger has been asking himself, and his colleagues at DisplayLink. He shared some of his preliminary thoughts with us. Like Karo he wants to avoid boxes, or at least to reduce their rigidity, but against that there's a desire from some for objective criteria, some kind of definition of done that moves them from one box to another. A senior tester for Neil must naturally also be a good tester, and the draft criteria he shared spoke to the kinds of extras he'd expect such a tester to be able to provide. Things like mentoring, coaching, unblocking, improvement, awareness, and knowledge.

"Any kind of metric for testing that involves bug counts focusses on how we're sorting out the failures we find, not on team successes." Chris Kelly has seen metrics such as net promoter score applied to technical support and was musing out loud about whether metrics could be applied to testing. If they could, what should they look like? The discussion covered the difference between teams and individuals. Can a team member do good testing while the team fails? One way of judging success was suggested: ask the stakeholders whether they're getting what they want from your work.

In the limit, good testing would lead to perfect testing, Šime Simic proposes, perhaps a little teasingly. We aren't going to get there, naturally, but we can do things that help us along the road. In particular, he talked about knowing the mission, aligning the mission with stakeholder needs, testing in line with the mission, doing what we can inside the constraints that the project has, and then reflecting on how it went and how we might change next time in similar circumstances. Testing, he went on, should not exist in a vacuum, and it's unrealistic if not unfair to try to judge it apart from the whole development process. (References)

Because they have different needs it's perfectly possible, and indeed reasonable, for two people to have different views of the same piece of testing. When they are the test manager and the tester, however, it may lead to some problems. For the tester, on the project Helen Stokes and Claire Banks talked about, exercising the product was the primary requirement. For the manager, visibility of the work, and the results of the work, were imperative. "There's more to good testing than doing the tests" they conclude.

My own presentation was about how, assuming we could know what good testing is, it can be a challenge to know whether someone is capable of doing it. I talked about how this particular problem manifests in recruitment all the time, and about how testing skills can be be used to assess candidates for testing roles. (Slides)

Friday, September 28, 2018

CEWT Lean Coffee

At CEWT #6 we used Lean Coffee as a way to reflect on some of the threads we'd discussed during the day. Here's a few brief, aggregated comments and questions that came up.

Different Perspectives

  • Claire and Helen's talk was about how the testers and test manager on a project had a very different view on the quality of the testing.
  • The developer perspective is also interesting, and often different.
  • Whose needs are being met by the testing?
  • Which lens are we looking through: experience, knowledge, context, ..?
  • Good testing is inherently perspective-based.
  • The relative rule.
  • What about outside of software, e.g. in laboratory science?
  • Stop Working and Start Thinking.

What makes good tests?

  • Consistent, deterministic, a specific outcome.
  • Really?
  • What about if the software is non-deterministic?
  • Isn't testing about information?
  • Is gathering data (e.g. performance data) testing?
  • There needs to be a pass/fail.
  • Really?
  • Is the tester the best person to judge pass/fail?
  • A good test gives actionable information.
  • A good test has a mission.
  • A good test has to answer a question.
  • A good test has to be fun to run.

Is good testing perpetual?

  • If testing was thought to be good when you did it, does it remain good?
  • If testing was thought to be bad when you did it, does it remain bad?
  • Be wary of hindsight bias.
  • The context can change and more information can be available later on.
  • Testing needs to be judged in the context in which it was done.
  • Perhaps our standards have changed.
  • Is it ever fair to compare X then to Y now?
  • Requirements change.
  • Regression suites must evolve.
  • I wouldn't want to work in an environment where testing stagnated
  • I don't want to work on a production line.
  • Stand up for your (past) testing.
  • Impostor syndrome.
  • I won't compare myself to someone else.
  • There's more than one way to do good testing.

Monday, September 24, 2018

The Factor The Matter

At CEWT #6 we discussed what good testing and good testers are. We didn't set ourselves the mission of coming up with some kind of definition, we didn't explicitly take that mission on during the course of the day, and to my mind we didn't converge on anything particularly concrete that could form the basis of one either.

Reviewing my notes, I thought it might be interesting to just list some of the factors that were thrown into the conversation during the day. Here they are:
  • Good relative to what?
  • Good relative to who?
  • Good relative to when?
  • Good for what?
  • Good for who?
  • Good for when?
  • Goodness can be quantified.
  • The existence of bugs found by non-testers is a way to judge testers or testing.
  • Goodness cannot be quantified.
  • The existence of bugs found by non-testers is not a way to judge testers or testing.
  • Goodness can't be separated from context.
  • Goodness can't be separated from perspective.
  • The value and quality of testing is subjective.
  • Good testing as a team versus as an individual.
  • Diversity of thought and action is good.
  • Some testing can be good and bad at the same time.
  • On a project, all testing will not be of the same standard.
  • There's more than one way to do good testing.
  • You can't judge testing outside of the whole development process.
  • Ask a stakeholder: are you getting what you want from us?
  • Many people judge testing (or anything) on what's visible.
  • Is goodness a feature of intent, or outcome?
  • Is goodness only really meaningful in hindsight? (But how long to wait?)
  • When the context changes, the view of already-done testing might change.
  • Good is a box.
  • Does defining good create a dichotomy: must there be bad?
  • Does defining good provide a way for people to aspire to a level?
  • Does defining good provide a demotivation for people who feel they are not good?
  • What's specific to testing that would make it good?
  • Perfect is the logical extension of good. Can it exist?
  • What is a good test case? (Kaner)
  • Good testing is not just about running the tests.
  • Does good testing have to tie in to business practices?
  • Perhaps good testing is when the results of the testing change the testing?
  • You can prepare to do good testing, but you can't guarantee it.
  • A mission helps to promote good testing.
  • A healthy environment, with trust, promotes good testing.
  • Good testers are short with curly hair.
Image: PNG Meter 

Sunday, September 23, 2018

Testing vs Chicken

At CEWT #6 we were asking what constitutes good testing. Here's prettied-up versions of the notes I used for my talk.

One line of argument that I've heard in this context and others is that even though it's hard to say what good testing is, we know it when we see it. But there's an obvious potential chicken and egg problem with that.

I'm going to simplify things by assuming that the chicken came first. In my scenario, we'll assert that we can know what good testing would look like for a project which is very similar to projects we've done before, where the results of testing were praised by stakeholders. The problem I'll address is that we don't have anyone who can do that project, so we have to hire. The question is:
What can we do as recruiters, during recruitment, to understand the potential for a candidate to deliver that good testing?
I've been recruiting technical staff for around 15 years, predominantly testers in the last ten, and my approach has changed enormously in that time. Back in the early days, I would largely base my phone screen interviews around a chronological traversal of the candidate's CV asking confirmatory questions. Checking the candidate, if you like.

These days, a CV to me is more of a filter, and a source of potential topics for exploration. I have also spent a lot of time thinking about how I want to advertise positions, and about the kinds of information I want to give and get from each stage of the process, and how I'll attempt to do that.

I have a simple model to help me. I call it the Egg of Testing Recruitment.

The yolk is the core testing stuff; crucial to our identified needs. The white is the other stuff; important to functioning in our context. It supports the yolk.

Some people will tell you that eggs are easy to cook. Some people also think that recruitment is straightforward: identify a need, describe it, find the person who best matches it, hire them, relax. But eggs don't always come out the way the chef intended.

And recruitment likewise. Here'a few reasons why:
  • multiple important factors
  • limited time and engagements
  • a space of reasonable outcomes
  • a dynamic feedback system
That last one is particularly interesting: as a recruiter, be aware candidates will be looking to align themselves with your needs. If, for example, you do and say things that suggest you favour exploratory testing, then don't be surprised when answers which support their exploratory testing skills start to come.

But  recruitment is starting to sound a lot like testing: the extraction of relevant information from an arbitrarily complex system under various constraints. And, if it's testing, I'll want a mission. And if I had a mission I might phrase it something like this:

The kinds of materials you can usually expect in standard hiring rounds are:
  • a cover letter
  • a CV
  • a phone screen
  • a technical exercise
  • a face-to-face interview
And then there's others that are reasonably common these days, including:
  • social media
  • blog
  • personal website
  • open source projects
All of these hold data about the system under test ... erm, about the candidate. I know that some recruiters disregard the cover letter. I love a cover letter. First, it's more data. Second, it is an opportunity for the candidate to speak direct to me, in their own time, in their own words, about the fit of this role to them and them to this role.

When it comes to conversation and exercises, I use the Egg of Testing Recruitment to remind me of what I'm after.

The yolk: when I can interact with the candidate I tend to want to explore core skills that can only really be demonstrated interactively. I'll want to put the candidate in a position where they can't know everything, where there's ambiguity, and see how they deal with it.

Do they ask for clarification, do they tell me what their assumptions are, do they offer caveated answers, do they say "in this context ..", do they use safety language? In this respect I regard interviews as more like an audition -- asking the candidate to perform something like a testing task, and being able to explain their thought processes around it.

The white: I'll be looking for reporting, presentation, consistency and the like in the written material. I'll also be noting stuff that could be ways in to understanding other aspects, particular technical expertise that I can ask about, for example. I can't ask for demonstration of all skills, but I can ask behavioural questions such as "can you tell me about a time when someone doubted the value of testing or when someone asked you to justify your testing?"

In the real world, of course, the egg model looks very different.

The yolk and egg cannot be separated so cleanly. But that's OK. In the interview, I can be testing both at once. For example, on any answer the candidate gives I can be looking for consistency. I can gauge the correctness or reasonableness or depth of a series of answers and use them as checks on the candidate's reliability of answering.

Having explored the candidate using conversation and exercises I need to evaluate them. A job advert that reflects what you actually want helps here. (It's worth remembering that when you're writing it.)

This evaluation is again like testing; you've stopped because you've spent the time that is available. Of course you could have done more. Of course you could have taken alternative routes. But you didn't and now you have to report: what you did, what you found, and the values and risks associated with that.

In your day job this probably goes to a stakeholder who ultimately makes a decision. In recruitment scenarios, you may well also be the stakeholder. But that shouldn't alter the way you go about your business, unless it makes you care even more than you would normally to do a good job.

I think there's three major points here. To put yourself in a position to recruit testers who can do the kind of good testing you're after:
  • understand your mission
  • treat interviews as auditions
  • explore the candidate
Here's my slides:

Saturday, September 15, 2018

Look at the Time

I'll be quick. In the kitchen. With tea brewing. Talking to a developer about new code. Exploring  assumptions. An enjoyable conversation. A valuable five minutes.

A third party provides an object identifier to our code. We use it as a key because it's our understanding that the identifier is unique. We have never seen identifier collision.

Try again: we have never seen identifier collision at any given point in time.

Do we know that the third party will not recycle identifiers as the objects they represent are discarded? What would happen if they did?

No longer in the kitchen. Tea on desk.

Thursday, August 30, 2018

Boxing Clever

Meticulous research, tireless reiteration of core concepts, and passion for the topic. You didn't ask, but if you had done that'd be what I'd say about the writing of Matthew Syed based on You Are Awesomereviewed here a few months back — and now also Black Box Thinking.

The basic thesis of the latter is captured nicely in a blog post of his from last year:
Black Box Thinking can be summarised in one, deceptively simple sentence: learning from mistakes. This is the methodology of science, which has changed the world precisely because it is constantly updating its theories in the light of their failures. In a complex world, failure is inevitable. The question is: do we learn, or do we conceal and self-justify?
Who wouldn't want to learn from their mistakes, you might ask? Lots of us, it turns out. The aviation industry tends to come out well in Syed's analysis. Accidents, mishaps, and near-misses are reviewed for ways in which future flights might be less likely to repeat them, and the knowledge is shared across the board. Blaming is minimised in order that all participants are encouraged to share their evidence and thoughts.

The medical and healthcare industries, and also politicians, tend not to do so well. In these areas, blame culture and a fear of reprisals are said to hinder the extent to which mistakes are admitted to, investigated, and subsequently mitigated.

Atul Gawande's The Checklist Manifesto makes similar points, and prescribes the use of checklists as one way to mitigate the future risk. Syed spends a lot of time on the ways in which cultural changes in philosophy, mindset, and practice, need to be made in order to get to a point where the risks are identified, accepted, and then provoke some kind of positive action.

There's so much material packed so densely into this book that I can't do it justice here. In lieu of that, here's some of the entwined key threads as I saw them:
  • We live and work in complex systems 
  • ... where failures will happen.
  • A blaming culture is likely to result in lower visibility of issues and more ass-covering 
  • ... whereas open cultures encourage and support self-reporting.
  • A "production" failure should be seen as a learning opportunity
  • ... and a chance to reduce the risk of future such failures.
  • Use "development" failure as a tool
  • ... particularly within an iterative development environment.
  • Expertise comes from practice and feedback
  • ... but a mixture of theory and practice helps avoid local maxima.
  • A fixed mindset is less likely to innovate
  • ... and broadening our outlook makes creative connections more likely.
  • On the whole, we prefer narrative over data 
  • ... and when beliefs and data disagree, we tend to deny the data.
  • Understanding what to measure and record is key
  • ... and sometimes it's sensible to experiment to understand what to measure.
This last point in this list gives the book its title — the black box recorder on an aeroplane is often crucial in understanding the circumstances that lead to an incident — while the first point is hammered home repeatedly: there is often no one single root cause for which an individual can clearly be held responsible.

This complexity is itself hinted at in the list: there are many variables at play, and they are interconnected. There is generally no silver bullet, no quick-fix, no one size to fit all. On this point, in a particularly nice meta twist, Syed notes that the approaches espoused for learning, say, how to build a product can also be used on the approaches themselves — in order to learn better how to build, perhaps we first need to learn better how to learn.

On learning then, three things that I'm taking away from this book.

I have historically been sceptical when I hear people blithely say that we learn more from failure than success. Out of context, I still don't believe that's necessarily a given but I think perhaps now I have more nuanced thinking here.

First, using a generate-and-test approach in development, and treating each generation that doesn't improve our test metric a failure, we might say that the volume of failure drives our learning more than the final success. Syed gives the example of James Dyson who made thousands of incrementally different prototype vacuum cleaners before arriving at his first production model. Thousands of failures, each of which helped to point the way to success.

Alternatively, I wonder whether it might mean that that analysis of the differences between success and multiple failures allows us to understand the factors important to success in a way that simple (ahem!) success does not.

Also new to me, and hidden in a footnote (p. 172), there's an interesting term:
"Observational statistics" is a phrase that encompasses all the statistics drawn from looking at what happened. Randomised control trials are different because they encompass not merely what happened, but also construct a counterfactual for comparison.
That counterfactual is key; it helps to balance survivorship bias. A well-known example comes from the second world war: deciding where to add armour to planes based on where there are bullet holes in those that returned to base is to miss the massive value of the unobserved data. Those that got shot down and never made it back might well have been hit elsewhere. (For a brief summary see e.g. Mother Jones.)

Another footnote (p. 220) raises an interesting potential tension that I realise I've been aware of but perhaps never surfaced before:
Getting the manufacturing process running seamlessly is often about ironing out unwanted deviations. It is about using process controls and the like to reduce variation. Creative change is often about experimentation: in other words, increasing variation.
Sensitivity to variability, to the unknown, should be adjusted consciously based on the context in which we are operating. More frequently, it appears to me, we have a relatively fixed level of comfort which can compromise our ability to operate in one or other of the scenarios that Syed identifies.

Black Box Thinking, despite the repetition due to the interconnectedness of the ideas it puts forwards and despite its sardine tin consistency, is a book worth persevering with. It's helped me to both learn and reflect on many concepts I've been thinking about for some time myself. Here's a few:
Image: Amazon