Sunday, October 17, 2021

Fix The Right Bugs

 

Earlier this year I did the Black Box Software Testing course in Bug Advocacy with the Association for Software Testing, and loved it.

That and other BBST courses are run in collaboration by both AST and Altom, and four members of the Altom team (Oana Casapu, Denisa Nistor, Raluca Popa, and Ru Cindrea) recently did a webinar, Bug Advocacy in the Time of Agile and Automation, on the RIMGEN mnemonic in the context of hard to reproduce bugs sitting around in a team's backlog.

Cem Kaner says that our mission as testers includes getting the right bugs off the backlog and fixed. The webinar described how we can work towards that by thinking about how to Reproduce, Isolate, Maximise, Generalise, and Externalise the issue, then reporting what we found using a Neutral tone.

Friday, October 15, 2021

But It Does Depend


The Association for Software Testing is crowd-sourcing a book, Navigating the World as a Context-Driven Tester, which aims to provide responses to common questions and statements about testing from a context-driven perspective.

It's being edited by Lee Hawkins who is posing questions on TwitterLinkedInSlack, and the AST mailing list and then collating the replies, focusing on practice over theory.

I've decided to contribute by answering briefly, and without a lot of editing or crafting, by imagining that I'm speaking to someone in software development who's acting in good faith, cares about their work and mine, but doesn't have much visibility of what testing can be.

Perhaps you'd like to join me?

 --00--

“Stop saying 'it depends' when I ask you a question”

I'm sorry about that, I understand that it can be frustrating to not get a direct answer to what you perceive is a direct question.

My motivation is to give you a sense of the variables that can influence my response, not to avoid answering, or to piss you off. (Well, not very often.)

For example, when you ask about project end date I could reply with one of

  • two weeks
  • it depends on Team X delivering their piece. I asked but they won't say when it'll be ready. Given past history, I'd guess between one and three weeks.

At some granularity they are both two weeks. Which would you find most helpful?

Did you say ... it depends?

Hmm.

But you're right. Both are good answers in different circumstances, with different needs, of different audiences.

When I compose an answer to your question I am using experience and context to decide which variables matter to you at this time. I am making assumptions about how much time you have, how deep you want to go, how much you care, and what you will do with the data. I am likely biased towards the data I have and not the lack of data I don't, and maybe even which data I think will make me look good, or bad.

This reasoning is necessarily imprecise because I don't know what your need is, or what you plan to do with the answer. Sometimes I'll ask, but you have been impatient with clarifying questions in the past and I have learned to be sparing with them.

Often my compromise across all of these variables is to give you an answer at a depth I guess will be acceptable and explain something about how I arrived at it, and what the risks around it are.  You then get the choice to dig or not.

You could help me reduce the risk of an answer you'd consider unhelpful by filling in the context when you ask. For example:

  • I have to do another stupid management Powerpoint no-one will read. What's your best guess for expected delivery time, no error bars allowed?
  • I have to prepare a detailed roadmap. What dependencies are there on your delivery, and what do they do to the end date?

One last thing: my intention is to help you and so I do my best not to answer with just "it depends." If I do that — outside of us joking around — please call me out on it.

Thursday, October 7, 2021

Parklife!


A significant part of testing is thinking up questions it might be interesting to ask, identifying ways that they could be asked, choosing a question and a way, and setting off. 

Parklife.

I find that I can usually think of more questions than I have time for. 

Parklife.

I find that there is usually more than one way to address them.

Parklife.

I find that after setting off, exploration usually throws up observations that prompt additional questions.

Parklife.

I know that I will never be able to address all of the questions in all of the ways.

Parklife.

So it's key to be able to pick the ones I think are most likely to give information which will be most important to the people who matter in a cost-effective way at the right time.

Parklife.

The rest have to be parked. 

Parklife.

The result? As testers, our view of the system under test will never be as high resolution as it could theoretically be. 

Parklife.

My advice? Accept that your vision will be blurred and embrace ...

Parklife!
Image: Wikipedia

Friday, October 1, 2021

#SomeEstimates

A while ago my team was asked for estimates on a customer project that had become both urgent and important. Unfortunately, but not unusually, there was a reasonable amount of uncertainty around the customer need and two possible approaches were being proposed. 

It fell to me to organise the team's response.

First off, should we refuse to estimate? Woody Zuill talks compellingly about #NoEstimates approaches. In Control, or the Fear of Losing It I summarised his perspective as:

We think that we estimate because it gives us control. In reality, we estimate because we fear losing control. The irony, of course, is that we aren't in control: estimates are inaccurate, decisions are still based on them, commitments are also based on them, projects overrun, commitments are broken, costs spiral, ...

Ron Jeffries has a typically nuanced take on the idea in The #NoEstimates Movement:

How to apply #NoEstimates isn’t entirely clear. Does it really mean that all estimates are bad? If not, which ones are OK? How can we tell the difference between an estimate that’s useful enough that we should do it, and one that is pernicious and never should be done?

And I find George Dinwiddie to be a pragmatic guide, noting in Software Estimation Without Guessing there are many ways to estimate and they do not all suit all people in all circumstances. The key is to find a useful approach at an appropriate cost, given the context.

In this case, I felt that we were being asked to help the project team to move past a decision point. My instinct was that analysis was probably more important than precise numbers, and I wanted to keep effort, and team interruption, to a minimum. 

This is what I did...

I drafted a document that listed the following for each of the two implementations (let's call them A and B):

  • what I understood were concrete requirements for each
  • assumptions the team would make in order to generate estimates
  • risks associated with each project, the process we were in, and estimating itself

I delivered this quickly and requested immediate feedback from the stakeholders. This clarified some aspects, identified things that I had missed or got wrong, and exposed differences in perspective amongst the sponsors. It also showed that I was taking the work seriously.

Next, I made a spreadsheet with a rough list of feature items we'd need to implement for each of A and B, and I passed that by the team to skim for obvious errors.

Finally, the team got together on a short call. We briefly kicked around tactics for estimating and decided between us to each give a low (or optimistic) and high (or pessimistic) estimate for each line item for each of A and B. We did this on the count of three to avoid biasing each other, and we wrapped up all of our uncertainties, worries, assumptions, and so on into the numbers. 

For each item I dropped the lowest low and highest high into the spreadsheet (like the example at the top) and totalled the values to give very crude error bars around potential implementation routes for each version of the project. 

I updated the document with this finding and delivered it back to the project with a recommendation that we de-risk by choosing B given the urgency of delivery. 

The stakeholders accepted the suggestion and my work was done.

Retrospecting, then: I was very happy with the process we bootstrapped here and I would use something like it again in similar circumstances to enable a decision.

To be clear, I would not trust the absolute numbers we created but I would have some faith that the relative comparisons are valuable. In our case, B was about half the size of A and this accorded with intuition about the amount of uncertainty and the complexity of A over B.

Also important is the context in which the numbers are set. Explicitly listing the assumptions, risks, and approach gives us a shared understanding and helps to see when something changes that might affect the estimates.

Choosing not to  unpack everyone's personal feelings on every number was a real efficiency gain. Gut instinct is built on data and experience and we can access it unconsciously and quickly. Taking a low and high number emphasises to stakeholders that there is uncertainty on the figures.

I tried to choose a pragmatic, context-based, approach to estimation, where the numbers might be somewhat brown* but, along with the contextual information, facilitated a decision. On another time, in another situation, I might have refused, or done something different. #SomeEstimates.

* I am indebted to Jason Trenouth for the concept of a brown number, so called because of the place they're pulled out of.

Tuesday, September 21, 2021

Capping it Off


I'm lucky that my current role at Ada Health gives me, and the rest of the staff, a fortnightly community day for sharing and learning.

I've done my, erm, share of sharing, but today I took advantage of the learning on offer to attend a workshop on our approach to making medical terminology accessible to non-experts, a presentation on how we manage our medical knowledgebase, another on the single sign-on infrastructure we're using in our customer integrations, and a riskstorming workshop using TestSphere to assess an air fryer.

So that would have been a great day by itself, but I, erm, capped it off by attending Capgemini's TestJam event, to see the keynotes by Janet Gregory and Lisi Hocke.

Janet talked about holistic testing, or the kinds of critical review, discovery, and mitigation activities that can take place at any point in the software development (and deployment, and release) life cycle. The foundation for all of this is good communication and relationships between the people and teams involved, and she often sees testers being the ones to cultivate that.

The key thing about a cycle is that there is no end. Release isn't where we wash our hands of the frickin' thing and relax, it's the point at which we can begin to observe what our customers are doing with it, and frame some hypotheses about how we could improve their experience. Testers should be here, framing experiments that feed into the next round of discovery that leads to planning and new features.


Experiments were the focus of Lisi Hocke's talk (slides), an experience report on an experiment she conducted at the company level to encourage teams to experiment with their own activities.

In a business with a large number of autonomous cross-functional teams, there was a perception that quality was a black box: no common perspective on what it meant, and hard to judge its level. Lisi, and others, co-ordinated an experiment to improve the quality culture in the company, hypothesising that transparency in approach and status, along with the sharing of ideas and techniques, would help to bring teams to a level where each of them had explicit test strategies and could talk about what quality meant to them.

Several teams volunteered and a few of them were selected as participants in a series of workshops which identified pain points, risks, and implicit test strategies. This was followed by the framing and running of experiments, each deliberately focused on improving one thing, with explicit hypotheses and criteria to judge success.

It was a lot of effort but definitely had some positive outcomes: lots of useful conversations, much more awareness of what was possible, and tangible improvements. Unfortunately there were also negatives: silos remained, some people felt inhibited from participating, and there was inertia to change. 

In addition, the overall project success criteria were only partially met. This might be acceptable in the first iteration, if not for the fact that the approach taken was so heavyweight that it was clear it wouldn't scale So, a second hypothesis: a leaner process with less hand-holding, and more facilitated peer-based activity could have the same kind of outcomes. 

Good news, it did! Bad news, COVID hit and other activities were prioritised.

Reflection on the experience still gave some useful learning, including

  • perhaps don't solve the team's problems but instead support the team in solving the problems
  • put effort into making improvement desirable by showing good outcomes
  • make the system reward the behaviours you want to see

Saturday, September 18, 2021

RIP Clive Sinclair


Sliding doors, naturally, but it feels like the Sinclair ZX Spectrum 16k I got as a combined birthday and Christmas present when I was a boy was significant in where I've ended up.

I recall with fondness the tedium-expectation opposition of typing in BASIC programs from printouts and then debugging them only to find that the monster was a letter M, and you were an asterisk and collision detection was a concept the author had only a passing grasp of.

I have nightmares about trying and failing to install several sets of RAM chips to upgrade the machine to 48k and instead ending up with a wobbly and unreliable external RAM pack. I mourn the times we had to take the whole computer back to the shop for repairs.

I regret spending my hard-earned paper round money on a Brother printer and then spending my hard-won free time trying to work out how to get it to print reliably, or at all. 

I can still feel the covers of the thick ring-bound manuals, introducing me to BASIC and helping me to write my own programs. It was magical when I realised there was an assembly language world beyond BASIC and that I could PEEK and POKE values directly into the heart of the computer!

Of course I read the monthly magazines religiously, and I played the games, played the games, played the games, ...

In retrospect, that was an amazing introduction to the pleasures and frustrations of computers and software, to the possibilities and the failures, to the often stark differences between desire and reality. It spurred my imagination and helped me to dream.


Thank you Clive Sinclair.
Image: Wikipedia

Friday, September 10, 2021

69.3%, OK?


The Association for Software Testing is crowd-sourcing a book, Navigating the World as a Context-Driven Tester, which aims to provide responses to common questions and statements about testing from a context-driven perspective.

It's being edited by Lee Hawkins who is posing questions on TwitterLinkedInSlack, and the AST mailing list and then collating the replies, focusing on practice over theory.

I've decided to contribute by answering briefly, and without a lot of editing or crafting, by imagining that I'm speaking to someone in software development who's acting in good faith, cares about their work and mine, but doesn't have much visibility of what testing can be.

Perhaps you'd like to join me?

 --00--

"What percentage of our test cases are automated?"

There's a lot wrapped up in that question, particularly when it's a metric for monitoring the state of testing.

It's not the first time I've been asked either. In my experience, it comes when someone has latched onto automating test cases because (a) they've heard of it, (b) test cases are countable, and (c) they have been tasked with providing a management-acceptable figure for the "Testing" value in a Powerpoint deck of several hundred slides mailed monthly to a large number of people who will not look at it. 

If that sounds cynical ... well, I suppose it is. But any cynicism over this particular measure doesn't mean I'm not interested in understanding your need and trying to help you get something that fulfils it. Can we talk about what you're after and why?

We can? Great!

I'll start. Some of the issues I have with the question as it stands are:

  • it seems to be perceived a measure of our testing
  • such a number would say nothing about the value of the testing done
  • the definition of a test case is moot
  • ... and, whatever they are, test cases are only a part of our testing
  • there's an implicit assumption that more automation is better
  • ... but automation comes with its own risks
  • ... and, whatever automation means, automated test cases are only a part of our test automation

If I look at how we test, and what we might call test cases, I can think of three ways I could answer your question right now:

  1. We don't have test cases in the sense I think the question intends. All of our ongoing testing is exploratory and, while we might document the results of the testing with automation, there is no sense in which a manual or scripted test case existed and was then automated. We score 0%.
  2. For the purposes of this exercise, I would be prepared to describe each assertion in our regression test suites a test case. As they would be our only test cases, all of them are automated. 100%!
  3. OK, we do have some items in a test case management system. These are historical release-time checks that (mostly) people outside the test team run through before we ship. I like to think of them more as checklists or jumping off points, but I'm realistic and know that some of my colleagues simply want to follow steps. Relative to the number of "automated test cases" there are few of them but if we include them in our calculation we'd bring the score down to, say, 99%.

Those answers don't seem very satisfactory to either of us do they? 

To me, at very best, this kind of metric covers a small slice of what we do and the assumptions underlying it are very questionable. To you, the metric matters less than some plausible number representing how well the testing is going to include in that monster Powerpoint deck.

I have some thoughts on that too:

  • testing, for me, is knowledge work and so notoriously hard to measure in simple numbers
  • testing does not exist in isolation from other product development activities
  • good testing can be done without the creation of artefacts such as test cases
  • metrics imposed without conversation and justification are likely to be viewed with suspicion
  • metrics are likely to be gamed when (perceived to be) used as a target, or to judge
  • starting with a list of artifacts (test cases, bug tickets, etc) is cart-before-horse
  • ... it's much better to ask first what you want to measure and why

So, for example, is the desire to measure customer satisfaction with the product? Is it to measure the testing contribution to that? Is it to see where time is being spent on certain kinds of activities that the business wants to stop? Is it to look for bottlenecks? Or something else?

If we do agree some kind of metrics, how can we reassure testers that they are not being judged, and that they should not pervert their working practices just to make the numbers look good?

We'll need something more than glib words.  Imagine you were told your performance would be judged on how many emails you sent. How would you react? Would you scoff at it but send more emails anyway? Would you send emails instead of having conversations? Would you care about the potential detrimental effects to you, others, the business? How could someone convince you to behave differently?

Finally, is there a real desire from you to look into sensible metrics with good intent and to act on the findings?

If so, then I will do all that I can to assist in getting something that is justifiable, that has explicit caveats, that is equitable, that is transparent, that acknowledges the messiness involved in its collection, that can be derived efficiently from data that we have, that sits within agreed error margins, and that reflects the work we're doing.

If not, then I'll ask you what kind of number will pass the cursory level of inspection that we both know it will receive, and I'll simply give you that: let's say 69.3%, OK?