Friday, January 15, 2021

Cypress Thrill


 Last night I attended Cypress: beyond the "Hello World" test with Gleb Bahmutov. Here's the blurb:

Gleb Bahmutov, a Distinguished Engineer at Cypress will show how to write realistic Cypress tests beyond a simple "Hello World" test. We will look at reusable commands, starting and stopping servers, running tests on CI, creating page objects, and other advanced topics. Everyone who is just starting with Cypress or is an advanced user will benefit from attending this free meetup.

I'm always interested in building up my background knowledge around test tooling so a presentation from an expert with a bit of depth but still suitable for a relative newbie seemed ideal. I also thought I could take the opportunity to practice my sketchnoting on a technical talk, something I've failed at in the past.

Last things first, then: I still found sketchnoting hard. In this area, where I know concepts but not much specific, I don't have a good instinct for the level of detail to capture. I was pleased that the notes weren't a total scribble disaster, and I can see this morning that they jog my memory, so I'm counting that as a win.

In terms of the webinar's content, my lack of hands-on experience with Cypress meant that I sometimes had no context for the points being made, or they were at a lower implementation level than I need right now. However, because the talk was essentially a series of largely unrelated tips I never fell completely out of touch.

I got some of the background material that I was looking for, too. Anyone who's been around testing for a while will have seen chatter about the relative merits of Selenium and Cypress. For example, Jason Huggins, the creator of Selenium, says things like this:

It's true, tho. Cypress' in-browser, JS-only, trapped-inside-the-browser's-security-sandbox approach is the same architecture as the first version of Selenium. We abandoned it because it turned out to be a bad idea. I wish them luck though. Maybe they'll make it work this time.

Huggins is also on this Twitter thread, where Richard Bradshaw is trying to nuance the conversation:

My main point is it’s not Selenium vs Cypress. It’s a custom framework with many other libraries vs Cypress. That makes for a better comparison and also stops many negatives aspects bing slammed on Selenium, when really it’s the other libraries or poor code/design.

Naturally, Gleb talked about ways of executing code outside the browser using Node.js and also about advantages of running inside the browser such as being able to access browser-native objects. With that capability, some of the perceived weaknesses of Cypress can be worked around, for example by overriding window.open()to capture data about links opening in  another page.

He also covered the problem of moving between multiple domains (for example when a commerce site subcontracts payment processing to a third party) where they're looking at an approach that, conceptually at least, pushes state to a stack at the point of changing domain, and pops it on return. I think this is a related ticket in the Cypress github repo.

I get a kick out of listening to people who know and care deeply about a topic. It's clear that Gleb is one of them and an hour in his company has helped me to incrementally improve my knowledge of the state of browser automation.

Gleb has made his slides available and the talk was recorded: 

https://twitter.com/FriendlyTester/status/1278343153729384449?s=20

Tuesday, January 12, 2021

Practical AI

Last night I attended Using Artificial Intelligence, a webinar hosted by BCS SIGIST, the Special Interest Group in Software Testing of The Chartered Institute for IT. In different ways, the two presentations were both concerned with practical applications of AI technology.

The first speaker, Larissa Suzuki, gave a broad but shallow overview of machine learning systems in production. She started by making the case for ML, notably pointing out that it's not suited for all types of application, before running through barriers to deployment of hardened real-world systems.

Testing was covered a little, with emphasis on testing the incoming data at ingestion, and again after each stage of processing, and then again in the context of the models that are built, and then again when the system is live.

To finish, Larissa described an emerging common pipeline for getting from idea to usable system which highlighted how much pure software engineering there needs to be around the box of AI tricks.

In the second half, Adam Leon Smith walked us through three demonstrations of artificial intelligence tooling with potential application for testing. 

He showed us Evosuite (video, code), a library that, unsupervised, creates unit tests that cover a codebase. There's no guarantee that these are tests that a human would have made, and Adam noted a bias towards negative cases, but in some sense this tool captures a behavioural snapshot of the code which could be used to identify later changes.

In the next demo (video, code) Adam trained a model on images of magnifying glasses and used it to identify the search icon at Amazon's home page, an approach that might be used to check for the presence of expected icon types without requiring a fixed gold standard. Finally, he showed how synthetic test data could be generated by AI systems, using thispersondoesnotexist.com which creates photorealistic images of non-existent people as an example.

Friday, January 1, 2021

How To Test Anything (In Three Minutes)

 

I was very happy to contribute to QA Daily's Inspirational Talks 2021 series this week but, in case you're here for an unrealistic quick fix, I have to tell you that the three minutes in question is the length of the videos and not the time you'll need for testing.

So how do I test anything? These are the things I've found to be helpful across contexts:

  • I know what testing means to me.
  • I find tools that can me to achieve that testing.
  • I'm not afraid to start where I am and iterate.

If that sounds interesting, there's greater depth in How To Test Anything and the related (40 minute) talk that I did for OnlineTestConf in 2020:


Thursday, December 24, 2020

Trust Us: Push, Publicise, and Punish

In the recent peer conference organised by the Association for Software Testing and BCS SIGIST we asked ourselves the question Should the Public Care about Software Testing?

I summarised the presentations in Who Cares? a couple of weeks ago and now the AST and SIGIST have published a joint report which manages to pull together and contextualise both the presentations and a whole day of conversation into a coherent whole.

The report outlines a number of risks around contemporary software development that it's thought the general public are largely not aware of, but suggests that people should only need to care about software testing to the extent that they can trust that experts have exercised good judgement about where, what, how, why, and when to test. 

It goes on to propose three categories of approach for establishing public trust — push, publicise, and punish — where pushes are applied up front to influence behaviour during the development of a product; publicisation puts information into the public domain to help consumers ask the right kinds of questions; and punishments penalise undesirable behaviour and introduce additional practices to attempt to prevent it in future.

Standards can fit into all three of these categories and are interesting particularly because there has been recent controversy over the ISO 29119 standard for software testing. The report notes that if a testing standard is expected to be a proxy for a product quality standard, then it is risking trying to "drive software development from the back of the bus." It then offers what the participants considered were important factors for any attempt at a testing standard to consider.

The conference wasn't set up to make proposals on behalf of AST or SIGIST, or commit the organisations to a policy or a position, but the report does conclude with a few threads that recurred during the conference about the relationship between software testing and the public:

  • Software is made for humans, and humans should be at the centre of its development and use. This includes understanding human biases and accounting for them.
  • Software production has mixed incentives, notably tension between business needs and societal needs. These have to be balanced carefully and include incentives to test appropriately according to the context.
  • Testing software appropriately is important, but we testers should not fixate on testing for its own sake, nor on the craft of testing, above the bigger picture concerns of making the software work, and work safely, for its users.

The material created at the conference is jointly owned by the participants: Lalitkumar Bhamare, Fiona Charles, Janet Gregory, Paul Holland, Nicola Martin, Eric Proegler, Huib Schoots, Adam Leon Smith, James Thomas, and Amit Wertheimer. 

Read the full report: Should the Public Care About Software Testing?

Saturday, December 19, 2020

Exploring to the Choir


I attended Rob Sabourin's talk Experiences in Exploratory Test Automation for the Test Tribe this morning. If I had to summarise it I'd say something like this: 

  • it is a myth that exploratory testing cannot exploit programmable tools
  • exploratory testing is deliberate learning using whatever tools are appropriate
  • automation in exploratory testing can be one-shot, just good enough, targeted at learning

Rob is preaching to my choir, and I've written about ways that I combine exploratory investigation with automation many times over the years, for example:

Maaret Pyhäjärvi has long been an evangelist in this area and, while we're here, take a look her Intersection of Automation and Exploratory Testing presentation where she demos exploratory testing using approval tests.

Friday, December 18, 2020

A Remote Possibility

 

Last year, in the days before SARS-CoV-2, I wrote a guide to peer conferences for the Association for Software Testing. It didn't mention running a peer conference remotely. This year, I found myself setting up a peer conference between AST and the BCS Special Interest Group in Software Testing which had to be run remotely.

Much of the guide still holds, and in some respects organisational concerns are simplified without travel, accommodation, and catering to worry about. But — and it's a Sir Mix-a-Lot-sized but — we've all done enough calls now to understand how hard it is to get the vibe right during a lengthy video meeting with more than a couple of participants.

So what did we consider, what did we do, and how did it work out?

On the purely logistical front, we had a few decisions to make. AST and BCS have worldwide memberships so choosing a time that didn't disadvantage some members was impossible. In the end, we ran a one-day conference, on a Sunday, from 3pm to 10pm BST. If we'd run across multiple days we could potentially have changed the hours each day to spread the pain around. However, as with in-person conferences, we were sensitive to the tension between giving enough space to explore the topic and excluding those with other important demands on their time.

We decided to keep the number of participants reasonably low, conscious of the fact that it can be easy to zone out as numbers increase. The flipside of this is the risk that we might end up with too small a group to have a varied discussion. On this occasion our drop-out rate was 20% (about normal) and I didn't feel that the level of conversation was lacking. Side note: we asked everyone to keep their cameras on as much as possible to give us all a sense of being together and able to see, as well as hear, reactions.

The structure of this peer conference was LAWST-style: several presentations, each followed by an "open season" discussion. It's usual in these kinds of events for the first couple of discussions to take a disproportionate amount of time as many general topics are aired. For our conference, we decided to timebox presentations at 10 minutes and open season at 35 which meant we could easily stick to a schedule with 10 minute breaks every hour — something we felt was important for health reasons and to keep energy levels high — and be sure to get more than a couple of presentations in. We scheduled a long break at around half-time and we shared the schedule at the start of the day so that all participants knew what was coming.

As it wasn't going to be possible for everyone to present we needed a way to choose presentations. I circulated abstracts a few days before the conference and set up a Google Doc for dot voting. In retrospect, I probably over-engineered the doc a little by asking people to drag images when it would have been simple and just as functional to have them type "X" against the talks they wanted to see. 

Finally on the logistical side, we anticipated that some kind of administrative communication channel for the organisers would be needed. In the real world a quick glance, gesture, or note slid over the table would all be possible. In the virtual world we felt we needed something specific that we could be watching all the time, so we set one up in Slack (see below). Ultimately we hardly used it but I'd still have one next time just in case it was needed.

Which brings us to the software we used. In advance we thought our requirements included these things: video conferencing software that could stay on all day; the ability to have global, multi-user, and 1-1 chat; multiple channels for chat; threads in chat; the host able to mute and unmute participants; participants able to share their screens; participants able to see all other participants at all times; and breakout rooms.

Zoom satisfied many of these requirements, is familiar to most of us these days, and was readily available, so was a straightforward choice. What it didn't give us was the flexibility we wanted around chat but all of those gaps were filled by another familiar tool, Slack.

As it happened, the only listed feature we didn't use was breakout rooms. Our intention was to set them up during breaks but in the moment we never felt the need. Some side conversation happened in Slack and I think we mostly regarded the breaks as a welcome relief away from our keyboards. 

The facilitator, Paul Holland, didn't mute anyone as I recall, but he did unmute people a couple of times. This may have been helped by agreeing on general microphone etiquette: the presenter's mic would be up throughout open season but everyone else would mute unless their comment was live.

The final, and crucial, component that we considered was facilitation. It's traditional for AST events to manage discussion in open season with K-cards, where participants hold up coloured cards to show that they want to contribute to the discussion, and how:

  • Green: I have a new thread. 
  • Yellow: I want to say something on the current thread.
  • Red: I must speak now (on topic or admin).
  • Purple: I think this conversation has gone down a rat hole.

We did wonder about trying to use physical cards over video but felt that it would be too hard for the facilitator to monitor and also difficult for the participants to know they'd been seen. 

So instead we decided to experiment with electronic cards and Slack threads. It quickly evolved it into this:

  • We had a dedicated Slack channel for open season.
  • We had the convention of using a different coloured icon for each of green, yellow, and red K-cards
  • ... and we documented and updated the conventions as we went:

  • At the start of each presentation we placed a prominent comment into the channel to separate it from previous threads:

  • During the presentation and open season, participants added green cards with a brief comment into the channel:

  • During open season, the facilitator made one of the threads current by commenting into it with a traffic light icon and a note, "Active thread"
  • ... and while that thread was live, participants dropped yellow cards into the thread:

  • The facilitator picked comments to be live and invited the commenter to speak
  • ... and conversation continued in the thread until all comments were addressed.
  • The facilitator then picked a new thread from the channel and started again.

A couple of emergent behaviours were interesting and really improved things:

  • We started off intending to use the words "new", "same", "NOW!" for the K-cards, but participants quickly switched to icons. You can see this change in Paul's text about cards above.
  • We didn't ask for a note with a card, but it felt very natural to put one.
  • We initially asked participants to publish thread comments into the main channel too, but it was too noisy.
  • We found that some comments were made into the thread without cards. These were generally interesting asides that didn't merit conversation but increased the discussion's bandwidth.
  • We saw that side conversations took place inside the thread, again without cards, to explore some points of mutual interest to a few participants.
  • We started putting references and links to related material in a general channel rather than with the threads.

Paul's facilitation really helped with these aspects; he noted when people were trying things and suggested that we follow some of the patterns generally.

Although we had an icon for the red card we didn't need it on the day and we didn't define a rat hole card at all, although Eric Proegler managed to improvise one:


The conference went really well, with great conversation, room for everyone to make their points, and a real buzz from the participants. The thought we put into the organisation was well worth it, but I loved how adaptable we were was on the day too. 

When I do this again I will be happy to do use Slack threads for K-cards. I'd also like to find a way to introduce side conversations or breakout discussions but I'd want a model that didn't dampen down any of the vibe and momentum built up  in the conversation.

The participants at this peer conference were Lalitkumar Bhamare, Fiona Charles, Janet Gregory, Paul Holland, Nicola Martin, Eric Proegler, Huib Schoots, Adam Leon Smith, James Thomas, and Amit Wertheimer. Thank you to Adam Leon Smith, Eric Proegler, and Paul Holland for help with the organisation.

Friday, December 11, 2020

Plan, Do, Something, Act

Last night I attended a Heart of England Scrum User Group meetup where Mike Harris was asking So where did all this agile stuff come from? Luckily he was answering also: W. Edwards Deming.

Mike's presentation was a high-level overview of the history of Lean and Agile in which he traced back to foundational work done by Deming and then to Deming's influence Walter Shewhart who integrated scientific methodology and statistics into industrial quality practices.

I have read a little Deming but I'm motivated to look more deeply after this. In particular, Mike drew attention to the fact that Plan, Do, Study, Act and Plan, Do, Check, Act were, for Deming, very much not the same thing. I had never realised this. 

The paper Circling Back: Clearing up myths about the Deming cycle and seeing how it keeps evolving by Ronald D. Moen and Clifford L. Norman talks about the strength of Deming's feeling about it, quoting him:

They bear no relation to each other ... [PDSA] is a quality control program. It is a plan for management. Four steps: Design it, make it, sell it, then test it in service. Repeat the four steps, over and over, redesign it, make it, etc. Maybe you could say that [PDSA]  is for management, and the [PDCA] is for a group of people that work on faults encountered at the local level.
Their source is Proceedings from the U.S. Government accounting Office’s Roundtable Discussion Product quality — Japan vs. United States (1980) which looks like a fascinating document. Just prior to the words that Moen and Clifford cite, Deming says:

I think, to most people, quality assurance is figures that show where you have been, whereas quality control is a program for continual improvement.

Again that's not a distinction I'd come across before, at least not with those names. 

Given that I've done only a cursory review, my sense is that Deming is suggesting that both PDCA and PDSA are quality control programs, each wanting to incrementally act, inspect, and adapt, but that they operate at a different granularity and so will naturally have different real-world actvities.

Two of my own experiences come to mind. I have been struck many times by the fact that organisations will promote "agile" working practices yet persist in "big up-front design" for other activities, such as annual reviews. Management by do-what-I-say-not-what-I-do. 

I attended a Scrum Inc course last year and, when we were asked to summarise Scrum in 30 seconds or less, I encouraged my group to use PDCA to abstract away from the ceremonies and other process baggage. That kind of big picture view exposes the underlying iteration, inspection, and improvement cycle. The trainers said no-one had done that before on their courses and that, usually, people simply dive into the details of the practices.

What these anecdotes hint is that, while we can certainly argue the toss about S or C, there are probably more deep-seated concerns around people not being aware of the way they work, why they work that way, and the outcomes they create by doing so.

There's clearly plenty for me to dig into here. Mike shared his references after the talk:

P.S. I attended this meetup sitting in my freezing cold car outside the pool while my daughter was inside at swimming club. Drawing notes with icicle fingers by the light of my laptop screen was an interesting challenge and didn't result in much that was readable so I copied the scribble out tidily for here.