Skip to main content

Posts

Showing posts from January, 2025

How do I Test AI?

  Recently a few people have asked me how I test AI. I'm happy to share my experiences, but I frame the question more broadly, perhaps something like this: what kinds of things do I consider when testing systems with artificial intelligence components .  I freestyled liberally the first time I answered but when the question came up again I thought I'd write a few bullets to help me remember key things. This post is the latest iteration of that list. Caveats: I'm not an expert; what you see below is a reminder of things to pick up on during conversations so it's quite minimal; it's also messy; it's absolutely not a guide or a set of best practices; each point should be applied in context; the categories are very rough; it's certainly not complete.  Also note that I work with teams who really know what they're doing on the domain, tech, and medical safety fronts and some of the things listed here are things they'd typically do some or all of. Testing ...

HS MS BS

  My daughter's high school requires its students to have access to a laptop at home for school work. The school's trust is a Microsoft shop so her computer needs MS Teams to be installed locally and she has access to the online Office 365 suite through a school account. No problem, right? Yeah, right, until it was. Last year sometime she complained to me that she couldn't log in. When she tried she was confronted with a dialog box "Your Organization requires you to change your pin" that wouldn't let her proceed with setting a " Hello PIN ".  I have an account on her machine (because of course ) and I found that I was getting the same problem and the same message. But which organisation is this? I don't want to sound too full of myself but I am the Lord of IT in our household organisation and I certainly didn't ask for this. Naturally, I did some research online and on her machine ... after reluctantly setting th...

Don't Know? Find Out!

In What We Know We Don't Know , Hillel Wayne crisply summarises a handful of research findings about software development, describes how the research is carried out and reviewed and how he explores it, and contrasts those evidence-based results with the pronouncements of charismatic thought leaders. He also notes how and why this kind of research is hard in the software world. I won't pull much from the talk because I want to encourage you to watch it. Go on, it's reasonably short, it's comprehensible for me at 1.25x, and you can skip the section on Domain-Driven Design (the talk was at DDD Europe) if that's not your bag. Let me just give the same example that he opens with: research shows that most code reviews focus more on the first file presented to reviewers rather than the most important file in the eye of the developer. What we should learn: flag the starting and other critical files to receive more productive reviews. You never even thought about that possi...