Saturday, May 25, 2019

Looking at Observability


The Test team book club is reading Guide: Achieving Observability from Honeycomb, a high-level white paper outlining what observability of a system means, why you might want it, and factors relevant to achieving and getting value from it.

It's not a particularly technical piece but it's sketched out to sufficient depth that our conversations have compared the content of the guide to the approaches taken in some of our internal projects, the problems they present, and our current solutions to them.

While I enjoy that practical stuff a great deal, I also enjoy chewing over the semantics of the terminology and making connections between domains. Here's a couple of first thoughts in that area.

The guide distinguishes between monitoring and observability.
  • monitoring: "Monitoring .. will tell you when something you know about but haven't fixed yet happens again" and "... you are unable to answer any questions you didn’t predict in advance. Monitoring ... discard[s] all the context of your events".
  • observability: "Observability is all about answering questions about your system using data", "rapidly iterate through hypothesis after hypothesis" and "strengthen the human element, the curiosity element, the ability to make connections."
I don't find that there's a particularly bright line between monitoring and observability: both record data for subsequent analysis and whether the system is (appropriately) observable depends on whether the data recorded is sufficient to answer the questions that need to be asked of it. I think this maps interestingly to conversations around checking and testing and the intent of the data gathering and later questions against that data.

Increasing observability will generally result in more data being collected. A couple of dimensions are important here:
  • high cardinality: refers to keys which can take many values. User, request, or session identifiers in a busy web service might be a good example. Being able to slice out data according to these kinds of variables allows analysis to take place at appropriate resolutions for given questions. 
  • richness: refers to the variety of data collected or, to put it another way, the amount of context that's associated with any "event" that the system records.
This relates nicely to a quote I pulled from Edward Tufte's Visualising Evidence last year, and chimes very much with my own experience:
When assessing evidence, it is helpful to see a full data matrix, all observations for all variables, those private numbers from which the public displays are constructed. No telling what will turn up. (p. 45)
This is one of the reasons that, at a small scale, I think Excel's pivot tables are such a valuable tool for quick, convenient, exploratory data analysis, and why one day I'll write a post about that.
Image: https://flic.kr/p/24JYSq

No comments:

Post a Comment