Saturday, September 15, 2018

Look at the Time


I'll be quick. In the kitchen. With tea brewing. Talking to a developer about new code. Exploring  assumptions. An enjoyable conversation. A valuable five minutes.

A third party provides an object identifier to our code. We use it as a key because it's our understanding that the identifier is unique. We have never seen identifier collision.

Try again: we have never seen identifier collision at any given point in time.

Do we know that the third party will not recycle identifiers as the objects they represent are discarded? What would happen if they did?

No longer in the kitchen. Tea on desk.
Image: https://flic.kr/p/aCWUN5

Thursday, August 30, 2018

Boxing Clever


Meticulous research, tireless reiteration of core concepts, and passion for the topic. You didn't ask, but if you had done that'd be what I'd say about the writing of Matthew Syed based on You Are Awesomereviewed here a few months back — and now also Black Box Thinking.

The basic thesis of the latter is captured nicely in a blog post of his from last year:
Black Box Thinking can be summarised in one, deceptively simple sentence: learning from mistakes. This is the methodology of science, which has changed the world precisely because it is constantly updating its theories in the light of their failures. In a complex world, failure is inevitable. The question is: do we learn, or do we conceal and self-justify?
Who wouldn't want to learn from their mistakes, you might ask? Lots of us, it turns out. The aviation industry tends to come out well in Syed's analysis. Accidents, mishaps, and near-misses are reviewed for ways in which future flights might be less likely to repeat them, and the knowledge is shared across the board. Blaming is minimised in order that all participants are encouraged to share their evidence and thoughts.

The medical and healthcare industries, and also politicians, tend not to do so well. In these areas, blame culture and a fear of reprisals are said to hinder the extent to which mistakes are admitted to, investigated, and subsequently mitigated.

Atul Gawande's The Checklist Manifesto makes similar points, and prescribes the use of checklists as one way to mitigate the future risk. Syed spends a lot of time on the ways in which cultural changes in philosophy, mindset, and practice, need to be made in order to get to a point where the risks are identified, accepted, and then provoke some kind of positive action.

There's so much material packed so densely into this book that I can't do it justice here. In lieu of that, here's some of the entwined key threads as I saw them:
  • We live and work in complex systems 
  • ... where failures will happen.
  • A blaming culture is likely to result in lower visibility of issues and more ass-covering 
  • ... whereas open cultures encourage and support self-reporting.
  • A "production" failure should be seen as a learning opportunity
  • ... and a chance to reduce the risk of future such failures.
  • Use "development" failure as a tool
  • ... particularly within an iterative development environment.
  • Expertise comes from practice and feedback
  • ... but a mixture of theory and practice helps avoid local maxima.
  • A fixed mindset is less likely to innovate
  • ... and broadening our outlook makes creative connections more likely.
  • On the whole, we prefer narrative over data 
  • ... and when beliefs and data disagree, we tend to deny the data.
  • Understanding what to measure and record is key
  • ... and sometimes it's sensible to experiment to understand what to measure.
This last point in this list gives the book its title — the black box recorder on an aeroplane is often crucial in understanding the circumstances that lead to an incident — while the first point is hammered home repeatedly: there is often no one single root cause for which an individual can clearly be held responsible.

This complexity is itself hinted at in the list: there are many variables at play, and they are interconnected. There is generally no silver bullet, no quick-fix, no one size to fit all. On this point, in a particularly nice meta twist, Syed notes that the approaches espoused for learning, say, how to build a product can also be used on the approaches themselves — in order to learn better how to build, perhaps we first need to learn better how to learn.

On learning then, three things that I'm taking away from this book.

I have historically been sceptical when I hear people blithely say that we learn more from failure than success. Out of context, I still don't believe that's necessarily a given but I think perhaps now I have more nuanced thinking here.

First, using a generate-and-test approach in development, and treating each generation that doesn't improve our test metric a failure, we might say that the volume of failure drives our learning more than the final success. Syed gives the example of James Dyson who made thousands of incrementally different prototype vacuum cleaners before arriving at his first production model. Thousands of failures, each of which helped to point the way to success.

Alternatively, I wonder whether it might mean that that analysis of the differences between success and multiple failures allows us to understand the factors important to success in a way that simple (ahem!) success does not.

Also new to me, and hidden in a footnote (p. 172), there's an interesting term:
"Observational statistics" is a phrase that encompasses all the statistics drawn from looking at what happened. Randomised control trials are different because they encompass not merely what happened, but also construct a counterfactual for comparison.
That counterfactual is key; it helps to balance survivorship bias. A well-known example comes from the second world war: deciding where to add armour to planes based on where there are bullet holes in those that returned to base is to miss the massive value of the unobserved data. Those that got shot down and never made it back might well have been hit elsewhere. (For a brief summary see e.g. Mother Jones.)

Another footnote (p. 220) raises an interesting potential tension that I realise I've been aware of but perhaps never surfaced before:
Getting the manufacturing process running seamlessly is often about ironing out unwanted deviations. It is about using process controls and the like to reduce variation. Creative change is often about experimentation: in other words, increasing variation.
Sensitivity to variability, to the unknown, should be adjusted consciously based on the context in which we are operating. More frequently, it appears to me, we have a relatively fixed level of comfort which can compromise our ability to operate in one or other of the scenarios that Syed identifies.

Black Box Thinking, despite the repetition due to the interconnectedness of the ideas it puts forwards and despite its sardine tin consistency, is a book worth persevering with. It's helped me to both learn and reflect on many concepts I've been thinking about for some time myself. Here's a few:
Image: Amazon

Thursday, August 16, 2018

Tufte: Visual Explanations


Last year I read a bunch of Edward Tufte books: The Visual Display of Quantitative Information, Envisioning Information, Visual Explanations: Images and Quantities, Evidence and Narrative, Beautiful Evidence, and The Cognitive Style of PowerPoint. I found them compelling and ended up writing You've Got To See This for the Gurock Blog. 

In the intervening year I've found ways to incorporate aspects of what I learned into my work: I've tried hard to remove the junk from my figures and charts; I've noted that when we're talking about how to talk about our data, something like small multiples can help us to visualise more of it more easily; I've encouraged members of my team to think about the difference between exploring data in a tool such as Excel, and presenting data in a chart produced by Excel.

After that experience, I thought it might be interesting to review the notes I took as I went through the books (which I did, and it was). Then I thought it might also be useful to share them (which I'm doing, and you can judge).

This short set of posts contain the quotes I took from each book, presented in the order that I happened to read them. Themes recur across the series, but the quotes don't necessarily reflect that; instead they show something of what I felt was interesting to me in the context of what I'd already read, what I already knew, and what I was working on at the time.
All of the books are published by Graphics Press and available direct from the author at edwardtufte.com. Particular thanks go to Šime for the loans and the comments.

--00--

How are we to assess the integrity of visual evidence? What ethical standards are to be observed in the production of such images? (p. 25)

... the reason we seek causal explanations is in order to intervene, to govern the cause so as to govern the effect ... (p. 28)

... descriptive narration is not causal explanation; the passage of time [can be] a poor explanatory variable ... (p. 29)

The deep, fundamental question in statistical analysis is Compared with what? (p. 30)

Time-series are exquisitely sensitive to choice of intervals and end points. (p. 37)

Displays of evidence implicitly but powerfully define the scope of the relevant, as presented data are selected from a larger pool of material. Like magicians, chartmakers reveal what they choose to reveal. (p. 43)

When assessing evidence, it is helpful to see a full data matrix, all observations for all variables, those private numbers from which the public displays are constructed. Not telling what will turn up. (p. 45)

... there are right ways and wrong ways to show data: there are displays that reveal the truth and displays that do not. (p. 45)

... lack of visual clarity in arranging evidence is a sign of a lack of intellectual clarity in reading about evidence (p. 48)

Informational displays should serve the analytical purpose at hand: if the substantive matter is a possible cause-effect relationship, then graphs should organize data so as to illuminate such a link. (p. 49)

In magical performances, knowledge about the revealed frontview (what appears to be done) fails to yield reliable knowledge about the concealed backview (what is actually done) — and it is the audience's misdirected assumption about such symmetric reliability that makes the magic. (p. 55)

[techniques of deception practised by magicians], when revered, reinforce strategies of presentation used by good teachers. Your audience should know beforehand what you are going to do; then they can evaluate how your verbal and visual evidence supports your argument. (p. 68)

If a clear statement of the problem cannot be formulated, then that is a sure sign that the content of the presentation is deficient. (p. 68)

Relevant to nearly every display of data, the smallest effective difference is the Occam's razor ... of information design. (p. 71)

Congruity of structure across multiple images gives the eye a context for assessing data variation. (p. 82)

Multiple images reveal repetition and change, pattern and surprise — the defining elements in the idea of information. (p. 105)

Excellence in the display of information is a lot like clear thinking. (p. 141)
Image: Tufte

Tufte: The Cognitive Style of PowerPoint

Last year I read a bunch of Edward Tufte books: The Visual Display of Quantitative Information, Envisioning Information, Visual Explanations: Images and Quantities, Evidence and Narrative, Beautiful Evidence, and The Cognitive Style of PowerPoint. I found them compelling and ended up writing You've Got To See This for the Gurock Blog. 

In the intervening year I've found ways to incorporate aspects of what I learned into my work: I've tried hard to remove the junk from my figures and charts; I've noted that when we're talking about how to talk about our data, something like small multiples can help us to visualise more of it more easily; I've encouraged members of my team to think about the difference between exploring data in a tool such as Excel, and presenting data in a chart produced by Excel.

After that experience, I thought it might be interesting to review the notes I took as I went through the books (which I did, and it was). Then I thought it might also be useful to share them (which I'm doing, and you can judge).

This short set of posts contain the quotes I took from each book, presented in the order that I happened to read them. Themes recur across the series, but the quotes don't necessarily reflect that; instead they show something of what I felt was interesting to me in the context of what I'd already read, what I already knew, and what I was working on at the time.
All of the books are published by Graphics Press and available direct from the author at edwardtufte.com. Particular thanks go to Šime for the loans and the comments.

--00--

Visual reasoning usually works more effectively when the relevant evidence is shown adjacent in space within our eyespan. (p. 5)

Many true statements are too long to fit on a PowerPoint slide, but this does not mean we should abbreviate the truth to make the words fit. It means we should find a better tool to make presentations.(p. 5)

How is it that each elaborate architecture of thought always fits exactly on one slide? (p. 12)

By using PP to report technical work,  presenters quickly damage their credibility ... Both [reviews of NASA's investigations into Shuttle disasters] concluded that (1) PowerPoint is an inappropriate tool for engineering reports, presentations and documentation and (2) the technical report is superior to PP. (p. 14)

... the PowerPoint slide typically shows 40 words, which is about 8 seconds of silent reading material. (p. 15)

This poverty of content has several sources. The PP design style, which uses about 40% to 60% of the space available on a slide to show unique content, with remaining space devoted to Phluff, bullets, frames, and branding. The slide projection of text, which requires very large type so the audience can see the words. Most importantly, presenters who don't have all that much to say (p. 15)

Sometimes PowerPoint's low resolution is said to promote a clarity of reading and thinking. Yet in visual reasoning, arts, typography, cartography, even sculpture, the quantity of detail is an issue completely separate from the difficulty of reading ... meaning and reasoning are relentlessly contextual. Less is bore. (p. 16)

To make smarter presentations, try smarter tools. (p. 28)

PowerPoint promotes a cognitive style that disrupts and trivialises evidence. (p. 30)

Preparing a technical report requires deeper intellectual work than simply compiling a list of bullets on slides. Writing sentences forces presenters to be smarter. And presentations based on sentences makes consumers smarter as well. (p. 30)

Our evidence concerning PP's performance is relevant only to serious presentations, where the audience (1) needs to understand something, (2) to assess the credibility of the presenter. (p. 31)

Consumers of presentations might well be skeptical of speakers who rely on PowerPoint's cognitive style. It is possible that these speakers not evidence-oriented, and are serving up some PP Phluff to mask their lousy content ... (p. 31)
Image: Tufte

Tufte: The Visual Display of Quantitative Information


Last year I read a bunch of Edward Tufte books: The Visual Display of Quantitative Information, Envisioning Information, Visual Explanations: Images and Quantities, Evidence and Narrative, Beautiful Evidence, and The Cognitive Style of PowerPoint. I found them compelling and ended up writing You've Got To See This for the Gurock Blog. 

In the intervening year I've found ways to incorporate aspects of what I learned into my work: I've tried hard to remove the junk from my figures and charts; I've noted that when we're talking about how to talk about our data, something like small multiples can help us to visualise more of it more easily; I've encouraged members of my team to think about the difference between exploring data in a tool such as Excel, and presenting data in a chart produced by Excel.

After that experience, I thought it might be interesting to review the notes I took as I went through the books (which I did, and it was). Then I thought it might also be useful to share them (which I'm doing, and you can judge).

This short set of posts contain the quotes I took from each book, presented in the order that I happened to read them. Themes recur across the series, but the quotes don't necessarily reflect that; instead they show something of what I felt was interesting to me in the context of what I'd already read, what I already knew, and what I was working on at the time.
All of the books are published by Graphics Press and available direct from the author at edwardtufte.com. Particular thanks go to Šime for the loans and the comments.

--00--

For Playfair, graphics were preferable to tables because graphics showed the shape of the data in a comparative perspective. (p. 32)

... small non-comparative, highly labeled data sets usually belong in tables. (p. 33)

... the relational graphic — in its barest form, the scatterplot and its variants — is the greatest of all graphical designs ... It confronts causal theories that X causes Y with empirical evidence as the actual relationship between X and Y (p. 47)

Graphical excellence consists of complex ideas communicated with clarity, precision, and efficiency. (p. 51)

Particularly disheartening is the securely established finding that the reported perception of something as clear and simple as line length depends on the context and what other people have already said about the lines. (p. 56)

... given the perceptual difficulties, the best we can hope for is some uniformity in graphics (if not in the perceivers) and some assurance that two perceivers have a fair chance of getting the numbers right. (p. 56)

Deception results from the incorrect extrapolation of visual expectations generated at one place on the graphic to other places. (p. 60)

Show data variation, not design variation. (p. 61)

Graphics must not quote data out of context. (p. 74)

If the statistics are boring then you've got the wrong numbers. Finding the right numbers requires as much specialized skill — statistical skill — and hard work as creating a beautiful design or covering a complex news story. (p.80)

Occasionally artfulness of design makes a graphic worthy of the Museum of Modern Art, but essentially statistical graphs are instruments to help people reason about quantitative information. (p. 91)

Above all else show the data. (p. 92)

The best designs ... are intriguing and curiosity-provoking, drawing the viewer into the wonder of the data, sometimes by narrative power, sometimes by immense details, and sometimes by elegant presentation of simple but interesting data. (p. 121)

John Tukey wrote: "If we are going to make a mark, it may as well be a meaningful one. The simplest — and most useful — meaningful mark is a digit" (p. 140)

Small multiples resemble the frames of a movie: a series of graphics showing the same combination of variables, indexed by changes in another variable. (p. 168)


Small multiples are inherently multivariate, like nearly all interesting problems and solutions in data analysis. (p. 169)

Tables are clearly the best way to show exact numerical values, although the entries can be arranged in semi-graphical form. (p. 178)

Given their low data-density and failure to order numbers along a visual dimension, pie charts should never be used. (p. 178)

Explanations that give access to the richness of the data make graphics more attractive to the viewer. (p. 180)

Words and pictures belong together. Viewers need the help that words can provide. (p. 180)

Thus, for graphics in exploratory data analysis, words should tell the viewer how to read the design ... and not what to read in terms of content. (p. 182)

What is to be sought in designs for the display of information is the clear portrayal of complexity. Not the complication of the simple; rather the task of the designer is to give visual access to the subtle and difficult — that is, the revelation of the complex (p. 191)

Tufte: Envisioning Information


Last year I read a bunch of Edward Tufte books: The Visual Display of Quantitative Information, Envisioning Information, Visual Explanations: Images and Quantities, Evidence and Narrative, Beautiful Evidence, and The Cognitive Style of PowerPoint. I found them compelling and ended up writing You've Got To See This for the Gurock Blog. 

In the intervening year I've found ways to incorporate aspects of what I learned into my work: I've tried hard to remove the junk from my figures and charts; I've noted that when we're talking about how to talk about our data, something like small multiples can help us to visualise more of it more easily; I've encouraged members of my team to think about the difference between exploring data in a tool such as Excel, and presenting data in a chart produced by Excel.

After that experience, I thought it might be interesting to review the notes I took as I went through the books (which I did, and it was). Then I thought it might also be useful to share them (which I'm doing, and you can judge).

This short set of posts contain the quotes I took from each book, presented in the order that I happened to read them. Themes recur across the series, but the quotes don't necessarily reflect that; instead they show something of what I felt was interesting to me in the context of what I'd already read, what I already knew, and what I was working on at the time.

All of the books are published by Graphics Press and available direct from the author at edwardtufte.com. Particular thanks go to Šime for the loans and the comments.

--00--

Emaciated data-thin designs ... provoke suspicion — and rightfully so — about the quality of measurement and analysis. (p. 32)

Small multiples, whether tabular or pictorial, move to the heart of visual reasoning ... Their multiplied smallness enforces local comparisons with our eyespan. (p. 33)

We envision information in order to reason about, communicate, document, and preserve that knowledge ... (p. 33)

Standards of excellence for information design are set by high quality maps, with diverse, bountiful detail, several layers of close reading combined with an overview, and rigorous data from engineering surveys. (p. 35)

Simplicity of reading derives from the context of detailed and complex information, properly arranged. A most unconventional design strategy is revealed: to clarify, add detail. (p. 37)

If the visual task is contrast, comparison, and choice — as it so often is — then the more relevant information with eyespan the better. (p. 50)

Simpleness is another aesthetic preference, not an information display strategy, not a guide to clarity. What we seek instead is a rich texture of data, a comparative context, an understanding of complexity revealed with an economy of means. (p. 51)


One line plus one line results in many meanings — Josef Albers. (p. 61; image above)

The noise in 1 + 1 = 3 is directly proportional to the contrast in value (light/dark) between figure and ground. (p. 62)

Careful visual editing diminishes 1 +1 = 3 clutter. These are not trivial cosmetic matters, for signal enhancement through noise reduction can reduce viewer fatigue as well as improve accuracy of readings from a computer interface, a flight-control display, or a medical instrument. (p. 62)

The arrangement of many computer interfaces is similarly overwrought. (p. 64)

Information consists of differences that make a difference. (p. 65)

At the heart of quantitative reasoning is a single question: Compared to what? (p. 67)

Comparisons must be enforced within the scope of the eyespan, a fundamental point occasionally forgotten in practice. (p. 76)


The Swiss maps are excellent because they are governed by good ideas and executed with superb craft. Ideas not only guide work, but also help defend our designs (by providing reasons for choices) against arbitrary taste preferences. (p. 82; similar image above)

Noise is costly, since computer displays are low-resolution devices, working at extremely thin data densities ... at every screen are two powerful information-processing capabilities, human and computer. Yet all communication between the two must pass through the low-resolution, narrow-band video display terminal, which chokes off fast, precise and complex communication. (p. 89)
Images: TufteArchemind, Bücher

Tufte: Beautiful Evidence


Last year I read a bunch of Edward Tufte books: The Visual Display of Quantitative Information, Envisioning Information, Visual Explanations: Images and Quantities, Evidence and Narrative, Beautiful Evidence, and The Cognitive Style of PowerPoint. I found them compelling and ended up writing You've Got To See This for the Gurock Blog. 

In the intervening year I've found ways to incorporate aspects of what I learned into my work: I've tried hard to remove the junk from my figures and charts; I've noted that when we're talking about how to talk about our data, something like small multiples can help us to visualise more of it more easily; I've encouraged members of my team to think about the difference between exploring data in a tool such as Excel, and presenting data in a chart produced by Excel.

After that experience, I thought it might be interesting to review the notes I took as I went through the books (which I did, and it was). Then I thought it might also be useful to share them (which I'm doing, and you can judge).

This short set of posts contain the quotes I took from each book, presented in the order that I happened to read them. Themes recur across the series, but the quotes don't necessarily reflect that; instead they show something of what I felt was interesting to me in the context of what I'd already read, what I already knew, and what I was working on at the time.


All of the books are published by Graphics Press and available direct from the author at edwardtufte.com. Particular thanks go to Šime for the loans and the comments.

--00--

For showing evidence, the map metaphor suggests that labels belong on images, that external grids help to scale images, and that data are more credible when contextualised. (p. 21)

The idea is to be approximately right rather than exactly wrong. (p. 50)  [so show more data at a lower precision to give context etc, c.f. sparklines]

[by adding sparklines to tables] Readers can scan ... making simultaneous multiple comparisons, searching for nonrandom patterns (p. 51)

Variations in slopes are best detected when the slopes are around 45 degrees ... aspect ratios should be such that time-series graphics tend towards a lumpy profile (p. 60)

A good way to assess a display for unintentional optical clutter is to ask "Do the prominent visual effects convey relevant content?" (p. 62)

[When creating diagrams with links] focus on causality (p. 78)

The essential test of text/image relations is how well they assist understanding of the content (p. 88)

Show comparisons, contrasts, differences. (p. 127)

Show multivariate data; that is, show more than one or two variables (p. 130)

... explanatory investigations, if they are to be honest and genuine, must seek out and present all relevant evidence regardless of mode (p. 131)  [where "mode" means the kind of data, e.g. text, image, table etc]

The first question is What are the content-reasoning tasks that this display is supposed to help with? (p. 136)

If the intellectual task is to make comparisons, as it is in nearly all data analysis then "Show comparisons" is the design principle. (p. 137)

Like the passive voice, the bullet-list format collaborates with evasive presenters to promote effects without causes (p. 143) [on the design choices that PowerPoint imposes]

proxy [is] statistical jargon for "pun" (p. 149) [and so do not use the same word to refer to multiple things in the same analysis, nor confound the things that one concept represents]

PowerPoint is presenter-oriented, not content-oriented, not audience-oriented (p. 158)
Image: Edward Tufte