Doing it—if not p then what. This part is about being able to make sense of the statistics you see in articles and reports. After exploring the general issue of the job of statistics further in Chapter 5, Chapter 6 covers traditional statistics (hypothesis testing and confidence intervals), and Chapter 7 introduces Bayesian statistics. For each we will consider what they mean and, as importantly, misinterpretations. Chapter 8 describes some of the common issues and problems faced by all these statistical methods including the dangers of cherry picking and the benefits of simulation and empirical methods made possible by computation. Chapter 9 focuses on the differences and my own recommendations for best choice of methods.
Design and interpretation. The last part of this book is focused on the decisions you need to make as you design your own studies and experiments, and interpret the results. Chapter 10 is about increasing the statistical power of your studies, that is making it more likely you will spot real effects. Chapter 11 moves on to when you have results and want to make sense of them and present them to others; however, much of this advice is also relevant when you are reading the work of others. Finally, Chapter 12 reviews the current state of statistics within HCI and recent developments including adoption of new statistical methods and the analysis of big data.
PART I
Wild and Wide –Concerning Randomness and Distributions
CHAPTER 2
The unexpected wildness of random
How random is the world? We often underestimate just how wild random phenomena are—we expect to see patterns and reasons for what is sometimes entirely arbitrary.
By ‘wild’ here I mean that the behaviour of random phenomena is often far more chaotic than we expect. Perhaps because, barring the weather, so many aspects of life are controlled, we have become used to ‘tame,’ predictable phenomena. Crucially, this may lead to misinterpreting data, especially in graphs, either seeing patterns that are in fact pure randomness, or missing trends hidden by noise.
The mathematics of formal statistics attempts to see through this noise and give a clear view of robust properties of the underlying phenomenon. This chapter may help you see why you need to do this sometimes. However, we are also aiming to develop a ‘gut’ feeling for randomness, which is most important when you are simply eyeballing data, getting that first impression, to help you sort out the spurious from the serious and know when to reach for the formal stats.
2.1 EXPERIMENTS IN RANDOMNESS
Through a story and some exercises, I hope that you will get a better feel for how wild randomness is. We sometimes expect random things to end up close to their average behaviour, but we’ll see that variability is often large.
When you have real data you have a combination of some real effect and random ‘noise.’ However, if you do some coin tossing experiments you can be sure that the coins you are dealing with are (near enough) fair—everything you see will be sheer randomness.
2.1.1 RAINFALL IN GHEISRA
We’ll start with a story:
In the far-off land of Gheisra there lies the Plain of Nali. For 100 miles in each direction it spreads, featureless and flat, no vegetation, no habitation; except, at its very centre, a pavement of 25 tiles of stone, each perfectly level with the others and with the surrounding land.
The origins of this pavement are unknown—whether it was set there by some ancient race for its own purposes, or whether it was there from the beginning of the world.
Figure 2.1: Three days in Gheisra—Which are mere chance and which are an omen?
Rain falls but rarely on that barren plain, but when clouds are seen gathering over the Plain of Nali, the monks of Gheisra journey on pilgrimage to this shrine of the ancients, to watch for the patterns of the raindrops on the tiles. Oftentimes the rain falls by chance, but sometimes the raindrops form patterns, giving omens of events afar off.
Some of the patterns recorded by the monks are shown in Fig. 2.1. All of them at first glance seem quite random, but are they really? Do some have properties or tendencies that are not entirely like random rainfall? Which are mere chance, and which foretell great omens? Before reading on make your choices and record why you made your decision.
Before we reveal the true omens, you might like to know how you fare alongside three- and seven-year-olds.
When very young children are presented with this choice (with an appropriate story for their age) they give very mixed answers, but have a small tendency to think that distributions like Day 1 are real rainfall, whereas those like Day 3 are an omen.
In contrast, once children are older, seven or so, they are more consistent and tended to plump for Day 3 as the random rainfall.
Were you more like the three-year-old and thought Day 1 was random rainfall, or more like the seven-year-old and thought Day 1 was an omen and Day 3 random. Or perhaps you were like neither of them and thought Day 2 was true random rainfall.
Let’s see who is right.
Day 1 When you looked at Day 1 you might have seen a slight diagonal tendency with the lower-right corner less dense than the upper-left. Or you may have noted the suspiciously collinear three dots in the second tile on the top row. However, this pattern, the preferred choice of the three-year-old, is in fact the random rainfall—or at least as random as a computer random number generator can manage! In true random phenomena you often do get gaps, dense spots, or apparent patterns, but this is just pure chance.
Day 2 In Day 2 you might have thought it looked a little clumped toward the middle. In fact, this is perfectly right, it is exactly the same tiles as in Day 1, but re-ordered so that the fuller tiles are toward the centre, and the part-empty ones to the edges. This is an omen!
Day 3 Finally, Day 3 is also an omen. This is the preferred choice of seven-year-olds to be random rainfalls and also, I have found, the preferred choice of 27-, 37-, and 47-year-olds. However, it is too uniform. The drops on each tile are distributed randomly within it, but there are precisely five drops on each tile. At some point during our early education we ‘learn’ (wrongly!) that random phenomena are uniform. Although this is nearly true when there are very large numbers involved (maybe 12,500 drops rather than 125), with smaller numbers the effects are far more chaotic than one might imagine.
2.1.2 TWO-HORSE RACES
Now for a different exercise, and this time you don’t just have to choose, you have to do something.
Find a coin or, even better, if you have 20, get them. Toss the coins one by one and put the heads into one row and the tails into another. Keep on tossing until one line of coins has ten coins in it … you could even mark a finish line ten coins away from the start (like Fig. 2.2). If you only have one coin you’ll have to toss it lots of times and keep tally.
If you are on your own repeat this several times, but if you are in a group, perhaps a class, do it fewer times and look at each other’s coins as well as your own.
Before you start, think about what you expect to see, and only then do the coin tossing. So what happened? Did you get a clear winner, or were they neck and neck? Is it what you expected to happen?
I