Preface
Sometimes, it seems we are bombarded with numbers, from global warming to utility bills. In user research or academic studies, we may also encounter more formal statistics such as significance testing (all those p-values) or Bayesian methods; and graphs and tables, of course, are everywhere.
For those of us working with people, we know that numbers do not capture the complexities of social activity or the nuances of human feelings, which are often more appropriately explored through rich qualitative studies. Indeed, many researchers shun anything numerical as, at best, simplistic and, at worst, dehumanising.
However, the truth is that we all use statistics, both in our work and day-to-day lives. This may be obvious if you read an article with explicit statistics, but mostly the statistics we use are informal and implicit. If you eyeball a graph, table of results, or simple summary of survey responses, and it affects your opinions, you are making a statistical inference. If you interview a selection of people or conduct a user trial of new software and notice that most people mention a particular issue or have a particular problem, you are using statistics.
Below the surface, our brains constantly average and weigh odds and we may be subconsciously aware of statistical patterns in the world well before we explicitly recognise them. Statistics are everywhere and, consciously or unconsciously, we are all statisticians. The core question is how well we understand this.
This book is intended to fill the gap between the ‘how to’ knowledge in basic statistics books and a real understanding of what those statistics mean. It will help you make sense of the various alternative approaches presented in recent articles in HCI and wider scientific literature. In addition, the later chapters will present aspects of statistical ‘craft’ skills that are rarely considered in standard textbooks. Some of the book relates to more formal statistics, while other parts will be useful even if you are only eyeballing graphs or making qualitative judgements about data.
There are some excellent books on advanced statistical techniques within HCI: Robertson and Kaptein’s collection Modern Statistical Methods for HCI [62] and Cairns’ Doing Better Statistics in Human–Computer Interaction [9]. This book is intended to complement these, allowing you to follow statistical arguments without necessarily knowing how to perform each of the analyses yourself, and, if you are using more advanced techniques, to understand them more thoroughly.
This book arose from a course on “Understanding Statistics” at CHI 2017, which itself drew on earlier short courses and tutorials from 20 years before. The fundamentals of statistics changed little in those 20 years; indeed, I could and should have written this book then. However, there have been two main developments, which have intensified both the need and the timeliness. The first is the increased availability, usability, and power of statistical tools such as R. These make it so much easier to apply statistics but can also lead to a false sense of security when complex methods are applied without understanding their purpose, assumptions and limitations. The second change has been a growing publicity about the problems of badly applied statistics—the ‘statistical crisis’: topics that were once only discussed amongst professional statisticians are now a matter of intense debate on the pages of Nature and in the halls of CHI. Again, this awareness is a very positive step but comes with the danger that HCI researchers and UX practitioners may reach for new forms of statistics with even less understanding and greater potential for misuse. Even worse, the fear of doing it wrong may lead some to avoid using statistics where appropriate or excuse abandoning it entirely.
We are in a world where big data rules, and nowhere more than in HCI where A–B testing and similar analysis of fine-grained logging means that automated analysis appears to be overtaking design expertise. To make sense of big data as well as the results of smaller laboratory experiments, surveys or field studies, it is essential that we are able to make sense of the statistics necessary to interpret quantitative data and to understand the limitations of numbers and how quantitative and qualitative methods can work together.
By the end of the book, you should have a richer understanding of: the nature of random phenomena and different kinds of uncertainty; the different options for analysing data and their strengths and weaknesses; ways to design studies and experiments to increase ‘power’—the likelihood of successfully uncovering real effects; and the pitfalls to avoid and issues to consider when dealing with empirical data. I hope that you will be better equipped to understand reports, data, and academic papers that use statistical techniques and to critically assess the validity of their results and how they may apply to your own practice or research. Most importantly, you will be better placed to design studies that efficiently use available resources and appropriately, effectively, and reliably analyse the results.
INTENDED READERSHIP
This book is intended for both experienced researchers and students who have already engaged, or intend to engage, in quantitative analysis of empirical data or other forms of statistical analysis. It will also be of value to practitioners using quantitative evaluation. There will be occasional formulae, but the focus of the book is on conceptual understanding, not mathematical skills.
Alan Dix
April 2020
Acknowledgments
First, I would like to thank Fiona, my wife, for her ongoing support and for reading this manuscript with her customary detail, not least by highlighting my continual tendency to write ‘it’ and ‘this’ when it is not at all clear what they refer to. Thanks also to the reviewers whose constructive comments led to quite substantial changes to the structure of this book, attendees at various tutorials and courses over the years who have given feedback on earlier versions of this material—including Ben for pointing out various errors (including one very embarrassing one) in a late draft. The photo of me on the cover was taken by Daniel Parry, who managed to fit me in at short notice just before the country shut down due to COVID-19. Many thanks, of course, to all the staff at Morgan & Claypool, especially Diane, Tondo, and Christine and I’m sure many others who I don’t know by name but have contributed in many ways to ensuring this book is of the highest quality.
Finally, writing this under coronavirus lockdown, the importance of understanding quantitative data is reinforced. I would like to dedicate this book to the frontline workers across the world during this critical time; in the UK, especially the staff of the NHS, but also all those providing essential services—from pharmacists and workers in care homes, to supermarket checkout assistants and parcel deliverers. Looking at the UK income distribution in Section 4.13, it is sobering to think that many of those who are putting their health and lives on the line will have incomes at the lowest end of these graphs. Behind every number is a human life. We can either use statistics to distance ourselves from the harsh reality of life or as a window to expose the neglected and overlooked. I hope that this book can help you achieve the latter.
Alan Dix
April 2020
CHAPTER 1
Introduction
In this introductory chapter we consider:
• the nature of human cognition, which makes it hard to understand probability, and hence why we need formal statistics;
• whether you need to worry about statistics at all;
• the way statistics operates to offer us insight into the complexities of the world; and
• the different phases in research and software development and where different forms of qualitative and quantitative analysis are appropriate.
1.1 WHY ARE PROBABILITY AND STATISTICS SO HARD?
Do you