The first five chapters of the book have been used successfully in quarter‐length courses at a number of institutions. An alternative approach for a quarter‐length course would be to skip some of the material in Chapters 4 and 5 and substitute one or more of the case studies in Chapter 6 (www.wiley.com/go/pardoe/AppliedRegressionModeling3e), or briefly introduce some of the topics in Chapter 7 (www.wiley.com/go/pardoe/AppliedRegressionModeling3e). A semester‐length course could comfortably cover all the material in the book.
The website for the book, which can be found at www.wiley.com/go/pardoe/Applied RegressionModeling3e, contains supplementary material designed to help both the instructor teaching from this book and the student learning from it. There you'll find all the datasets used for examples and homework problems in formats suitable for most statistical software packages, as well as detailed instructions for using the major packages, including SPSS, Minitab, SAS, JMP, Data Desk, EViews, Stata, Statistica, R, and Python. There is also some information on using the Microsoft Excel spreadsheet package for some of the analyses covered in the book (dedicated statistical software is necessary to carry out all of the analyses). The website also includes information on obtaining a instructor's manual containing complete answers to all the homework problems, as well as instructional videos, practice quizzes, and further ideas for organizing class time around the material in the book.
The book contains the following stylistic conventions:
When displaying calculated values, the general approach is to be as accurate as possible when it matters (such as in intermediate calculations for problems with many steps), but to round appropriately when convenient or when reporting final results for real‐world questions. Displayed results from statistical software use the default rounding employed in R throughout.
In the author's experience, many students find some traditional approaches to notation and terminology a barrier to learning and understanding. Thus, some traditions have been altered to improve ease of understanding. These include: using familiar Roman letters in place of unfamiliar Greek letters (e.g., rather than and rather than ); replacing the nonintuitive for the sample mean of with ; using and for null hypothesis and alternative hypothesis, respectively, rather than the usual and .
Major changes for the third edition
The second edition of this book was used in the regression analysis course run by Statistics.com from 2012 to 2020. The lively discussion boards provided an invaluable source for suggestions for changes to the book. This edition clarifies and expands on concepts that students found challenging and addresses every question posed in those discussions.
There is expanded material on assessing model assumptions, analysis of variance, sums of squares, lack of fit testing, hierarchical models, influential observations, weighted least squares, multicollinearity, and logistic regression.
A new appendix provides an informal overview of matrices in the context of multiple linear regression.
I've added learning objectives to the beginning of each chapter and text boxes at the end of each section that summarize the important concepts.
As in the first two editions, this edition uses mathematics to explain methods and techniques only where necessary, and formulas are used within the text only when they are instructive. However, the book also includes additional formulas in optional sections to aid those students who can benefit from more mathematical detail.
I've added many more end‐of‐chapter problems. In total, the number of problems has increased by nearly 70%.
I've updated and added new references.
The book website has been expanded to include instructional videos and practice quizzes.
Iain Pardoe
Nelson, British Columbia
January, 2020
Acknowledgments
I am grateful to a number of people who helped to make this book a reality. Dennis Cook and Sandy Weisberg first gave me the textbook‐writing bug when they approached me to work with them on their classic applied regression book [Cook and Weisberg, 1999], and Dennis subsequently motivated me to transform my teaching class notes into my own applied regression book. People who provided data for examples used throughout the book include: Victoria Whitman for the house price examples; Wolfgang Jank for the autocorrelation example on beverage sales; Craig Allen for the case study on pharmaceutical patches; Cathy Durham for the Poisson regression example in the chapter on extensions. The multilevel and Bayesian modeling sections of the chapter on extensions are based on work by Andrew Gelman and Hal Stern. A variety of anonymous reviewers provided extremely useful feedback on the second edition of the book, as did many of my students at the University of Oregon and Statistics.com. Finally, I'd like to thank colleagues at Thompson Rivers University and the Pennsylvania State University, as well as Kathleen Santoloci and Mindy Okura‐Marszycki at Wiley.
Iain Pardoe
INTRODUCTION
I.1 STATISTICS IN PRACTICE
Statistics is used in many fields of application since it provides an effective way to analyze quantitative information. Some examples include:
A pharmaceutical company is developing a new drug for treating a particular disease more effectively. How might statistics help you decide whether the drug will be safe and effective if brought to market?Clinical trials involve large‐scale statistical studies of people—usually both patients with the disease and healthy volunteers—who are assessed for their response to the drug. To determine that the drug is both safe and effective requires careful statistical analysis of the trial results, which can involve controlling for the personal characteristics of the people (e.g., age, gender, health history) and possible placebo effects, comparisons with alternative treatments, and so on.
A manufacturing firm is not getting paid by its customers in a timely manner—this costs the firm money on lost interest. You've collected recent data for the customer accounts on amount owed, number of days since the customer was billed, and size of the customer (small, medium, large). How might statistics help