Handbook of Regression Analysis With Applications in R. Samprit Chatterjee. Читать онлайн. Newlib. NEWLIB.NET

Автор: Samprit Chatterjee
Издательство: John Wiley & Sons Limited
Серия:
Жанр произведения: Математика
Год издания: 0
isbn: 9781119392484
Скачать книгу
11, 13, 14, 15, and 16, with the former chapter 11 on nonlinear regression moving to Chapter 12) expand greatly on the power and applicability of regression models beyond what was discussed in the first edition. For this reason many more references are provided in these chapters than in the earlier ones, since some of the material in those chapters is less established and less well‐known, with much of it still the subject of active research. In keeping with that, we do not spend much (or any) time on issues for which there still isn't necessarily a consensus in the statistical community, but point to books and monographs that can help the analyst get some perspective on that kind of material.

      Chapter 13 extends applications to data with multiple observations for each subject consistent with some structure from the underlying process. Such data can take the form of nested or clustered data (such as students all in one classroom) or longitudinal data (where a variable is measured at multiple times for each subject). In this situation ignoring that structure results in an induced correlation that reflects unmodeled differences between classrooms and subjects, respectively. Mixed effects models generalize analysis of variance (ANOVA) models and time series models to this more complicated situation. Models with linear effects based on Gaussian distributions can be generalized to nonlinear models, and also can be generalized to non‐Gaussian distributions through the use of generalized linear mixed effects models.

      Modern data applications can involve very large (even massive) numbers of predictors, which can cause major problems for standard regression methods. Best subsets regression (discussed in Chapter 2) does not scale well to very large numbers of predictors, and Chapter 14 discusses approaches that can accomplish that. Forward stepwise regression, in which potential predictors are stepped in one at a time, is an alternative to best subsets that scales to massive data sets. A systematic approach to reducing the dimensionality of a chosen regression model is through the use of regularization, in which the usual estimation criterion is augmented with a penalty that encourages sparsity; the most commonly‐used version of this is the lasso estimator, and it and its generalizations are discussed further.

      A final small change from the first edition to the second edition is in the title, as it now includes the phrase With Applications in R. This is not really a change, of course, as all of the analyses in the first edition were performed using the statistics package R. Code for the output and figures in the book can (still) be found at its associated web site at http://people.stern.nyu.edu/jsimonof/RegressionHandbook/. As was the case in the first edition, even though analyses are performed in R, we still refer to general issues relevant to a data analyst in the use of statistical software even if those issues don't specifically apply to R.

      We would like to once again thank our students and colleagues for their encouragement and support, and in particular students for the tough questions that have definitely affected our views on statistical modeling and by extension this book. We would like to thank Jon Gurstelle, and later Kathleen Santoloci and Mindy Okura‐Marszycki, for approaching us with encouragement to undertake a second edition. We would like to thank Sarah Keegan for her patient support in bringing the book to fruition in her role as Project Editor. We would like to thank Roni Chambers for computing assistance, and Glenn Heller and Marc Scott for looking at earlier drafts of chapters. Finally, we would like to thank our families for their continuing love and support.

      SAMPRIT CHATTERJEE

      Brooksville, Maine

      JEFFREY S. SIMONOFF

      New York, New York

      October, 2019

      How to Use This Book

      This book is designed to be a practical guide to regression modeling. There is little theory here, and methodology appears in the service of the ultimate goal of analyzing real data using appropriate regression tools. As such, the target audience of the book includes anyone who is faced with regression data [that is, data where there is a response variable that is being modeled as a function of other variable(s)], and whose goal is to learn as much as possible from that data.

      The book can be used as a text for an applied regression course (indeed, much of it is based on handouts that have been given to students in such a course), but that is not its primary purpose; rather, it is aimed much more broadly as a source of practical advice on how to address the problems that come up when dealing with regression data. While a text is usually organized in a way that makes the chapters interdependent, successively building on each other, that is not the case here. Indeed, we encourage readers to dip into different chapters for practical advice on specific topics as needed. The pace of the book is faster than might typically be the