Handbook of Regression Analysis With Applications in R. Samprit Chatterjee. Читать онлайн. Newlib. NEWLIB.NET

Информация о произведении:

Автор:	Samprit Chatterjee
Издательство:	John Wiley & Sons Limited
Серия:
Жанр произведения:	Математика
Год издания:	0
isbn:	9781119392484

Скачать книгу

and

. This subset model is

with images . This equality condition is called a linear restriction, because it defines a linear condition on the parameters of the regression model (that is, it only involves additions, subtractions, and equalities of coefficients and constants).

The question about whether the total SAT score is sufficient to predict grade point average can be stated using a hypothesis test about this linear restriction. As always, the null hypothesis gets the benefit of the doubt; in this case, that is the simpler restricted (subset) model that the sum of images and images is adequate, since it says that only one predictor is needed, rather than two. The alternative hypothesis is the unrestricted full model (with no conditions on images ). That is,

versus

These hypotheses are tested using a partial images ‐test. The images ‐statistic has the form

(2.1) equation

where images is the sample size, images is the number of predictors in the full model, and images is the difference between the number of parameters in the full model and the number of parameters in the subset model. This statistic is compared to an images distribution on images degrees of freedom. So, for example, for this GPA/SAT example, images and images , so the observed images ‐statistic would be compared to an images distribution on images degrees of freedom. Some statistical packages allow specification of the full and subset models and will calculate the images ‐test, but others do not, and the statistic has to be calculated manually based on the fits of the two models.

An alternative form for the images ‐test above might make clearer what is going on here:

That is, if the strength of the fit of the full model (measured by images ) isn't much larger than that of the subset model, the images ‐statistic is small, and we do not reject the subset model; if, on the other hand, the difference in images values is large (implying that the fit of the full model is noticeably stronger), we do reject the subset model in favor of the full model.

The images ‐statistic to test the overall significance of the regression is a special case of this construction (with restriction images ), as is each of the individual images ‐statistics that test the significance of any variable (with restriction images ). In the latter case images .

2.2.2 COLLINEARITY

Recall that the importance of a predictor can be difficult to assess using images ‐tests when predictors are correlated with each other. A related issue is that of collinearity (sometimes somewhat redundantly referred to as multicollinearity), which refers to the situation when (some of) the predictors are highly correlated with each other. The presence of predicting variables that are highly correlated with each other can lead to instability in the regression coefficients, increasing their standard errors, and as a result the images ‐statistics for the variables can be deflated. This can be seen in Figure 2.1. The two plots refer to identical data sets, other than the one data point that is lightly colored. Dropping the data points down to the images plane makes clear the high correlation between the predictors. The estimated regression plane changes from

in the top plot to

in the bottom plot; a small change in only one data point causes a major change in the estimated regression function.

Thus, from a practical point of view, collinearity leads to two problems. First, it can happen that the overall images ‐statistic is significant, yet each of the individual images ‐statistics is not significant (more generally, the tail probability for the images ‐test is considerably smaller than those of any of the individual coefficient images ‐tests). Second, if the data are changed only slightly, the fitted regression coefficients can change dramatically. Note that while collinearity can have a large effect on regression coefficients and associated Скачать книгу