Handbook of Regression Analysis With Applications in R. Samprit Chatterjee. Читать онлайн. Newlib. NEWLIB.NET

Информация о произведении:

Автор:	Samprit Chatterjee
Издательство:	John Wiley & Sons Limited
Серия:
Жанр произведения:	Математика
Год издания:	0
isbn:	9781119392484

Скачать книгу

alt="images"/>‐statistics, it does not have a large effect on overall measures of fit like the overall images

‐test or

, since adding unneeded variables (whether or not they are collinear with predictors already in the model) cannot increase the residual sum of squares (it can only decrease it or leave it roughly the same).

FIGURE 2.1: Least squares estimation under collinearity. The only change in the data sets is the lightly colored data point. The planes are the estimated least squares fits.

Another problem with collinearity comes from attempting to use a fitted regression model for prediction. As was noted in Chapter 1, simple models tend to forecast better than more complex ones, since they make fewer assumptions about what the future will look like. If a model exhibiting collinearity is used for future prediction, the implicit assumption is that the relationships among the predicting variables, as well as their relationship with the target variable, remain the same in the future. This is less likely to be true if the predicting variables are collinear.

How can collinearity be diagnosed? The two‐predictor model

provides some guidance. It can be shown that in this case

and

where images is the correlation between images and images . Note that as collinearity increases ( images ), both variances tend to images . This effect is quantified in Table 2.1.

Table 2.1: Variance inflation caused by correlation of predictors in a two‐predictor model.

	Variance inflation

This ratio describes by how much the variances of the estimated slope coefficients are inflated due to observed collinearity relative to when the predictors are uncorrelated. It is clear that when the correlation is high, the variability (and hence the instability) of the estimated slopes can increase dramatically.

A diagnostic to determine this in general is the variance inflation factor ( images ) for each predicting variable, which is defined as

where images is the images of the regression of the variable images on the other predicting variables. images gives the proportional increase in the variance of images compared to what it would have been if the predicting variables had been uncorrelated. There are no formal cutoffs as to what constitutes a large images , but collinearity is generally not a problem if the observed images satisfies

Скачать книгу