FIGURE 2.1: Least squares estimation under collinearity. The only change in the data sets is the lightly colored data point. The planes are the estimated least squares fits.
Another problem with collinearity comes from attempting to use a fitted regression model for prediction. As was noted in Chapter 1, simple models tend to forecast better than more complex ones, since they make fewer assumptions about what the future will look like. If a model exhibiting collinearity is used for future prediction, the implicit assumption is that the relationships among the predicting variables, as well as their relationship with the target variable, remain the same in the future. This is less likely to be true if the predicting variables are collinear.
How can collinearity be diagnosed? The two‐predictor model
provides some guidance. It can be shown that in this case
and
where
Table 2.1: Variance inflation caused by correlation of predictors in a two‐predictor model.
|
Variance inflation |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
This ratio describes by how much the variances of the estimated slope coefficients are inflated due to observed collinearity relative to when the predictors are uncorrelated. It is clear that when the correlation is high, the variability (and hence the instability) of the estimated slopes can increase dramatically.
A diagnostic to determine this in general is the variance inflation factor (
where