Notice that the estimates of beta1 and beta2 can be quite different from the true values in the high correlation scenario (bottom plot in Figure 2.4). Consistent with this, the standard errors are larger and the confidence intervals are wider.
Let’s get more insight on the impact that changing the correlation value has on the estimates of the coefficients. Increase the Number of Simulations to about 500 using the slider, and again simulate with two different values of the correlation, one near zero and one near one. You should obtain results similar to those in Figure 2.5.
Figure 2.5: Plots of Estimates for Coefficients, Low and High Predictor Correlation
These plots show Estimate beta1 and Estimate beta2, the estimated values of β1 and β2, from the 500 or so regression fits. The reference lines show the true values of the corresponding parameters, so that the intersection of these two lines shows the pair of true values used to simulate the data. In an ideal world, all of the estimate pairs would be very close to the point defined by the true values.
When X1 and X2 have correlation close to zero, the parameter estimates cluster rather uniformly around the true value. However, the impact of high correlation between X1 and X2 is quite dramatic. As this correlation increases, the estimates of β1 and β1 become much more variable, but also more strongly (and negatively) correlated themselves. As the correlation between the two predictors X1 and X2 approaches +1.0 or –1.0, we say that the X’X matrix involved in the MLR solution, becomes ill-conditioned
In fact, when there is perfect correlation between X1 and X2, the MLR coefficient estimates cannot be computed because the matrix (X’X)1 does not exist. The situation is similar to trying to build a regression model for MPG with two redundant variables, say, “Weight of Car in Kilograms” and “Weight of Car in Pounds.” Because the two predictors are redundant, there really is only a single predictor, and the MLR algorithm doesn’t know where to place its coefficients. There are infinitely many ways that the coefficients can be allocated to both terms to produce the same model.
In cases of multicollinearity, the coefficient estimates are highly variable, as you see in Figure 2.5. This means that estimates have high standard errors, so that confidence intervals for the parameters are wide. Also, hypothesis tests can be ineffective because of the uncertainty inherent in the parameter estimates. Much research has been devoted to detecting multicollinearity and dealing with its consequences. Ridge regression and the lasso method (Hastie et al. 2001) are examples of regularization techniques that can be useful when high multicollinearity is present. (In JMP Pro 11, select Help > Books > Fitting Linear Models and search for “Generalized Regression Models”.)
Whether multicollinearity is of concern depends on your modeling objective. Are you interested in explaining or predicting? Multicollinearity is more troublesome for explanatory models, where the goal is to figure out which predictors have an important effect on the response. This is because the parameter estimates have high variability, which negatively impacts any inference about the predictors. For prediction, the model is useful, subject to the general caveat that an empirical statistical model is good only for interpolation, rather than extrapolation. For example, in the correlated case shown in Figure 2.4, one would not be confident in making predictions when X1 = +1 and X2 = -1 because the model is not supported by any data in that region.
You can close the reports produced by Multicollinearity.jsl at this point.
3
Principal Components Analysis: A Brief Visit
Centering and Scaling: An Example
The Importance of Exploratory Data Analysis in Multivariate Studies
Dimensionality Reduction via PCA
Principal Components Analysis
Like PLS, principal components analysis (PCA) attempts to use a relatively small number of components to model the information in a set of data that consists of many variables. Its goal is to describe the internal structure of the data by modeling its variance. It differs from PLS in that it does not interpret variables as inputs or outputs, but rather deals only with a single matrix. The single matrix is usually denoted by X. Although the components that are extracted can be used in predictive models, in PCA there is no direct connection to a Y matrix.
Let’s look very briefly at an example. Open the data table Solubility.jmp by clicking on the correct link in the master journal. This JMP sample data table contains data on 72 chemical compounds that were measured for solubility in six different solvents, and is shown in part in Figure 3.1. The first column gives the name of the compound. The next six columns give the solubility measurements. We would like to develop a better understanding of the essential features of this data set, which consists of a 72 x 6 matrix.
Figure 3.1: Partial View of Solubility.jmp
PCA works by extracting linear combinations of the variables. First, it finds a linear combination of the variables that maximizes the variance. This is done subject to a constraint on the sizes of the coefficients, so that a solution exists. Subject to this constraint, the first linear combination explains as much of the variability in the data as possible. The observations are then weighted by this linear combination, to produce scores. The vector of scores is called the first principal component. The vector of coefficients for the linear combination is sometimes called the first loading vector.
Next, PCA finds a linear combination which, among all linear combinations that are orthogonal to the first, has the highest variance. (Again, a constraint is placed on the sizes of the coefficients.) This second vector of factor loadings is used to compute scores for the observations, resulting in the second principal component. This second principal component explains as much variance as possible in a direction orthogonal to that of the first loading vector. Subsequent linear combinations are extracted similarly, to explain the maximum variance in the space that is orthogonal to the loading vectors that have been previously extracted.
To perform PCA for this data set in JMP:
1. Select Analyze > Multivariate Methods > Principal Components.
2. Select the columns 1-Octanol through Hexane and add them as Y, Columns.
3. Click OK.
4. In the red triangle menu for the resulting report, select Eigenvalues.
Your report should appear as in Figure 3.2. (Alternatively, you can simply run the last script in the data table panel, Principal Components.)
Figure 3.2: PCA Analysis for Solubility.jmp
Each row of data is transformed to a score on each principal component. Plots of these scores for the first two principal components