As mentioned earlier, our example is deliberately simplified. It is not uncommon for spectra to be measured at a thousand wavelengths, rather than 81. One challenge for software is to find useful representations, especially graphical representations, to help tame this complexity. Here we have seen that for this type of data, PLS holds the promise of providing results, whereas MLR clearly fails.
You might like to rerun the simulation with different settings to see how these plots and other results change. Once you are finished, you can close the reports produced by the script SpectralData.jsl.
How Does PLS Work?
So what, at a high level at least, is going on behind the scenes in a PLS analysis? We use the script PLSGeometry.jsl to illustrate. This script generates an invisible data table consisting of 20 rows of data with three Xs and three Ys. It then models these data using either one or two factors. Run the script by clicking on the correct link in the master journal. In the control panel window that appears (Figure 4.8), leave the Number of Factors set at One and click OK.
Figure 4.8: Control Panel for PLSGeometryDemo.jsl
This results in a two-by-two arrangement of 3-D scatterplots, shown in Figure 4.9. A Data Filter window also opens, shown in Figure 4.10.
Because the data table behind these plots contains three responses and n = 20 observations, the response matrix Y is 20 x 3. This is the first time that we have encountered a matrix Y of responses, rather than simply a column vector of responses. To use MLR in this case, we would have to fit a model to each column in Y separately. So, any information about how the three responses vary jointly would be lost. Although in certain cases it is desirable to model each response separately, PLS gives us the flexibility to leverage information relative to the joint variation of multiple responses. It makes it easy to model large numbers of responses simultaneously in a single model.
Figure 4.9: 3-D Scatterplots for One Factor
The two 3-D scatterplots on the left enable us to see the actual values for all six variables for all observations simultaneously, with the predictors in the top plot and the responses in the bottom plot. By rotating the plot in the upper left, you can see that the 20 points do not fill the whole cube. Instead, they cluster together, indicating that the three predictors are quite strongly correlated.
You can see how the measured response values relate to the measured predictor values by pressing the Go arrow in the Animation Controls on the video-like control of the Data Filter window that the script produced (Figure 4.10). When you do this, the highlighting loops over the observations and shows the corresponding observations in the other plots. To pause the animation, click the button with two vertical bars that has replaced the Go arrow.
Figure 4.10: Data Filter for Demonstration
The two plots on the right give us some insight into how PLS works. These plots display predicted, rather than actual, values. By rotating both of these plots, you can easily see that the predicted X and Y values are perfectly co-linear. In other words, for both the Xs and the Ys, the partial least squares algorithm has projected the three-dimensional cloud of points onto a line.
Now, once again, run through the points using the Animation Controls in the Data Filter window and observe the two plots on the right. Note that, as one moves progressively along the line in Predicted X Values, one also moves progressively along the line in Predicted Y Values. This indicates that PLS not only projects the point clouds of the Xs and the Ys onto a lower-dimensional subspace, but it does so in a way that reflects the correlation structure between the Xs and the Ys. If you were given a new observation’s three X coordinates, PLS would enable you to obtain its predicted X values, and PLS would use related information to compute corresponding predicted Y values for that observation.
In Appendix 1, we give the algorithm used in computing these results. For now, simply note that we have illustrated the statement made earlier in this chapter that PLS is a projection method that reduces dimensionality. In our example, we have taken a three-dimensional cloud of points and represented those points using a one-dimensional subspace, namely a line. As the launch window indicates, we say that we have extracted one factor from the data. In our example, we have used this one factor to define a linear subspace onto which to project both the Xs and the Ys.
Because it works by extracting latent variables, partial least squares is also called projection to latent structures. While the term “partial least squares” stresses the relationship of PLS to other regression methods, the term “projection to latent structures” emphasizes a more fundamental empirical principle: Namely, the underlying structure of highly dimensional data associated with complex phenomena is often largely determined by a smaller number of factors or latent variables that are not directly accessible to observation or measurement (Tabachnick and Fidell 2001). It is this aspect of projection, which is fundamental to PLS, that the image on the front cover is intended to portray.
Note that, if the observations do not have some correlation structure, attempts at reducing their dimensionality are not likely to be fruitful. However, as we have seen in the example from spectroscopy, there are cases where the predictors are necessarily correlated. So PLS actually exploits the situation that poses difficulties for MLR.
Given this as background, close your data filter and 3-D scatterplot report. Then rerun PLSGeometry.jsl, but now choosing Two as the Number of Factors. The underlying data structure is the same as before. Rotate the plots on the right. Observe that the predicted values for each of the Xs and the Ys fall on a plane, a two-dimensional subspace defined by the two factors. Again, loop through these points using the Animation Controls in the Data Filter window. As you might expect, the two-dimensional representation provides a better description of the original data than does the one factor model.
When data are highly multidimensional, a critical decision involves how many factors to extract to provide a sound representation of the original data. This comes back to finding a balance between underfitting and overfitting. We address this later in our examples. For now, close the script PLSGeometry.jsl and its associated reports.
PLS versus PCA
As described earlier, PCA uses the correlation matrix for all variables of interest, whereas PLS uses the submatrix that links responses and predictors. In a situation where there are both Ys and Xs, Figure 4.1 indicates that PCA uses the orange-colored correlations, whereas PLS uses the green-colored correlations. These green entries are the correlations that link the responses and predictors. PLS attempts to identify factors that simultaneously reduce dimensionality and provide predictive power.
To see a geometric representation that contrasts PLS and PCA, run the script PLS_PCA.jsl by clicking on the correct link in the master journal. This script simulates values for two predictors, X1 and X2, and a single response Y. A report generated by this script is shown in Figure 4.11.
Figure 4.11: Plots Contrasting PCA and PLS
The Contour Plot for Y on the left shows how the true value of Y changes with X1 and X2. The continuous color intensity scale shows large values of Y in red and small values in blue, as indicated by the legend to the right of the plot. The contour plot indicates that the response surface is a plane tilted so that it slopes upward