ORDINATION TECHNIQUES
Principal Component Analysis
Principal component analysis (PCA) was first described by Karl Pearson in 1901, and in 1933 Harold Hotelling developed computational methods (Manly 1986; Gotelli and Ellison 2004). It is one of the simplest multivariate approaches and has been widely used in ecological studies, although more recently its use has been supplanted by other approaches (Ludwig and Reynolds 1988). The objective is to create linear combinations (components) of the original variables that are not correlated and which capture most of the variation. If successful, the original number of variables is replaced with fewer principal components, making interpretation of the data easier (Manly 1986). There is much to be gained in terms of data simplification if the original measured variables are highly correlated. Indeed, if the original variables are themselves not correlated (generally this would be rare in ecological studies), then the number of components would be the same as the number of original variables and nothing would be gained by the analysis (Manly 1986). Because PCA is sensitive to the magnitude of the original variables, they are usually standardized to means of zero and unit variances. In addition, because PCA is a linear model, its usefulness declines with data that are nonlinear.
Factor Analysis
Factor analysis was developed by Charles Spearman in 1904 for the purpose of measuring human intelligence (Gotelli and Ellison 2004). As with PCA, the goal is to reduce the original number of variables to fewer, noncorrelated, variables (factors). Unlike PCA where factors are linear combinations of variables, factor analysis assumes that the measured variables are a linear combination of underlying factors, with the number of factors usually being less than the original number of variables (Kim and Mueller 1978; Sokal and Rohlf 1995; Gotelli and Ellison 2004). Factor analysis is especially useful as an exploratory approach to identify possible causal factors behind the original correlations in the data set (Sokal and Rohlf 1995; Gotelli and Ellison 2004). Factor analysis can use principal components as initial factors and, as with PCA, variables are first standardized to means of zero and unit variances (Manly 1986).
Discriminant Function Analysis
This approach is really a special case of factor analysis, where the goal is to extract factors (now referred to as discriminant functions) that best separate identifiable groups that are recognized prior to the analysis (Cooley and Lohnes 1971). Groups could be individuals of the same species or sex, or fish assemblages in the same latitude. The discriminant functions are linear combinations of variables that best separate the groups, and each function is uncorrelated with other functions. A useful feature of discriminant analysis is that once functions have been determined, they can be used to classify new data to groups (Gotelli and Ellison 2004).
Correspondence Analysis
Correspondence analysis (CA), or reciprocal averaging, is another approach used to elucidate group characteristics, such as species or functional groups, to habitat characteristics (ter Braak 1995). As is true of the other ordination techniques discussed here, CA requires the assumption that groups show unimodal distributions across the environmental variables. In contrast to the other approaches, CA does a simultaneous ordination of rows and columns to maximize the separation of the groups along each axis (Gotelli and Ellison 2004). Mathematical properties of CA, and other ordination techniques, result in compressing the extremes of an environmental gradient and accentuating the middle, resulting in what is variously referred to as the “horseshoe” or “arch” effect (Wartenberg et al. 1987; ter Braak 1995). Modifications to CA, collectively referred to as detrended correspondence analysis (DCA), were designed to deal with the distortion (ter Braak 1995). However, methods used to remove the curvature of scaling all have limitations (Wartenberg et al. 1987; Gotelli and Ellison 2004).
Nonmetric Multidimensional Scaling (NMDS)
Unlike the previous ordination techniques, which generally retain the original spacing of observations in multivariate space, NMDS is based on ranked distances. It can be used with any distance measure, and the goal of NMDS is to maximize distances of dissimilar objects and minimize distances of similar objects. It is particularly useful in ecological studies because it performs well with data containing many zero values and is robust to deviations from multinormality (Gotelli and Ellison 2004; Paavola et al. 2006).
CLASSIFICATION ANALYSIS
In contrast to ordination, the goal of which is to separate objects or variables along meaningful axes, classification analysis seeks to form discrete groupings. Cluster analysis, based on hierarchical methods, is the most commonly used form of classification analysis in ecological studies (Gotelli and Ellison 2004).
Cluster Analysis
Cluster analysis is particularly useful as an exploratory data tool by creating hierarchical groupings of objects by variables (Q analysis) or variables by objects (R analysis). Approaches to hierarchical cluster analysis are based on similarity or dissimilarity matrices and most commonly form groups by nearest-neighbor joining (Legendre and Legendre 1998; Gotelli and Ellison 2004).
USEFUL REFERENCES
Gotelli, N. J., and A. M. Ellison. 2004. A primer of ecological statistics. Sinauer Associates. Sunderland, Massachusetts.
Jongman, R. H. G., C. J. F. ter Braak, and O. F. R. van Tongeren. 1995. Data analysis in community and landscape ecology. Cambridge University Press, New York.
Kim, J.-O., and C. W. Mueller. 1978. Introduction to factor analysis. Sage University Paper series on Quantitative Applications in the Social Sciences, series 07-013. Sage Publications, Beverly Hills, California.
Ludwig, J. A., and J. F. Reynolds. 1988. Statistical ecology. John Wiley and Sons, New York.
Manly, B. F. J. 1986. Multivariate statistical methods, a primer. Chapman and Hall, New York.
Multivariate Statistics and Fish Assemblages
Pioneering multivariate studies relating habitat characteristics and fishes include G. R. Smith and Fisher (1970), dealing with the distribution patterns of fishes in Kansas, and Stevenson et al. (1974), who studied 53 species of western and central Oklahoma fishes from 27 drainage units. Both studies were based on factor analysis, which treats the variation and covariation of the original variables as a linear combination of underlying factors, with the number of factors usually being less than the original number of variables (Sokal and Rohlf 1995; Gotelli and Ellison 2004). Factor analysis thus is a means of reducing the number of variables (i.e., the factors replace the original variables) and in identifying possible causal factors that are behind the original correlations in the data set (Sokal and Rohlf 1995).
The Oklahoma study (Stevenson et al. 1974) included fish and environmental data from tributaries of the Arkansas, South Canadian, and Red river drainages. Species diversity is generally low in this environmentally harsh region, but numbers of individuals can be quite high (Matthews 1988). The analysis