Studies such as these provide a way to reduce the number of variables to a more easily handled number and also offer an objective assessment of the linkages among species and of associations among species and their environments. As with current applications of multivariate techniques, the outcomes depend on appropriate sampling designs, choosing appropriate spatial and temporal scales for analyses and meeting any assumptions (e.g., normality) of the statistical tests.
Another important feature of these studies is their predictive ability. For instance, the Oklahoma study of Stevenson et al. (1974) identified a group of fishes comprising Red River Pupfish (Cyprinodon rubrofluviatilis), Red River Shiner (Notropis bairdi), Speckled Chub (Macrhybopsis aestivalis), and Chub Shiner (N. potteri) that were positively related to indicators of natural brine. They surmised that, given increased salinity levels caused by oil and gas extraction, these species would show an expansion of their ranges, assuming that they had access to the new habitats. In support of this prediction, Red River Pupfish were introduced (perhaps by bait dealers) into a saline tributary (2.4 ppt) of the Cimarron River in northwestern Oklahoma. The pupfish are reproducing and appear to be established (McNeely et al. 2004).
Because different multivariate techniques have different strengths and weaknesses, more recent studies often combine several approaches. Rahel (1984) analyzed fish assemblages from 43 bog lakes in northern Wisconsin. Bog lakes are late successional-stage lakes in the transition from lakes to wetlands and are characterized by low pH, low oxygen levels, and generally low fish species diversity (in this case 20 species).
Fish species were grouped into assemblages using the multivariate technique of detrended correspondence analysis (DCA) on habitat distribution data (Box 4.2). Three assemblages were identified: the centrarchid assemblage consisting of bass and sunfish and associated species such as Northern Pike (Esox lucius), the cyprinid assemblage, and the Central Mudminnow (Umbra limi)-Yellow Perch (Perca flavescens) assemblage. The latter group, which occurred along with the other two assemblages, comprised a “core species group,” to which others could be added in lakes with less harsh environments.
The lakes were grouped by their environmental characteristics using principal components analysis (PCA). Out of nine original variables, PCA was able to capture 71% of the environmental variation among the lakes with three derived variables (components) (Box 4.2). The first principal component reflected the influence of lake size and habitat diversity, with lakes having high correlations on PC-I being large and having well-developed, complex littoral zones (Figure 4.7). The second principal component was largely a measure of lake productivity and acidity, with lakes having high loadings on this axis characterized by higher pH, alkalinity, and conductivity values. The third axis (not shown in Figure 4.7) reflected lake depth and adjoining wetland development. Rahel then overlaid the distribution of fishes defined by the three DCA-identified assemblages on the ordination of lakes based on habitat characteristics (Figure 4.7). Centrarchid assemblages were characteristic of large, highly productive lakes, whereas the cyprinid assemblage tended to occur in smaller, less productive lakes, and the Central Mudminnow-Yellow Perch assemblage occurred in low productivity, highly acidic lakes. In terms of successional stages, the Central Mudminnow-Yellow Perch assemblage occupied late successional environments that were transitioning to wetlands (Figure 4.7), whereas the centrarchid assemblage was characteristic of lakes in an early successional stage. The successional pattern thus shows a change from high to low fish species diversity, as environmental conditions become more limiting.
The application of multivariate statistical approaches to understanding the distributions of fish species and to fish assemblage structure is now widespread and many of such studies are treated in other sections of this chapter and in other chapters. A more recent approach, again following the technological advances in computing power, applies point location data, such as from museum collections, and the information available in geographic information systems (GIS) to the prediction of potential species occurrences. Sometimes referred to as “niche modeling,” the approaches use information associated with actual species occurrences within the framework of a GIS to automatically generate a map of additional localities where the species is likely to occur. One of the first approaches used an iterative, artificial intelligence software package called GARP (Genetic Algorithm for Rule-Set Production) (Stockwell and Peters 1999; Peterson 2001). Genetic algorithms are useful in instances where the original data, generally museum records for species occurrences, and environmental data do not meet the assumptions of most multivariate statistics. Perhaps a chief distinction between these niche models and the multivariate approaches discussed earlier is that the former generally incorporate more environmental data (often 30 or more data layers), focusing especially on topographic and climate data that can be placed in a GIS mapping system. Another obvious difference is that such studies generally are done remotely, without actual fieldwork other than the initial fish collections. As such, niche modeling links the information of museum data on species occurrence with the power of modern GIS systems, but is generally limited to data that are available remotely in electronic databases. However, as long as environmental data can be provided that are suitable for GIS, there is really no limit on what could be included (McNyset 2005). Detailed information on local habitat use, such as focal-point water velocities, substratum selection, or vertical water column position are typically not included, nor are the influences on species local occurrences caused by interactions with other species. Thus niche modeling does not include local dimensions of the realized niche of species but instead is more analogous to the fundamental niche (Hutchinson 1957a). Wiley et al. (2003) properly refer to these as “partial niche models.”
FIGURE 4.7. Principal components analysis of Wisconsin lakes on the basis of environmental characteristics. Data show mean factor scores of three fish assemblages, defined by detrended correspondence analysis, on PC-I and PC-II. See text for further explanation. Based on data from Rahel (1984).
FIGURE 4.8. The prediction of the Kansas distribution of Bluntface Shiner (Cyprinella camura) using a niche model (GARP). The solid gray line surrounds the known distribution of Bluntface Shiner in Kansas; the dashed line shows the predicted distribution. Circles show the locality data used to build the model; triangles show data used to test the model predictions. Based on McNyset (2005). See text for additional explanation.
Predictive capabilities of GARP models, as with other multivariate models (e.g., Ross et al. 1987), can be tested by using only a random subset of the species occurrence data to build the models. The withheld set of occurrence data can then be plotted on the map along with the model predictions. Of course another way to test model output is to “ground truth” the information with follow-up field studies. By either approach, GARP models have generally proven quite successful in predicting species occurrences for North American birds (Peterson 2001), marine fishes (Wiley et al. 2003), Mexican freshwater fishes (Domínguez-Domínguez et al.