Usually, systematic sampling would provide us with random individuals – unless for some reason every tenth individual is more likely to share certain characteristics. Suppose we used systematic sampling to examine the distribution of ants' nests in a grassland. We could place 2 m × 2 m quadrats evenly 10 m apart across the site and then count the number of nests within each quadrat. However, if ants' nests are in competition with each other, they are likely to be spaced out. If this spacing happens to be at about 10 m distances, we would either overestimate the number of nests if our sequence of samples included the nests, or underestimate if we just missed including nests in each quadrat. It would be better in this situation to use a mixture of random and systematic sampling (called stratified random sampling – Figure 1.4c) where the area was divided into blocks (say of 10 m × 10 m) and then the 2 m × 2 m quadrats were placed randomly within each of these. This type of sampling design can also be applied to temporal situations by, for example, dividing the day into blocks of 4 hours and allocating the order of the sites to be sampled within each block using different random numbers.
More sophisticated methods of laying out sampling plots (or allocating sampling periods) may be useful for planning experiments. We could lay out a series of treatments in rows so that we have replicates of each treatment (Figure 1.5a). Although this would be relatively easy to manage (adding fertiliser, dealing with particular cutting regimes, etc.) since each treatment is clustered together, there may be variability within the plot that masks the impacts of the treatment themselves. An alternative is to ensure that each row and each column of the plot has one of each treatment (see Figure 1.5b). It is even better if these treatments can be distributed randomly, whilst still maintaining an even spread across the rows and columns using a Latin square design (see Figure 1.5c). Variations on this theme have been proposed, including ones based on the patterns used in the Suduku game (Sarkar and Sinhar 2015).
Figure 1.5 Experimental layouts for five different treatments. (a) Clustered design; (b) stratified design; (c) Latin square design. Each treatment is represented by a different symbol.
Planning statistical analysis
Although at this stage we will not discuss in detail the ways in which data are analysed, it is important to at least have sight of the likely methods that may be used (see Chapter 5 for a more detailed coverage of statistical analysis). This is because different statistical methods are required to deal with different research questions. For example, if we were interested in trends over time, a regression model would seem appropriate; but if we were motivated to look for differences between treatments, then an ANOVA might be our test of choice. It should be noted that most statistical techniques require data to be gathered in a particular manner and so a lack of care at this stage could result in data being collected that cannot answer the question posed. It could also mean that some statistical methods cannot be used because the data do not meet the minimum number of observations that are needed to obtain meaningful results. This might also apply to those statistical approaches that require balanced designs (i.e. the same number of data points in each factor measured). In this section, we will discuss some of the major types of analyses that you could employ to answer certain commonly asked questions. As always, it is worth looking at the literature to see what types of analysis have been used in similar studies to the one you propose to do. There are several major groups of analysis based on the broad types of approach required.
Describing data
We need a variety of techniques to describe the data that we collect. This might be as a data exploratory technique (to check the data to see how variable a data set is, or what sort of distribution we get, etc.), to understand some aspects of the data (e.g. how diverse communities are), and for communication purposes (to be able to discuss the results, orally and in writing, with other people).
Here, simple plotting of measured variables on frequency histograms (or tables), cross‐tabulation of one (nominal) variable against another, and examining the range of the data (from the minimum to the maximum) may help to check for errors and ensure that we can chose the correct type of test for subsequent analysis. Extracting statistics, such as diversity indices to describe species richness, or evenness to describe how equal the proportion of species is within a community, can be important to assess what sort of community we have. Likewise, the estimation of population size or density might also be important. Similarly, we can use the average value of a variable to describe the magnitude of the majority of the data points (usually in conjunction with some measure of how variable the values are and how many data points were collected). These descriptive statistical techniques are reviewed later and more detail can be found in Wheater and Cook (2000, 2015).
Table 1.3 Common statistical tests. Note that in each case, there are possible questions (and analyses) dealing with more than two samples and/or variables – see Chapter 5 for further details.
Example question | Null hypothesis | Type of test | Data required |
Is there a difference between the number of birds found in deciduous woodlands and coniferous woodlands? | There is no significant difference between the number of birds in deciduous and coniferous woodlands. | Difference tests, e.g. a t test or a Mann–Whitney U test (p. 305). | Two variables: one nominal describing the woodland type and one based on either measurements (i.e. actual numbers) or on a ranked scale that describes the number of birds. |
Is there a relationship between the number of birds and the size of the woodland? | There is no significant relationship between the number of birds and the size of the woodland. | Relationship tests, e.g. correlation analysis (p. 307). | Two variables: one (either measured or ranked) that describes the number of birds and one (either measured or ranked) that describes the size of the woodland. |
Is there an association between whether birds are resident or not and whether the woodlands are deciduous or coniferous? | There is no significant association between the frequency of residency and the frequency of woodland type. | Frequency analysis, e.g. a Chi‐square test (p. 312). | Two variables: one nominal describing the residency status of the birds and one nominal describing the woodland type. |
Asking questions about data
If we wish to ask specific questions of