Similarly, the upper quartile is calculated from the top half of the data (i.e. the observations with the largest values). The second or top or upper half of the data has eight observations; so again the cut‐point for the upper quartile is the observation that splits the eight highest ranked observations (ordered observations 9–16 into two halves again, (i.e. four observations in each ‘half’). Thus, the upper quartile lies somewhere between the 12th and 13th ordered observations. Since the quartile lies between two observations the easiest option is to take the mean of the two observations. Therefore, the upper quartile is (4 + 5)/2 = 4.5 mm. So, the interquartile range (IQR), for the corn size data, is from 2.0 to 4.5 mm; or a single number 2.5 mm.
Standard Deviation and Variance
A third measure of the amount of spread or variability in a data set is the standard deviation. It is based on the idea of averaging the distance each value is away from the sample mean,
Table 2.5 Calculating the median, quartiles, and interquartile range for the corn size data.
The variance is expressed in square units and so is not a suitable measure for describing variability because it is not in the same units as the raw data. The solution is to take the square root of the variance to return to the original units. This gives us the standard deviation (usually abbreviated to SD or s) defined as:
Examining this expression it can be seen that if all the x's were the same, then they would all equal
Illustrative Example – Calculation of the Standard Deviation – Foot Corn Size
The calculations to work out the standard deviation for the 16 corn sizes are given in Table 2.6.
A convenient method of removing the negative signs is by squaring the deviations, which is given in the next column, which is then summed to get 75.756 mm2. Note that the majority of this sum (54%) is contributed by one observation, the value of 10 mm from subject 16, which is the observation furthest from the mean. This illustrates that much of the value of an SD is derived from the outlying observations. (The standard deviation is vulnerable to outliers, so if the 10 was replaced by 100 we would get a very different result.) We now need to find the average squared deviation. Common sense would suggest dividing by n, but it turns out that this actually gives an estimate of the population variance, which is too small. This is because we use the estimated mean
Table 2.6 Calculation of the variance and standard deviation for 16 subjects from the corn size data.
Corn | Square of | ||||
---|---|---|---|---|---|
size | Differences | differences | |||
Subject | (mm) | Mean | from mean | from mean | |
(i) | (xi) |
( |
( |
( |
|
1 | 1 | 3.625 | −2.625 | 6.891 | |
2 | 2 | 3.625 | −1.625 | 2.641 | |
3 | 2 | 3.625 | −1.625 | 2.641 | |
4 | 2 | 3.625 | −1.625 | 2.641 | |
5
|