4.5 The Normal Distribution
This symmetric ‘bell‐shaped’ distribution mentioned above is known as the Normal distribution and is one of the most important distributions in statistics. One such example is the histogram of the birthweight (in kilogrammes) of the 3226 new‐born babies shown in Figure 4.9.
Figure 4.9 Distribution of birthweight in 3226 new‐born babies.
(Source: data from O'Cathain et al. 2002).
The histogram of the sample data is an estimate of the population distribution of birth weights in new‐born babies. This population distribution can be estimated by the superimposed smooth ‘bell‐shaped’ curve or ‘Normal’ distribution shown. We presume that if we were able to look at the entire population of new‐born babies then the distribution of birthweight would have exactly the Normal shape. The Normal distribution has the properties summarised in Figure 4.10.
Figure 4.10 The Normal probability distribution.
The Normal distribution (Figure 4.10), is completely described by two parameters: one, μ, represents the population mean or centre of the distribution and the other, σ, the population standard deviation. The formula for the Normal distribution is given as Eq. (4.3). Populations with small values of the standard deviation σ have a distribution concentrated close to the centre, μ; those with large standard deviation have a distribution widely spread along the measurement axis (Figure 4.11).
Figure 4.11 Probability distribution functions of the Normal distributions with different means and standard deviations. (a) Effect of changing mean (μ2 > μ1). (b) Effect of changing SD (σ2 > σ1).
There are infinitely many Normal distributions depending on the values of μ and σ. The Standard Normal distribution has a mean of zero and a variance (and standard deviation) of one and a shape as shown in Figure 4.10. The formula is given as Eq. (4.4) in Section 4.9. If the random variable X has a Normal distribution with mean, μ and standard deviation, σ, then the standardised Normal deviate
The areas under the Standard Normal distribution curve have been tabulated in Table T1 in the appendix and some examples in Table 4.1. In column (i), the table gives for a positive value of Z, (that is the number of standard deviations above the mean of zero), the area under the Normal curve to the right of this value. The same value is obtained for the area below the same numerical, but negative, value −Z. Column (ii) gives the combination of these two equal areas. Using Figure 4.12 or Table 4.1, we can note that much of the area (68%) of the probability is between −1 and +1 SD, the large majority (95%) between −2 and +2 SD, and almost all (99%) between −3 and +3.
Table 4.1 Selected probabilities associated with the Normal distribution.
Standardised deviate | Probability of greater deviation | |
---|---|---|
Z = (X − μ)/σ | (i) Area in one direction | (ii) Area both directions |
0 | 0.5000 | 1.0000 |
1.000 | 0.1590 | 0.3170 |
1.645 | 0.0500 | 0.0100 |
1.960 | 0.0250 | 0.0500 |
2.000 | 0.0230 | 0.0460 |
2.576 | 0.0050 | 0.0100 |
3.000 | 0.0013 | 0.0027 |
Figure 4.12 Areas (percentages of total probability) under the standard Normal curve. (a) 31.7% of observations lie outside the mean ± 1 SD. (b) 4.6% of observations lie outside the mean ± 2SD.
As can be seen from Table 4.1, using Z values of 1.96 (that is, 1.96 SD away from the mean) then exactly 95% of the Normal distribution lies between
Changing the multiplier 1.96 to 2.58, exactly 99% of the Normal distribution lies in the corresponding interval.
How Do We Use the Normal Distribution?
The Normal probability distribution can be used to calculate the probability of different values occurring. We could be interested in the probability of being within 1 SD of the mean (or outside it). We can use a Normal distribution table, which tells us the probability of being outside this value.