Using the birthweight data from the O'Cathain et al. (2002) study let us assume that the birthweight for new born babies has a Normal distribution with a mean of 3.4 kg and a standard deviation of 0.6 kg. So, what is the probability of giving birth to baby with a birthweight of 4.5 kg or higher?
Since birthweight is assumed to follow a Normal distribution, with mean of 3.4 kg and SD of 0.6 kg, we therefore know that approximately 68% of birthweights will lie between 2.8 and 4.0 kg and about 95% of birthweights will lie between 2.2 and 4.6 kg. Using Figure 4.13 we can see that a birthweight of 4.5 kg is between one and two standard deviations away from the mean.
Figure 4.13 Normal distribution curve for birthweight with a mean of 3.4 kg and SD of 0.6 kg.
First calculate, Z, the number of standard deviations 4.5 kg is away from the mean of 3.4 kg, that is,
The Normal distribution also has other uses in statistics and is often used as an approximation to the Binomial and Poisson distributions. Figure 4.4 shows that the Binomial distribution for any particular value of the parameter π approaches the shape of a Normal distribution as the other parameter n increases. The approach to Normality is more rapid for values of π near 0.5 than for values near to 0 or 1. Thus, provided n is large enough, a count may be regarded as approximately Normally distributed with mean nπ and
4.6 Reference Ranges
Diagnostics tests use patient data to classify individuals as either normal or abnormal. A related statistical problem is the description of the variability in normal individuals, to provide a basis for assessing the test results of other individuals. The most common form of presenting such data is as a range of values or interval that contains the values obtained from the majority of a sample of normal subjects. The reference interval is often referred to as a normal range or reference range. To distinguish the use of the same word for the Normal distribution we have used a lower case, for the normal range, and upper case convention throughout this book.
Worked Example – Reference Range – Birthweight
We can use the fact that our sample birthweight data, from the O'Cathain et al. (2002) study (see Figure 4.9); appear Normally distributed to calculate a reference range for birthweights. We have already mentioned that about 95% of the observations from a Normal distribution lie within 1.96 SDs either side of the mean. So a reference range obtained from this sample of babies is:
If the baby data were not Normally distributed then the normal reference range is obtained from the calculated percentiles of the sample as described in Chapter 2. Thus the 2.5 percentile corresponds to 2.5% of the babies below this weight which equals 2.91 kg. Correspondingly the estimated 97.5 percentile suggests that only 2.5% of babies are heavier than 4.43 kg at birth. The percentile‐based reference range for baby birthweight is therefore estimated to be 2.19 to 4.43 kg. This is very close to that obtained when we assume the birthweight has a Normal distribution.
Most reference ranges are based on samples larger than 3500 people. Over many years, and millions of births, the World Health Organization (WHO) has come up with a normal birthweight range for new‐born babies. These ranges represent results than are acceptable in new‐born babies and actually cover the middle 80% of the population distribution, that is, the 10th and 90th centiles. Low birthweight babies are usually defined (by the WHO) as weighing less than 2500 g (the 10th centile) regardless of gestational age, and large birth weight babies are defined as weighing above 4000 g (the 90th centile). Hence the normal birth weight range is around 2.5 to 4.0 kg. For our sample data, the 10th to 90th centile range was similar, at 2.75 to 4.03 kg.
4.7 Other Distributions
There are many other probability distributions used in statistics. In this section we briefly list and describe those that are more commonly used.
t‐distribution
Student's t‐distribution is any member of a family of continuous probability distributions that arises when estimating the mean of a Normally distributed variable (in the population) in situations where the sample size is small and the population standard deviation is unknown. It was developed by William Sealy Gosset under the pseudonym Student.
The t‐distribution plays an important role in a number of widely used statistical analyses, including Student's t‐test for assessing the statistical significance of the difference between two sample means, the construction of confidence intervals for the difference between two population means, and in linear regression analysis.
The t‐distribution is symmetric and bell‐shaped, like the Normal distribution, but has heavier tails, meaning that it is more prone than a Standard Normal distribution to producing values that fall far from its mean (Figure 4.14a). The exact shape of the t‐distribution is determined by the mean and variance plus what are known as the degrees of freedom, df. These are derived from the sample size. As the df increases, the shape of the t‐distribution becomes closer to the Normal distribution; and when the sample size (and degrees of freedom) are greater than 30, the t‐distribution is very similar to the Standard Normal distribution.
Figure 4.14 Examples of probability density/distribution functions for the t‐, chi‐squared, F‐ and Uniform distributions. (a) t‐distribution. (b) chi‐squared distribution. (c) F‐distribution. (d) Uniform distribution.
Chi‐squared Distribution
The chi‐squared distribution (or χ2‐distribution) with n degrees of freedom (Figure 4.14b) is the distribution of a sum of the squares of n independent