This is an example of the DNA sequence found at a microsatellite locus. This sequence is the 24.1 allele from the fibrinogen alpha chain gene, or FGA locus (Genbank accession no. AY749636; see Figure 2.8). The integral repeat is the 4 bp sequence CTTT, and most alleles have sequences that differ by some number of full CTTT repeats. However, there are exceptions where alleles have sequences with partial repeats or stutters in the repeat pattern, for example, the TTTCT and CTC sequences imbedded in the perfect CTTT repeats. In this case, the 24.1 allele is 1 bp longer than the 24‐allele sequence.
GCCCCATAGGTTTTGAACTCACAGATTAAACTGTAACCAAAATAAAATTAGGCATTATTTACAAGCTAGTTT CTTT CTTT CTTT TTTCT CTTT CTTT CTTT CTTT CTTT CTTT CTTT CTTT CTTT CTTT CTTT CTTT CTTT CTTT CTTT CTTT CTC CTTC CTTC CTTT CTTC CTTT CTTT TTTGCTGGCA ATTACAGACAAATCAA |
Table 2.4 Expected numbers of each of the three MN blood group genotypes under the null hypotheses of Hardy–Weinberg. Genotype frequencies are based on a sample of 1066 Chukchi individuals, a native people of eastern Siberia (Roychoudhury and Nei 1988).
Frequency of M = |
|||
---|---|---|---|
Genotype | Observed | Expected number of genotypes | Observed – Expected |
MM | 165 |
|
−21.6 |
MN | 562 |
|
43.2 |
NN | 339 |
|
−21.6 |
In more general terms, the expected frequency of an event, p, times the number of trials or samples, n, gives the expected number of events or np. To test the hypothesis that p is the frequency of an event in an actual population, we compare np with
(2.7)
where ∑ (pronounced “sigma”) indicates taking the sum of multiple terms.
The χ2 formula makes intuitive sense. In the numerator, there is a difference between the observed and Hardy–Weinberg expected number of individuals. This difference is squared, like a variance, since we do not care about the direction of the difference but only the magnitude of the difference. Then, in the denominator, we divide by the expected number of individuals to make the squared difference relative. For example, a squared difference of 4 is small if the expected number is 100 (it is 4%) but relatively larger if the expected number is 8 (it is 50%). Adding all of these relative squared differences gives the total relative squared deviation observed over all genotypes.
(2.8)
We need to compare our statistic to values from the χ2 distribution. But, first, we need to know how much information, or the degrees of freedom (commonly abbreviated as df), was used to estimate the χ2 statistic. In general, degrees of freedom are based on the number of categories of data: df = no. of classes compared − no. of parameters estimated −1 for the χ2 test itself. In this case, df = 3–1 − 1 = 1 for three genotypes and one estimated allele frequency (with two alleles: the other allele frequency is fixed once the first has been estimated).
Figure 2.9 shows a χ2 distribution for one degree of freedom. Small deviations of the observed from the expected are more probable since they leave more area of the distribution to the right of the χ2 value. As the χ2 value gets larger, the probability that the difference between the observed and expected is just due to chance sampling decreases (the area under the curve to the right gets smaller). Another way of saying this is that as the observed and expected get increasingly different, it becomes more improbable that our null hypothesis of Hardy–Weinberg is actually the process that is determining genotype frequencies. Using Table 2.5, we see that a χ2 value of 7.46 with 1 df has a probability between 0.01 and 0.001. The conclusion is that the observed genotype frequencies would be observed less than 1% of the time in a population that actually had Hardy–Weinberg expected genotype frequencies. Under the null hypothesis, we do not expect this much difference or more from Hardy–Weinberg expectations to occur often. By convention, we would reject chance as the explanation for the differences if the χ2 value had a probability of 0.05 or less. In other words, if chance explains the difference in five trials out of 100 or less, then we reject the hypothesis that the observed and expected patterns are the same. The critical value above which we reject the null hypothesis for a χ2 test is 3.84 with 1 df, or in notation χ20.05, 1 = 3.84. In this case, we can clearly see an excess of heterozygotes and deficits of homozygotes, and employing the χ2 test allows us to conclude that Hardy–Weinberg expected genotype frequencies are not present in the population.
Figure 2.9 A χ2 distribution with one degree of freedom. The χ2 value for the Hardy–Weinberg test with MN blood group genotypes as well as the critical value to reject the null hypothesis are shown. The area under the curve to the right of the arrow indicates the probability of observing that much or more difference between the observed and expected outcomes.