CHAPTER 1 Thinking like a population geneticist
All scientific fields possess a body of concepts as well as a specialized vocabulary used to express these concepts precisely. Population genetics is no different, and the entirety of this book is designed to introduce, explain, and demonstrate these concepts and vocabulary. What may be unique about population genetics among the natural sciences is the way that its practitioners approach questions about the biological world. Population genetics is a dialog between predictions based on the principles of Mendelian inheritance and observations from the empirical measurement of genotype and allele frequencies. Idealized predictions stemming from general principles form the basis of hypotheses that can be tested through observation, experiment, and comparison. At the same time, empirical patterns observed within and among populations are evaluated for evidence of their causes via predictive models. This first chapter will explore some of the ways that population genetics approaches and defines problems that are relevant to the topics in all chapters. The chapter is also intended to give some insight into how to approach the study of population genetics.
1.1 Expectations
What Do We Expect to Happen?
Expectations Are the Basis of Understanding Cause and Effect
In our everyday lives, there are many things that we expect to occur or not to occur based on the knowledge of our surroundings and past experience. For example, you probably do not expect to get hit by a meteorite while walking to your next population genetics class. Why not? Meteorites do impact the surface of the Earth and, on occasion, strike something noticeable to people nearby. A few times in the distant past, in fact, large meteors have hit the Earth and left evidence like the Chicxulub impact crater on the Yucatán Peninsula in Mexico. What influences your lack of concern? It is probably a combination of basic knowledge of the principles of physics that apply to meteors as well as your empirical observations of the frequency and location of meteor strikes. Basic physics tells us that a small meteor on a collision course with the Earth is unlikely to hit the surface since most objects burn up from the friction they experience traveling through the Earth's atmosphere. You might also reason that even if the object is big enough to pass through the atmosphere intact, and there are far fewer of these, then the Earth is a large place and, just by chance, the impact is unlikely to be even remotely near you. Finally, you have most probably never witnessed a large meteorite impact or even heard of one occurring during your lifetime. You have combined your knowledge of the physical world and your experience to arrive (perhaps unconsciously) at a prediction or an expectation: meteorite strikes are possible but are so infrequent that the risk of being struck while on the way to class is miniscule. In this very same way, you have constructed models of many events and processes in your physical and social world and used the resulting predictions to make comparisons and decisions.
Expectation: The expected value of a random variable, especially the average; a prediction or forecast.
The study of population genetics similarly revolves around constructing and testing expectations for genetic variation in populations of individual organisms. Expectations attempt to predict things like how much genetic variation is present in a population, how genetic variation in a population changes over time, and the pattern of genetic variation that might be left behind by a given biological process that acts over time or through space. Building these expectations involves the use of first principles or the set of very basic rules and assumptions that define how natural systems work at their lowest, most basic levels. A first principle in physics is the force of gravity. In population genetics, first principles are the very basic mechanisms of Mendelian particulate inheritance and processes such as mutation, mating patterns, gene flow, and natural selection that increase, decrease, and shape genetic variation. These foundational rules and processes are used and combined in population genetics with the ultimate goal of building a comprehensive set of predictions that can be applied to any species and any genetic system.
Empirical study in population genetics also plays a central role in constructing and evaluating predictions. In population genetics as in all sciences, empirical evidence is drawn from intentional observations, cleverly constructed comparisons, and experiments. Genetic patterns observed in actual populations are compared with expected patterns to test models constructed using general principles and assumptions. For example, we could construct a mathematical or computer simulation model of random genetic drift (change in allele frequency due to sampling from finite populations) based on abstract principles of sampling from a finite population and biological reproduction. We could then compare the predictions of such a model to the observed change in allele frequency through time in a laboratory population of Drosophila melanogaster (fruit flies). If the change in allele frequency in the fruit fly population matched the change in allele frequency predicted using the model of genetic drift, then we could conclude that the model effectively summarizes the biological sampling processes that take place in fruit fly populations.
It is also possible to use well‐tested and accepted model expectations as a basis to hypothesize what processes caused an observed pattern in a biological population. Again, to use a D. melanogaster population as an example, we might ask whether an observed change in allele frequency over some generations in a wild population could be explained by genetic drift. If the observed allele frequency change is within the range of the predicted change in allele frequencies based on a model of genetic drift, then we have identified a possible cause of the observed pattern. Comparing observed genetic patterns in populations often requires modifications to existing models or the construction of novel models in order to develop appropriate expectations. For example, a model of genetic drift constructed for D. melanogaster might naturally assume that all individuals in the population are diploid (individuals that possess paired sets of homologous chromosomes). If we wanted to use that same model to predict genetic drift in a population of honeybees, we would have to account for the fact that their males are haploid (individuals that possess single copies of each chromosome) while females are diploid. This change in reproductive biology could be taken into account by altering the assumptions of the model of genetic drift to make predictions appropriate for honeybee populations. Note that without some modifications, a single model of genetic drift would not accurately predict allele frequencies over time in both fruit flies and honeybees since their patterns of reproduction and chromosomal inheritance are different.
Parameters and parameter estimates
While developing the expectations of population genetics in this book, we will most often be working with idealized quantities. For example, allele frequency in a population is a fundamental quantity. For a genetic locus with two alleles, A and a, it is common to say that p equals the frequency of the A allele and q equals the frequency of the a allele. In mathematics, parameter is another term for an idealized quantity like an allele frequency. It is assumed that parameters have an exact value. Put another way, parameters are idealized quantities where the messy, real‐life details of how to measure the quantities they represent are completely ignored.
Empirical population genetics measures quantities such as allele frequencies to give parameter estimates by sampling and then measuring the alleles and genotypes present in actual populations. All experiments, observations, and even simulations in population genetics produce parameter estimates of some sort. There is a subtle notational convention used to indicate an estimate, that is, the hat or ^ character above a variable. Estimates wear hats whereas parameters do not. Using allele frequency as an example, we would say
(pronounced “p hat”) equals the number of A alleles sampled divided by the total number of alleles sampled. Intuitively, we can see from the denominator in the expression for that the allele frequency estimate will depend on the sample we gather to make the estimate.