One of the most common nonprobability sampling designs is quota sampling. Quota sampling involves identifying relevant subgroups in a population and sampling fixed proportions from each subgroup. Schoenberg et al. (2005) used quota sampling to explore differences and similarities in lay knowledge about diabetes between African Americans, Mexican Americans, Great Lakes Indians, and rural Whites. They set a quota of 20 participants from each group. This design balanced a desire for larger subsample sizes against practical constraints on the number of time-intensive, in-depth interviews researchers could complete. Within each group, Schoenberg et al. selected respondents whose age, ethnicity, and residential area increased the likelihood of experiencing diabetes. This strategy reflects the theoretical purpose of sampling cultural knowledge rather than estimating individual attributes.
Many medical anthropologists also use purposive sampling techniques. The goal of purposive sampling is to represent important dimensions of variation relevant to the aims of research. There are many approaches to purposive sampling, including selection of extreme, typical, unique, or politically important cases; selection to maximize homogeneity or heterogeneity of the sample; and identification of critical cases who have specialized knowledge or experiences relevant to the subject of interest (Onwuegbuzie and Leech 2007). In ethnographic research, the selection of key informants is an example of critical-case sampling.
Medical anthropologists often combine the building blocks in Table 4.1 to construct complex, multistage sampling designs. Baer et al. (2003) used a two-stage sampling design in their study of cross-cultural differences and similarities in the meaning of the folk illness nervios in Mexico, Guatemala, and the United States. In each of four sites, they purposively selected clusters – “a village, neighborhood, or census tract” (p. 319) – based on differences in social class, ethnicity, and other factors. Then they randomly selected roughly 40 households from each site, for a total sample size of 158.
The combination of probability and nonprobability sampling methods in multistage designs can be particularly useful for testing hypotheses about sociocultural influences on health. For example, my colleagues and I used a variant of cluster sampling that combines probability and nonprobability techniques in our work on skin color, social classification, and blood pressure in Puerto Rico (Gravlee et al. 2005). We identified clusters purposively to maximize contrasts in key explanatory variables – social class and skin color – and sampled randomly within clusters. This strategy, like all decisions in research design, involved trade-offs: Identifying clusters using nonprobability methods limited generalizability but probably made it more efficient to detect sociocultural processes related to class and color. Given limited resources, that’s a trade-off we were willing to make.
Sample Size
Sample size, Bernard (2018, p. 127) notes, is a function of four things: (1) how much variation exists in the population, (2) the number of subgroups you want to compare, (3) how big the differences are between subgroups, and (4) how precise your estimates need to be. These principles apply to studies large and small and are relevant to collecting either attribute or cultural data.
Procedures for estimating sample size in confirmatory survey or experimental research are well established (Cohen 1992). In exploratory research, the theoretical and empirical basis for evaluating sample size is relatively less developed (Onwuegbuzie and Leech 2007) but actively under development. Until recently, all we had were rules of thumb. Morse (1994) proposed sample sizes of 5–50 informants, depending on the purpose of the study. Charmaz (2014) suggested 20–30 for a grounded theory study. Creswell (2007, pp. 126–128) recommended one or two participants in narrative research, 3–10 in phenomenological research, 20–30 in grounded theory research, 4–5 cases in case study research, and “numerous artifacts, observations, and interviews…until the workings of the cultural-group are clear” in ethnography (p. 128).
As Creswell’s advice for ethnographers suggests, a guiding principle is theoretical saturation: Your sample is large enough when you stop getting new information. But how can you estimate in advance how large that will be? Guest et al. (2006) addressed this issue in a study of HIV prevention in Ghana and Nigeria. They interviewed a total of 60 female sex workers. After every six interviews, they tracked which new themes appeared, how frequently each theme occurred, and how much codebook definitions changed. By these measures, Guest et al. reached saturation after only 12 interviews. This finding is consistent with rule-of-thumb guidelines and with predictions from cultural consensus theory (Romney et al. 1986). But Guest et al. note two important caveats. First, the semistructured interview guide was narrowly focused, and all women answered the same questions. In fully unstructured interviews, it would be harder to reach saturation, because new themes would appear as researchers introduced new questions over time. Second, the sample included only one, relatively homogenous subgroup: young, urban, female sex workers. Because sample size is a function of heterogeneity in the phenomenon of interest, adding other subgroups likely would have increased the sample size necessary to reach saturation.
Hagaman and Wutich (2016) showed that to be the case. They analyzed semistructured ethnographic interviews from a cross-cultural study on water issues in four research sites: one each in Bolivia, Fiji, New Zealand, and the United States. The question was how many interviews were necessary to reach data saturation for themes and metathemes within and across sites. Hagaman and Wutich operationalized saturation as having identified a theme in three separate interviews. Most themes appeared for the first time quickly – only 3–5 interviews – but it generally took up to 10 interviews for the second instance of a theme and 16 for the third. These numbers are just averages. Hagaman and Wutich found that even 30 interviews wasn’t enough in the U.S. site and that, to identify metathemes cross-culturally, it took up to 39 interviews. These findings underscore the principle that the more heterogeneous the population, the larger the sample you will need.
Cultural consensus theory (Romney et al. 1986) formalizes the relationship between heterogeneity and sample size. The theory draws on a cognitive view of culture as shared and socially transmitted knowledge; it then provides a formal model for measuring the extent to which knowledge is shared or contested. The implication for sample size is that the higher the sharing, the smaller the sample necessary to detect consensual beliefs. If we wanted to know how Americans carve up the calendar into days of the week, a handful of informants would do, because this cultural knowledge is widely shared. But if we wanted to understand how days of the week relate to more complex domains – eating, drinking, family life, or sources of stress – we would need a larger sample to capture the variation. Consensus theory formalizes this intuition, and Weller (2007) provides tables for calculating necessary sample sizes to achieve desired levels of accuracy and validity, given varying levels of agreement among informants. To use this table in designing a study, you’d have to make some assumptions about how much agreement you expect to find.
Baer et al. (2003) used this approach to calculate subsample sizes in their cross-cultural study of nervios. They anticipated a moderate level of consensus (.50) and used stringent criteria for accuracy (.95) and level of confidence (.999). Using these conservative assumptions, the tables in Weller (2007) show that at least 29 informants were necessary in each research site. Baer et al. went a bit beyond the minimum and set subsample sizes at 40 per site “to be sure that we had sufficient individuals for comparative purposes within samples” (p. 323).
Table 4.2 shows how Christopher McCarty and I incorporated consensus theory and emerging evidence