Working with latent variables requires an entirely different approach to measurement (Bollen 1989). For single and multi-variable operationalizations, researchers assume that sociological concepts can be directly observed. When we investigate variables such as years of age or schooling, we assume that people accurately know and report that information (with some random error being acceptable). Suppose, however, we’re interested in the amount of depression in the general population. Do people always know if they’re depressed? Does everyone have the same definition of depression? This is a trickier situation. Fortunately, the Center for Epidemiologic Studies Depression (CESD) scale, which originally included 20 items (Radloff 1977), is a well-established measure of depression. There are a number of statistical approaches to combining indicators of depression like the CES-D items into a single scale (e.g. Payton 2009; Perreira et al. 2005), but they all assume that we can create a latent variable for depression that avoids the measurement error we would encounter with a single question asking directly about depression.
Measurement concerns might seem pedantic at times but taking measurement issues seriously can generate important research. For example, Montez et al. (2012) evaluated models of US mortality with 13 different measures of education derived from Hummer and Lariscy (2011). The preferred functional form of education included a linear decline in mortality risk from 0 to 11 years, a notably larger reduction in mortality risk after high school completion, and a steep linear decline continuing after high school completion. Although their primary aims were methodological in nature, searching for the optimal form of education revealed an interesting theoretical insight – educational attainment benefits survival through both human capital accumulation and socially meaningful credentials (Collins 1979).
Research on health lifestyles offers another example of the importance of measurement (Cockerham 2005; Cockerham et al. 2020). Health lifestyles refer to meaningful combinations of health behaviors that people adopt. We can imagine a lifestyle involving regular exercise, a nutritious diet, and abstention from smoking and heavy drinking. Alternatively, we can also imagine a largely sedentary lifestyle with limited concerns about a nutritious diet. And one could continue with several other possible clusters of health behaviors that coalesce into recognizable health lifestyles. To investigate these potential lifestyles and their relationship to adult health, Cockerham et al. (2020) identified latent classes of different health lifestyles, and their results revealed a unique pattern of associations between health lifestyle and health status due to diagnosed conditions that affected lifestyles in middle adulthood.
Answering Descriptive Research Questions
Many questions of interest to medical sociologists are descriptive in nature. Their answers provide descriptions of circumstances without necessarily providing a sense of how they might be altered. For instance, we might be interested in knowing the extent to which immigrants exhibit better health than non-immigrants, trends in life expectancy among members of different communities over time, or differences in health care expenditures across countries. Answers to such questions not only provide valuable information for medical sociologists but are also critical for policymakers and other constituencies that rely on data to inform decision-making.
In some cases descriptive research questions can be addressed in a straightforward manner through calculating simple statistics such as means and proportions. In other cases, finding answers to descriptive research questions requires the use of a statistical model that takes into account multiple factors. For instance, in estimating trends in death rates due to cancer for different communities over time we might want to adjust for different age structures across the communities. A statistical model provides the means to make such an adjustment.
Various forms of regression models are the primary statistical models used in quantitative medical sociology research (Gelman and Hill 2007; Kalbfleisch and Prentice 2011; Long 1997). Regression models take the form of regressing an outcome or dependent variable on a set of predictors or independent variables, one of which is often considered the focal independent variable. The estimates from fitting a regression model provide a means for assessing the relationship between a focal independent variable and an outcome (e.g. years of schooling and self-rated health) while adjusting or controlling for one or more other independent variables. The outcome variable may be continuous (e.g. health care expenditures in constant dollars), categorical (e.g. an indicator for smoking in the past month), or even unobserved (e.g. the risk of dying in a given year). The independent variables may take any level of measurement.
Answering Causal Research Questions
Despite the value of answers to descriptive research questions, we often want to talk about the causes of the statistical associations we observe. In the physical sciences, causal research questions are addressed via experiments. As noted above, experiments are also used in medical sociology research, but more frequently we rely on observational data. Because observational data lack the statistical properties of experimental data, conclusions regarding causal processes are often tentative. However, over the past 20 to 30 years, methodologists have developed a framework for causal analysis with observational data referred to as the counterfactual framework (Morgan and Winship 2015; Pearl 2009).
The counterfactual framework has two components, the potential outcomes model and causal graphs. The potential outcomes model provides a precise statement of what we mean when we say that a focal independent variable has a causal effect on an outcome (Rubin 1974). In particular, we mean the difference in the outcome that would be observed if a given case experienced an alternative exposure of a focal independent variable than the exposure that was observed. Let us consider smoking during pregnancy as our focal independent variable and the child’s birthweight as our outcome. Then the causal effect of smoking would be defined as the difference in a child’s birthweight for a mother who smoked during pregnancy had she not smoked. It’s impossible, however, to observe both of these states simultaneously (i.e. the birthweight of a child from a mother who smoked, and the birthweight of the same child had the same mother not smoked). Instead, we do the best we can to construct a comparison group to estimate what birthweight would have been observed had the mother not smoked.
Figure 3.1 Causal graph of the relationships between adult children‘s education(ACE), a vector of mediators (M), and mortality(MOR). X represents a vector of pretreatment confounders (e.g., respondent education), and U1 and U2 represent potential unobserved pretreatment and posttreatment confounders, respectively.
The second component, causal graphs, provide a systematic approach to determining what strategies for causal analysis are available with a given set of data and which variables need to be included in the analysis (Pearl 2009). Causal graphs depict relevant variables for an analysis as nodes and causal relationships between the variables using directed edges from the predictor to the outcome. In addition, bidirected edges can be used to indicate that two variables share a common cause. The relationships indicated in a causal graph are non-parametric and include all possible interactions. Figure 3.1 illustrates an example of a causal graph of mortality. The graph indicates key confounders of the effect of education on mortality (X and U1) as well as confounders of the effect of mediators on mortality (U2). Confounders are variables that affect both the focal independent variable and the outcome. If confounders are not