Data Collection
Research in medical sociology uses a variety of data collection techniques. What follows is a brief outline of the techniques contemporary medical sociologists use to obtain data. The most common tool for data collection is the questionnaire. Questionnaires are used in all types of research, ranging from in-person interviews to simple online surveys, and there’s an entire science about developing them in ways that will improve the chances of actually getting respondents to answer questions (Couper 2017; Schaeffer and Presser 2003).
Non-experimental surveys that collect information with questionnaires are the most common source of data in sociology. Surveys like the General Social Survey (GSS, https://gss.norc.org) provide a wide array of information about the US population (currently around 330 million), including health information, by asking a carefully selected sample of only several thousand people. The GSS is also an example of secondary data, which are data collected by prior researchers. There are several prominent secondary data sources that appear frequently in medical sociology research, e.g. National Longitudinal Study of Adolescent to Adult Health (Add Health, https://www.cpc.unc.edu/projects/addhealth), National Longitudinal Surveys (NLS, https://www.bls.gov/nls), Behavioral Risk Factor Surveillance System (BRFSS, https://www.cdc.gov/brfss). These examples, like many other popular secondary datasets, are publicly available. Although researchers are unable to operationalize their theories exactly as they would please (increasing the potential for bias), secondary data removes the burden of data collection. Most social scientists don’t have resources required to develop and implement national surveys. Publicly available data from larger surveys provides an essential resource for medical sociology.
Experimental and quasi-experimental methods of data collection are less common in sociology but still provide useful information for medical sociologists. Experimental methods have been used to good effect in, for instance, vignette studies in which researchers randomly manipulate characteristics of patients and study the conditions under which physicians treat them differently (Stepanikova 2012). Quasi-experimental methods exploit real-world interventions that “randomly” affect some groups of people more so than others. For instance, studies of the effect of education on health have exploited the uneven roll-out of changes in compulsory schooling laws and studies of the effect of trauma on mental health have exploited random variation in the communities exposed to a natural disaster, such as a tsunami (Courtin et al. 2019; Frankenberg et al. 2012).
Ethnography, participant/observation, and in-depth interviews are common techniques for creating the rich data necessary to understand social phenomena. Researchers who enter the field to collect qualitative data gain access to a more expansive social world than researchers who rely solely on questionnaires to gather information. Qualitative data collection allows for the observation of the serendipitous social action that so often goes unobserved in other data collection strategies and is especially good at integrating the context of social life. Qualitative researchers are themselves the data collection instruments, so these approaches require that researchers constantly reflect on how their own social positions affects what they observe (Fine and Hancock 2017).
“Found” data is another source of data that’s having a growing influence on social science research. A creative researcher can develop unique datasets with information from administrative records, historical documents, social networks, and other publicly available data that is increasingly located online. For example, Cotti et al. (2015) merged the BRFSS with publicly available data on the Dow Jones Industrial Average, a stock index, and found that large drops in the stock market were associated with poorer mental health and higher levels of smoking, binge drinking, and fatal car accidents involving alcohol. Government websites in particular can have treasure troves of data for researchers willing to wade into the deep waters of internet data collection and organization. While offering an interesting new approach, inefficiency lurks. Data collection can go on ad infinitum if the researcher hasn’t adequately conceptualized the key components of their research question.
METHODS FOR ANALYZING DATA
Assuming a researcher has data in hand, they have a wide variety of research methods at their disposal to explore data in an effort to answer their research questions. In the sections that follow, we review the principal methods being used to understand data in ways that have helped advance medical sociology.
Quantitative Methods
If a researcher has numerical data, then they’ll use quantitative methods for the analysis. A numerical array of data typically consists of columns representing study variables and rows containing observations of variables for each case/participant. For example, if we collected information about years of schooling from a sample of 100 adults, then each of the 100 rows would contain the education of a single participant. Variables are chosen through a process called operationalization (or simply measurement) in which researchers choose how to empirically represent concepts from their research questions. Because of the diversity of perspectives in medical sociology, most peer-reviewed journals require researchers to provide rigorous justifications for their measurement choices. This makes careful operationalization a critical feature of quantitative research in medical sociology (Aneshensel 2002; Link 2002; McAlpine et al. 2018).
Measurement
Operationalization involves two independent stages, identifying (or designing) a valid measure of a concept and then determining how exactly to incorporate that measure into a statistical model. The first stage involves a good deal of perseverance and creativity from researchers. Operationalizing complex sociological concepts generally requires a detailed review of prior studies in light of one’s specific research question. The second stage is more circumscribed and can often depend on the availability of measures and the degree to which measurement models will help inform one’s research question.
Researchers often take one of the following approaches to incorporating measures into statistical models: (1) single variable, (2) multiple variables, or (3) latent variable (Bollen et al. 2001). In some cases, the second step in operationalization is relatively straightforward. For example, if we are interested in estimating the extent to which age is related to alcohol use in a population, then we would look for data that includes respondents’ age and some assessment of alcohol consumption. This would be a single variable approach because we