Confederates: People who participate in a research project but are actually assisting the experimenter.
Pilot test: Using a small number of participants to test aspects of the research and receive feedback on measures and/or manipulations.
Manipulation check: Questions posed in research, typically at the end, to assess whether the participants were aware of the level of the independent variable to which they were assigned.
Pilot Tests and Manipulation Checks
If you are doing an experiment and are manipulating one or more variables, you are naturally concerned with whether the manipulation worked, that is, with whether it produced its desired effect. To avoid being disappointed when the null hypothesis cannot be rejected, it is wise first to make sure that the manipulation works (through a pilot test). To assess the effectiveness of the intervention in the study itself, you use a manipulation check at the end of the study. In a pilot test, you run your study, often with a just a few fellow students, to test your conditions. For example, if you were doing a problem-solving test and your review of the literature had not specified the difficulty or number of items to ask people to solve, you might need to test out your protocol. You would want to make sure that you weren’t making it too easy (and hence have a ceiling effect where everyone essentially correctly answered all of the items and the scores cluster at the top of the distribution) or conversely making it too hard (and hence have a floor effect where essentially everyone answers none or very few of the items and all the scores cluster at the bottom of the distribution).
Ceiling effect: Outcome in which scores cluster at the top of the maximum value; creates difficulty in evaluating group differences.
Floor effect: Clustering of scores on a measure at the low end of the possible scale values; typically linked to the difficulty of the assessment; creates difficulty in evaluating group differences.
Related to the pilot test is the notion of a manipulation check. Imagine you were conducting research on the effect of hair color on hirability and you used an application that took a photo and changed the hair color of that person. Apple, Inc. (Cupertino, California) has such an application that a student used in a research project (https://itunes.apple.com/us/app/hair-color/id485420312). If your hypothesis is that blondes are rated differently than are brunettes or redheads, you need to be sure that people can discern the hair color and pay attention to the hair color they see. If not, your manipulation is unlikely to be effective and you haven’t adequately tested your hypothesis. At the end of your study, you would include a question asking the participants to indicate the hair color of the person they rated.
If a manipulation check does not reflect the expected differences between conditions, you have no way of knowing whether the targeted information in your conditions was not attended to by participants when they made their responses or if it was attended to but did not influence respondents significantly differently. One recommendation to avoid this situation is to run a pilot test and analyze the manipulation check question(s) at that time. If you have already started running the study, you could analyze the manipulation check question(s) fairly early on when about five to seven participants per condition have completed the study. You should be able to get a sense of whether the manipulation check reveals the expected group differences. For example, if you varied hair color in a study of the effects of hair color (blond, brunette) on hirability, your manipulation check question might be, “What was the hair color of the woman in the photo?” If most respondents say something that is on track (e.g., for the brunette condition, acceptable responses might be brown, dark hair, or brunette), you can assume that they perceived the hair color.
Summary of Additional Threats to Internal Validity
Campbell and Stanley’s (1963) list (history, maturation, testing, instrumentation, statistical regression, differential selection, experimental mortality, selection–maturation interaction):
Demand characteristics
Beliefs/attitudes of participants
Effectiveness of cover story
Effectiveness of manipulation checks
Try This Now 3.4
How would you describe your role attitude as a participant? Have you tried to guess the hypotheses of experiments in which you have taken part?
Revisit and Respond 3.4
Of the threats to internal validity listed by Campbell and Stanley (1963), where do you have the most control? The least control?
Give an example of a demand characteristic.
What are the different role attitudes participants can have, according to Adair (1973)?
What is the difference between a cover story and a manipulation check?
External Validity and Ecological Validity
A good deal of attention has been paid here to internal validity; let’s turn our attention outward, to external validity and ecological validity (they are not the same). External validity is the ability to apply the results of research more broadly, that is, beyond the sample used in a particular study. Generalizability is a major emphasis of external validity.
External validity: The ability to apply the results of research more broadly—beyond the sample used in a particular study.
Ecological validity: Validity in which the emphasis is on the degree of representativeness of the research to the events and characteristics of real life.
External validity needs to be distinguished from ecological validity. In ecological validity, the emphasis is on the degree of verisimilitude (i.e., lifelikeness) of the research to the events and characteristics of real life. The two concepts overlap in that both relate to situations beyond the immediate research setting, but ecological validity specifically emphasizes the realism or degree to which the situation reflects real life, which is not necessarily the case for external validity.
There is tension between external validity and ecological validity in the sense that when you emphasize the discovery of generalizable principles, experimental control is paramount (Banaji & Crowder, 1989). On the other hand, when you emphasize whether a finding occurs in a particular way in the real world, your concern is with natural settings, which typically lend themselves to reduced experimental control (Neisser, 1978). Different methods are typically associated with external and ecological validity. In the case of external validity, laboratory research is emphasized; in the case of ecological validity, field studies are often the focus. These different strategies