Furthermore, program staff can be wonderfully creative in blending local procedures with randomization in order to ensure that they are serving their target populations while preserving the experiment’s integrity. For example, the U.S. Department of Health and Human Services’s Family and Youth Services Bureau (FYSB) is operating an evaluation of a homeless youth program called the Transitional Living Program (Walker, Copson, de Sousa, McCall, & Santucci, 2019; U.S. Department of Health and Human Services (DHHS), n.d.a). The evaluation worked with program staff to help them use their existing needs-assessment tools to prioritize youth for the program in conjunction with a randomization process that considers those preferences: It is a win-win arrangement. Related scholarship has established procedures for embedding preferences within randomization (Olsen, Bell, & Nichols, 2017), ensuring the technical aspects of the approach as well as mitigating program concerns about ethics.
Even if control group members either are perceived to be or actually are disadvantaged, random assignment still might not be unethical (Blustein, 2005). For example, society benefits from accurate information about program effectiveness and, accordingly, research may be justified in allowing some citizens to be temporarily disadvantaged in order to gather information to achieve wider benefits for many (e.g., Slavin, 2013). Society regularly disadvantages individuals based on government policy decisions undertaken for nonresearch reasons. An example that disadvantages some people daily is that of high-occupancy vehicle (HOV) lanes: they disadvantage solo commuters to the benefit of carpoolers. Unlike an evaluation’s control group exclusions, those policy decisions (such as establishing HOV lanes) are permanent not temporary.
In an example from the private sector, Meyer (2015) argues that managers who engage in A/B testing—where staff are subjected to alternative policies—without the consent of their employees operate more ethically than those who implement a policy change without evidence to support that change. Indeed, the latter seems “more likely to exploit her position of power over users or employees, to treat them as mere means to the corporation’s ends, and to deprive them of information necessary for them to make a considered judgment about what is in their best interests” (Meyer, 2015, p. 279).
Moreover, in a world of scarce resources, I argue that it is unethical to continue to operate ineffective programs. Resources should be directed toward program improvement (or in some cases termination) when evidence suggests that a program is not generating desired impacts. From this alternative perspective, it is unethical not to use rigorous impact evaluation to provide strong evidence to guide spending decisions.
It is worth noting that policy experiments are in widespread use, signaling that society has already judged them to be ethically acceptable. Of course it is always essential to ensure the ethics of evaluation research, not only in terms of design but also in terms of treatment of research participants. Moreover, I acknowledge that there are instances where it is clearly unethical—in part because it may also be illegal—to randomize an individual out of a program. For example, entitlement programs in the U.S. entitle people to a benefit, and that entitlement cannot and should not be denied, even for what might be valuable research reasons. That does not imply, however, that we cannot or should not continue to learn about the effectiveness of entitlement programs. Instead, the kinds of questions that we ask about them are different from “Do they work?” That is, the focus is less on the overall, average treatment effects and more about the impact variation that arises from variation in program design or implementation. For instance, we might be interested to know what level of assistance is most effective for achieving certain goals. A recent example of this involves the U.S. Department of Agriculture’s extension of children’s food assistance into the summer. The Summer Electronic Benefits Transfer for Children (SEBTC) Demonstration that replaced no summer cash/near-cash assistance with a stipend for $30 or $60 per month is indeed an ethical (and creative) way to ascertain whether such assistance reduces hunger among vulnerable children when school is out of session (Collins et al., 2016; Klerman, Wolf, Collins, Bell, & Briefel, 2017).
This leads to my final point about ethics. Much of the general concern is about randomizing individuals into a “no services” control group. But, as the remainder of this book elaborates, conceiving the control group that way is unnecessary. Increasingly, experimental evaluation designs are being used to compare alternative treatments to one another rather than compare some stand-alone treatment to nothing. As such, concerns about ethics are much assuaged. As we try to figure out whether Program A is better or worse than Program B, or whether a program should be configured this way or that way, eligible individuals get access to something. When research shows which “something” is the better option, then all individuals can begin to be served through that better program option.
What This Book Covers
This book considers a range of experimental evaluation designs, highlighting their flexibility to accommodate a range of applied questions of interest to program managers. These questions about impact variation—what drives successful programs—have tended to be outside the purview of experimental evaluations. Historically, they have been under the purview of nonexperimental approaches to impact evaluation, including theory-driven evaluation, case-based designs, and other, descriptive or correlational, analytical strategies. It is my contention that experimental evaluation designs, counter to common belief among many an evaluator, can actually be used to address what works, for whom, and under what circumstances.
It is my hope that the designs discussed will motivate their greater use for program improvement for the betterment of humankind.
Why a focus on experimental evaluation? I focus on experimental evaluation because of its relative importance to funders, its ability to establish causal evidence, and its increasing flexibility to answer questions addressing more than the average treatment effect.
Why a focus on experimental evaluation designs? I focus on experimental evaluation designs because (1) alternative, nonexperimental designs are covered in other texts, and (2) many analytic strategies aimed at uncovering insights about “black box” mechanisms necessitate specialized analytic training that is beyond the scope of this book.
Why not a focus on nonexperimental designs and analysis strategies? There is substantial, active research in the “design replication” (or “within-study comparison”) literature that considers the conditions under which nonexperimental designs can produce the same results as an experimental evaluation. As with advanced analytic strategies, is it beyond the scope of this book to offer details—let alone a primer—on the many, varied nonexperimental evaluation designs. Suffice it to say that those designs exist and are the subjects of other books.
Using experimental evaluation designs to answer “black box” type questions—what works, for whom, and under what circumstances—holds substantial promise. Making a shift from thinking about a denied control group toward thinking about comparative and enhanced treatments opens opportunities for connecting experimental evaluation designs to the practice of program management and evidence-based improvement efforts.
The book is organized as follows: After this Introduction, Chapter 2 suggests a conceptual framework, building from the well-known program logic model and extending that to an evaluation logic model. Chapter 3 offers an introduction to the two-group experimental evaluation design. As the center of the book, Chapter