Study Design
The study design process includes all clinical, molecular, bioinformatics, and analytic components of your study. A study population must be defined, as well as the process for selecting individuals from that population for study participation (ascertainment) and criteria for identifying individuals exhibiting the trait of choice (phenotype definition). One must determine the molecular technology to be used (e.g. genome‐wide association study versus rare variant gene burden testing) as well as establish the electronic process for storing and retrieving the clinical and molecular data, and the analysis methods to be used. Finally, one must ensure all ethical, legal, and social issues are addressed. Each component in the study design process should be given careful consideration, as the decisions one makes at each step will impact the conclusions that can be drawn from the results. Furthermore, many of the decisions in the study design process are inter‐related. For example, the ascertainment process will impact the analysis methods that can be performed. Since many of the steps in the study design process are covered in detail in other chapters of the book, they will not be discussed further here. Only the process of selecting a study population and an ascertainment scheme will be covered in this chapter and expanded upon in Chapter 4.
Selecting a Study Population
In a well‐designed study, the study population is clearly defined and individuals from the study population are systematically selected for participation. The two primary types of study population are population‐based and clinic‐based. A population‐based study is the gold standard for study design, but a clinic‐based population is a more realistic design for many genetic studies.
Population‐Based
Ideally, one will select a study population that is an unambiguous subset of a larger underlying population. For example, this could include individuals selected from a newborn state screening registry (e.g. determining the frequency of congenital hypothyroidism in individuals from the California Newborn Screening Program (Waller et al. 2000)), children who attend a particular school system (e.g. screening for FRAXA and FRAXE in a special needs population (Meadows et al. 1996)), and individuals in a local cancer registry (e.g. examination of the involvement of genetics to fallopian tube cancer in patients identified from the Ontario Cancer Registry (Aziz et al. 2001)). This epidemiologic approach to selecting a study population is much less susceptible to ascertainment bias than a clinic‐based study population, and conclusions made from analysis are usually extendable to the larger underlying population from which the study population was drawn. The disadvantage with this approach is that it may be difficult to achieve the desired patient sample size for rare disorders, and even for common disorders, it can be quite laborious to identify eligible participants. For example, for a condition with a frequency of 1/1 000, one would need to screen 100 000 individuals to identify even 100 cases, a rather small sample size.
Clinic‐Based
In reality, most genetic studies rely on clinic‐based study populations because they provide the investigators with faster access to patients. There are numerous examples of genetic studies relying on this design. Generally, patients are selected for participation from existing patient populations, such as a specialty clinic (e.g. examining the phenotypic heterogeneity of age‐related macular degeneration in patients selected from ophthalmology clinics at two academic centers (De La Paz et al. 1997)). Less effort is required for the initial identification of the patients in this approach compared with the population‐based approach. However, the disadvantage of a clinic‐based study population is that it can be quite challenging to determine the larger underlying population from which a clinic population was drawn. Identifying appropriate control subjects for clinic‐based studies can be difficult for this reason, as discussed later in this chapter. Most large, academic medical centers attract patients not only from the immediate geographic region, but often from all over the world. A further complication of clinic‐based samples is that patients ascertained in this manner may exhibit a more severe form of the condition, simply because they have sought medical treatment or have been referred for expert diagnosis and care. For example, some individuals with myotonic dystrophy have cataracts as the only phenotypic manifestation (Meola 2013). Therefore, these individuals would not seek medical attention on a frequent basis. In contrast, other individuals with myotonic dystrophy are severely affected and seek medical treatment regularly. Consequently, ascertaining patients with myotonic muscular dystrophy from a neurology clinic would likely over‐sample individuals with the most severe form of the condition and omit ascertainment of the individuals who carry the less severe form of the condition (cataracts only). Thus, while clinic‐based populations provide fast access to desired patients, results obtained in such studies may not be applicable to the general population, or even to all individuals with the condition.
Ascertainment
There are three basic designs of case ascertainment for a genetic analysis of a binary (present/absent) trait: collection of a single affected individual (case), relative pairs from a family, and extended families with multiple affected and unaffected individuals. Examples of these ascertainment schemes are shown in Figure 3.1. As shown below, certain sampling schemes limit the types of analyses that can be performed. However, one’s sampling scheme is often dictated by the natural history of the condition under investigation. For example, with late‐onset disorders such as Alzheimer disease, Parkinson disease, and chronic obstructive pulmonary disease, collection of the parents of an affected individual is often not feasible as the parents are often already deceased. In such cases, one may be restricted to collection of affected sibpairs, or a case‐control sample.
Figure 3.1 Ascertainment schemes for genetic analysis.
Single Affected Individual
Collection of a single affected individual can take place in the context of the traditional epidemiologic cross‐sectional, case‐control, and cohort designs, as well as the case‐parent trio design. The case‐parent trio design is primarily used in family‐based association analysis and includes collection of a case and both their parents. An alternative to the case‐control and trio designs is the case‐only design (Khoury and Flanders 1996). Similar to the trio design, the case‐only approach arose from concerns regarding selection of appropriate controls for the study of genetic factors in the more traditional case‐control approach. The case‐only approach has been promoted as a particularly useful approach in the examination of gene–environment interactions (Piegorsch et al. 1994). From the ascertainment perspective, collection of single affected individuals is more feasible because in complex disorders, large families with multiple affected individuals are often difficult to identify. A disadvantage of this approach is that it limits the statistical genetic analyses to specific types of association methods (discussed in more detail in Chapter 8). Traditional linkage analysis (described in more detail in Chapter 6) usually cannot be performed on a case‐control or trio data set because the necessary family structure is not available for most genetic models.
Relative Pairs
The use of relative pairs has been a common ascertainment design in the genetic analysis of complex disorders. This approach may include the use of sibling pairs that are either concordant for the disorder (affected sibpairs) or discordant (one affected and one unaffected). Monozygotic (identical) and dizygotic (nonidentical) twins are a special case of sibling pairs, and the utility of twins in genetic analysis is described