Extended Families
Extended families refer to large families with many affected individuals in several generations. This study design is optimal for traditional linkage analysis but is often a rare occurrence in complex disorders. If such a family is identified, it is possible that the genetic liability in this particular family is due to a single gene, rather than a more complex etiology. Such a family would provide a unique opportunity to localize a single gene that has a large effect on disease risk in that family but may have a more moderate effect on disease etiology in the general population. Advances in high‐throughput sequencing technologies (described more in Chapter 10) have made genome sequencing of small numbers of affected family members feasible, allowing the direct examination of segregation of variants with disease in these pedigrees (described more in Chapter 6). Association methods may also be used with extended families. However, one must ensure that the association method being used considers the within‐family dependence (such as the Pedigree Disequilibrium Test (Martin et al. 2000b) or GenABEL (Aulchenko et al. 2007)) or selects only one affected individual from the family to be used in the analysis. A special case can be made for analyzing X‐linked variants within families (Choi et al. 2016; Turkmen and Lin 2020).
There are also variations on these three ascertainment schemes. For example, in an analysis of breast cancer in Australia, Hopper and colleagues (1999) employed a “case‐control‐family” design. In this approach, the cases and controls were selected first and subsequently additional family members were recruited based on the family history. If applied correctly, this approach will have the analytic advantages of a family study, and the results can be placed in the context of an epidemiologic study. Statistical issues associated with this design have been reviewed by other investigators (Liang and Pulver 1996; Seybolt et al. 1997) and will not be discussed here.
Many investigators have explored sampling schemes to determine the optimal ascertainment scheme for genetic analysis of complex disorders. McCarthy and colleagues (1998) considered sampling strategies for affected sibpairs and found that the power to detect a disease gene locus is highly dependent on the larger pedigree structure from which the sibpairs were drawn. Furthermore, they concluded that imposing a few restrictions on that pedigree structure (such as the presence of at least one unaffected sibling or parent) can provide a modest increase in power, and ascertaining random affected sibpairs (regardless of the larger pedigree structure) tends to be a robust approach under a variety of genetic inheritance models. The advantage of restricting the pedigree structure to one or fewer affected parents is that one can reduce the possibility of bilineality in the pedigree. Terwilliger and Goring (2000) have argued that, even in the case of complex disorders, ascertainment of large pedigrees is a more successful approach for genetic analysis than a case‐control approach as the large pedigrees increase the likelihood of genetic homogeneity and additionally, once ascertaining large pedigrees, one has more flexibility with regard to the types of analyses that may be performed. For example, one can analyze the entire pedigree for linkage analysis, and also, by breaking the family structure into smaller units, consider affected sibpair, affected relative pair, or trio approaches as complementary methods for identifying the disease genes. Badner et al. (1998), however, suggest that there is no benefit to collecting large pedigrees under certain genetic models (a qualitative trait with common alleles under single locus, additive and multiplicative inheritance models). In spite of the many elegant theoretical considerations of sampling schemes, there does not appear to be any consensus with regard to an optimal sampling scheme for complex disorders (Baron 1999). The optimal study design for a particular condition is influenced by the underlying genetic model, which is unknown in complex diseases. Consequently, the choice of ascertainment design will be determined primarily by the natural history of the condition under investigation and the available resources (both financial and personnel) rather than theoretical concerns.
Healthy or Unaffected Controls
For some analyses, it is necessary to have control samples to use for comparison with the patient samples. These control samples may include spouses and siblings of affected individuals, classmates, other members of the community, or even untransmitted genetic alleles. Regardless of the relationship of the control sample to the patient sample, one must ensure that the controls are ascertained from the same study population as the patients. Furthermore, the controls can be matched to the patients for confounding factors (any factor that might influence the association between the disease and genotype), such as age, sex, ethnicity, and geographic location. There are two approaches for matching controls to the cases. First, one can select controls such that the overall distribution of cases and controls is comparable with respect to the frequency of the confounders (e.g. for a study of autism spectrum disorders, both cases and controls have a sex ratio of 3 : 1 males to females). This is referred to as frequency or category matching. Alternatively, one or more control individuals may be selected to match each case based on the confounding characteristics (e.g. the case and the control are both African‐American females, eight years of age, and reside in Durham County, North Carolina). This approach is called individual matching. An alternative to matching is to consider these potential confounders in statistical analyses, although this may be a less statistically powerful approach. With the increasing availability of publicly accessible data sets, it has become feasible to utilizing existing controls, so long as there is careful consideration of the potential confounding factors. A landmark study by The Wellcome Trust Case Control Consortium (2007) was the first to robustly demonstrate the use of a common set of controls for identifying genetic factors associated with multiple conditions. Subsequently, it has become commonplace to utilize common, publicly available control samples.
It is important to keep in mind that improper selection of controls can lead to incorrect conclusions. For example, if cases and controls are not appropriately matched on ethnicity and the frequency of alleles for the genetic marker differs by ethnicity, an association study can be doomed. One may falsely conclude an association between a genetic marker and the condition if the “at‐risk” marker allele is more prevalent in the predominant ethnicity of the cases versus the controls (e.g. Knowler et al. 1988). Examples of population stratification and approaches for its detection and control of its effects are described more in Chapter 8.
Ascertainment Bias
In genetic studies, research subjects are selected for participation based on the presence or absence of the trait of interest. The family member who comes to the investigator’s attention (through admission to a hospital or solicitation of support groups, for example) is called the proband. Most often, the proband is an individual who exhibits the trait of interest. Ascertainment through an affected individual can lead to a bias in the distribution of the numbers of affected and unaffected family members present in the analysis. Because the ascertainment scheme necessitated that the family have at least one affected individual (proband), families that may be carrying the genetic liability of interest but, by chance, do not contain an affected family member will not be ascertained. This phenomenon is referred to as ascertainment bias and is demonstrated in Figure 3.2. Depending on the analysis, ascertainment bias may greatly influence the outcome of the analyses.
In general, ascertainment bias should not affect the ability to accept or reject linkage in linkage analysis (Chapter 6). However, it can affect the estimate of the recombination fraction